This documentation is for an unreleased version of Apache Flink. We recommend you use the latest stable version.
OpenAI
OpenAI #
The OpenAI Model Function allows Flink SQL to call OpenAI API for inference tasks.
Overview #
The function supports calling remote OpenAI model services via Flink SQL for prediction/inference tasks. Currently, the following tasks are supported:
- Chat Completions: generate a model response from a list of messages comprising a conversation.
- Embeddings: get a vector representation of a given input that can be easily consumed by machine learning models and algorithms.
Usage examples #
The following example creates a chat completions model and uses it to predict sentiment labels for movie reviews.
First, create the chat completions model with the following SQL statement:
CREATE MODEL ai_analyze_sentiment
INPUT (`input` STRING)
OUTPUT (`content` STRING)
WITH (
'provider'='openai',
'endpoint'='https://api.openai.com/v1/chat/completions',
'api-key' = '<YOUR KEY>',
'model'='gpt-3.5-turbo',
'system-prompt' = 'Classify the text below into one of the following labels: [positive, negative, neutral, mixed]. Output only the label.'
);
Suppose the following data is stored in a table named movie_comment
, and the prediction result is to be stored in a table named print_sink
:
CREATE TEMPORARY VIEW movie_comment(id, movie_name, user_comment, actual_label)
AS VALUES
(1, 'Good Stuff', 'The part where children guess the sounds is my favorite. It's a very romantic narrative compared to other movies I've seen. Very gentle and full of love.', 'positive');
CREATE TEMPORARY TABLE print_sink(
id BIGINT,
movie_name VARCHAR,
predicit_label VARCHAR,
actual_label VARCHAR
) WITH (
'connector' = 'print'
);
Then the following SQL statement can be used to predict sentiment labels for movie reviews:
INSERT INTO print_sink
SELECT id, movie_name, content as predicit_label, actual_label
FROM ML_PREDICT(
TABLE movie_comment,
MODEL ai_analyze_sentiment,
DESCRIPTOR(user_comment));
Model Options #
Common #
Option | Required | Default | Type | Description |
---|---|---|---|---|
provider |
required | (none) | String | Specifies the model function provider to use, must be 'openai'. |
endpoint |
required | (none) | String | Full URL of the OpenAI API endpoint, e.g. https://api.openai.com/v1/chat/completions or
https://api.openai.com/v1/embeddings . |
api-key |
required | (none) | String | OpenAI API key for authentication. |
model |
required | (none) | String | Model name, e.g. gpt-3.5-turbo , text-embedding-ada-002 . |
Chat Completions #
Option | Required | Default | Type | Description |
---|---|---|---|---|
system-prompt |
optional | "You are a helpful assistant." | String | The input message for the system role. |
temperature |
optional | null | Double | Controls randomness of output, range [0.0, 1.0] . See temperature |
top-p |
optional | null | Double | Probability cutoff for token selection (used instead of temperature). See top_p |
stop |
optional | null | String | Stop sequences, comma-separated list. See stop |
max-tokens |
optional | null | Long | Maximum number of tokens to generate. See max tokens |
Embeddings #
Option | Required | Default | Type | Description |
---|---|---|---|---|
dimension |
optional | null | Long | Dimension of the embedding vector. See dimensions |
Schema Requirement #
Task | Input Type | Output Type |
---|---|---|
Chat Completions | STRING | STRING |
Embeddings | STRING | ARRAY<FLOAT> |