This documentation is for an unreleased version of Apache Flink. We recommend you use the latest stable version.

OpenAI #

The OpenAI Model Function allows Flink SQL to call OpenAI API for inference tasks.

Overview #

The function supports calling remote OpenAI model services via Flink SQL for prediction/inference tasks. Currently, the following tasks are supported:

Chat Completions: generate a model response from a list of messages comprising a conversation.
Embeddings: get a vector representation of a given input that can be easily consumed by machine learning models and algorithms.

Usage examples #

The following example creates a chat completions model and uses it to predict sentiment labels for movie reviews.

First, create the chat completions model with the following SQL statement:

CREATE MODEL ai_analyze_sentiment
INPUT (`input` STRING)
OUTPUT (`content` STRING)
WITH (
    'provider'='openai',
    'endpoint'='https://api.openai.com/v1/chat/completions',
    'api-key' = '<YOUR KEY>',
    'model'='gpt-3.5-turbo',
    'system-prompt' = 'Classify the text below into one of the following labels: [positive, negative, neutral, mixed]. Output only the label.'
);

Suppose the following data is stored in a table named movie_comment, and the prediction result is to be stored in a table named print_sink:

CREATE TEMPORARY VIEW movie_comment(id, movie_name,  user_comment, actual_label)
AS VALUES
  (1, 'Good Stuff', 'The part where children guess the sounds is my favorite. It's a very romantic narrative compared to other movies I've seen. Very gentle and full of love.', 'positive');

CREATE TEMPORARY TABLE print_sink(
  id BIGINT,
  movie_name VARCHAR,
  predicit_label VARCHAR,
  actual_label VARCHAR
) WITH (
  'connector' = 'print'
);

Then the following SQL statement can be used to predict sentiment labels for movie reviews:

INSERT INTO print_sink
SELECT id, movie_name, content as predicit_label, actual_label
FROM ML_PREDICT(
  TABLE movie_comment,
  MODEL ai_analyze_sentiment,
  DESCRIPTOR(user_comment));

Model Options #

Common #

Option	Required	Default	Type	Description
provider	required	(none)	String	Specifies the model function provider to use, must be 'openai'.
endpoint	required	(none)	String	Full URL of the OpenAI API endpoint, e.g. `https://api.openai.com/v1/chat/completions` or `https://api.openai.com/v1/embeddings`.
api-key	required	(none)	String	OpenAI API key for authentication.
model	required	(none)	String	Model name, e.g. `gpt-3.5-turbo`, `text-embedding-ada-002`.

Chat Completions #

Option	Required	Default	Type	Description
system-prompt	optional	"You are a helpful assistant."	String	The input message for the system role.
temperature	optional	null	Double	Controls randomness of output, range `[0.0, 1.0]`. See temperature
top-p	optional	null	Double	Probability cutoff for token selection (used instead of temperature). See top_p
stop	optional	null	String	Stop sequences, comma-separated list. See stop
max-tokens	optional	null	Long	Maximum number of tokens to generate. See max tokens

Embeddings #

Option	Required	Default	Type	Description
dimension	optional	null	Long	Dimension of the embedding vector. See dimensions

Schema Requirement #

Task	Input Type	Output Type
Chat Completions	STRING	STRING
Embeddings	STRING	ARRAY<FLOAT>

OpenAI #

Overview #

Usage examples #

Model Options #

Common #

provider

endpoint

api-key

model

Chat Completions #

system-prompt

temperature

top-p

stop

max-tokens

Embeddings #

dimension

Schema Requirement #