OpenAI
This documentation is for an unreleased version of Apache Flink. We recommend you use the latest stable version.

OpenAI #

The OpenAI Model Function allows Flink SQL to call OpenAI API for inference tasks.

Overview #

The function supports calling remote OpenAI model services via Flink SQL for prediction/inference tasks. Currently, the following tasks are supported:

  • Chat Completions: generate a model response from a list of messages comprising a conversation.
  • Embeddings: get a vector representation of a given input that can be easily consumed by machine learning models and algorithms.

Usage examples #

The following example creates a chat completions model and uses it to predict sentiment labels for movie reviews.

First, create the chat completions model with the following SQL statement:

CREATE MODEL ai_analyze_sentiment
INPUT (`input` STRING)
OUTPUT (`content` STRING)
WITH (
    'provider'='openai',
    'endpoint'='https://api.openai.com/v1/chat/completions',
    'api-key' = '<YOUR KEY>',
    'model'='gpt-3.5-turbo',
    'system-prompt' = 'Classify the text below into one of the following labels: [positive, negative, neutral, mixed]. Output only the label.'
);

Suppose the following data is stored in a table named movie_comment, and the prediction result is to be stored in a table named print_sink:

CREATE TEMPORARY VIEW movie_comment(id, movie_name,  user_comment, actual_label)
AS VALUES
  (1, 'Good Stuff', 'The part where children guess the sounds is my favorite. It's a very romantic narrative compared to other movies I've seen. Very gentle and full of love.', 'positive');

CREATE TEMPORARY TABLE print_sink(
  id BIGINT,
  movie_name VARCHAR,
  predicit_label VARCHAR,
  actual_label VARCHAR
) WITH (
  'connector' = 'print'
);

Then the following SQL statement can be used to predict sentiment labels for movie reviews:

INSERT INTO print_sink
SELECT id, movie_name, content as predicit_label, actual_label
FROM ML_PREDICT(
  TABLE movie_comment,
  MODEL ai_analyze_sentiment,
  DESCRIPTOR(user_comment));

Model Options #

Common #

Option Required Default Type Description
provider
required (none) String Specifies the model function provider to use, must be 'openai'.
endpoint
required (none) String Full URL of the OpenAI API endpoint, e.g. https://api.openai.com/v1/chat/completions or https://api.openai.com/v1/embeddings.
api-key
required (none) String OpenAI API key for authentication.
model
required (none) String Model name, e.g. gpt-3.5-turbo, text-embedding-ada-002.

Chat Completions #

Option Required Default Type Description
system-prompt
optional "You are a helpful assistant." String The input message for the system role.
temperature
optional null Double Controls randomness of output, range [0.0, 1.0]. See temperature
top-p
optional null Double Probability cutoff for token selection (used instead of temperature). See top_p
stop
optional null String Stop sequences, comma-separated list. See stop
max-tokens
optional null Long Maximum number of tokens to generate. See max tokens

Embeddings #

Option Required Default Type Description
dimension
optional null Long Dimension of the embedding vector. See dimensions

Schema Requirement #

Task Input Type Output Type
Chat Completions STRING STRING
Embeddings STRING ARRAY<FLOAT>