0% found this document useful (0 votes)
1K views23 pages

Oracle Vector Ai

The document contains a series of questions and answers related to vector embeddings, similarity search, and database operations in Oracle Database 23ai. It covers topics such as storage options for vector embeddings, factors affecting similarity search results, DDL operations on VECTOR columns, and the use of various SQL and PL/SQL functions. Additionally, it discusses the integration of Generative AI with Autonomous Database and the implications of different distance metrics in vector searches.

Uploaded by

riyasathsafran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views23 pages

Oracle Vector Ai

The document contains a series of questions and answers related to vector embeddings, similarity search, and database operations in Oracle Database 23ai. It covers topics such as storage options for vector embeddings, factors affecting similarity search results, DDL operations on VECTOR columns, and the use of various SQL and PL/SQL functions. Additionally, it discusses the integration of Generative AI with Autonomous Database and the implications of different distance metrics in vector searches.

Uploaded by

riyasathsafran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

1.

When generating vector embeddings outside the database, what is the most suitable

option for storing the embeddings for later use?

1. in a CSV file

2. In a binary FVEC file with the relational data in a CSV file

3. In the database as BLOB (Binary Large Object) data

4. In a dedicated vector database

2. When generating vector embeddings for a new dataset outside of Oracle Database

23ai, which factor is crucial to ensure meaningful similarity search results?

1. The choice of programming language used to process the dataset (for

example, Python, Java)

2. The physical location where the vector embeddings are stored

3. The storage format of the new dataset (for example, CSV, JSON)

4. The same vector embedding model must be used for vectorizing the data

and creating a query vector

Arial

3. You are working with vector search in Oracle Database 23ai and need to ensure the

integrity of your vector data during storage and retrieval

Which factor is crucial for maintaining the accuracy and reliability of your vector search

results
1. Using the same embedding model for both vector creation and similarity

search

2. Regularly updating vector embeddings to reflect changes in the source

data

3. The specific distance algorithm employed for vector comparisons

4. The physical storage location of the vector data

4. Which DDL operation is NOT permitted on a table containing a VECTOR column in

Oracle Database 23ai?

1. Creating a new table using CTAS CREATE TABLE AS SELECT that

includes the VECTOR column from the original table

2. Dropping an existing VECTOR column from the table

3. Modifying the data type of an existing VECTOR column to a non-VECTOR

type

4. Adding a new VECTOR column to the table

5. Which SQL statement correctly adds a VECTOR column named v with 4 dimensions

and FLOAT 32 format to an existing table named my_table?

1. ALTER TABLE my_table MODIFY (V VECTOR (4, FLOAT32))

2. ALTER TABLE my_table ADD (V VECTOR (4, FLOAT32))


3. UPDATE my_table SET v - VECTOR (4, FLOAT32)

4. ALTER TABLE my_table ADD v VECTOR (4, FLOAT32)

6. A machine learning team is using IVF indexes in Oracle Database 23ai to find similar

images in a large dataset. During testing, they observe that the search results are often

incomplete, missing relevant images. They suspect the issue lies in the number of

partitions probed.How should they improve the search accuracy?

1. Add the TARGET ACCURACY clause to the query with a higher value for

the accuracy.

2. Change the index type to HNSW for better accuracy.

3. Increase the VECTOR MEMORY SIZE initialization parameter.

4. Re-create the index with a higher EFCONSTRUCTION value.

7. What happens when querying with an IVF index if you increase the value of the

NEIGHBOR PARTITION probes parameter?

1. The number of centroids decreases.

2. Accuracy decreases.

3. Index creation time is reduced.

4. More partitions are probed, improving accuracy, but also increasing query
latency

8. Which PL/SQL package is primarily used for interacting with Generative Al services in

Oracle Database 23ai?

1. DBMS AI

2. DBMS ML

3. DBMS VECTOR CHAIN

4. DBMS GENAI

9. Which SQL function is used to create a vector embedding for a given text string in

Oracle Database 23ai ?

1. GENERATE EMBEDDING

2. CREATE VECTOR_EMBEDDING

3. EMBED TEXT

4. VECTOR EMBEDDING

10. Which PL/SQL function converts documents such as PDF, DOC, JSON, XML, or

HTML to plain text?

1.ADBMS VECTOR/TEXT_TO_PLAIN

2. DBMS VECTOR_CHAIN. UTL TO TEXT

3. DBMS VECTOR CHAIN.UTIL_TO_CHUNKS

4. DBMS VECTOR.CONVERT_TO_TEXT
11. What is the primary purpose of the DBMS_VECTOR_CHAIN_UTL_TO_CHUNS

package in a RAG application?

1. To generate vector embeddings from a text document

2. To load a document into the database

3. To split a large document into smaller chunks to improve vector quality by

minimizing token truncation

4. To convert a document into a single, large text string.

12. What is the first step in setting up the practice environment for Select Al?

1. Optionally create an OCI compartment.

2. Create a policy to enable access to OCI Generative Al

3. Drop any compartment that does not use OCI Generative Al.

13. How is the security interaction between Autonomous Database and OCI

Generative Al managed in the context of Select Al?

1. By encrypting all communication between the Autonomous Database and OCI

Generative Al using TLS/SSL protocols

2. By utilizing Resource Principals, which grant the Autonomous Database instance

access to OCI Generative Al without exposing sensitive credentials

3. By establishing a secure VPN tunnel between the Autonomous Database and

OCI Generative Al service


4. By requiring users to manually enter their OCI API keys each time they execute a

natural language query

14. You are storing 1,000 embeddings in a VECTOR column, each with 256

dimensions using FLOAT32. What is the approximate size of the data on disk?

1. 1 MB

2. 4 MB

3. 256 KB

4. 1GB

15. Which Oracle Cloud Infrastructure (OC) service is directly integrated with

Select Al?

1. 000 Language

2. OCI Generative Al

3. OCT Vision

4. OCI Data Science

16.Which is NOT a feature or capability related to Al and Vector Search in

Exadata?

1. Native Support for Vector Search Only within the Database Server

2. Vector Replication with Golden Gate

3. Loading Vector Data using SQL *Loader


4. Al Smart Scan

17.Which statement best describes the core functionality and benefit of Retrieval

Augmented Generation (RAG) in Oracle Database 23ai?

1. it empowers LLMs to interact with private enterprise data stored within the

database, leading to more context aware and precise responses to user queries

2. It primarily aims to optimize the performance and efficiency of LLMs by using

advanced data retrieval techniques, thus minimizing response times, and

reducing computational overhead

3. It allows users to train their own specialized LLMs directly within the Oracle

Database environment using their internal data, thereby reducing the reliance on

external ai providers

4. It enables Large Language Models (LLMs) to access and process real-time data

streams from diverse sources to generate the most up-to-date insights

18.If a query vector uses a different distance metric than the one used to create

the index, what happens?

1. The query fails.

2. An exact match search is triggered.

3. The index automatically updates.

4. A warning is logged, but the query executes.


19. What are the key advantages and considerations of using Retrieval

Augmented Generation (RAG) in the context of Oracle Ai Vector Search?

1. it excels at optimizing the performance and efficiency of LLM inference through

advanced caching and precomputation techniques, leading to faster response

times but potentially storage requirements.

2. It priorities real-time data extraction and summarization from various sources to

ensure the LLM always has the most up-to-date information

3. It focuses on training specialized LLMs within the database environment for

specific tasks, offering greater control over model behavior and data privacy but

potentially requiring more development effort.

4. It leverages existing database security and access controls, thereby enabling

secure and controlled access to both the database content and the LLM.

20.Which Python library is used to vectorize text chunks and the user's question

in the following example

import oracledb

connection oracledb, connect (uner-un, password-pw, den-es)

table name - Page

with connection.cursor() as cursort

Create the table


create_table_sqi

CREATE TABLE IF NOT EXISTS (table_name) (

try:

cursor.execute(create table sql)

except oracledb.DatabaseError as es

raise

connection.autocommit True

from sentence transformers import SentenceTransformer encoder

SentenceTransformer('all MiniLM-L12-v2)

1. sentence transformers

2. oci

3. oracledb

4. json

21.What is the function of the COS INE parameter in the SQL query used to

retrieve similar vectors?

topk = 3

sqlf'''''select payload, vector distance (vector, :vector, COSINE) as score from


(table name) order by score fetch approxfirat (topk) rows only'"'"'

...

1. It filters out vectors with a cosine similarity below a certain threshold.

2. It converts the vectors to a format compatible with the SQL database.

3. It indicates that the cosine distance metric should be used to measure similarity

between vectors.

4. It specifies the type of vector encoding used in the database.

22.

where each book has multiple paragraphs and sentences.Which SQL structure

should you use?

A. GROUP BY with vector operations.

B. FETCH PARTITIONS BY clause

You are tasked with finding the closest matching sentences across books,

C. A nested query with ORDER BY.

D. Exact similarity search with a single query vector

23.In the following Python code, what is the significance of prepending the

source filename to each text chunk before storing it in the vector database?

docs = [("text": filename + '|'+ section, 'path': filename) for filename, sections in

faqs.item() for section in sections]

# Sample the resulting data

docs [:2]

1. It preserves context and aids in the retrieval process by associating each

vectorized chunk with its original source file.

2. It helps differentiate between chunks from different files but has no impact on
vectorization.

3. It speeds up the vectorization process by providing a unique identifier for each

chunk.

4. It improves the accuracy of the LLM by providing additional training data.

24.How does an application use vector similarity search to retrieve relevant

information from a database, and how is this information then integrated into the

generation process?

1. Encodes the question and database chunks into vectors, finds the most similar

using cosine similarity, and includes them in the LLM prompt.

2. Trains a separate LLM on the database and uses it to answer, ignoring the

general LLM.

3. Converts the question to keywords, searches for matches, and inserts the text

into the response.

4. Clusters similar text chunks and randomly selects one from the most relevant

cluster.

25.When using SQL "Loader to load vector data for search applications, what is a

critical consideration regarding the formatting of the vector data within the input

CSV file?

A. Enclose vector components in curly braces (()).

B. As FVEC is a binary format and the vector dimensions have a known width, fixed

offsets can he used to make parsing the vectors fast and efficient.

C. Use sparse format for vector data.

D. Rely on SQL "Loader's automatic normalization of vector data.

26.Which function is used to generate vector embeddings within an Oracle database?

A. DBMS_VECTOR_CHAIN.UTL_TO_CHUNKS
B. DBMS_VECTOR_CHAIN.UTL_TO_TEXT

C. DBMS_VECTOR_CHAIN.UTL_TO_EMBEDDINGS

D. DBMS_VECTOR_CHAIN.UTL_TO_GENERATE_TEXT

27.Which statement best describes the capability of Oracle Data Pump for handling

vector data in the context of vector search applications?

1. Data Pump only export and import vector data if the vector embeddings are

stored as BLOB (Binary Large Object) data types in the database.

2. Data Pump treats vector embeddings as regular text strings, which can lead to

data corruption or loss of precision when transferring vector data for vector

search.

3. Data Pump provides native support for exporting and importing tables containing

vector data types, facilitating the transfer of vector data for vector search

applications

4. Because of the complexity of vector data,Data Pump requires a specialized plug

in to handle the export andimport operations involving vector data types.

28.What happens when you attempt to insert a vector with an incorrect

number of dimensions into a VECTOR column with a defined number of

dimensions?

1. The database truncates the vector to fit the defined dimensions,

2. The database pads the vector with zeros to match the defined

dimensions.

3. The insert operation fails, and an error message is thrown.

4. The database ignores the defined dimensions and inserts the vector

as is
29.In Oracle Database 23ai, which data type is used to store vector

embeddings for similarity search?

1. VECTOR2

2. BLOB

3. VECTOR

4. VARCHAR2

30 .What is created to facilitate the use of OCI Generative Al with

Autonomous Database?

1. An Al profile for OCI Generative Al

2. A dedicated OCI compartment

3. A new user account with elevated privileges

31. Why would you choose to NOT define a specific sire for the VECTOR

column during development?

1. it impacts the accuracy of similarity searches.

2. it restricts the database to a single embedding model

3. it limits the length of text that can be vectorized.

4. Different external embedding models produce vectors with varying

dimensions and data types.

32. What is the correct order of steps for building a RAG application using

PL/SQL in Oracle Database 23ai?

1. Load ONNX Model, Vectorize Question, Load Document, Split Text

into Chunks, Create Embeddings, Perform Vector Search, Generate


Output.

2. Load Document, Split Text into Chunks, Load ONNX Model, Create

Embeddings, Vectorize Question, Perform Vector Search, Generate

Output.

3. Vectorize Question, Load ONNX Model, Load Document, Split Text

into Chunks, Create Embeddings, Perform Vector Search, Generate

Output.

4. Load Document, Load ONNX Model, Split Text into Chunks, Create

Embeddings, Vectorize Question, Perform Vector Search, Generate

Output.

33. What is the primary purpose of a similarity search in Oracle Databas

23ai?

1. optimize relational database operations

2. to compute distances between all data points in a database

3. To find exact matches in BLOB data

4. To retrieve the most semantically similar entries using distance

metrics between different vectors.

34. What is the advantage of using Euclidean Squared Distance rather than

Euclidean Distance in similarity search queries?

1. it is the default distance metric for Oracle Al Vector Search,

2. It supports hierarchical partitioning of vectors

3. it is simpler and faster because it avoids square-root calculations

4. It guarantees higher accuracy than Euclidean Distance.

35.you need to prioritize accuracy over speed in a similarity search for a

dataset of images. which should you Use?

1. Approximate similarity search with HNSW indexing and target


accuracy of 70%.

2. Multivector similarity search with partitioning.

3. Exact similarity search using a full table scan.

4. Approximate similarity search with IVF indexing and target accuracy

of 70%.

36.What is the significance of splitting text into chunks in the process of

loading data into Oracle Al Vector Search?

1. To reduce the computational burden on the embedding model

2. To facilitate parallel processing of the data during vectorization

3. To minimize token truncation as each vector embedding model has its

own maximum token limit

37. What the purpose of the VECTOR_DISTANCE function in Oracle

Database 23ai similarity search?

1. To fetch rows that match exact vector embeddings.

2. To create vector indexes for efficient searches

3. To group vectors by their exact scores

4. To calculate the distance between vectors using a specified metric.

38. You are asked with creating table to store vector embeddings with the

following characteristics: . Each vector must have exactly 512 dimensions.

. The dimensions should be stored as 32-bit floating point numbers. Which

SQL statement should you use?

1. CREATE Table vectors (id NUMBER, embedding VECTOR(512))f

2. CREATE TABLE vectors (id NUMBER, embedding VECTOR) ,

3. CREATE TabLE vectors (id NUNBER, embedding VECTOR(*, INT8)),

4. CREATE TABLE vectors (id NUNBER, embedding VECTOR (512,


FLÖAT32) );

39. Which function should you use to determine the storage format of a

vector?

1. VECTOR_DIMENSION_FORMAT

2. VECTOR_CHUNKS

3. VECTOR_NORM

4. VECTOR_ENBEDDING

40. what security enhancement is introduced in Exadata system software

24ai?

1. Integration with third party security tool

2. Enhanced encryption algorithm for data at rest

3. Snmp securitys

41. you need to generate a vector from the string 1[1.2, 3.4]" in FLOAT32

format with 2 dimensions Which function will you use?

1. To_VECTOR

2. VECTOR DISTANCE

3. FROM_VECTOR

4. VECTOR_SERIALIZE

42.What is the primary purpose of the VECTOR_EMBEDDING function in

Oracle Database 23ai?

1. To calculate vector dimensions

2. To calculate vector distances

3. To serialize vectors into a string

4. To generate a single vector embedding for data


43. What is a key characteristic of HNSW vector indexes?

1. They are hierarchical with multilayered connections

2. They require exact match for searches

3. They are disk-based structures.

4. They use hash- based clustering.

44. What is the primary function of Al Smart Scan in Exadata System

Software 24ai?

1. To provide real-time monitoring and diagnostics for Al

applications.

2. To accelerate Al workloads by leveraging Exadata RDMA

Memory (XRMEM), Exadata Smart | Cache, and on-storage

processing.

3. To automatically optimize database queries for improved

performance.

45. Which parameter is used to define the number of closest vector

candidates considered during HNSW index creation?

1. EFCONSTRUCTION

2. VECTOR_MEMORY_SIZE

3. NEIGHBOURS

4. TARGET_ACCURACY

46.You want to quickly retrieve the top-10 matches for a query vector from

a dataset of billions of vectors, prioritizing speed over exact accuracy.What

is the best approach?

1. Exact similarity search using flat search


2. Approximate similarity search with a low target accuracy setting

3. Relational filtering combined with an exact search

4. Exact similarity search with a high target accuracy setting

47. Which is a characteristic of an approximate similarity search in Oracle

Database 23ai?

1. It compares every vector in the dataset.

2. It trades off accuracy for faster performance.

3. always guarantees 100% accuracy.

4. It is slower than exact similarity search.

48. Which operation is NOT permitted on tables containing VECTOR

columns?

1. SELECT

2. UPDATE

3. DELETE

4. JOIN ON VECTOR columns.

50. You are asked to fetch the top five vectors nearest to a

query vector, but only for a specific category of documents

Which query structure should you use?

1. Use UNION ALL with vector operations.

2. Perform the similarity search without a WHERE

clause.

3. Apply relational filters and a similarity search in the

query.

4. Use VECTOR_INDEX_HINT and NO WHERE clause


51. What is the primary function of an embedding model in the

context of vector search?

1. To define the schema for a vector database

2. To execute similarity search operations within a

database

3. To Transform text or data into numerical vector

representations

4. To store vectors in a structured format for efficient

retrieval

52. What is the significance of using local ONNX models for

embedding within the database?

1. Support for legacy SQL*Plus clients

2. Improved accuracy compared to external models

3. Reduced embedding dimensions for faster processing

4. Enhanced security because data remains within the

Database

53. Which of the following actions will result in an error when

using VECTOR_DIMENSION_COUNT() in Oracle Database 23ai?

1. Providing a vector with a dimensionality that exceeds

the specified dimension count.

2. Using a vector with a data type that is not supported

by the function.

3. Providing a vector with duplicate values for its

components.

4. Calling the function on a vector that has been created


with to_vector()

54. An application needs to fetch the top-3 matching sentences

from a dataset of books while ensuring a balance between

speed and accuracy. Which query structure should you use?

1. Approximate similarity search with the VECTOR_DISTANCE

function

2. Exact similarity search with Euclidean distance

3. Multivector similarity search with approximate fetching

and target accuracy

4. A combination of relational filters and similarity search.

55. You are asked with finding the closest matching sentences

across books, where each book has multiple paragraphs and

sentences. Which SQL structure should you use?

1. A nested query with ORDER BY

2. Exact similarity search with a single query vector

3. GROUP BY with vector operations

4. FETCH PARTITIONS BY clause

56. What is the primary difference between the HNSW and IVF

vector indexes in Oracle Database 23ai?

1. Both operate identically but differ in memory usage.

2. OHNSW guarantees accuracy, whereas IVF sacrifices

performance for accuracy.

3. OHNSW uses an in-memory neighbor graph for faster

approximate searches, whereas IVF use the buffer cache

with partitions
4. OHNSW is partition based, whereas IVF uses neighbor

graphs for indexing.

57. A database administrator wants to change the VECTOR

MEMORY SIZE parameter for a pluggable database (PDB) in

Oracle Database 23a.Which SQL command is correct?

1. ALTER SYSTEM SET vector memory size-1G SCOFE-

BOTH:

2. ALTER DATABASE SET vector_memory_size-1G SCOPE-

VECTOR/

3. ALTER SYSTEM SET vector_memory_size-1G SCOPE-

SGA

4. ALTER SYSTEM RESET vector_memory_size:

58.Which vector index available in Oracle Database 23ai is

known for its speed and accuracy, making it a preferred choice

for vector search

1. Binary Tree (BT) index

2. Inverted File System (IFS) index

3. Full-Text (FT) index

4. Hierarchical Navigable Small World (HNSW) index

59. What is the purpose of the Vector Pool in Oracle Database

2.3ai?

1. To manage database partitioning

2. To store HNSW vector indexes and IVF index metadata


3. To enable longer SQL execution

4. To store non-vector data types

60. What is the default distance metric used by the VECTOR

DISTANCE function if none is specified?

1. Euclidean

2. Hamming

3. Cosine

4. Manhattan

61. In Oracle Database 23ai, which SQL function calculates the

distance between two vectors using the Euclidean matric?

1. L1 DISTANCE

2. L2 DISTANCE

3. HANMING DISTANCE

4. COSINE DISTANCE

62. What is a key advantage of using GoldenGate 23ai for

managing and distributing vector data for Al applications?

1. Real-time vector data updates across locations.

2. Automatic translation of vector embeddings between

formats.

3. Specialized vector embedding compression.

4. Built-in version control for vector data.

63. What happens when you attempt to insert a vector with an

incorrect number of dimensions into a VECTOR column with a


defined number of dimensions?

1. The database pads the vector with zeros to match the

defined dimensions

2. The database ignores the defined dimensions and

inserts the vector as is.

3. The database truncates the vector to fit the defined

dimensions.

4. The insert operation fails, and an error message is

thrown

Common questions

Powered by AI

The crucial factor is ensuring that the same vector embedding model is used for vectorizing the data and creating a query vector to ensure meaningful similarity search results .

To improve search accuracy, re-create the IVF index with a higher EFCONSTRUCTION value .

Text chunking plays a critical role by minimizing token truncation, as each vector embedding model has its own maximum token limit. Splitting text into chunks allows for more accurate representation of the data within these token limits, thereby enhancing vector quality .

Using different distance metrics will trigger a query execution despite logging a warning, potentially leading to less accurate results .

The approximate size of the data on disk is 1GB .

The primary function of RAG is to empower LLMs to interact with private enterprise data stored within the database, leading to more context-aware and precise responses to user queries .

If a query vector uses a different distance metric than the one used to create the index, a warning is logged, but the query still executes. This indicates a potential misalignment in metric compatibility, which may lead to less accurate results .

Choosing not to define a specific size for a VECTOR column can be advantageous because different external embedding models produce vectors with varying dimensions and data types. This flexibility allows the database to accommodate multiple embedding models without being constrained to a single dimension size .

Zero-padding is appropriate when the number of dimensions of an inserting vector is less than the defined dimensions in a VECTOR column. It allows the vector to fit the predefined structural requirements of the column without data loss or errors .

The insert operation fails, and an error message is thrown .

You might also like