From agents to models, from search to training, one platform for all your AI data and workloads
AI thrives on more than text. It needs multimodal data. Today’s complex workloads demand more than a database. They need a new foundation built for AI at scale.
Data lakes only handle tabular data, search engines just work with vectors, and neither work well with multimodal data. Researchers using today's infrastructure face more complexity, higher cost, and slower progress.
LanceDB provides one place for all your AI data and workloads so your team can move fast from idea to petabyte-scale production.
Fast scans and random access. Large blob storage. Zero-copy fine-grained data-evolution at petabyte scale.
table.add_columns({
"title_frame": extract_key_frame("video", 0),
"description": img2txt("title_frame"),
"embedding": embed("description")
})
Blazing fast hybrid search, filter, and rerank over billions of vectors. Compute-storage separation for up to 100x savings.
(table.search("flying cars", query_type="hybrid")
.where("date > '2025-01-01'")
.reranker("cross_encoder_tuned")
.select(["id"]).limit(10)
.to_pandas())
Declarative, distributed and versioned pre-processing for faster feature experimentation and iteration cycles. Native support for LLM-as-UDF.
ds = lance.dataset("s3://bucket/path.lance")
@lance.batch_udf()
def multiply_by_two(x: pa.RecordBatch) -> pa.RecordBatch:
return pa.RecordBatch.from_arrays(
[pc.multiply(x["id"], 2)], ["two"]
)
ds.add_columns(multiply_by_two)
High performance SQL for multimodal data.
db.sql("SELECT decode('audio_track', 'wav') "
"FROM table WHERE id in ('1', '5', '324')")
Faster dataloading, global shuffling, and integrated filters for large scale training using pytorch or JAX.
for batch in DataLoader(table.where("video_height>=720").shuffle()):
inputs, targets = batch["description"], batch["title_frame"]
outputs = model(inputs)
...
From prototype to production.
01
Get started fast with a simple install and intuitive interface.
02
Grow your project to petabyte scale without worrying about infrastructure.
03
Streamline your workflow and focus on high-value experimentation.
01
Unlock the value in your sales calls, decks, contracts, and more.
02
Keep you data private and secure. Works with your existing data lake.
03
Unlock massive scalability and unmatched price-performance.
Highest search QPS on a single table
Massive scalability at a fraction of the cost
Largest table under management
Safety and security guaranteed for your data.