Releases: datahub-project/datahub
v1.4.0.2
There is no version v1.4.0.1, moving to v1.4.0.2 to be consistent with the CLI version for this release.
Changes:
- Bump default CLI version
Full Changelog: v1.4.0...v1.4.0.2
v1.4.0
Release Highlights
DataHub v1.4.0 is packed with exciting updates, including:
-
AI & Context: Introducing Context Documents for bringing organizational knowledge to DataHub. Create context documents directly on DataHub, or import them from Notion & Confluence. Curate, refine, and semantically search across your documents using DataHub MCP Server & Agent Context Kit. Requires admin configuration.
-
Major UI Improvements: Redesigned ingestion source creation workflow with guided step-by-step experience, modernized login/signup pages, support for Service Accounts, new asset “Summary” profile tab with modular layout, and ability to upload files to asset documentation. Read more below!
-
New Ingestion Connectors: New connectors for Google Dataplex, Azure Data Factory, IBM Db2, Notion, & Confluence. Major enhancements include Airflow 3.x support, Snowflake Streamlit apps and Semantic Views ingestion, Databricks OAuth authentication, and Kafka Connect Confluent Cloud integration.
-
SDK Features: New Java SDK V2 with fluent builder API, Python SDK Tag entity support, parametrized assertion runs, and full Pydantic v2 migration.
-
Platform Improvements: Elasticsearch 8 support with multi-client shim and semantic search infrastructure.
User Experience
This release includes significant improvements to the user interface and user experience:
Improved Experiences: Home Page, Lineage Explorer, Entity Profiles, & More
Simplified Home Page
DataHub’s simplified, modular home page experience is now enabled by default for all DataHub instances.
Learn more about the new Home Page here.
Support for the old home page will be dropped in an upcoming release. Until that time, you may revert to the previous home page by setting following environment variable in the datahub-gms:
SHOW_HOME_PAGE_REDESIGNtofalse
Entity Profile Summary Tabs
Check out the new summary tabs available on Domain, Glossary Term, & Data Product profile pages. Summaries provide an overview of the key details about each entity at a glance.
Streamlined Data Lineage Explorer
Experience the most seamless version of data lineage yet. Seamlessly navigate across data dependencies with the redesigned lineage navigator. Behavior should largely remain the same as with the old lineage UI. In a future release, the old UI will be removed. For now, you can revert to it by setting following environment variable in the datahub-gms:
LINEAGE_GRAPH_V3tofalse
Other UX Improvements
We’ve also included modernized Ingestion, Login, Sign Up, and Analytics pages in this release. Check them out and let us know what you think!
Important: Note that we’ve disabled the legacy UI for DataHub by default as of this release. You’ll no longer be able to toggle between the legacy UI & new UI in settings - the new UI will be visible by default. In future releases, the legacy UI will be removed from the UI codebase completely.
Context Documents & Semantic Search
Introducing Context Documents V1, a new feature that allows adding AI-related context and documentation to assets, & optional configurability for semantic search (beta).
- Added models and APIs for Context Documents [#15280]
- Introduced UI flows for Context Documents. [#15279]
- Various UI improvements for Context Documents. [#15413]
- Support viewing and adding related context to all asset types. [#15453]
- Support for ingesting external context documents from Notion & Confluence (see Ingestion Updates below).
- Support for configuring semantic indexing of document contents, and semantic search via the
semanticSearchAcrossEntitiesGraphQL resolver, through DataHub MCP Server, and via the Agent Context Kit document search tools.- Note that semantic search must happen via Ingestion Recipes (Notion, Confluence, or DataHub Documents ingestion source). For more details, see Semantic Search Configuration.
- Semantic Search is only supported if you are using OpenSearch version 2.19.3+. It is NOT currently supported for Elasticsearch deployments.
This feature is enabled by default, but can be disabled by setting the following environment variable in the datahub-gms:
CONTEXT_DOCUMENTS_ENABLEDtofalse
Read more about Context Documents here. And read about configuring platform capabilities required for semantic search here.
Agent Context Kit: Snowflake, LangChain, MCP Server
As of v1.4.0, DataHub is publishing a new Agent Context Kit Python library.
Shipped in this release:
- Snowflake: Providing a new
datahub agentCLI command that enables you to provision a Snowflake Cortex Agent that automatically has access to various DataHub tools for searching assets, documents, retrieving lineage, sample queries, and more. Learn more here. - LangChain: Providing a Python tools library that enables you to easily build LangChain Agents with access to DataHub assets & metadata. Learn more here.
In addition, we’ve also made some important additions to the DataHub MCP Server to add a host of new tools:
- Mutation Tools: Edit tags, terms, owners, descriptions, structured properties, domains, & more.
- Document Tools: Search (keyword OR semantic) across context documents, create new documents in the “Shared” space.
Which will be available in the v0.5.0 version of DataHub MCP Server.
Service Accounts
Support for creating named service accounts, generating API access tokens, and granting permissions via DataHub’s Access Policies system. Useful for creating dedicated
- Add support for service accounts in DataHub [#52765]
This feature is enabled by default. Read more about service accounts here.
Upload Files to Asset Documentation
New capability to upload and download files when documenting any types of assets in DataHub using configurable S3 storage backend. Requires configuring DataHub’s backend server to be able to read and write from a particular S3 bucket.
- File upload to S3 extension in UI. [#15061]
- Presigned upload URL endpoint. [#14943]
- Inline previews for text, PDF, and video files. [#15182]
- Support for schema field and asset documentation. [#15055]
- Permission checks for file downloads. [#15059]
This feature is disabled by default, and can be enabled by setting various environment variable in the datahub-gms container:
-
DOCUMENTATION_FILE_UPLOAD_V1totrue -
And S3 configs:
DATAHUB_BUCKET_NAME: # The S3 bucket name to use for storing data DATAHUB_ROLE_ARN: # The AWS IAM role ARN to assume for S3 reads and writes
Note that this assumes AWS credentials with permission to read & write to the specified bucket are available & mounted in the environment where DataHub is running.
Other Improvements
- Support linking multiple Applications to entities. [#15160]
- Structured properties infinite scroll with backend search. [#14991]
- Option to hide structured properties with empty values. [#14872]
- Model signature table for MLModel summary tab. [#15205]
- Improved More Filters UX. [#15794]
- Role selector with pagination and search. [#15858]
- Tag editing updates with new menu. [#14884]
- Show all views in settings. [#14971]
- Runs tab for DataFlow entities. [#15775]
Metadata Ingestion
We're continuously improving our integrations to add new capabilities and squash bugs.
New Sources
- Google Dataplex: New connector for Google Dataplex metadata ingestion. In incubation. [#15379]
- Azure Data Factory: New connector for Azure Data Factory pipelines and datasets. In incubation. [#15499]
- Microsoft Fabric OneLake: New connector to ingest from Fabric workspaces, lakehouses, warehouses, schema, and tables.
- IBM Db2: New source for IBM Db2 databases. Incubating.. [#14968]
- Notion: Added as ingestion source for Context Documents. In incubation. [#15970]
- **Confluence...
v1.4.0rc12
chore(): Bump default packaged CLI release (#16127) Co-authored-by: John Joyce <[email protected]>
v1.4.0rc11
feat(ui): Add support for notion + confluence to ingestion form V1 (#…
v1.4.0rc9
Release Highlights
Fix scroll for groups.
v1.4.0rc8
Release Notes
Same as last, but includes default feature flags.
v1.4.0rc10
fix(documents): prevent duplicate document creation on page refresh (…
v1.4.0rc7
Release Highlights
Same as previous just retrying the build..
v1.4.0rc6
Release Highlights
Attempting to trigger another publish.
v1.4.0rc5
Release Highlights
Same as previous, still RC.