0% found this document useful (0 votes)
32 views10 pages

Azure Databricks Components

The document outlines the essential components of Azure Databricks, including workspaces, accounts, billing, authentication, and various interfaces for accessing assets. It details data management tools like Unity Catalog, schemas, and tables, as well as computation management aspects such as clusters and jobs. Additionally, it covers data engineering, AI and machine learning capabilities, data warehousing, and visualization features within Azure Databricks.

Uploaded by

sdiesel211
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views10 pages

Azure Databricks Components

The document outlines the essential components of Azure Databricks, including workspaces, accounts, billing, authentication, and various interfaces for accessing assets. It details data management tools like Unity Catalog, schemas, and tables, as well as computation management aspects such as clusters and jobs. Additionally, it covers data engineering, AI and machine learning capabilities, data warehousing, and visualization features within Azure Databricks.

Uploaded by

sdiesel211
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

8/13/25, 1:54 AM Azure Databricks components - Azure Databricks | Microsoft Learn

Azure Databricks components


06/18/2025

This article introduces fundamental components you need to understand in order to use Azure
Databricks effectively.

Accounts and workspaces


In Azure Databricks, a workspace is an Azure Databricks deployment in the cloud that functions as
an environment for your team to access Databricks assets. Your organization can choose to have
either multiple workspaces or just one, depending on its needs.

An Azure Databricks account represents a single entity that can include multiple workspaces.
Accounts enabled for Unity Catalog can be used to manage users and their access to data
centrally across all of the workspaces in the account.

Billing: Databricks units (DBUs)


Azure Databricks bills based on Databricks units (DBUs), which are units of processing capability
per hour based on VM instance type.

See the Azure Databricks pricing page .

Authentication and authorization


This section describes concepts that you need to know when you manage Azure Databricks
identities and their access to Azure Databricks assets.

User
A unique individual who has access to the system. User identities are represented by email
addresses. See Manage users.

Service principal
A service identity for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD
platforms. Service principals are represented by an application ID. See Service principals.
https://learn.microsoft.com/en-us/azure/databricks/getting-started/concepts 1/10
8/13/25, 1:54 AM Azure Databricks components - Azure Databricks | Microsoft Learn

Group
A collection of identities. Groups simplify identity management, making it easier to assign access
to workspaces, data, and other securable objects. All Databricks identities can be assigned as
members of groups. See Groups.

Access control list (ACL)


A list of permissions attached to the workspace, cluster, job, table, or experiment. An ACL specifies
which users or system processes are granted access to the objects, as well as what operations are
allowed on the assets. Each entry in a typical ACL specifies a subject and an operation. See Access
control lists.

Personal access token (PAT)


A personal access token is a string used to authenticate REST API calls, Technology partners
connections, and other tools. See Azure Databricks personal access token authentication.

Microsoft Entra ID tokens can also be used to authenticate to the REST API.

Azure Databricks interfaces


This section describes the interfaces for accessing your assets in Azure Databricks.

UI
The Azure Databricks UI is a graphical interface for interacting with features, such as workspace
folders and their contained objects, data objects, and computational resources.

REST API
The Databricks REST API provides endpoints for modifying or requesting information about Azure
Databricks account and workspace objects. See account reference and workspace reference .

SQL REST API


The SQL REST API allows you to automate tasks on SQL objects. See SQL API .

https://learn.microsoft.com/en-us/azure/databricks/getting-started/concepts 2/10
8/13/25, 1:54 AM Azure Databricks components - Azure Databricks | Microsoft Learn

CLI
The Databricks CLI is hosted on GitHub . The CLI is built on top of the Databricks REST API.

Data management
This section describes the tools and logical objects used to organize and govern data on Azure
Databricks. See Database objects in Azure Databricks.

Unity Catalog
Unity Catalog is a unified governance solution for data and AI assets on Azure Databricks that
provides centralized access control, auditing, lineage, and data discovery capabilities across
Databricks workspaces. See What is Unity Catalog?.

Catalog
Catalogs are the highest level container for organizing and isolating data on Azure Databricks.
You can share catalogs across workspaces within the same region and account. See What are
catalogs in Azure Databricks?.

Schema
Schemas, also known as databases, are contained within catalogs and provide a more granular
level of organization. They contain database objects and AI assets, such as volumes, tables,
functions, and models. See What are schemas in Azure Databricks?.

Table
Tables organize and govern access to structured data. You query tables with Apache Spark SQL
and Apache Spark APIs. See Introduction to Azure Databricks tables.

View
A view is a read-only object derived from one or more tables and views. Views save queries that
are defined against tables. See What is a view?.

https://learn.microsoft.com/en-us/azure/databricks/getting-started/concepts 3/10
8/13/25, 1:54 AM Azure Databricks components - Azure Databricks | Microsoft Learn

Volume
Volumes represent a logical volume of storage in a cloud object storage location and organize
and govern access to non-tabular data. Databricks recommends using volumes for managing all
access to non-tabular data on cloud object storage. See What are Unity Catalog volumes?.

Delta table
By default, all tables created in Azure Databricks are Delta tables. Delta tables are based on the
Delta Lake open source project , a framework for high-performance ACID table storage over
cloud object stores. A Delta table stores data as a directory of files on cloud object storage and
registers table metadata to the metastore within a catalog and schema.

Find out more about technologies branded as Delta.

Metastore
Unity Catalog provides an account-level metastore that registers metadata about data, AI, and
permissions about catalogs, schemas, and tables. See Metastore.

Azure Databricks provides a legacy Hive metastore for customers that have not adopted Unity
Catalog. See Hive metastore table access control (legacy).

Catalog Explorer
Catalog Explorer allows you to explore and manage data and AI assets, including schemas
(databases), tables, models, volumes (non-tabular data), functions, and registered ML models. You
can use it to find data objects and owners, understand data relationships across tables, and
manage permissions and sharing. See What is Catalog Explorer?.

DBFS root

) Important

Storing and accessing data using DBFS root or DBFS mounts is a deprecated pattern and not
recommended by Databricks. Instead, Databricks recommends using Unity Catalog to
manage access to all data. See What is Unity Catalog?.

https://learn.microsoft.com/en-us/azure/databricks/getting-started/concepts 4/10
8/13/25, 1:54 AM Azure Databricks components - Azure Databricks | Microsoft Learn

The DBFS root is a storage location available to all users by default. See What is DBFS?.

Computation management
This section describes concepts that you need to know to run computations in Azure Databricks.

Cluster
A set of computation resources and configurations on which you run notebooks and jobs. There
are two types of clusters: all-purpose and job. See Compute.

You create an all-purpose cluster using the UI, CLI, or REST API. You can manually terminate
and restart an all-purpose cluster. Multiple users can share such clusters to do collaborative
interactive analysis.
The Azure Databricks job scheduler creates a job cluster when you run a job on a new job
cluster and terminates the cluster when the job is complete. You cannot restart an job cluster.

Pool
A set of idle, ready-to-use instances that reduce cluster start and auto-scaling times. When
attached to a pool, a cluster allocates its driver and worker nodes from the pool. See Pool
configuration reference.

If the pool does not have sufficient idle resources to accommodate the cluster's request, the pool
expands by allocating new instances from the instance provider. When an attached cluster is
terminated, the instances it used are returned to the pool and can be reused by a different cluster.

Databricks runtime
The set of core components that run on the clusters managed by Azure Databricks. See Compute.
Azure Databricks has the following runtimes:

Databricks Runtime includes Apache Spark but also adds a number of components and
updates that substantially improve the usability, performance, and security of big data
analytics.
Databricks Runtime for Machine Learning is built on Databricks Runtime and provides
prebuilt machine learning infrastructure that is integrated with all of the capabilities of the

https://learn.microsoft.com/en-us/azure/databricks/getting-started/concepts 5/10
8/13/25, 1:54 AM Azure Databricks components - Azure Databricks | Microsoft Learn

Azure Databricks workspace. It contains multiple popular libraries, including TensorFlow,


Keras, PyTorch, and XGBoost.

Jobs & Pipelines UI


The Jobs & Pipelines workspace UI provides entry to the Jobs, Lakeflow Declarative Pipelines, and
Lakeflow Connect UIs, which are tools that allow you to orchestrate and schedule workflows.

Jobs
A non-interactive mechanism for orchestrating and scheduling notebooks, libraries, and other
tasks. See Lakeflow Jobs

Pipelines
Lakeflow Declarative Pipelines provide a declarative framework for building reliable, maintainable,
and testable data processing pipelines. See Lakeflow Declarative Pipelines.

Workload
Workload is the amount of processing capability needed to perform a task or group of tasks.
Azure Databricks identifies two types of workloads: data engineering (job) and data analytics (all-
purpose).

Data engineering An (automated) workload runs on a job cluster which the Azure Databricks
job scheduler creates for each workload.
Data analytics An (interactive) workload runs on an all-purpose cluster. Interactive workloads
typically run commands within an Azure Databricks notebook. However, running a job on an
existing all-purpose cluster is also treated as an interactive workload.

Execution context
The state for a read–eval–print loop (REPL) environment for each supported programming
language. The languages supported are Python, R, Scala, and SQL.

Data engineering

https://learn.microsoft.com/en-us/azure/databricks/getting-started/concepts 6/10
8/13/25, 1:54 AM Azure Databricks components - Azure Databricks | Microsoft Learn

Data engineering tools aid collaboration among data scientists, data engineers, data analysts, and
machine learning engineers.

Workspace
A workspace is an environment for accessing all of your Azure Databricks assets. A workspace
organizes objects (notebooks, libraries, dashboards, and experiments) into folders and provides
access to data objects and computational resources.

Notebook
A web-based interface for creating data science and machine learning workflows that can contain
runnable commands, visualizations, and narrative text. See Databricks notebooks.

Library
A package of code available to the notebook or job running on your cluster. Databricks runtimes
include many libraries, and you can also upload your own. See Install libraries.

Git folder (formerly Repos)


A folder whose contents are co-versioned together by syncing them to a remote Git repository.
Databricks Git folders integrate with Git to provide source and version control for your projects.

AI and machine learning


Databricks provides an integrated end-to-end environment with managed services for developing
and deploying AI and machine learning applications.

Mosaic AI
The brand name for products and services from Databricks Mosaic AI Research, a team of
researchers and engineers responsible for Databricks biggest breakthroughs in generative AI.
Mosaic AI products include the ML and AI features in Databricks. See Mosaic Research .

Machine learning runtime

https://learn.microsoft.com/en-us/azure/databricks/getting-started/concepts 7/10
8/13/25, 1:54 AM Azure Databricks components - Azure Databricks | Microsoft Learn

To help you develop ML and AI models, Databricks provides a Databricks Runtime for Machine
Learning, which automates compute creation with pre-built machine learning and deep learning
infrastructure including the most common ML and DL libraries. It also has built-in, pre-configured
GPU support including drivers and supporting libraries. Browse to information about the latest
runtime releases from Databricks Runtime release notes versions and compatibility.

Experiment
A collection of MLflow runs for training a machine learning model. See Organize training runs
with MLflow experiments.

Features
Features are an important component of ML models. A feature store enables feature sharing and
discovery across your organization and also ensures that the same feature computation code is
used for model training and inference. See Feature management.

Generative AI models
Databricks supports the exploration, development, and deployment of generative AI models,
including:

AI playground, a chat-like environment in the workspace where you can test, prompt, and
compare LLMs. See Chat with LLMs and prototype generative AI apps using AI Playground.
A built-in set of pre-configured foundation models that you can query:
See Pay-per-token Foundation Model APIs.
See [Recommended] Deploy foundation models from Unity Catalog for foundation
models you can serve with a single click.
Third-party hosted LLMs, called external models. These models are meant to be used as-is.
Capabilities to customize a foundation model to optimize its performance for your specific
application (often called fine-tuning). See Foundation Model Fine-tuning.

Model registry
Databricks provides a hosted version of MLflow Model Registry in Unity Catalog. Models
registered in Unity Catalog inherit centralized access control, lineage, and cross-workspace
discovery and access. See Manage model lifecycle in Unity Catalog.

https://learn.microsoft.com/en-us/azure/databricks/getting-started/concepts 8/10
8/13/25, 1:54 AM Azure Databricks components - Azure Databricks | Microsoft Learn

Model serving
Mosaic AI Model Serving provides a unified interface to deploy, govern, and query AI models.
Each model you serve is available as a REST API that you can integrate into your web or client
application. With Mosaic AI Model Serving, you can deploy your own models, foundation models,
or third-party models hosted outside of Databricks. See Deploy models using Mosaic AI Model
Serving.

Data warehousing
Data warehousing refers to collecting and storing data from multiple sources so it can be quickly
accessed for business insights and reporting. Databricks SQL is the collection of services that
bring data warehousing capabilities and performance to your existing data lakes. See Data
warehousing architecture.

Query
A query is a valid SQL statement that allows you to interact with your data. You can author queries
using the in-platform SQL editor, or connect using a SQL connector, driver, or API. See Access and
manage saved queries to learn more about how to work with queries.

SQL warehouse
A computation resource on which you run SQL queries. There are three types of SQL warehouses:
Classic, Pro, and Serverless. Azure Databricks recommends using serverless warehouses where
available. See SQL warehouse types to compare available features for each warehouse type.

Query history
A list of executed queries and their performance characteristics. Query history allows you to
monitor query performance, helping you identify bottlenecks and optimize query runtimes. See
Query history.

Visualization
A graphical presentation of the result of running a query. See Visualizations in Databricks
notebooks and SQL editor.

https://learn.microsoft.com/en-us/azure/databricks/getting-started/concepts 9/10
8/13/25, 1:54 AM Azure Databricks components - Azure Databricks | Microsoft Learn

Dashboard
A presentation of data visualizations and commentary. You can use dashboards to automatically
send reports to anyone in your Azure Databricks account. Use the Databricks Assistant to help you
build visualizations based on natural language prompts. See Dashboards. You can also create a
dashboard from a notebook. See Dashboards in notebooks.

For legacy dashboards, see Legacy dashboards.

) Important

Databricks recommends using AI/BI dashboards (formerly Lakeview dashboards). Earlier


versions of dashboards, previously referred to as Databricks SQL dashboards are now
called legacy dashboards.

End of support timeline:

As of April 7, 2025: Official support for the legacy version of dashboards has ended.
You can no longer create new legacy dashboards. Only critical security issues and
service outages will be addressed.

November 3, 2025: Databricks will begin archiving legacy dashboards that have not
been accessed in the past six months. Archived dashboards will no longer be accessible,
and the archival process will occur on a rolling basis. Access to actively used
dashboards will remain unchanged.

Databricks will work with customers to develop migration plans for active legacy
dashboards after November 3, 2025.

Convert legacy dashboards using the migration tool or REST API. See Clone a legacy
dashboard to an AI/BI dashboard for instructions on using the built-in migration tool.
See Dashboard tutorials for tutorials on creating and managing dashboards using the
REST API.

https://learn.microsoft.com/en-us/azure/databricks/getting-started/concepts 10/10

You might also like