0% found this document useful (0 votes)
184 views9 pages

Mastering Azure Databricks Day-5

The document outlines various assets in Azure Databricks, including clusters, notebooks, jobs, libraries, data, and experiments, which are essential for building and managing data workflows. It provides examples and use cases for each asset, such as running ETL tasks, scheduling model retraining, and logging machine learning metrics. Overall, it serves as a guide for utilizing Azure Databricks effectively in data engineering and analytics tasks.

Uploaded by

Akn Bk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
184 views9 pages

Mastering Azure Databricks Day-5

The document outlines various assets in Azure Databricks, including clusters, notebooks, jobs, libraries, data, and experiments, which are essential for building and managing data workflows. It provides examples and use cases for each asset, such as running ETL tasks, scheduling model retraining, and logging machine learning metrics. Overall, it serves as a guide for utilizing Azure Databricks effectively in data engineering and analytics tasks.

Uploaded by

Akn Bk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Master Azure Databricks Day - 5

Getting Started with Azure Databricks Assets

Follow Me to Get Such Praveen Patel


More Content Azure Data Engineer
Azure Databricks Assets

In Azure Databricks, "assets" refer to the various components you use to build, run, and
manage data workflows. These assets can include notebooks, clusters, jobs, datasets,
models, and more.re.

Praveen Patel | Azure Data Engineer


Cluster

Databricks cluster is a set of computation resources and configurations on which you


run data engineering, data science, and data analytics workloads, such as production
ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning.

Example

Cluster Name : DataProcessing


ClusterNode Type: Standard_DS3_v2
Runtime: Databricks Runtime 13.3 LTS (includes Apache Spark 3.4)

Use Case

Attach the notebook to a compute cluster.

Praveen Patel | Azure Data Engineer


Notebooks

A notebook is a web-based interface to documents containing a series of runnable cells


(commands) that operate on files and tables, visualizations, and narrative text.
Commands can be run in sequence, referring to the output of one or more previously run
commands.

Example

Notebook cell
df spark.read.csv("/mnt/data/sales.csv", header=True, inferSchema=True)
df.groupBy("Region").sum("Sales").show()
Use Case

Write Python code to explore data and build a model.

Praveen Patel | Azure Data Engineer


Jobs

Automated executions of notebooks, JARs, Python scripts, or Delta Live Tables pipelines.

Example

Job Name: Daily Sales


ETLTask: Runs a notebook every morning at 7:00 AM
Email alert: Sends success/failure notifications

Use Case

Schedule the model retraining every week.

Praveen Patel | Azure Data Engineer


Libraries

A library makes third-party or locally-built code available to notebooks and jobs running
on your clusters.

Example

Installed pandas, scikit-learn, or custom.whl file for ML processing.

Use Case

Use scikit-learn or xgboost installed on the cluster.

Praveen Patel | Azure Data Engineer


Data

You can import data into a distributed file system mounted into an Azure Databricks
workspace and work with it in Azure Databricks notebooks and clusters. You can also use
a wide variety of Apache Spark data sources to access data.

Example

Mount Azure Data Lake Gen2 to /mnt/datalake in Databricks

Use Case

Load customer behavior data from Azure Data Lake.

Praveen Patel | Azure Data Engineer


Experiment

MLflow Experiments lets you run MLflow machine learning model trainings.

Example
Log model metrics:
import mlflow with mlflow.start_run():
mlflow.log_metric("rmse", 0.87)

Use Case

Save your trained model with MLflow.

Praveen Patel | Azure Data Engineer


Praveen Patel
Azure Data Engineer

Follow Me to Get Such


More Content Like This

You might also like