0% found this document useful (0 votes)

184 views9 pages

Mastering Azure Databricks Day-5

The document outlines various assets in Azure Databricks, including clusters, notebooks, jobs, libraries, data, and experiments, which are essential for building and managing data workflows. It provides examples and use cases for each asset, such as running ETL tasks, scheduling model retraining, and logging machine learning metrics. Overall, it serves as a guide for utilizing Azure Databricks effectively in data engineering and analytics tasks.

Uploaded by

Akn Bk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

184 views9 pages

Mastering Azure Databricks Day-5

Uploaded by

Akn Bk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Master Azure Databricks Day - 5

Getting Started with Azure Databricks Assets

Follow Me to Get Such Praveen Patel

More Content Azure Data Engineer
Azure Databricks Assets

In Azure Databricks, "assets" refer to the various components you use to build, run, and
manage data workflows. These assets can include notebooks, clusters, jobs, datasets,
models, and more.re.

Praveen Patel | Azure Data Engineer

Cluster

Databricks cluster is a set of computation resources and configurations on which you

run data engineering, data science, and data analytics workloads, such as production
ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning.

Example

Cluster Name : DataProcessing

ClusterNode Type: Standard_DS3_v2
Runtime: Databricks Runtime 13.3 LTS (includes Apache Spark 3.4)

Use Case

Attach the notebook to a compute cluster.

Praveen Patel | Azure Data Engineer

Notebooks

A notebook is a web-based interface to documents containing a series of runnable cells

(commands) that operate on files and tables, visualizations, and narrative text.
Commands can be run in sequence, referring to the output of one or more previously run
commands.

Example

Notebook cell
df spark.read.csv("/mnt/data/sales.csv", header=True, inferSchema=True)
df.groupBy("Region").sum("Sales").show()
Use Case

Write Python code to explore data and build a model.

Praveen Patel | Azure Data Engineer

Jobs

Automated executions of notebooks, JARs, Python scripts, or Delta Live Tables pipelines.

Example

Job Name: Daily Sales

ETLTask: Runs a notebook every morning at 7:00 AM
Email alert: Sends success/failure notifications

Use Case

Schedule the model retraining every week.

Praveen Patel | Azure Data Engineer

Libraries

A library makes third-party or locally-built code available to notebooks and jobs running
on your clusters.

Example

Installed pandas, scikit-learn, or custom.whl file for ML processing.

Use Case

Use scikit-learn or xgboost installed on the cluster.

Praveen Patel | Azure Data Engineer

Data

You can import data into a distributed file system mounted into an Azure Databricks
workspace and work with it in Azure Databricks notebooks and clusters. You can also use
a wide variety of Apache Spark data sources to access data.

Example

Mount Azure Data Lake Gen2 to /mnt/datalake in Databricks

Use Case

Load customer behavior data from Azure Data Lake.

Praveen Patel | Azure Data Engineer

Experiment

MLflow Experiments lets you run MLflow machine learning model trainings.

Example
Log model metrics:
import mlflow with mlflow.start_run():
mlflow.log_metric("rmse", 0.87)

Mastering Azure Databricks Day-5

Uploaded by

Mastering Azure Databricks Day-5

Uploaded by

Master Azure Databricks Day - 5

Getting Started with Azure Databricks Assets

Follow Me to Get Such Praveen Patel

Praveen Patel | Azure Data Engineer

Databricks cluster is a set of computation resources and configurations on which you

Cluster Name : DataProcessing

Attach the notebook to a compute cluster.

Praveen Patel | Azure Data Engineer

A notebook is a web-based interface to documents containing a series of runnable cells

Write Python code to explore data and build a model.

Praveen Patel | Azure Data Engineer

Job Name: Daily Sales

Schedule the model retraining every week.

Praveen Patel | Azure Data Engineer

Installed pandas, scikit-learn, or custom.whl file for ML processing.

Use scikit-learn or xgboost installed on the cluster.

Praveen Patel | Azure Data Engineer

Mount Azure Data Lake Gen2 to /mnt/datalake in Databricks

Load customer behavior data from Azure Data Lake.

Praveen Patel | Azure Data Engineer

Save your trained model with MLflow.

Praveen Patel | Azure Data Engineer

Follow Me to Get Such

You might also like