Master Azure Databricks Day - 5
Getting Started with Azure Databricks Assets
Follow Me to Get Such Praveen Patel
More Content Azure Data Engineer
Azure Databricks Assets
In Azure Databricks, "assets" refer to the various components you use to build, run, and
manage data workflows. These assets can include notebooks, clusters, jobs, datasets,
models, and more.re.
Praveen Patel | Azure Data Engineer
Cluster
Databricks cluster is a set of computation resources and configurations on which you
run data engineering, data science, and data analytics workloads, such as production
ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning.
Example
Cluster Name : DataProcessing
ClusterNode Type: Standard_DS3_v2
Runtime: Databricks Runtime 13.3 LTS (includes Apache Spark 3.4)
Use Case
Attach the notebook to a compute cluster.
Praveen Patel | Azure Data Engineer
Notebooks
A notebook is a web-based interface to documents containing a series of runnable cells
(commands) that operate on files and tables, visualizations, and narrative text.
Commands can be run in sequence, referring to the output of one or more previously run
commands.
Example
Notebook cell
df spark.read.csv("/mnt/data/sales.csv", header=True, inferSchema=True)
df.groupBy("Region").sum("Sales").show()
Use Case
Write Python code to explore data and build a model.
Praveen Patel | Azure Data Engineer
Jobs
Automated executions of notebooks, JARs, Python scripts, or Delta Live Tables pipelines.
Example
Job Name: Daily Sales
ETLTask: Runs a notebook every morning at 7:00 AM
Email alert: Sends success/failure notifications
Use Case
Schedule the model retraining every week.
Praveen Patel | Azure Data Engineer
Libraries
A library makes third-party or locally-built code available to notebooks and jobs running
on your clusters.
Example
Installed pandas, scikit-learn, or custom.whl file for ML processing.
Use Case
Use scikit-learn or xgboost installed on the cluster.
Praveen Patel | Azure Data Engineer
Data
You can import data into a distributed file system mounted into an Azure Databricks
workspace and work with it in Azure Databricks notebooks and clusters. You can also use
a wide variety of Apache Spark data sources to access data.
Example
Mount Azure Data Lake Gen2 to /mnt/datalake in Databricks
Use Case
Load customer behavior data from Azure Data Lake.
Praveen Patel | Azure Data Engineer
Experiment
MLflow Experiments lets you run MLflow machine learning model trainings.
Example
Log model metrics:
import mlflow with mlflow.start_run():
mlflow.log_metric("rmse", 0.87)
Use Case
Save your trained model with MLflow.
Praveen Patel | Azure Data Engineer
Praveen Patel
Azure Data Engineer
Follow Me to Get Such
More Content Like This