1 Azure Terraform Pipeline — DevOps
Terraform is an open-source infrastructure as code (IAC) tool that allows users to
define and deploy infrastructure resources, such as servers, storage, and networking,
using simple, human-readable configuration files.
The keys to production Data & AI applications with Azure Databricks – Terraform
As businesses increasingly prioritize data-driven and AI capabilities, leveraging Databricks
platform on Azure Cloud can expedite achieving these goals. However, navigating the
complexities of building end-to-end production applications on this platform can be
challenging during the initial phase of the project.
In this blog post, we delve into crafting a unified, production-grade architecture with various
Databricks resources that can be adapted and expanded to suit business needs.
Production Requirements
Some of the key challenges that we faced while working with customers were:
1. Automated infrastructure provisioning for different teams and lines of
businesses, each with their distinct roles within the organization.
2. Data governance, auditing, understanding the data lineage across all kinds of
data, including handling of sensitive data assets.
3. Organizing data sharing needs with different lines of businesses and business
partners.
4. A unified development experience, ease of development, managing the lifecycle
of projects, and having a proper testing framework in place.
5. Proper monitoring and logging in place.
All the above should be a repeatable process for continuously extending and building on the
existing platform.
Building a Unified Platform: Methods and Practical Insights
Based on actual customer implementations, the approach to constructing this unified
platform is outlined in four steps below, to be executed in order.
1. Platform setup using Databricks Terraform Provider.
2. Streamlining development of complex projects using Databricks Asset
Bundles (DAB).
3. Enable secure data sharing through Delta Sharing, enhancing collaboration and
data access within the platform.
4. Monitoring and logging
Below is what conceptual architecture might look like.
2 Azure Terraform Pipeline — DevOps
Conceptual Architecture
Platform Setup Using Databricks Terraform Provider
Before we can start working on Databricks platform activities, we have to complete the
following prerequisites in Azure Portal and design a few things which will be leveraged by
Terraform.
Essential Takeaways
1. You need a valid Azure Subscription
2. Work with your cloud engineering team to design VNET and subnets required
for different Databricks Workspaces.
3. Design security hardening for workspaces as per the company’s security
processes (see: Azure Databricks Security Checklist).
4. Design all high level Microsoft EntraID Azure AD groups that will be sync them
into Databricks.
5. Azure Service Principal required for deployment of different Databricks
resources.
6. Organize Terraform scripts into different modules for ease of management.
7. Manage the state of the Terraform in Azure Storage Backend.
8. All the source code for the Terraform project is managed in Azure Repos.
9. Design CI/CD process as per your organization needs to create the
infrastructure as a part of Azure Pipelines.
Exploring Automation Capabilities of the Databricks Terraform Provider
At a minimum, leverage the Databricks Terraform provider to:
3 Azure Terraform Pipeline — DevOps
1. Manage Databricks Workspaces
2. Convert a user group to account admin
3. Create a metastore
4. Setup Service Principals and user groups
5. Setup catalogs
6. Setup cluster policies and compute resources
This will get you started with Databricks platform for your various
business use cases. You can keep extending this platform to add
other resources as your needs evolve.
Let’s look at each of these now in some detail.
Manage Databricks Workspaces
This example will create Workspaces and assign the Azure Service
Principal used by the Terraform script as an admin to the
Workspace.Some organizations require close to 20–30 Workspaces
in use by different teams, so leveraging an automated process is
highly recommended.By utilizing Databricks Terraform provider,
teams can ensure consistency, repeatability, and reliability across
their Workspace deployments. Terraform’s declarative configuration
enables teams to define infrastructure as code, promoting version
control and auditing capabilities.
4 Azure Terraform Pipeline — DevOps
Once you have run the Terraform script, log into the Admin Console
and Workspace(s) to ensure everything was created successfully.
The initial Databricks deployment will have the Azure Service
Principal used by Terraform as account admin. This should be
changed to a user group. To make this change, setup account level
Databricks SCIM connector to sync the Azure account admin AD
group to the Databricks account (see: Configure SCIM provisioning
using Microsoft Entra ID). This is a one-time step, and all Azure AD
groups will be sync them periodically to Databricks at the account
level.
Convert a User Group to Account Admin
As of the writing of this article we cannot assign a user group as the
Databricks account admin from the web UI, hence we are leveraging
the Databricks Terraform Provider for this activity. The account
admin user group will include all users (typically the cloud
engineering team) and the Azure Service Principal used by
Terraform scripts. Make sure to add the account admin user group
to all Workspaces created above with the Workspace admin role.
Create a Metastore
A metastore is the top-level container of objects in Unity Catalog. It
stores data assets (tables and views) and the permissions that
govern access to them. Databricks account admins can create
5 Azure Terraform Pipeline — DevOps
metastores and assign them to Databricks workspaces in order to
control which workloads use the metastore. This example will help
you to create a new metastore.
This step only needs to be done once per Azure region. The only
reason to create a new metastore would be for Disaster Recovery
(DR). If using Terraform, then the Azure Service Principal will be
the default owner of the metastore. Similar to the previous step,
make sure to change the default metastore admin to an Azure AD
user group created for managing metastores. This will include all
users (typically data admin team) and the Azure Service Principal
used by Terraform scripts. Keep in mind this is a manual step.
Setup Service Principals and User Groups
This example will assign all the user groups created in Azure AD and
sync them into Databricks using the SCIM application created above
to the required Workspaces. Any service principals required to run
jobs should be created and added to the required Workspaces.
The script should be designed to conditionally add user groups to
different Workspaces. In larger organizations which have multiple
lines of business, not all user groups will have access to all
Workspaces and are limited by business functions and data access.
For QA and production environments, the recommendation is to use
6 Azure Terraform Pipeline — DevOps
only service principals to run jobs — individual user groups do not
have access to run them.
Setup Catalogs
This example is responsible for creating all the catalogs and grant
permissions on securable objects to user groups. It should be
designed to conditionally create catalogs and fine-grained
permissions on them. For example, admin user groups can have full
privileges on a catalog while the development team will only have
permissions to create schemas. The typical recommendation is to
create a matrix of user groups with different permissions on
catalogs, schemas, and other securables within Unity Catalog.
Here is a sample user-group UC permission matrix:
7 Azure Terraform Pipeline — DevOps
UserGroup-UnityCatalog permission matrix
Setup cluster policies and compute resources
This script is responsible for creating all the compute policies in
different environments and compute resource creation based on
those policies. This will also grant permissions to different user
groups and service principals based on the specific policies and
resources.
Compute resource creation for job execution is controlled with
Databricks Asset Bundles, which we will see in the next section. You
will create a compute cluster in advance with the Terraform script
for teams that require specific data exploration, ad hoc data analysis,
8 Azure Terraform Pipeline — DevOps
or AI/ML needs. For all of these, make sure that compute cluster
creation is done with specific policies for different environments to
control cost, and aid in cost attribution.
CI/CD for Terraform Projects using Azure Pipelines
All the Terraform setup should be managed as a part of CI/CD
process. A typical flow for creating the resources in Databricks
involves the following steps:
1. All resources should be created from the master branch.
2. Development teams required to create resources should
create a feature branch of the project.
3. Manage the input to the scripts in
an environments folder.
4. Create a pull request with a proper description of what
resources need to be created and what files are part of the
changes.
5. Assign the pull request to at-least 2 approvers, one from the
development team and the other from DevOps team to
review the changes.
6. On approval and merge into master branch, trigger the
respective Azure DevOps pipeline.
7. Leverage Azure DevOps best practices like branch policies.
8. Update the release notes on the bottom of the page with the
dates and changes.
9 Azure Terraform Pipeline — DevOps
Streamlining Development of Complex Projects using
Databricks Asset Bundle (DAB)
Databricks Asset Bundles (DABs) are a new tool for streamlining the
development of complex data, analytics, and ML projects for the
Databricks platform. Bundles make it easy to manage complex
projects during active development by providing CI/CD capabilities
in your software development workflow with a single concise and
declarative YAML syntax that works with the Databricks CLI.
By using bundles to automate your project’s tests, deployments, and
configuration management you can reduce errors while promoting
software best practices across your organization as templated
projects. Here are some sample projects that use DABs to manage
resource and job configurations: DAB-examples
Driving Project Efficiency with Databricks Asset Bundles
An execution strategy that will ensure project success with
Databricks Asset Bundles involves the following:
1. Engineering teams are responsible for setting up the DAB
projects.
2. Create a DAB custom template suited to your organization
needs and all teams should create a project using the DAB
template for consistent configurations.
3. Leverage Azure Repos for source code management.
10 Azure Terraform Pipeline — DevOps
4. Leverage Azure Devops pipeline for CI/CD build and
deploy process, with unit and integration test cases to
prevent and catch errors. (see: Run a CI/CD workflow with
DAB).
Navigating Data Security: Exploring Different Strategies
Organizations have many policies in place when it comes to data
security, but here we will focus on a few strategies which should be
considered.
Handling sensitive data by filtering out rows of data and
masking columns of data. If you are working with sensitive PII or
PHI data, there are several strategies to ensure data security.
In the short term, use dynamic views for row level filtering and
column level masking.
In a medium term time frame, you can filter sensitive data using
row filters and column masks on tables. There are
certain limitations to this approach, hence we are recommending it
as a medium term strategy. Some of these limitations include
materialized views and streaming tables in Delta Live Tables that
don’t support row filters or column masks. Furthermore, Delta
Sharing and Delta Lake time travel don’t work with row-level
security or column masks.
11 Azure Terraform Pipeline — DevOps
In the long term, Databricks continues to innovate with Unity
Catalog on a regular basis, so it is worth reviewing the release notes
to see how new features may improve your data security strategy.
Other strategies to be implemented are:
Strong access control is emphasized through the implementation
of user groups and roles, allowing for granular control over
permissions within Databricks.
Leverage the “is_member” function to determine if the current user
belongs to specific groups, particularly those handling sensitive
PII/PHI data.
Isolation of the environment is achieved through platform-level
security measures such as VPNs, Azure Private Link, and Restricted
IP Lists. Adherence to Azure Databricks Security best practices is
essential, including the creation of separate workspaces and cluster
policies tailored for the handling of PII/PHI data, ensuring data
integrity and confidentiality.
Audit logging is recommended to track and monitor activities
within the environment. Leveraging Databricks Audit
Logs alongside Cloud Storage Access Logs, Cloud Provider Activity
Logs, and Virtual Network Traffic Flow Logs provides
comprehensive visibility into user actions and system events.
12 Azure Terraform Pipeline — DevOps
Additionally, Databricks Lakehouse Monitoring tools aid in
monitoring the health of the environment, while theSecurity
Analysis Tool enables proactive identification and mitigation of
security risks. These practices collectively enhance security posture
and ensure compliance with data protection regulations.
Databricks Lakehouse Monitoring
Image taken from https://docs.databricks.com/en/lakehouse-
monitoring/index.html
Enable Secure Data Sharing through Delta Sharing,
Enhancing Collaboration and Data Access within the
Platform
One of the requirements that many organizations have is sharing
data with different lines of businesses and partners to maximize the
13 Azure Terraform Pipeline — DevOps
value of their data. Delta Sharing meets this requirement in a simple
way.
What is Delta Sharing?
Delta Sharing is an open protocol for secure data sharing with other
organizations regardless of which computing platforms they use. It
can share collections of tables in a Unity Catalog metastore in real
time without copying them, so that data recipients can immediately
begin working with the latest version of the data.
Delta Sharing
Image taken from https://www.databricks.com/product/delta-
sharing
Conclusion
This article focused on streamlining platform deployment and
enhancing development workflows for Databricks on Azure Cloud.
We simplified the setup and configuration process of secure
14 Azure Terraform Pipeline — DevOps
environments (including data security), and established automated
testing and deployment using tools like Terraform and Databricks
Asset Bundles. We enhanced collaboration and data sharing through
Delta Sharing, facilitating efficient utilization of data resources.
By prioritizing security, efficiency, collaboration, and scalability, the
architecture discussed here is capable of adapting to evolving
business needs and technological advancements.
Databricks Terraform provider
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/terraform/
15 Azure Terraform Pipeline — DevOps
HashiCorp Terraform is a popular open source tool for creating safe and predictable
cloud infrastructure across several cloud providers. You can use the Databricks
Terraform provider to manage your Azure Databricks workspaces and the associated
cloud infrastructure using a flexible, powerful tool. The goal of the Databricks
Terraform provider is to support all Databricks REST APIs, supporting automation of
the most complicated aspects of deploying and managing your data platforms.
Databricks customers are using the Databricks Terraform provider to deploy and
manage clusters and jobs and to configure data access. You use the Azure
Provider to provision Azure Databricks workspaces.
Getting started
In this section, you install and configure requirements to use Terraform and the
Databricks Terraform provider on your local development machine. You then
configure Terraform authentication. Following this section, this article provides
a sample configuration that you can experiment with to provision an Azure
Databricks notebook, cluster, and a job to run the notebook on the cluster in an
existing Azure Databricks workspace.
Requirements
1. You must have the Terraform CLI. See Download Terraform on the
Terraform website.
2. You must have a Terraform project. In your terminal, create an empty
directory and then switch to it. (Each separate set of Terraform
configuration files must be in its own directory, which is called a
Terraform project.) For example: mkdir terraform_demo && cd
terraform_demo.
BashCopy
mkdir terraform_demo && cd terraform_demo
Include Terraform configurations for your project in one or more
configuration files in your Terraform project. For information about the
16 Azure Terraform Pipeline — DevOps
configuration file syntax, see Terraform Language Documentation on the
Terraform website.
3. You must add to your Terraform project a dependency for the Databricks
Terraform provider. Add the following to one of the configuration files in
your Terraform project:
Copy
terraform {
required_providers {
databricks = {
source = "databricks/databricks"
}
}
}
4. You must configure authentication for your Terraform project.
See Authentication in the Databricks Terraform provider documentation.
Sample configuration
This section provides a sample configuration that you can experiment with to
provision an Azure Databricks notebook, a cluster, and a job to run the notebook on
the cluster, in an existing Azure Databricks workspace. It assumes that you have
already set up the requirements, as well as created a Terraform project and
configured the project with Terraform authentication as described in the previous
section.
1. Create a file named me.tf in your Terraform project, and add the
following code. This file gets information about the current user (you):
Copy
# Retrieve information about the current user.
data "databricks_current_user" "me" {}
2. Create another file named notebook.tf, and add the following code. This
file represents the notebook.
Copy
variable "notebook_subdirectory" {
description = "A name for the subdirectory to store the notebook."
type = string
17 Azure Terraform Pipeline — DevOps
default = "Terraform"
}
variable "notebook_filename" {
description = "The notebook's filename."
type = string
}
variable "notebook_language" {
description = "The language of the notebook."
type = string
}
resource "databricks_notebook" "this" {
path = "${data.databricks_current_user.me.home}/$
{var.notebook_subdirectory}/${var.notebook_filename}"
language = var.notebook_language
source = "./${var.notebook_filename}"
}
output "notebook_url" {
value = databricks_notebook.this.url
}
3. Create another file named notebook.auto.tfvars, and add the following
code. This file specifies the notebook’s properties.
Copy
notebook_subdirectory = "Terraform"
notebook_filename = "notebook-getting-started.py"
notebook_language = "PYTHON"
4. Create another file named notebook-getting-started.py, and add the
following code. This file represents the notebook’s contents.
Copy
display(spark.range(10))
5. Create another file named cluster.tf, and add the following code. This
file represents the cluster.
Copy
variable "cluster_name" {
description = "A name for the cluster."
type = string
default = "My Cluster"
}
18 Azure Terraform Pipeline — DevOps
variable "cluster_autotermination_minutes" {
description = "How many minutes before automatically terminating due
to inactivity."
type = number
default = 60
}
variable "cluster_num_workers" {
description = "The number of workers."
type = number
default = 1
}
# Create the cluster with the "smallest" amount
# of resources allowed.
data "databricks_node_type" "smallest" {
local_disk = true
}
# Use the latest Databricks Runtime
# Long Term Support (LTS) version.
data "databricks_spark_version" "latest_lts" {
long_term_support = true
}
resource "databricks_cluster" "this" {
cluster_name = var.cluster_name
node_type_id = data.databricks_node_type.smallest.id
spark_version =
data.databricks_spark_version.latest_lts.id
autotermination_minutes = var.cluster_autotermination_minutes
num_workers = var.cluster_num_workers
}
output "cluster_url" {
value = databricks_cluster.this.url
}
6. Create another file named cluster.auto.tfvars, and add the following
code. This file specifies the cluster’s properties.
Copy
cluster_name = "My Cluster"
cluster_autotermination_minutes = 60
cluster_num_workers = 1
7. Create another file named job.tf, and add the following code. This file
represents the job that runs the notebook on the cluster.
Copy
19 Azure Terraform Pipeline — DevOps
variable "job_name" {
description = "A name for the job."
type = string
default = "My Job"
}
resource "databricks_job" "this" {
name = var.job_name
existing_cluster_id = databricks_cluster.this.cluster_id
notebook_task {
notebook_path = databricks_notebook.this.path
}
email_notifications {
on_success = [ data.databricks_current_user.me.user_name ]
on_failure = [ data.databricks_current_user.me.user_name ]
}
}
output "job_url" {
value = databricks_job.this.url
}
8. Create another file named job.auto.tfvars, and add the following code.
This file specifies the jobs’s properties.
Copy
job_name = "My Job"
9. Run terraform plan. If there are any errors, fix them, and then run the
command again.
10. Run terraform apply.
11. Verify that the notebook, cluster, and job were created: in the output of
the terraform apply command, find the URLs
for notebook_url, cluster_url, and job_url, and go to them.
12. Run the job: on the Jobs page, click Run Now. After the job finishes,
check your email inbox.
13. When you are done with this sample, delete the notebook, cluster, and
job from the Azure Databricks workspace by running terraform destroy.
Note
20 Azure Terraform Pipeline — DevOps
For more information about the terraform plan, terraform apply,
and terraform destroy commands, see Terraform CLI
Documentation in the Terraform documentation.
14. Verify that the notebook, cluster, and job were deleted: refresh the
notebook, cluster, and Jobs pages to each display a message that the
resource cannot be found.
Next steps
1. Create an Azure Databricks workspace.
2. Manage workspace resources for an Azure Databricks workspace.
Troubleshooting
Note
For Terraform-specific support, see the Latest Terraform topics on the HashiCorp
Discuss website. For issues specific to the Databricks Terraform Provider,
see Issues in the databrickslabs/terraform-provider-databricks GitHub repository.
Error: Failed to install provider
Issue: If you did not check in a terraform.lock.hcl file to your version control system,
and you run the terraform init command, the following message appears: Failed to
install provider. Additional output may include a message similar to the following:
Copy
Error while installing databrickslabs/databricks: v1.0.0: checksum list has no
SHA-256 hash for
"https://github.com/databricks/terraform-provider-databricks/releases/download/
v1.0.0/terraform-provider-databricks_1.0.0_darwin_amd64.zip"
Cause: Your Terraform configurations reference outdated Databricks Terraform
providers.
Solution:
1. Replace databrickslabs/databricks with databricks/databricks in all of
your .tf files.
21 Azure Terraform Pipeline — DevOps
To automate these replacements, run the following Python command
from the parent folder that contains the .tf files to update:
PythonCopy
python3 -c "$(curl -Ls https://dbricks.co/updtfns)"
2. Run the following Terraform command and then approve the changes
when prompted:
BashCopy
terraform state replace-provider databrickslabs/databricks
databricks/databricks
For information about this command, see Command: state replace-
provider in the Terraform documentation.
3. Verify the changes by running the following Terraform command:
BashCopy
terraform init
Error: Failed to query available provider packages
Issue: If you did not check in a terraform.lock.hcl file to your version control system,
and you run the terraform init command, the following message appears: Failed to
query available provider packages .
Cause: Your Terraform configurations reference outdated Databricks Terraform
providers.
Solution: Follow the solution instructions in Error: Failed to install provider.
Enable logging
The Databricks Terraform provider outputs logs that you can enable by setting
the TF_LOG environment variable to DEBUG or any other log level that Terraform
supports.
By default, logs are sent to stderr. To send logs to a file, set
the TF_LOG_PATH environment variable to the target file path.
22 Azure Terraform Pipeline — DevOps
For example, you can run the following command to enable logging at the debug
level, and to output logs in monochrome format to a file named tf.log relative to the
current working directory, while the terraform apply command runs:
BashCopy
TF_LOG=DEBUG TF_LOG_PATH=tf.log terraform apply -no-color
For more information about Terraform logging, see Debugging Terraform.
Additional examples
Deploy an Azure Databricks workspace using Terraform
Manage Databricks workspaces using Terraform
Create clusters
Create a cluster, a notebook, and a job
Control access to Databricks SQL tables
Implement CI/CD pipelines to deploy Databricks resources using the
Databricks Terraform provider
Create a sample legacy dashboard
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/terraform/cluster-
notebook-job
Create clusters, notebooks, and jobs
with Terraform
Article
03/01/2024
5 contributors
Feedback
In this article
1. Step 1: Create and configure the Terraform project
23 Azure Terraform Pipeline — DevOps
2. Step 2: Run the configurations
3. Step 3: Explore the results
4. Step 4: Clean up
This article shows how to use the Databricks Terraform provider to create a cluster,
a notebook, and a job in an existing Azure Databricks workspace.
This article is a companion to the following Azure Databricks getting started articles:
Tutorial: Run an end-to-end lakehouse analytics pipeline , which uses a
cluster that works with Unity Catalog, a Python notebook, and a job to
run the notebook.
Quickstart: Run a Spark job on Azure Databricks Workspace using the
Azure portal, which uses a general-purpose cluster and a Python
notebook.
You can also adapt the Terraform configurations in this article to create custom
clusters, notebooks, and jobs in your workspaces.
Step 1: Create and configure the Terraform project
1. Create a Terraform project by following the instructions in
the Requirements section of the Databricks Terraform provider overview
article.
2. To create a cluster, create a file named cluster.tf, and add the following
content to the file. This content creates a cluster with the smallest
amount of resources allowed. This cluster uses the lastest Databricks
Runtime Long Term Support (LTS) version.
For a cluster that works with Unity Catalog:
Copy
variable "cluster_name" {}
variable "cluster_autotermination_minutes" {}
variable "cluster_num_workers" {}
variable "cluster_data_security_mode" {}
# Create the cluster with the "smallest" amount
# of resources allowed.
data "databricks_node_type" "smallest" {
24 Azure Terraform Pipeline — DevOps
local_disk = true
}
# Use the latest Databricks Runtime
# Long Term Support (LTS) version.
data "databricks_spark_version" "latest_lts" {
long_term_support = true
}
resource "databricks_cluster" "this" {
cluster_name = var.cluster_name
node_type_id = data.databricks_node_type.smallest.id
spark_version =
data.databricks_spark_version.latest_lts.id
autotermination_minutes = var.cluster_autotermination_minutes
num_workers = var.cluster_num_workers
data_security_mode = var.cluster_data_security_mode
}
output "cluster_url" {
value = databricks_cluster.this.url
}
For an all-purpose cluster:
Copy
variable "cluster_name" {
description = "A name for the cluster."
type = string
default = "My Cluster"
}
variable "cluster_autotermination_minutes" {
description = "How many minutes before automatically terminating due
to inactivity."
type = number
default = 60
}
variable "cluster_num_workers" {
description = "The number of workers."
type = number
default = 1
}
# Create the cluster with the "smallest" amount
# of resources allowed.
data "databricks_node_type" "smallest" {
local_disk = true
}
# Use the latest Databricks Runtime
# Long Term Support (LTS) version.
25 Azure Terraform Pipeline — DevOps
data "databricks_spark_version" "latest_lts" {
long_term_support = true
}
resource "databricks_cluster" "this" {
cluster_name = var.cluster_name
node_type_id = data.databricks_node_type.smallest.id
spark_version =
data.databricks_spark_version.latest_lts.id
autotermination_minutes = var.cluster_autotermination_minutes
num_workers = var.cluster_num_workers
}
output "cluster_url" {
value = databricks_cluster.this.url
}
3. To create a cluster, create another file named cluster.auto.tfvars, and
add the following content to the file. This file contains variable values for
customizing the cluster. Replace the placeholder values with your own
values.
For a cluster that works with Unity Catalog:
Copy
cluster_name = "My Cluster"
cluster_autotermination_minutes = 60
cluster_num_workers = 1
cluster_data_security_mode = "SINGLE_USER"
For an all-purpose cluster:
Copy
cluster_name = "My Cluster"
cluster_autotermination_minutes = 60
cluster_num_workers = 1
4. To create a notebook, create another file named notebook.tf, and add
the following content to the file:
Copy
variable "notebook_subdirectory" {
description = "A name for the subdirectory to store the notebook."
type = string
default = "Terraform"
}
26 Azure Terraform Pipeline — DevOps
variable "notebook_filename" {
description = "The notebook's filename."
type = string
}
variable "notebook_language" {
description = "The language of the notebook."
type = string
}
resource "databricks_notebook" "this" {
path = "${data.databricks_current_user.me.home}/$
{var.notebook_subdirectory}/${var.notebook_filename}"
language = var.notebook_language
source = "./${var.notebook_filename}"
}
output "notebook_url" {
value = databricks_notebook.this.url
}
5. If you are creating a cluster, save the following notebook code to a file in
the same directory as the notebook.tf file:
For the Python notebook for Tutorial: Run an end-to-end lakehouse
analytics pipeline, a file named notebook-getting-started-lakehouse-
e2e.py with the following contents:
Copy
# Databricks notebook source
external_location = "<your_external_location>"
catalog = "<your_catalog>"
dbutils.fs.put(f"{external_location}/foobar.txt", "Hello world!",
True)
display(dbutils.fs.head(f"{external_location}/foobar.txt"))
dbutils.fs.rm(f"{external_location}/foobar.txt")
display(spark.sql(f"SHOW SCHEMAS IN {catalog}"))
# COMMAND ----------
from pyspark.sql.functions import col
# Set parameters for isolation in workspace and reset demo
username = spark.sql("SELECT regexp_replace(current_user(), '[^a-zA-
Z0-9]', '_')").first()[0]
database = f"{catalog}.e2e_lakehouse_{username}_db"
source = f"{external_location}/e2e-lakehouse-source"
table = f"{database}.target_table"
27 Azure Terraform Pipeline — DevOps
checkpoint_path = f"{external_location}/_checkpoint/e2e-lakehouse-
demo"
spark.sql(f"SET c.username='{username}'")
spark.sql(f"SET c.database={database}")
spark.sql(f"SET c.source='{source}'")
spark.sql("DROP DATABASE IF EXISTS ${c.database} CASCADE")
spark.sql("CREATE DATABASE ${c.database}")
spark.sql("USE ${c.database}")
# Clear out data from previous demo execution
dbutils.fs.rm(source, True)
dbutils.fs.rm(checkpoint_path, True)
# Define a class to load batches of data to source
class LoadData:
def __init__(self, source):
self.source = source
def get_date(self):
try:
df = spark.read.format("json").load(source)
except:
return "2016-01-01"
batch_date =
df.selectExpr("max(distinct(date(tpep_pickup_datetime))) + 1
day").first()[0]
if batch_date.month == 3:
raise Exception("Source data exhausted")
return batch_date
def get_batch(self, batch_date):
return (
spark.table("samples.nyctaxi.trips")
.filter(col("tpep_pickup_datetime").cast("date") ==
batch_date)
)
def write_batch(self, batch):
batch.write.format("json").mode("append").save(self.source)
def land_batch(self):
batch_date = self.get_date()
batch = self.get_batch(batch_date)
self.write_batch(batch)
RawData = LoadData(source)
# COMMAND ----------
RawData.land_batch()
# COMMAND ----------
28 Azure Terraform Pipeline — DevOps
# Import functions
from pyspark.sql.functions import col, current_timestamp
# Configure Auto Loader to ingest JSON data to a Delta table
(spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "json")
.option("cloudFiles.schemaLocation", checkpoint_path)
.load(file_path)
.select("*", col("_metadata.file_path").alias("source_file"),
current_timestamp().alias("processing_time"))
.writeStream
.option("checkpointLocation", checkpoint_path)
.trigger(availableNow=True)
.option("mergeSchema", "true")
.toTable(table))
# COMMAND ----------
df = spark.read.table(table_name)
# COMMAND ----------
display(df)
For the Python notebook for Quickstart: Run a Spark job on Azure
Databricks Workspace using the Azure portal, a file named notebook-
quickstart-create-databricks-workspace-portal.py with the following
contents:
Copy
# Databricks notebook source
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=Seattle"
blob_sas_token = r""
# COMMAND ----------
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' %
(blob_container_name, blob_account_name,blob_relative_path)
spark.conf.set('fs.azure.sas.%s.%s.blob.core.windows.net' %
(blob_container_name, blob_account_name), blob_sas_token)
print('Remote blob path: ' + wasbs_path)
# COMMAND ----------
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
29 Azure Terraform Pipeline — DevOps
# COMMAND ----------
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))
6. If you are creating a notebook, create another file
named notebook.auto.tfvars, and add the following content to the file.
This file contains variable values for customizing the notebook
configuration.
For the Python notebook for Tutorial: Run an end-to-end lakehouse
analytics pipeline:
Copy
notebook_subdirectory = "Terraform"
notebook_filename = "notebook-getting-started-lakehouse-e2e.py"
notebook_language = "PYTHON"
For the Python notebook for Quickstart: Run a Spark job on Azure
Databricks Workspace using the Azure portal:
Copy
notebook_subdirectory = "Terraform"
notebook_filename = "notebook-quickstart-create-databricks-
workspace-portal.py"
notebook_language = "PYTHON"
7. If you are creating a notebook, in your Azure Databricks workspace, be
sure to set up any requirements for the notebook to run successfully, by
referring to the following instructions for:
o The Python notebook for Tutorial: Run an end-to-end lakehouse
analytics pipeline
o The Python notebook for Quickstart: Run a Spark job on Azure
Databricks Workspace using the Azure portal
8. To create the job, create another file named job.tf, and add the
following content to the file. This content creates a job to run the
notebook.
Copy
variable "job_name" {
description = "A name for the job."
type = string
30 Azure Terraform Pipeline — DevOps
default = "My Job"
}
resource "databricks_job" "this" {
name = var.job_name
existing_cluster_id = databricks_cluster.this.cluster_id
notebook_task {
notebook_path = databricks_notebook.this.path
}
email_notifications {
on_success = [ data.databricks_current_user.me.user_name ]
on_failure = [ data.databricks_current_user.me.user_name ]
}
}
output "job_url" {
value = databricks_job.this.url
}
9. If you are creating a job, create another file named job.auto.tfvars, and
add the following content to the file. This file contains a variable value
for customizing the job configuration.
Copy
job_name = "My Job"
Step 2: Run the configurations
In this step, you run the Terraform configurations to deploy the cluster, the
notebook, and the job into your Azure Databricks workspace.
1. Check to see whether your Terraform configurations are valid by running
the terraform validate command. If any errors are reported, fix them,
and run the command again.
BashCopy
terraform validate
2. Check to see what Terraform will do in your workspace, before Terraform
actually does it, by running the terraform plan command.
BashCopy
terraform plan
31 Azure Terraform Pipeline — DevOps
3. Deploy the cluster, the notebook, and the job into your workspace by
running the terraform apply command. When prompted to deploy,
type yes and press Enter.
BashCopy
terraform apply
Terraform deploys the resources that are specified in your project.
Deploying these resources (especially a cluster) can take several minutes.
Step 3: Explore the results
1. If you created a cluster, in the output of the terraform apply command,
copy the link next to cluster_url, and paste it into your web browser’s
address bar.
2. If you created a notebook, in the output of the terraform
apply command, copy the link next to notebook_url, and paste it into your
web browser’s address bar.
Note
Before you use the notebook, you might need to customize its contents.
See the related documentation about how to customize the notebook.
3. If you created a job, in the output of the terraform apply command, copy
the link next to job_url, and paste it into your web browser’s address bar.
Note
Before you run the notebook, you might need to customize its contents.
See the links at the beginning of this article for related documentation
about how to customize the notebook.
4. If you created a job, run the job as follows:
1. Click Run now on the job page.
2. After the job finishes running, to view the job run’s results, in
the Completed runs (past 60 days) list on the job page, click the
most recent time entry in the Start time column. The Output pane
shows the result of running the notebook’s code.
32 Azure Terraform Pipeline — DevOps
Step 4: Clean up
In this step, you delete the preceding resources from your workspace.
1. Check to see what Terraform will do in your workspace, before Terraform
actually does it, by running the terraform plan command.
BashCopy
terraform plan
2. Delete the cluster, the notebook, and the job from your workspace by
running the terraform destroy command. When prompted to delete,
type yes and press Enter.
BashCopy
terraform destroy
Terraform deletes the resources that are specified in your project.
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/terraform/azure-workspace
Deploy an Azure Databricks workspace
using Terraform
Article
03/01/2024
7 contributors
Feedback
In this article
1. Simple setup
2. Provider configuration
33 Azure Terraform Pipeline — DevOps
The following sample configuration uses the azurerm Terraform provider to deploy an
Azure Databricks workspace. It assumes you have signed in to Azure ( az login) on
your local machine with an Azure user that has Contributor rights to your
subscription.
For more information about the azurerm Terraform plugin for Databricks,
see azurerm_databricks_workspace.
Simple setup
Copy
terraform {
required_providers {
azurerm = "~> 2.33"
random = "~> 2.2"
}
}
provider "azurerm" {
features {}
}
variable "region" {
type = string
default = "westeurope"
}
resource "random_string" "naming" {
special = false
upper = false
length = 6
}
data "azurerm_client_config" "current" {
}
data "external" "me" {
program = ["az", "account", "show", "--query", "user"]
}
locals {
prefix = "databricksdemo${random_string.naming.result}"
tags = {
Environment = "Demo"
Owner = lookup(data.external.me.result, "name")
}
}
resource "azurerm_resource_group" "this" {
name = "${local.prefix}-rg"
34 Azure Terraform Pipeline — DevOps
location = var.region
tags = local.tags
}
resource "azurerm_databricks_workspace" "this" {
name = "${local.prefix}-workspace"
resource_group_name = azurerm_resource_group.this.name
location = azurerm_resource_group.this.location
sku = "premium"
managed_resource_group_name = "${local.prefix}-workspace-rg"
tags = local.tags
}
output "databricks_host" {
value = "https://${azurerm_databricks_workspace.this.workspace_url}/"
}
Provider configuration
In Manage Databricks workspaces using Terraform, use the special configurations for
Azure:
Copy
provider "databricks" {
host = azurerm_databricks_workspace.this.workspace_url
}
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/terraform/workspace-
management
Manage Databricks workspaces using
Terraform
Article
03/01/2024
6 contributors
Feedback
In this article
1. Standard functionality
2. Workspace security
3. Storage
4. Advanced configuration
35 Azure Terraform Pipeline — DevOps
This article shows how to manage resources in an Azure Databricks workspace using
the Databricks Terraform provider.
The following configuration blocks initialize the most common
variables, databricks_spark_version, databricks_node_type,
and databricks_current_user.
Copy
terraform {
required_providers {
databricks = {
source = "databricks/databricks"
}
}
}
provider "databricks" {}
data "databricks_current_user" "me" {}
data "databricks_spark_version" "latest" {}
data "databricks_node_type" "smallest" {
local_disk = true
}
Standard functionality
These resources do not require administrative privileges. More documentation is
available at the dedicated
pages databricks_secret_scope, databricks_token, databricks_secret, databricks_noteb
ook, databricks_job, databricks_cluster, databricks_cluster_policy, databricks_instance
_pool.
Copy
resource "databricks_secret_scope" "this" {
name = "demo-${data.databricks_current_user.me.alphanumeric}"
}
resource "databricks_token" "pat" {
comment = "Created from ${abspath(path.module)}"
lifetime_seconds = 3600
}
resource "databricks_secret" "token" {
string_value = databricks_token.pat.token_value
scope = databricks_secret_scope.this.name
key = "token"
36 Azure Terraform Pipeline — DevOps
resource "databricks_notebook" "this" {
path = "${data.databricks_current_user.me.home}/Terraform"
language = "PYTHON"
content_base64 = base64encode(<<-EOT
token = dbutils.secrets.get('${databricks_secret_scope.this.name}', '$
{databricks_secret.token.key}')
print(f'This should be redacted: {token}')
EOT
)
}
resource "databricks_job" "this" {
name = "Terraform Demo (${data.databricks_current_user.me.alphanumeric})"
new_cluster {
num_workers = 1
spark_version = data.databricks_spark_version.latest.id
node_type_id = data.databricks_node_type.smallest.id
}
notebook_task {
notebook_path = databricks_notebook.this.path
}
email_notifications {}
}
resource "databricks_cluster" "this" {
cluster_name = "Exploration (${data.databricks_current_user.me.alphanumeric})"
spark_version = data.databricks_spark_version.latest.id
instance_pool_id = databricks_instance_pool.smallest_nodes.id
autotermination_minutes = 20
autoscale {
min_workers = 1
max_workers = 10
}
}
resource "databricks_cluster_policy" "this" {
name = "Minimal (${data.databricks_current_user.me.alphanumeric})"
definition = jsonencode({
"dbus_per_hour" : {
"type" : "range",
"maxValue" : 10
},
"autotermination_minutes" : {
"type" : "fixed",
"value" : 20,
"hidden" : true
}
})
}
resource "databricks_instance_pool" "smallest_nodes" {
37 Azure Terraform Pipeline — DevOps
instance_pool_name = "Smallest Nodes ($
{data.databricks_current_user.me.alphanumeric})"
min_idle_instances = 0
max_capacity = 30
node_type_id = data.databricks_node_type.smallest.id
preloaded_spark_versions = [
data.databricks_spark_version.latest.id
]
idle_instance_autotermination_minutes = 20
}
output "notebook_url" {
value = databricks_notebook.this.url
}
output "job_url" {
value = databricks_job.this.url
}
Workspace security
Managing security requires administrative privileges. More documentation is
available at the dedicated
pages databricks_secret_acl, databricks_group, databricks_user, databricks_group_me
mber, databricks_permissions.
Copy
resource "databricks_secret_acl" "spectators" {
principal = databricks_group.spectators.display_name
scope = databricks_secret_scope.this.name
permission = "READ"
}
resource "databricks_group" "spectators" {
display_name = "Spectators (by ${data.databricks_current_user.me.alphanumeric})"
}
resource "databricks_user" "dummy" {
user_name = "dummy+$
{data.databricks_current_user.me.alphanumeric}@example.com"
display_name = "Dummy ${data.databricks_current_user.me.alphanumeric}"
}
resource "databricks_group_member" "a" {
group_id = databricks_group.spectators.id
member_id = databricks_user.dummy.id
}
resource "databricks_permissions" "notebook" {
38 Azure Terraform Pipeline — DevOps
notebook_path = databricks_notebook.this.id
access_control {
user_name = databricks_user.dummy.user_name
permission_level = "CAN_RUN"
}
access_control {
group_name = databricks_group.spectators.display_name
permission_level = "CAN_READ"
}
}
resource "databricks_permissions" "job" {
job_id = databricks_job.this.id
access_control {
user_name = databricks_user.dummy.user_name
permission_level = "IS_OWNER"
}
access_control {
group_name = databricks_group.spectators.display_name
permission_level = "CAN_MANAGE_RUN"
}
}
resource "databricks_permissions" "cluster" {
cluster_id = databricks_cluster.this.id
access_control {
user_name = databricks_user.dummy.user_name
permission_level = "CAN_RESTART"
}
access_control {
group_name = databricks_group.spectators.display_name
permission_level = "CAN_ATTACH_TO"
}
}
resource "databricks_permissions" "policy" {
cluster_policy_id = databricks_cluster_policy.this.id
access_control {
group_name = databricks_group.spectators.display_name
permission_level = "CAN_USE"
}
}
resource "databricks_permissions" "pool" {
instance_pool_id = databricks_instance_pool.smallest_nodes.id
access_control {
group_name = databricks_group.spectators.display_name
permission_level = "CAN_ATTACH_TO"
}
}
39 Azure Terraform Pipeline — DevOps
Storage
Depending on your preferences and needs, you can
Manage JAR, Wheel, and Egg libraries through
the databricks_dbfs_file resource.
List entries on DBFS with the databricks_dbfs_file_paths data source.
Get contents of small files with the databricks_dbfs_file data source.
Mount your Azure storage using
the databricks_azure_adls_gen1_mount, databricks_azure_adls_gen2_mount,
and databricks_azure_blob_mount resources.
Advanced configuration
More documentation is available at the dedicated pages for
the databricks_workspace_conf and databricks_ip_access_list resources.
Copy
data "http" "my" {
url = "https://ifconfig.me"
}
resource "databricks_workspace_conf" "this" {
custom_config = {
"enableIpAccessLists": "true"
}
}
resource "databricks_ip_access_list" "only_me" {
label = "only ${data.http.my.body} is allowed to access workspace"
list_type = "ALLOW"
ip_addresses = ["${data.http.my.body}/32"]
depends_on = [databricks_workspace_conf.this]
}
40 Azure Terraform Pipeline — DevOps
Terraform is an open-source infrastructure as code (IAC) tool that
allows users to define and deploy infrastructure resources, such as
servers, storage, and networking, using simple, human-readable
configuration files.
Azure Provider
The Azure Provider can be used to configure infrastructure in
Microsoft Azure using the Azure Resource Manager API’s.
Docs: https://registry.terraform.io/providers/hashicorp/azurerm/
latest/docs
Click → Service Principal and a Client Secret
41 Azure Terraform Pipeline — DevOps
buymeacoffee ☕ 👈 Click the link
Login to the CLI in VSCode. Automatically rediret the browser enter
your azure credentials.
az login
42 Azure Terraform Pipeline — DevOps
Successfully logged in Azure DevOps portal via VSCode Terminal
allow users to download data exported from the plan of a Run in a
Terraform workspace.
export MSYS_NO_PATHCONV=1
43 Azure Terraform Pipeline — DevOps
Organizations can use subscriptions to manage costs and the
resources that are created by users, teams, and projects.
az ad sp create-for-rbac --role="Contributor" --
scopes="/subscriptions/20000000-0000-0000-0000-000000000000"
An Azure service principal is an identity created for use with
applications, hosted services, and automated tools to access Azure
resources. This access is restricted by the roles assigned to the
service principal, giving you control over which resources can be
accessed and at which level.
Create a Service Principal
These values map to the Terraform variables like so:
appId is the client_id defined above.
password is the client_secret defined above.
tenant is the tenant_id defined above.
az login --service-principal -u CLIENT_ID -p CLIENT_SECRET --tenant
TENANT_ID
44 Azure Terraform Pipeline — DevOps
Successfully login the Service Principal account
How Terraform Works?
Terraform creates and manages resources on cloud platforms and
other services through their application programming interfaces
(APIs).
main.tf: This is our main configuration file where we are going to
define our resource definition.
45 Azure Terraform Pipeline — DevOps
variables.tf: This is the file where we are going to define our
variables.
outputs.tf: This file contains output definitions for our resources
backend.tf: Defines where the state file of the current
infrastructure will be stored (or) where Terraform stores its state
data files
Terraform keeps track of the managed resources. This state can be
stored locally or remotely.
This State File contains full details of resources in our terraform
code. When you modify something on your code and apply it on
cloud, terraform will look into the state file, and compare the
changes made in the code from that state file and the changes to the
infrastructure based on the state file.
When you run terraform apply command to create an
infrastructure on cloud, Terraform creates a state file called
“terraform.tfstate”.
version.tf: Terraform will check that the version of the installed
Terraform binary that executes the Terraform
configuration.
46 Azure Terraform Pipeline — DevOps
terraform.tfvars: allow us to manage variable assignments
systematically in a file with the extension .tfvars or .tfvars.json
Terraform core → Core is responsible for life cycle management of
infrastructure.
Terraform Provider → A plugin for Terraform that makes a
collection of related resources available
Initialize the Terraform repo
Initializes the Terraform working directory, downloading any
necessary provider plugins.
Run Terraform plan
47 Azure Terraform Pipeline — DevOps
Terraform plan command creates an execution plan, which lets you
preview the changes that Terraform plans to make to your
infrastructure.
When Terraform creates a plan it → Reads the current state of any
already-existing remote objects to make sure that the Terraform
state is up-to-date.
Run Terraform apply
Create the AWS resources defined in your Terraform configuration
48 Azure Terraform Pipeline — DevOps
Import the below repository into Azure DevOps for
Terraform configuration
GitHub - Ibrahimsi/Terraform-AzureDevOps-Sample
Contribute to Ibrahimsi/Terraform-AzureDevOps-Sample development by creating an
account on GitHub.
github.com
(OR) Create a file manually
Displays directory paths and (optionally) files in each
subdirectory: tree
49 Azure Terraform Pipeline — DevOps
main.tf: This is our main configuration file where we are going to
define our resource definition.
resource "azurerm_resource_group" "example" {
name = "${var.prefix}-rg"
location = var.location
}
resource "azurerm_virtual_network" "main" {
name = "${var.prefix}-network"
address_space = ["10.0.0.0/16"]
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
}
resource "azurerm_subnet" "internal" {
name = "internal"
resource_group_name = azurerm_resource_group.example.name
virtual_network_name = azurerm_virtual_network.main.name
address_prefixes = ["10.0.2.0/24"]
}
resource "azurerm_network_interface" "main" {
name = "${var.prefix}-nic"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
ip_configuration {
name = "testconfiguration1"
subnet_id = azurerm_subnet.internal.id
private_ip_address_allocation = "Dynamic"
}
}
resource "azurerm_virtual_machine" "main" {
name = "${var.prefix}-vm"
50 Azure Terraform Pipeline — DevOps
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
network_interface_ids = [azurerm_network_interface.main.id]
vm_size = "Standard_DS1_v2"
# Uncomment this line to delete the OS disk automatically when deleting
the VM
# delete_os_disk_on_termination = true
# Uncomment this line to delete the data disks automatically when
deleting the VM
# delete_data_disks_on_termination = true
storage_image_reference {
publisher = "Canonical"
offer = "0001-com-ubuntu-server-jammy"
sku = "22_04-lts"
version = "latest"
}
storage_os_disk {
name = "myosdisk1"
caching = "ReadWrite"
create_option = "FromImage"
managed_disk_type = "Standard_LRS"
}
os_profile {
computer_name = "hostname"
admin_username = "testadmin"
admin_password = "Password1234!"
}
os_profile_linux_config {
disable_password_authentication = false
}
tags = {
environment = "staging"
}
}
provider.tf: Containing the terraform block, s3 backend definition,
provider configurations, and aliases.
51 Azure Terraform Pipeline — DevOps
provider "azurerm" {
features {}
}
terraform.tfvars: Files are used to store variable definitions. This
allows you to externalize your variable definitions and makes it
easier to manage them, especially if you have a large number of
variables or need to use the same variables in multiple
environments.
location = "West Europe"
prefix = "demo"
variables.tf: Define the variables that must have values in order for
your Terraform code to validate and run. You can also define default
values for your variables in this file.
variable "prefix" {
variable "location" {
init: Initializes a working directory and downloads the necessary
provider plugins and modules and setting up the backend for storing
your infrastructure’s state.
52 Azure Terraform Pipeline — DevOps
terraform init
Reinitialize your working directory. Terraform Cloud has been
successfully initialized. You may now begin working with Terraform
Cloud.
Finding the latet rm azure version
terraform.lock.hclIt captures the versions of all the Terraform
providers you’re using.
Generated by Terraform when you runterraform init command.
53 Azure Terraform Pipeline — DevOps
This file serves as a reference point across all executions, aiding in
the evaluation of compatible dependencies with the current
configuration.
terraform fmt
Theterraform fmt command is used to rewrite Terraform
configuration files to a canonical format and style. This command
applies a subset of the Terraform language style conventions, along
with other minor adjustments for readability.
terraform validate runs checks that verify whether a configuration is
syntactically valid and internally consistent, regardless of any
provided variables or existing state.
terraform validate
54 Azure Terraform Pipeline — DevOps
terraform plan command creates an execution plan, which lets
you preview the changes that Terraform plans to make to your
infrastructure.
When Terraform creates a plan it → Reads the current state of
any already-existing remote objects to make sure that the
Terraform state is up-to-date.
terraform plan
Findout the how many resources create use grep cmd
Apply the Configuration
Create the AWS resources defined in your Terraform configuration.
Executes the actions proposed in a Terraform plan.
terraform apply (OR)
terraform apply --auto-approve
55 Azure Terraform Pipeline — DevOps
It asks for confirmation from the user before making any changes,
unless it was explicitly told to skip approval. Finally create a
resources.
Check the azure portal the resources created?
56 Azure Terraform Pipeline — DevOps
This state is stored by default in a local file named “terraform.
tfstate” → Terraform uses state to determine which changes to
make to your infrastructure.
Recommend storing it in Terraform Cloud to version, encrypt, and
securely share it with your team.
An Azure storage account contains all of your Azure Storage data
objects: blobs, files, queues, and tables. The storage account
provides a unique namespace for your Azure Storage data that is
accessible from anywhere in the world over HTTP or HTTPS.
Create storage account
57 Azure Terraform Pipeline — DevOps
Any container that is used to store products, regardless of its size.
Storage container to host the tfstate file
58 Azure Terraform Pipeline — DevOps
Build infrastructure with terraform configuration, a state file will be
created automatically in the local workspace directory
named terraform.tfstate.
This tfstate file will have information about the provisioned
infrastructure which terraform manage.
Whenever we make changes to the configuration file, it will
automatically determine which part of your configuration is already
created. And, also it will determine which needs to be changed with
the help of the state file.
59 Azure Terraform Pipeline — DevOps
Abackend.tf defines where Terraform stores its state data files.
Terraform uses persisted state data to keep track of the resources it
manages.
terraform {
backend "azurerm" {
resource_group_name = "demo-rg"
storage_account_name = "ibrahimsi"
container_name = "prod-tfstate"
key = "prod.terraform.tfstate"
}
}
Displays directory paths and (optionally) files in each subdirectory:
tree
60 Azure Terraform Pipeline — DevOps
Again initialize the providers the tfstate is already there if migrate
Inside ofprod-tfstate there is no files
Before init terraform.tfstate file there in local.
61 Azure Terraform Pipeline — DevOps
init: Initializes a working directory and downloads the necessary
provider plugins and modules and setting up the backend for storing
your infrastructure’s state.
terraform init
62 Azure Terraform Pipeline — DevOps
There is no file migrate to remote.
Check azure portal container
63 Azure Terraform Pipeline — DevOps
Remotely refer the file → Everytime run the init cmd
Finally destroy the resources
terraform destroy (OR)
terraform destroy --auto-approve
Azure DevOps CICD Pipeline
Hashicorp Terraform is an open-source IaC (Infrastructure-as-
Code) tool for configuring and deploying cloud infrastructure. It
codifies infrastructure in configuration files that describe the desired
state for your topology.
Terraform enables the management of any infrastructure — such as
public clouds, private clouds, and SaaS services — by using
Terraform providers.
64 Azure Terraform Pipeline — DevOps
Create a New project
65 Azure Terraform Pipeline — DevOps
66 Azure Terraform Pipeline — DevOps
Import the
repository → https://github.com/Ibrahimsi/Terraform-
AzureDevOps-Sample.git
GitHub - Ibrahimsi/Terraform-AzureDevOps-Sample
Contribute to Ibrahimsi/Terraform-AzureDevOps-Sample development by creating an
account on GitHub.
github.com
67 Azure Terraform Pipeline — DevOps
68 Azure Terraform Pipeline — DevOps
Successfully import the github code
Create a new file
In the name of terraform.tfvars
69 Azure Terraform Pipeline — DevOps
Write the below file
location = "Canada Central"
prefix = "demo"
Azure Pipelines automatically builds and tests code projects.
It supports all major languages and project types and combines
continuous integration, continuous delivery, and continuous testing
to build, test, and deliver your code to any destination. Setup the
build pipeline.
70 Azure Terraform Pipeline — DevOps
Starter Project simplifies the setup of an entire continuous
integration (CI) and continuous delivery (CD) pipeline to
Azure with Azure DevOps. You can start with existing code or use
one of the provided sample applications. Select the starter pipeline.
Copt the Pipeline Code
trigger:
- main
stages:
- stage: Build
jobs:
- job: Build
pool:
vmImage: 'ubuntu-latest'
steps:
71 Azure Terraform Pipeline — DevOps
Enable the terraform extension
Terraform enables the definition, preview, and deployment of cloud
infrastructure.
Goto Organizational setting → Extension → Browse Marketplace
Add Terraform Extension
72 Azure Terraform Pipeline — DevOps
Organization level install the terraform
Proceed to the organization
Finally check the extension the terrform add (or) not
73 Azure Terraform Pipeline — DevOps
Goto the Terraform_Pipeline project
Add Terraform tasks
74 Azure Terraform Pipeline — DevOps
Pipeline variables → Values that can be set and modified during a
pipeline run.
Authorize the subscription → Person who receives any Service
under a valid subscription.
75 Azure Terraform Pipeline — DevOps
76 Azure Terraform Pipeline — DevOps
Confirm the subscriptions. Command set to the init mode.
77 Azure Terraform Pipeline — DevOps
78 Azure Terraform Pipeline — DevOps
Create a Storage account for Azure Portal. Go to
thebackend.tf confirm the storage name.
The storage account provides a unique namespace for your Azure
Storage data that’s accessible from anywhere in the world over
HTTP (or) HTTPS. Create a Storage account.
79 Azure Terraform Pipeline — DevOps
80 Azure Terraform Pipeline — DevOps
Go to resource → Create a new container
Fillout the terraform pipeline
81 Azure Terraform Pipeline — DevOps
Add to the display name code. Again search the terraform → select
validate cmd
82 Azure Terraform Pipeline — DevOps
tf validate successfully added
Add another task → fit
83 Azure Terraform Pipeline — DevOps
84 Azure Terraform Pipeline — DevOps
Terraform plan → Creates an execution plan, which lets you preview
the changes that Terraform plans to make to your infrastructure.
85 Azure Terraform Pipeline — DevOps
86 Azure Terraform Pipeline — DevOps
Tf Plan Pipeline Code
Archieve build → It is the overall package.
87 Azure Terraform Pipeline — DevOps
88 Azure Terraform Pipeline — DevOps
YAML is a human-readable data serialization language that is often
used for writing configuration files.
Publish build artifacts → Task and can be downloaded with the
Download Build Artifact task. And when you publish them.
Artifacts are files created as part of a build process that often contain
metadata about that build’s jobs like test results, security scans, etc.
89 Azure Terraform Pipeline — DevOps
90 Azure Terraform Pipeline — DevOps
Full pipeline code. 6 tasks is added.
1. Build
2. Tf init
3. Tf validate
4. fmt
5. Tf plan
6. Archieve files
trigger:
- main
stages:
- stage: Build
jobs:
- job: Build
pool:
vmImage: 'ubuntu-latest'
steps:
- task: TerraformTaskV4@4
displayName: Tf init
inputs:
provider: 'azurerm'
command: 'init'
backendServiceArm: 'Pay-As-You-Go(f30deb63-a417-4fa4-afc1-
813a7d3920bb)'
backendAzureRmResourceGroupName: 'demo-resources'
backendAzureRmStorageAccountName: 'ibrahimsi'
backendAzureRmContainerName: 'prod-tfstate'
backendAzureRmKey: 'prod.terraform.tfstate'
- task: TerraformTaskV4@4
displayName: Tf validate
inputs:
provider: 'azurerm'
command: 'validate'
- task: TerraformTaskV4@4
displayName: Tf fmt
inputs:
provider: 'azurerm'
command: 'custom'
customCommand: 'fmt'
outputTo: 'console'
91 Azure Terraform Pipeline — DevOps
environmentServiceNameAzureRM: 'Pay-As-You-Go(f30deb63-a417-4fa4-
afc1-813a7d3920bb)'
- task: TerraformTaskV4@4
displayName: Tf plan
inputs:
provider: 'azurerm'
command: 'plan'
commandOptions: '-out $(Build.SourcesDirectory)/tfplanfile'
environmentServiceNameAzureRM: 'Pay-As-You-Go(f30deb63-a417-4fa4-
afc1-813a7d3920bb)'
- task: ArchiveFiles@2
displayName: Archive files
inputs:
rootFolderOrFile: '$(Build.SourcesDirectory)/'
includeRootFolder: false
archiveType: 'zip'
archiveFile: '$(Build.ArtifactStagingDirectory)/$
(Build.BuildId).zip'
replaceExistingArchive: true
- task: PublishBuildArtifacts@1
inputs:
PathtoPublish: '$(Build.ArtifactStagingDirectory)'
ArtifactName: '$(Build.BuildId)-build'
publishLocation: 'Container'
Save and run the code
92 Azure Terraform Pipeline — DevOps
93 Azure Terraform Pipeline — DevOps
Build job → Pipeline is used to generate Artifacts out of Source
Code.
Job approval need → To define a required template approval.
Click on view → Permit → Permit access
94 Azure Terraform Pipeline — DevOps
Successfully build the job
Go to the release pipeline
A Release Pipeline consumes the Artifacts and conducts follow-up
actions within a multi-staging system.
New pipeline → Select empty job
95 Azure Terraform Pipeline — DevOps
Stage 1 → Deployment
96 Azure Terraform Pipeline — DevOps
Azure Artifacts enables developers to efficiently manage all their
dependencies from one place. Add an artifact.
97 Azure Terraform Pipeline — DevOps
Add build artifact
98 Azure Terraform Pipeline — DevOps
Add the trigger
99 Azure Terraform Pipeline — DevOps
Goto the deployment task
Add another task
100 Azure Terraform Pipeline — DevOps
Install the terraform installer
Add another extract files
101 Azure Terraform Pipeline — DevOps
Modified the Destination folder
Another task — terraform init
102 Azure Terraform Pipeline — DevOps
Terraform apply
Save the setting
103 Azure Terraform Pipeline — DevOps
Add one more stage → Clone the stage
104 Azure Terraform Pipeline — DevOps
Change the name
Only one task is modified
105 Azure Terraform Pipeline — DevOps
Save the setting
A deployment job is a special type of job. It’s a collection of steps to
run sequentially against the environment.
Pre-deployment approvals: User must manually sign out after
deployment before the release is triggered to other stages.
Post-deployment approvals: Team wants to ensure there are no
active issues in the work item or problem management system
before deploying a build to a stage
106 Azure Terraform Pipeline — DevOps
Add approval stage before destroy stage → Click Pre-deployment
conditions
Select the members
107 Azure Terraform Pipeline — DevOps
108 Azure Terraform Pipeline — DevOps
Make the changes git repo let it trigger end to end
Automatically start the pipeline
Build pipeline is running
Successful
109 Azure Terraform Pipeline — DevOps
Goto the release pipeline once finish the build pipeline automatically
started
job is running
110 Azure Terraform Pipeline — DevOps
Successfully run the release pipeline
Check the azure portal the resources created or not?
111 Azure Terraform Pipeline — DevOps
Destroy the resources it need to get approve once approval
destroyed
112 Azure Terraform Pipeline — DevOps
Need to approval → Release pipeline
Successful
113 Azure Terraform Pipeline — DevOps
Destroy → Removal of every copy of a data item from an
organisation.
114 Azure Terraform Pipeline — DevOps
115 Azure Terraform Pipeline — DevOps
Deleted the resources automatically in Azure Portal.
116 Azure Terraform Pipeline — DevOps
What are containers in Azure?
Azure containers are a popular and efficient way to deploy and manage applications
in the cloud. They offer a wide range of benefits, such as increased flexibility,
scalability, and cost-effectiveness, but also have some potential drawbacks that
should be considered before making a decision.
Definition of Container is:
Container is virtualization of operating system.
It is an application, that is packed with all the files, configurations and dependencies
to run it.
It is a new evolution of virtualization
As the virtual machines are virtualization of hardware, the containers are
virtualization of OS.
117 Azure Terraform Pipeline — DevOps
As you can see, the container run on Engine on OS
May be you are asking why the "Containers" were created? Good question.
The Containers were created, because the virtual machines have a lot of
disadvantages like:
1. Using lot of resources like CPU, RAM and diskspace.
2. Every VM needs OS license.
3. Very slow to startup.
Then virtualization of OS came and change the way we think about applicaiton
underlying harware and software.
Now, lets look at pros and cons of containers.
Pros of Azure Containers
Portability: One of the main advantages of Azure containers is their portability.
Containers can be easily moved between different cloud environments, which makes
118 Azure Terraform Pipeline — DevOps
them an excellent choice for companies looking to adopt a multi-cloud strategy or
for those who want to switch to a different cloud provider.
Scalability: Azure containers allow for easy scalability. Containers can be quickly
spun up or down based on demand, which means that businesses can easily adjust
their infrastructure to meet changing requirements.
Consistency: Azure containers ensure consistency across different environments,
which is particularly useful for development and testing purposes. Containers can be
created with specific configurations, libraries, and dependencies, and then moved to
other environments without any changes.
Resource Efficiency: Containers are lightweight and consume fewer resources than
traditional virtual machines. This means that they are more efficient in terms of
resource utilization, which can help reduce costs and improve performance.
Rapid Deployment: Azure containers can be deployed rapidly, which is particularly
useful for businesses looking to accelerate their time to market. Containers can be
created, configured, and deployed in a matter of minutes, which can help businesses
stay ahead of their competitors.
Cons of Azure Containers
Complexity: Although Azure containers offer a range of benefits, they can be more
complex to manage than traditional virtual machines. Containers require additional
tooling and orchestration to manage, which can be time-consuming and require
additional expertise.
Security: Containers are more susceptible to security threats than traditional virtual
machines. This is because containers share the same host operating system and
kernel, which means that if one container is compromised, other containers on the
same host could also be affected.
Data Persistence: Azure containers are designed to be stateless, which means that
they are not ideal for applications that require persistent data storage. Containers
can be configured to store data externally, but this adds an additional layer of
complexity to the application architecture.
Network Configuration: Azure containers require specific network configurations to
ensure that they can communicate with other containers and services. This can be
119 Azure Terraform Pipeline — DevOps
challenging for businesses that are not familiar with container networking or that
have complex network configurations.
Learning Curve: Finally, Azure containers have a steep learning curve. Businesses
that are new to containers will need to invest time and resources to understand how
containers work and how to manage them effectively.
Azure containers offer a range of benefits, including portability, scalability,
consistency, resource efficiency, and rapid deployment. However, they also
have some potential drawbacks, such as complexity, security, data persistence,
network configuration, and a steep learning curve. Before adopting Azure
containers, businesses should carefully consider these pros and cons and
evaluate whether they are a good fit for their specific use case.