0% found this document useful (0 votes)

59 views53 pages

APJ Lakehouse Optimisation Webinar

Uploaded by

NIvas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views53 pages

APJ Lakehouse Optimisation Webinar

Uploaded by

NIvas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Data + AI Professional

Workshop Series
Data Engineering Optimization
Best Practices

Irfan Elahi - Specialist Solutions Architect

©2024 Databricks Inc. — All rights reserved

Housekeeping

▪ This presentation will be recorded and we will share these materials after
the session.
▪ There are no hands-on components so you only need something to take
notes.
▪ Use the Q&A function to ask questions.
▪ Please ﬁll out the survey at the end of the session so we can improve our
future sessions.

©2024 Databricks Inc. — All rights reserved

The Optimization
Mindset

©2024 Databricks Inc. — All rights reserved

Optimize Only When Necessary

Start with a goal in mind Tackle the easy things ﬁrst

- Cost target
Spend time understanding the problem
- SLA / Performance target

Know when to stop Benchmark and Iterate

Keep tracks of results so you know you
Refer to step 1
are making progress

©2024 Databricks Inc. — All rights reserved

Understanding the journey of a query

Query

Compute

Data

©2024 Databricks Inc. — All rights reserved

Understanding the journey of a query

Query Query

Processing
Compute Infra
Engine

Data Data

©2024 Databricks Inc. — All rights reserved

Understanding the journey of a query

Query Query Query

Workload
Management

Instance
Selection Result Cache
Processing
Compute Infra
Engine
Scaling

Disk
Processing Engine
Cache

Data Data Data

©2024 Databricks Inc. — All rights reserved

Performance Optimization - Framework

Foundational

Performance
Optimization

Diagnosis Code

©2024 Databricks Inc. — All rights reserved

Foundational - Compute

©2024 Databricks Inc. — All rights reserved

Photon - Speeding up Data Processing
Client submits Query

Driver Node
Query Parsing
Catalyst: Query Analysis, Planning, Optimization, Spark → Photon Plan Conversion

Execution Framework
Task Scheduling, Shufﬂe Service (Shared Nothing)

Execute Task Execute Task Execute Task Execute Task

Execution
Photon Photon
Engine Photon Photon

Metadata Caching Service, Auto-Compaction, Partition Pruning

©2024 Databricks Inc. — All rights reserved

Where and When to Use Photon
Batch
✅ Delta Lake
COPY INTO ✅ Parquet We do not support these
✅JSON
Auto Loader
✅CSV APIs and Methods
✅AVRO
✅XML
Structured Streaming ✅Binary ✅ DataFrame ❌ RDDs
✅JDBC/ODBC ✅ SQL ❌ Typed Datasets
✅ SQL UDFs ❌ Java/Scala udfs
Delta Live Tables 🚧 Pandas and Python UDFs

Ingestion

Workloads that beneﬁt the most

● Joins and aggregation heavy computations
● Delta Lake merge
● Reading/writing wide tables
● Decimal computations
● Delta Live Tables (DLT) and AutoLoader
● Updates and delete workloads via DV’s

©2024 Databricks Inc. — All rights reserved

Find Out Why Things Don’t Photonize

● Photon makes everything faster

● If a query falls out of Photon, ﬁgure out why!
○ collect_set? Use collect_list(distinct)
○ UDFs? RDDs? Avoid those wherever possible.
○ Non-photonizable source? We may have a preview for that!
● Getting things to run in Photon completely solves most
problems

©2024 Databricks Inc. — All rights reserved

Predictive I/O - Speeding up point queries
Let the compute engine determine the best way to fetch data

SELECT protoBase64 FROM query_profile_protos WHERE id LIKE '204ff749-dc88-4b0a%’

17x speedup

©2024 Databricks Inc. — All rights reserved 13

Spark 3.0 Adaptive Query Execution

▪ Adapts a query plan automatically at runtime based on accurate

metrics

▪ Capabilities:
▪ Sort Merge Join (SMJ) → Broadcast Hash Join (BHJ)
▪ Coalesce shufﬂe partitions
▪ Handles skew

©2024 Databricks Inc. — All rights reserved

Compute Sizing
A balance between query performance and concurrency requirement

● Vertical Scaling - Cluster Size (2XS ⇔ 4XL)

○ Larger cluster for larger queries and tables

● Horizontal Scaling - No. of Clusters (Min ⇔ Max #)

○ More cluster for more concurrent queries

● Monitor Query History to ﬁnd the right ﬁt

○ Too many queries in queue = more clusters
○ Queries taking too long = larger clusters

©2024 Databricks Inc. — All rights reserved

Compute - Architectural Considerations
Isolated Clusters & Warehouses to Avoid Resource Contention

● Ephemeral Job compute

○Jobs - Isolated compute for ingestion + ETL jobs, can be sized/optimized for that workload, run on a schedule

○Only charged for when the job is running

● Shared development clusters

○All-purpose - Auto-scale, auto-pause to only use when teams are actively developing, only resources needed

○Recommended to develop and test with a subset of the full dataset

● Shared SQL warehouse for ad-hoc analysis

○SQL warehouse - Auto-scale, auto-pause to only use when teams are actively querying, only resources needed

○ Serverless available for instant startup, shutdown to reduce idle time

● Separate SQL warehouse for BI reporting

○Size appropriately for BI needs, avoid contention with other processes

©2024 Databricks Inc. — All rights reserved 16

Serverless - Shifting the paradigm
How to fundamentally move your price-performance proﬁle
The improved performance
gives us room to reduce
the warehouse size and let
scaling optimise for cost

Serverless improves
performance by
increasing throughput

©2024 Databricks Inc. — All rights reserved

A Note on Classic Compute

● DBSQL Warehouse automatically take cares of compute instance selection and

cluster conﬁgurations

● Classic Compute Clusters (Jobs, Interactive, etc.) still give you the ability to
conﬁgure instances and here is the TLDR
○ Core:RAM Ratio - Most core given enough memory for the budget
○ Processor Type - ARM based chip can work quite well
○ Local Storage - Disk Cache is useful for repeated data access
○ Driver Size - Don’t over complicate it (4-8 core with 16-32 GB RAM should be enough)
○ Spot Availability - Stability is more important for long running jobs
○ Auto Scaling - Achieve high cluster utilization and reduce overall cost

©2024 Databricks Inc. — All rights reserved

Foundational - Data

©2024 Databricks Inc. — All rights reserved

Delta Lake
Most performant modern open data format

©2024 Databricks Inc. — All rights reserved

Data Layout matters
Putting the right things together makes life easier

SELECT COUNT(*) FROM LEGOS WHERE COLOUR = “RED”

How do you want to store your legos?

What about SELECT COUNT(*) FROM LEGOS WHERE SIZE = “SMALL”?

©2024 Databricks Inc. — All rights reserved
Data Layout Rationale
Different ways to organise your data so that we don’t need to read too much unnecessary ﬁles

Select * from deltalake_table where part = 2 and col = 6

Partition Pruning File Skipping

/path/to/deltalake_table/ file_name col_min col_max
part=1/part_00001.parquet
part=1/part_00002.parquet 1.parquet 1 3
part=1/part_00003.parquet
part=2/part_00001.parquet 2.parquet 4 7
part=2/part_00002.parquet
3.parquet 8 10
part=2/part_00003.parquet

©2024 Databricks Inc. — All rights reserved

Data Layout - To partition or not partition
DO NOT partition unless you know why you are partitioning

● Over-partition is worse than no partition at all

○ small ﬁles kill performance
● Reasons to partition
○ Table size > 100TB
○ Isolating data for separate schemas (i.e. multiplexing)
○ Governance use cases where you commonly delete entire partitions of data
○ Physical boundary to isolate data is required
● Partition best practices
○ Keep partition size between 1GB and 1TB
○ Combine partition with Z-order
©2024 Databricks Inc. — All rights reserved
Data Skipping and Delta Lake Stats

● Databricks Delta Lake collects stats about the ﬁrst N columns

○ dataSkippingNumIndexedCols = 32
● These stats are used in queries
○ Metadata only queries: select max(col) from table
■ Queries just the Delta Log, doesn’t need to look at the files if col has stats
○ Allows us to skip files
■ Partition Pruning, Data Filters apply in that order
○ TimeStamp and String types aren’t always very useful
■ Precision/Truncation prevent exact matches, have to fall back to files sometimes
● Avoid collecting stats on long strings
○ Put them outside first 32 columns or collect stats on fewer columns
■ alter table change column col after col32
■ set spark.databricks.delta.properties.defaults.dataSkippingNumIndexedCols = 3

©2024 Databricks Inc. — All rights reserved

Data layout - Clustering
Sort the data in ways that you will need it

Z-Ordering Liquid Clustering

file_name col_min col_max

1.parquet 6 8

2.parquet 3 10

3.parquet 1 4

file_name col_min col_max

1.parquet 1 3

2.parquet 4 7

3.parquet 8 10

©2024 Databricks Inc. — All rights reserved

Liquid Clustering - No More partitions

● Fast
○ Faster writes and similar reads vs. well-tuned partitioned tables
● Self-tuning
○ Avoids over- and under-partitioning
● Incremental
○ Automatic partial clustering of new data
● Skew-resistant
○ Produces consistent ﬁle sizes and low write ampliﬁcation
● Flexible
○ Want to change the clustering columns? No problem!

©2024 Databricks Inc. — All rights reserved

Scenarios Beneﬁting from Liquid Clustering

● Tables often ﬁltered by high cardinality columns.

● Tables with signiﬁcant skew in data distribution.
● Tables that grow quickly and require maintenance and tuning
effort.
● Tables with concurrent write requirements.
● Tables with access patterns that change over time.
● Tables where a typical partition key could leave the table with
too many or too few partitions.

©2024 Databricks Inc. — All rights reserved

Data Layout - File Sizes
When it comes to performance, ﬁle size matters

Small File Size Large File Size

● Less data to read ● More data to read
● More ﬁles ● Less ﬁles
● Rewrite is cheaper ● Rewrite is expensive

delta.tuneFileSizesForRewrites

©2024 Databricks Inc. — All rights reserved

Deletion Vector
Amortisation of rewrite costs

Inserts Deletes Updates

w/o DV w/ DV w/o DV w/ DV w/o DV w/ DV

File File File File DV File File DV

rowNum, data rowNum, data rowNum, data rowNum, data rowNum, data rowNum, data
1, data 1, data 1, data 1, data 0 1, data 1, data 0
2, data 2, data 2, data 2, data 0 2, data 2, data 0
3, data 3, data 3, data 3, data 0 3, data 3, data 0
4, data 4, data 4, deleted data 4, data 1 4, new data 4, data 1
5, data 5, data 5, data 5, data 0 5, data 5, data 0
6, data 6, data 6, data 6, data 0 6, data 6, data 0
7, data 7, data 7, data 7, data 0 7, data 7, data 0
8, data 8, data 8, data 0 8, data 8, data 0

rowNum, data rowNum, data rowNum, data

1, data 1, data 1, new data

Full File Rewrite Full File Rewrite

©2024 Databricks Inc. — All rights reserved

Predictive Optimization
Bringing it all together automatically

● Scheduling of these optimization can be

tricky
● Mistakes can be made if users forget to
set up these process
● Predictive Optimization automatically
determines which operations to execute
based on usage
● Prioritise high return operations based on
expected beneﬁts

©2024 Databricks Inc. — All rights reserved

Code Optimization

©2024 Databricks Inc. — All rights reserved

Basics
1. In production jobs, avoid operations that trigger an action besides reading and
writing ﬁles. These include count(), display(), collect().
2. Avoid operations that will force all computation into the driver node such as
using single threaded python/pandas/Scala. Use Pandas API on Spark instead to
distribute pandas functions.
3. Avoid python UDFs which execute row-by-row. Instead use native pyspark
functions or Pandas UDFs for vectorized UDFs.
4. Use Dataframes or Datasets instead of RDDs. RDDs cannot take advantage of the
cost-based optimizer.

©2024 Databricks Inc. — All rights reserved 32

Controlled Batch Size For Streaming
• Defaults are large and non-deterministic:
• 1000 ﬁles per micro-batch
• No limit to input batch size

• Optimal mini-batch size → Optimal cluster usage

• Suboptimal mini-batch size → Performance cliff

•
Per Trigger Settings:
• Kafka
• maxOffsetsPerTrigger

• Delta Lake and Auto Loader

• maxFilesPerTrigger
• maxBytesPerTrigger

©2024 Databricks Inc. — All rights reserved

Broadcast Join
• The most performant type of join
• Distributes a small dataset across all worker
nodes to minimize shufﬂing and speed up query
execution.
• Triggered when smaller table/data-frame is
lesser than
spark.sql.autoBroadcastJoinThreshold
(10 MB by default)
• Control micro-batch size to trigger it when
joining with target table (e.g. during merge) in
Structured Streaming
• Prerequisite for Dynamic File Pruning (DFP)

©2024 Databricks Inc. — All rights reserved

Dynamic File Pruning (DFP)
• Intelligently skipping non-relevant data ﬁles
during selective joins, achieving up to 8x faster
performance

• Key Prerequisites:
• The join strategy is BROADCAST JOIN
• The join type is INNER or LEFT-SEMI

©2024 Databricks Inc. — All rights reserved

Incremental Processing via Streaming

▪ Consider streaming for all of your workloads:

▪ Incremental processing resulting in reduced latency and quick time to insight

▪ Built-in Checkpointing for exactly-once guarantees and fault tolerance

▪ Efﬁcient resource utilization

▪ Move to a CDC architecture pattern where you are only processing

change data will greatly reduce overall processing time

©2024 Databricks Inc. — All rights reserved 36

DLT vs Structured Streaming
Feature Structured Streaming Delta Live Tables

Autoloader ✅ ✅

Trigger Options ✅ ✅

Workflow Support ✅ ✅

Schema Evolution ✅ ✅

CDC with SCD Type 1 and 2 ❌ ✅

Data Quality Constraints + Monitoring ❌ ✅

Automatic Orchestration ❌ ✅

Concurrent Streaming Jobs on a Cluster ❌ (not recommended) ✅

Pipeline Observability ❌ ✅

Simplified Deployment + UI ❌ ✅

Enhanced Autoscaling ❌ ✅

Enzyme Runtime Engine ❌ ✅

Simple query is usually a fast query
Get to the results with the least amount of data and transformation

1. Predicates are pushed as far up as possible

a. Select the least amount of columns and rows that you need
b. Align data layout with commonly used predicates (ZORDER, LIQUID)
c. Make sure data is right sized (OPTIMIZE)

2. Simplify how you join your tables

a. Join the smallest tables ﬁrst / collect statistics for the optimizer to do it for you
b. Provide join hints if you can
c. Reduce unnecessary data movement, i.e. If you know your data layout and join keys you can use the right join
strategy (sort-merge v. shufﬂe hash vs broadcast)

3. Simplify operations
a. Be careful about expensive operations (distinct, sort, window)
b. UDFs are powerful but they are not fast, try to use native functions as much as possible

Diagnosis

Performance 5S’s
Skew | Spill | Shuffle | Storage (small ﬁles) | Serialization

Common Performance Bottlenecks
Encountered with any big data or MPP system

Symptom Details

Skew An imbalance in the size of partitions

Spill The writing of temp ﬁles to disk due to a lack of memory

Shufﬂe The act of moving data between executors

Small Files A set of problems indicative of high overhead due to tiny ﬁles

Serialization The distribution of code segments across the cluster

Skew - Mitigation

● Repartition Data (comes with caveats)

● Enable Adaptive Query Execution (AQE) in Spark 3

(enabled by default from DBR 7.3+)

● Employ skew hints

● Salting

Spill - Mitigation

Option #1 - Allocate a cluster with more memory per worker

Tip: Larger, fewer nodes > Smaller, more nodes

Option #2 - In the case of skew, address that root cause ﬁrst.

Option #3 - Decrease the size of each partition by increasing

the number of partitions
■ By managing spark.sql.shufﬂe.partitions
■ By explicitly repartitioning
■ By managing spark.sql.ﬁles.maxPartitionBytes

Shufﬂe - Mitigation

TIP: Don’t get hung up on trying to remove every shufﬂe

● Shufﬂes are often a necessary evil. Focus on the [more] expensive
operations instead. Many shufﬂe operations are actually quite fast.

● Reduce network IO by using fewer and larger workers

● Reduce the amount of data being shufﬂed

■ Narrow your columns
■ Preemptively ﬁlter out unnecessary records

Storage (Tiny Files) - Mitigation
Make sure you constantly optimize and vacuum your delta tables
■ OPTIMIZE will compact and Z-order/Cluster your ﬁles
■ Vacuum will delete old versions and cleanup the metadata
(Predictive Optimization automates it)

Enable auto-optimize in all your tables unless there is a good

reason for not doing so (e.g. streaming low latency requirements)

spark.databricks.delta.optimizeWrite.enabled = true
spark.databricks.delta.autoCompact.enabled = true

Serialization - Mitigation
● PySpark UDFs have signiﬁcant serialization overhead between JVM
and Python interpreter

● Act like “black box” and cannot be optimized by Spark's Catalyst optimizer,
leading to suboptimal execution

● Thus, wherever possible, don’t use UDFs!

● The native and SQL higher-order functions are very robust

● But if you have to…

■ Use Arrow Optimized Python UDFs
■ Use Vectorized UDFs aka Pandas UDFs

Diagnosing performance issues
Scheduling Time v. Running Time

● Same wall clock time != same performance

● Scheduling time has nothing to do with your code
○ Waiting for compute = Need running compute = Serverless / pre-started cluster
○ Waiting in queue = We need more concurrency = Increase max # of clusters

Diagnosing performance issues
Running Time Breakdown

● Same running time != Same time spent on execution

● Long Optimizing Query & Pruning Files Time = better stats collection

Diagnosing performance issues
Execution Details

Does the number of rows

Make sure photon is read make sense? Did you
close to 100% read too much data?

Disk cache meant your

data is already cached on
local storage
If your data read is
correct, are you reading
too many ﬁles or
partitions? Spill meant your
warehouse/cluster is too
small, i.e. Not enough RAM

Diagnosing performance issues
Understanding Query Proﬁle

● Execution can be broken down to

individual operations within your
query

● The most time spent operation is

likely where you need to start

● It should tell you which part of

the query is causing problem

● Knowing what is the problem

doesn’t mean it is an easy ﬁx

What now?

Key takeaways
Optimize only when necessary

● Know what you are optimizing towards

● Focus on the easy things ﬁrst (Platform -> Data -> Query)
● Leverage latest compute and features (i.e. Serverless Compute, Photon, Predictive
Optimization, Liquid Clustering etc)
● Scale vertically or horizontally based on indicators (complexity, concurrency)
● Pivot to end-to-end incremental processing, via streaming, wherever possible
● Extensively use observability and monitoring tools to guide optimization efforts
● Know when to stop optimizing and start building more useful things

Thank you

Databricks LakeHouse Architectre
No ratings yet
Databricks LakeHouse Architectre
10 pages
Delta Lake Data Engineering Overview
No ratings yet
Delta Lake Data Engineering Overview
59 pages
Getting Started With Databricks
No ratings yet
Getting Started With Databricks
39 pages
(Exam) Data Engineering Certification Prep Guide - Partners
No ratings yet
(Exam) Data Engineering Certification Prep Guide - Partners
15 pages
Databricks For The SQL Developer: Gerhard Brueckl
No ratings yet
Databricks For The SQL Developer: Gerhard Brueckl
40 pages
Lakehouse With Delta Lake Deep Dive
100% (2)
Lakehouse With Delta Lake Deep Dive
64 pages
Data Engineering 101 - Databricks Q&As
No ratings yet
Data Engineering 101 - Databricks Q&As
39 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
9 pages
Databricks Lakehouse & AI Overview
No ratings yet
Databricks Lakehouse & AI Overview
60 pages
Databricks Differences Abhishek
No ratings yet
Databricks Differences Abhishek
7 pages
Advanced Data Engineering With Databricks
No ratings yet
Advanced Data Engineering With Databricks
154 pages
Data Engineering With Databricks (Verma, Sumit) (Z-Library)
No ratings yet
Data Engineering With Databricks (Verma, Sumit) (Z-Library)
193 pages
Data Engineering Databricks
No ratings yet
Data Engineering Databricks
139 pages
Databricks Intermediate Guide
No ratings yet
Databricks Intermediate Guide
1 page
Databricks Data Engineer Associate Notes
100% (1)
Databricks Data Engineer Associate Notes
5 pages
Slide Deck Data Analysis With Databricks
No ratings yet
Slide Deck Data Analysis With Databricks
115 pages
Databricks Lakehouse Guide
No ratings yet
Databricks Lakehouse Guide
149 pages
Data Engineering With Databricks
No ratings yet
Data Engineering With Databricks
11 pages
DatabricksDataEngineer Associate2024
80% (5)
DatabricksDataEngineer Associate2024
157 pages
Cloud 2
No ratings yet
Cloud 2
3 pages
1 - Architecting For The Lakehouse
No ratings yet
1 - Architecting For The Lakehouse
115 pages
DE Bootcamp - Week 3 Day 2
No ratings yet
DE Bootcamp - Week 3 Day 2
4 pages
Azure Data Engineering Complete Guide
No ratings yet
Azure Data Engineering Complete Guide
130 pages
Matthieu - Lamairesse - Reda - Khouani - Why The Best Serverless Data Warehouse Is A Lakehouse - (DAIWT - PARIS)
No ratings yet
Matthieu - Lamairesse - Reda - Khouani - Why The Best Serverless Data Warehouse Is A Lakehouse - (DAIWT - PARIS)
38 pages
Databricks Best Practices
No ratings yet
Databricks Best Practices
25 pages
Execr
No ratings yet
Execr
4 pages
19 Databricks
No ratings yet
19 Databricks
28 pages
Databricks Training
100% (1)
Databricks Training
4 pages
Data Engineering With Databricks
100% (2)
Data Engineering With Databricks
63 pages
Databricks 1742506222
No ratings yet
Databricks 1742506222
24 pages
Radical Speed For SQL Queries On Databricks Photon Under The Hood
No ratings yet
Radical Speed For SQL Queries On Databricks Photon Under The Hood
48 pages
Python and Pyspark With Databricks, With Azure Project
No ratings yet
Python and Pyspark With Databricks, With Azure Project
9 pages
Delta Lake On Azure Databricks
No ratings yet
Delta Lake On Azure Databricks
18 pages
Delta Lake for Data Engineers
No ratings yet
Delta Lake for Data Engineers
4 pages
Data Pipelines W DLT (Template)
No ratings yet
Data Pipelines W DLT (Template)
89 pages
Data Engineering Cert Guide
No ratings yet
Data Engineering Cert Guide
15 pages
Lesson02 DatabricksPerfTuningHardware
No ratings yet
Lesson02 DatabricksPerfTuningHardware
30 pages
Big Book of Data Engineering 2nd Edition Final
No ratings yet
Big Book of Data Engineering 2nd Edition Final
97 pages
Azure Data Engineer Interview Questions - Part 1
No ratings yet
Azure Data Engineer Interview Questions - Part 1
19 pages
Simplify Your Streaming
No ratings yet
Simplify Your Streaming
27 pages
Test 12 File
No ratings yet
Test 12 File
18 pages
Azure Data Engineer + Databricks Content
No ratings yet
Azure Data Engineer + Databricks Content
7 pages
Azure Etl 1741608374
No ratings yet
Azure Etl 1741608374
14 pages
Databricks Certified Data Engineer Professional Exam Guide 1 Mar 2025
No ratings yet
Databricks Certified Data Engineer Professional Exam Guide 1 Mar 2025
6 pages
Azure Databricks Documentation
100% (1)
Azure Databricks Documentation
7,197 pages
Data Bricks S
No ratings yet
Data Bricks S
18 pages
Spark Development for Developers
No ratings yet
Spark Development for Developers
172 pages
Databricks 101 Crystal
No ratings yet
Databricks 101 Crystal
65 pages
O Reilly Data Lake Bootcamp Day 11694182865124
No ratings yet
O Reilly Data Lake Bootcamp Day 11694182865124
46 pages
Simplifying Data Engineering Databricks
100% (1)
Simplifying Data Engineering Databricks
20 pages
Must Know Before Your Next Databricks Interview
No ratings yet
Must Know Before Your Next Databricks Interview
7 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
54 pages
PDF Data Engineering Interview Questions and Answers
No ratings yet
PDF Data Engineering Interview Questions and Answers
18 pages
Ravi Databricks Best Practices 1655702853
No ratings yet
Ravi Databricks Best Practices 1655702853
29 pages
Databricks
No ratings yet
Databricks
81 pages
EoDA Open QA
No ratings yet
EoDA Open QA
1 page
Deloitte Pyspark Interview Questions For Data Engineer 2024 - by Ronit Malhotra - Jun, 2024 - Medium
No ratings yet
Deloitte Pyspark Interview Questions For Data Engineer 2024 - by Ronit Malhotra - Jun, 2024 - Medium
9 pages
BuildingEA StepByStep
No ratings yet
BuildingEA StepByStep
10 pages
LeanIX Playbook Using The Gartner TIME Framework For Application Rationalization EN
No ratings yet
LeanIX Playbook Using The Gartner TIME Framework For Application Rationalization EN
27 pages
EN 1P It Architecture Roles SAP LeanIX
No ratings yet
EN 1P It Architecture Roles SAP LeanIX
1 page
Swiggy Tech Org
No ratings yet
Swiggy Tech Org
4 pages
Whitepaper Service Mesh and API Management
No ratings yet
Whitepaper Service Mesh and API Management
18 pages
Consumer Goods Supply Chain - Minibook
No ratings yet
Consumer Goods Supply Chain - Minibook
33 pages
Dissertation Library Science
100% (2)
Dissertation Library Science
6 pages
English Split Up Grade 8
No ratings yet
English Split Up Grade 8
2 pages
GRE填空机经1500题生词伴侣
No ratings yet
GRE填空机经1500题生词伴侣
308 pages
Test Syllabus
No ratings yet
Test Syllabus
4 pages
EPC Contracts and NEC
No ratings yet
EPC Contracts and NEC
4 pages
Altronic Iii: Ignition System For Industrial Engines
No ratings yet
Altronic Iii: Ignition System For Industrial Engines
4 pages
MCM Settings - Ultimate Skyrim 4.2.0
No ratings yet
MCM Settings - Ultimate Skyrim 4.2.0
20 pages
Worksheet 1 Ch-2
100% (1)
Worksheet 1 Ch-2
2 pages
Unit 33. Creating and Building
100% (1)
Unit 33. Creating and Building
6 pages
Praveen Kumar's CV: Store & Purchase Expert
No ratings yet
Praveen Kumar's CV: Store & Purchase Expert
3 pages
Fitting An Origin-Displaced Logarithmic Spiral To Empirical Data by Differential Evolution Method of Global Optimization
No ratings yet
Fitting An Origin-Displaced Logarithmic Spiral To Empirical Data by Differential Evolution Method of Global Optimization
7 pages
Ialcee 2016
No ratings yet
Ialcee 2016
9 pages
A Review of The Role of Estrogen in Dermal Aging and Facial
No ratings yet
A Review of The Role of Estrogen in Dermal Aging and Facial
8 pages
Gmail - Your Application For Inside Sales Associate at Fresh Prints - Next Steps
No ratings yet
Gmail - Your Application For Inside Sales Associate at Fresh Prints - Next Steps
2 pages
Workshop 3 GROUP D
No ratings yet
Workshop 3 GROUP D
24 pages
3D BIM 2D Plan Submission (Sanitary)
No ratings yet
3D BIM 2D Plan Submission (Sanitary)
29 pages
Module 7 Studio Lighting One Light Setup and Multiple Light Setup
No ratings yet
Module 7 Studio Lighting One Light Setup and Multiple Light Setup
75 pages
Tema 1 Natural Science
No ratings yet
Tema 1 Natural Science
7 pages
RNA Structure and Functions Explained
No ratings yet
RNA Structure and Functions Explained
15 pages
Team Performance
100% (1)
Team Performance
33 pages
Hrafnagaldur Odins PDF
No ratings yet
Hrafnagaldur Odins PDF
120 pages
127 - Hse Inspection Checklist-Compressed Gas Cylinder
82% (11)
127 - Hse Inspection Checklist-Compressed Gas Cylinder
1 page
Teacher Assessment Strategies
No ratings yet
Teacher Assessment Strategies
3 pages
Mba-Syllabus (BGU)
No ratings yet
Mba-Syllabus (BGU)
240 pages
QB 91 C 24 Gax 0
No ratings yet
QB 91 C 24 Gax 0
3 pages
GFM Divine Healing Booklet
No ratings yet
GFM Divine Healing Booklet
56 pages
AMISP Daily Report 18-01-2025
No ratings yet
AMISP Daily Report 18-01-2025
15 pages
FCOM vs FCTM: Cockpit Prep Steps
No ratings yet
FCOM vs FCTM: Cockpit Prep Steps
6 pages
Nota Renewable Energy
No ratings yet
Nota Renewable Energy
97 pages
SAP S/4HANA Asset Accounting IFRS Guide
No ratings yet
SAP S/4HANA Asset Accounting IFRS Guide
39 pages