Democratization-of-Deep-Learning - Updated Brand
Democratization-of-Deep-Learning - Updated Brand
MACHINE LEARNING
CAR
NOT
CARCAR
NOT CAR
DEEP LEARNING
CAR
NOT
CARCAR
NOT CAR
Deep learning models can be trained to perform complicated tasks such as image or speech recognition and
determine meaning from these inputs. A key advantage is that these models scale well with data and their
performance will improve as the size of your data increases.
2
INPUT HIDDEN LAYERS OF CHARACTERISTICS OUTPUT
IS THIS A GIANT
PANDA, A RED PANDA
OR A RACOON?
Deep learning not only performs best with larger volumes of data, but also requires powerful hardware such as
graphical processing units (GPUs) in order to perform. The deep learning market is expected to be worth $1.75
billion by the year 20221. Investment in this area is driven by the fact that 61% of enterprises with an innovation
strategy are applying AI to their data to find previously missed opportunities such as process improvements or
new revenue streams2.
IMAGE CL ASSIFICATION
VOICE RECOGNITION
This is the ability of a deep learning model to receive and interpret dictation or to understand
and carry out spoken commands. Models are able to convert captured voice commands to
text and then use natural language processing to understand what is being said and in what
context. This has delivered massive benefits to industries like automotive which uses voice
commands to enable drivers to make phone calls and adjust internal controls – all without
taking their hands off the steering wheel, thereby improving safety.
3
ANOMALY DETECTION
This deep learning technique strives to recognize abnormal patterns which don’t match the
behaviors expected for a particular system, out of millions of different transactions. These
applications can lead to the discovery of an attack on financial networks, fraud detection in
insurance filings or credit card purchases, even isolating sensor data in industrial facilities
signifying a safety issue4.
RECOMMENDATION ENGINES
These models analyze user actions in order to provide recommendations based on user
behavior. Recommendation engines are critical components of e-commerce sites such
as Overstock.com which uses a recommendation engine to provide accurate suggestions
of products to users for future purchases based on their shopping history. This massively
reduces the friction for the user and provides efficient revenue streams for the company.
SENTIMENT ANALYSIS
This leverages deep learning-heavy techniques such as natural language processing,
text analysis, and computational linguistics to gain clear insight into customer opinion,
understanding of consumer sentiment, and measuring the impact of marketing strategies5.
A real-world application of this particular type of AI is deployed by Riot Games to better
understand and combat abusive language that can occur during in-game experiences in
order to improve the user experience.
VIDEO ANALYSIS
Deep learning models have made it possible to process and evaluate vast streams of video
footage for a range of tasks including threat detection, which can be used in airport security,
banks, and sporting events. Media companies like Viacom leverage video analysis to ensure
that lag is eliminated and the user experience is maximized.
4
A Deep Learning Workflow
A generalized workflow for building and training a deep learning model consists of steps that vary in complexity.
This spans the ingestion of data, through network architecture choice, to production.6
CREATE YOUR TRAINING DATA SET - This can include a wide variety of data types from a wide variety of
sources needed to train a model, which may include additional effort to obtain labels or target variable values.
ANALYZE THE DATA - It is critical to clean and organize the data in order to eliminate errors and
discrepancies.
DESIGN YOUR ARCHITECTURE - The key is to understand the type of problem you are trying to solve and
then choosing the right architecture for the job.
TUNE YOUR HYPERPARAMETERS - Getting the best results in deep learning requires experimenting with
different values for training hyperparameters.
TRAIN THE MODEL - In this step, you provide the data to the learning algorithm which performs optimization
routines to produce the best model it can.
EVALUATE PERFORMANCE - This validates the ability of the model to confidently perform forecasting and
estimation — the actual “thinking” of the model — against an unseen (“test”) dataset.
5
What is Powering Deep Learning?
If you were to start doing deep learning, which framework would you use? That question can be answered by
understanding the problem you are trying to solve. Most frameworks support different programming languages,
offer varying levels of architectural complexity, different degrees of performance, and a variety of deep
learning algorithms suitable for specific use cases. Here’s an overview of some of the common deep learning
frameworks available.
MXNET – is a deep learning framework designed for both efficiency and flexibility.
It allows you to mix symbolic and imperative programming to maximize efficiency
and productivity.
6
It needs to be noted that even with all of the available frameworks and greater understanding of how to
pursue AI through machine learning and deep learning, there are still substantial limitations to current
abilities of artificial intelligence. Trying to compare the capabilities of a deep learning application against the
capabilities of an actual human is going to be disappointing to say the least – it’s currently not an apples to
apples comparison. A reason for this is that while a person may easily recognize the nuance associated with
knowing for example how to clearly distinguish the differences between images of a toddler and an infant,
a deep learning program will find this incredibly difficult to do with consistent accuracy. The performance of
any artificial intelligence will only be as good as the data that it is being fed, and if the data itself is either
incorrect or incomplete, the performance will be simply wrong. Processing a lot of data is easy, but feature
learning is very difficult7.
But, as deep learning algorithms evolve to recognize nuances and overcome their current limitations,
the future of AI is bright. What’s most critical is for big data analytics platforms to handle the growing
complexities of these algorithms as well as the scale of the data that will be required for better model training.
Additionally, each of these frameworks provide different strengths for different deep learning approaches
and their own unique challenges. For example, Caffe is a strong option for image classification but can
be resource intensive and the framework itself is difficult to deploy8. Things can get very hard, and very
complicated, very quickly for Data Science and Engineering teams.
Also, the effectiveness of deep learning, or artificial intelligence of any type, rests on the quality of the
infrastructure that powers it. The infrastructure should be viewed as a multiplier of the effectiveness of your
AI. Neural networks come in various architectures, the performance of which is a function of the architecture,
which can be challenging for traditional engineering teams to manage, let alone a data science team. Also,
the processing requirements of an AI infrastructure can be massive, requiring specialized – and expensive –
processors such as GPU’s (graphical processing units) in order to perform the mathematical computations
that power deep learning models9.
7
From a resource standpoint, training an accurate deep learning model can be extremely taxing. Parameters
need to be tweaked to create a great model, and this step can be manually intensive. This puts a lot of pressure
on a data science team as the number of decisions required to develop successful deep learning models can
be incredibly time-consuming. Significant time and money can be wasted if poor decisions are made10. Also,
since deep learning models are complex, it requires a significant amount of data to accurately train a model.
Databricks offers a unified data analytics platform that allows organizations to build reliable, fast, and scalable
deep learning pipelines on Apache Spark that enable data science teams to build, train, and deploy deep
learning applications with ease.
BI INTEGRATIONS
DATA SCIENCE WORKSPACE scikit
8
• Unified Infrastructure: Databricks offers a fully managed, serverless, cloud infrastructure that simplifies
operations, delivers elasticity, and enables greater cost control. Data engineers will benefit from unified
workflows that simplify data preparation and ETL11 and an API that makes it easy to work with major
frameworks such as TensorFlow, Keras, PyTorch, MXNet, Caffe, CNTK.
• Unified Workflows: A single platform for end-to-end workflow automation from data preparation to
exploration, modeling, and large-scale prediction. Databricks also offers high-level APIs to leverage
TensorFlow models more easily.
• Performance Optimized: Databricks offers unparalleled processing performance at scale for deep learning
powered by Databricks Runtime. Having the ability to crunch massive volumes of data and build highly
performant data pipelines allows for more accurate deep learning models.
• M
odel Management: Databricks natively integrates with MLflow - an open source framework for managing
the complete Machine Learning lifecycle - so that data scientists can easily track experiments, reproduce
results, and deploy models virtually anywhere.
• C
ollaborative Data Science: For data science teams, a collaborative workspace helps teams work better
together to interactively explore data and train deep learning models against real-time data sets with the
flexibility of multi-programming language support. These are all critical elements for any organization that
is serious about AI.
Through the use of the Databricks unified data analytics platform, data teams can benefit from a fully
managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost
of ownership.
Conclusion
Artificial intelligence through deep learning will drive innovations in IT for the foreseeable future. The rise
of big data has helped to bring this to reality and the benefits are just beginning to be realized. AI is already
being explored across many industries, and as the technology improves, will have a major impact on how it
can be used to solve real-world problems. The opportunity to bring AI to the mainstream is here, and
Databricks has a vision for making deep learning more accessible to a broad range of users
The Databricks unified analytics platform simplifies the integration of scalable deep learning into
organizational workflows, from machine learning practitioners to business analysts, and it helps fulfill the
promise of AI.
9
Next Steps
Below are some resources to help you along your deep learning journey:
WEBINAR SERIES
Deep Learning Fundamentals
This webinar series cover the fundamentals of deep
learning with TensorFlow and Keras on Databricks.
ON-DEMAND WEBINAR
Simple Steps to Distributed
Deep Learning On Databricks
Watch how to easily migrate from single machine to distributed
deep learning with Databricks HorovodRunner.
ON-DEMAND WEBINAR
Simple Distributed Deep Learning
Model Inference
We provide a reference end-to-end pipeline for distributed
deep learning model inference using the latest features from
Apache Spark and Delta Lake.
C U S TO M E R S TO R Y:
Simplify Distributed
TensorFlow Training for Fast Image
Categorization at Starbucks
10
Get started with a free trial of Databricks
and start building deep learning applications today
S TA R T Y O U R F R E E T R I A L
REFERENCES
1. https://www.marketsandmarkets.com/Market-Reports/deep-learning-market-107369271.html
2. https://www.forbes.com/sites/gilpress/2016/07/20/artificial-intelligence-rapidly-adopted-by-enterprises-survey-says/#5da1818212da
3. http://www.bandt.com.au/marketing/three-ways-image-recognition-technology-will-change-retail-marketing-2017
4. http://digitalcommons.unf.edu/cgi/viewcontent.cgi?article=1783&context=etd
5. http://www.oreilly.com/data/free/artificial-intelligence-now.csp
6. https://databricks.com/blog/2017/06/06/databricks-vision-simplify-large-scale-deep-learning.html
7. http://www.oreilly.com/data/free/files/what-is-artificial-intelligence.pdf
8. https://www.infoworld.com/article/3163525/analytics/review-the-best-frameworks-for-machine-learning-and-deep-learning.html?upd=1512493932276
9. https://www.forbes.com/sites/janakirammsv/2017/08/07/in-the-era-of-artificial-intelligence-gpus-are-the-new-cpus/2/#5118f7024efa
10. http://bytes.schibsted.com/deep-learning-changing-data-science-paradigms/
11. https://databricks.com/blog/2016/12/21/deep-learning-on-databricks.html
A B O U T DATA B R I C K S :
Databricks is the data and AI company. Founded by the original creators of Apache Spark™, Delta Lake and MLflow, Databricks simplifies data and AI so data teams can collaborate
and innovate faster. More than five thousand organizations worldwide —including Comcast, Shell, Starbucks and Regeneron — rely on Databricks as a unified platform for massive-
scale data engineering, collaborative data science, full-lifecycle machine learning and business analytics. Venture-backed and headquartered in San Francisco (with offices around
the globe) Databricks is on a mission to help data teams solve the world’s toughest problems. To learn more, follow Databricks on Twitter, LinkedIn and Facebook.
© Databricks 2020. All rights reserved. Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation. Privacy Policy | Terms of Use