0% found this document useful (0 votes)

238 views21 pages

2019 Data & AI Trends & Challenges

This document provides an overview of the data and AI landscape in 2019. It discusses how issues of data privacy and regulation came to the forefront over the past year due to various data breaches and privacy scandals. The landscape chart depicts the vibrant but evolving ecosystem of data technologies and products, from infrastructure to analytics to machine learning. Regulation is spreading globally with laws like GDPR and CCPA, and big tech companies are facing increased pressure around data privacy practices.

Uploaded by

peters sillie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

238 views21 pages

2019 Data & AI Trends & Challenges

Uploaded by

peters sillie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

A Turbulent Year: The 2019 Data & AI

Landscape
It has been another intense year in the world of data, full of excitement but also
complexity.

As more of the world gets online, the “datafication” of everything continues to

accelerate. This mega-trend keeps gathering steam, powered by the intersection of
separate advances in infrastructure, cloud computing, artificial intelligence, open
source and the overall digitalization of our economies and lives.

A few years ago, the discussion around “Big Data” was mostly a technical one, centered
around the emergence of a new generation of tools to collect, process and analyze
massive amounts of data. Many of those technologies are now well understood, and
deployed at scale. In addition, over the last couple of years in particular, we’ve started
adding layers of intelligence through data science, machine learning and AI into many
applications, which are now increasingly running in production in all sorts of consumer
and B2B products.

As those technologies continue to both improve and spread beyond the initial group of
early adopters (FAANG and startups) into the broader economy and world, the
discussion is shifting from the purely technical into a necessary conversation around
impact on our economies, societies and lives.

We’re just starting to truly get a sense of the nature of the disruption ahead.
In a world where data-driven automation becomes the rule (automated products,
automated cars, automated enterprises), what is the new nature of work? How do we
handle the social impact? How do we think about privacy, security, freedom?

Meanwhile, the underlying technologies continue to evolve at a rapid pace, with an ever
vibrant ecosystem of startups, products and projects, heralding perhaps even more
profound changes ahead. In that ecosystem, the year was characterized by the early
innings of a long expected consolidation, and perhaps a passing of the guard from one
era to another as early technologies are starting to give way to the next generation.

To try and make sense of it all, this is our sixth landscape and “state of the union” of
the data and AI ecosystem. For anyone interested in tracking the evolution, here are
the prior versions: 2012, 2014, 2016, 2017 and 2018.

Worth noting: as the term “Big Data” has now entered the museum of once-hot
buzzwords, this year the chart will just be the “Data & AI Landscape”.

Also, to make the reading more digestible, we’ll break down the post into two parts:

Part I (this post) will include a few introductory thoughts on the rapidly evolving
context around data privacy and regulation, which will have a profound impact on what
can/cannot be done with data technologies; it will also include the landscape itself.

Part II will include a roundup of key trends on data infrastructure, analytics and
ML/AI.

Data, AI and society: The tide is shifting

In 2018, we noted how the data world had started to reveal some darker, scarier
undertones, in the wake of the Cambridge Analytica scandal in particular.

This trend continued to develop in 2019. There were more data breaches,
more privacy scandals. More stories of surveillance state in China
(including this report on a Muslim town in Northwest China). More freaky examples
of AI deepfakes, for which we are very unprepared.

As a result, the tide has started to shift in earnest.

Certainly, the debate around the dangers of AI, with all its sci-fi connotations, had
captured imaginations already, and this year has seen more initiatives around thinking
through those issues, such as the launch of Fei Fei Li’s Institute for Human-Centered
Artificial Intelligence.

But up until recently, questions around data ownership, privacy and security were met,
for almost everyone but a vocal minority, with a resounding yawn.

Perhaps more than ever, privacy issues jumped to the forefront of public
debate in 2019 and are now front, left and center. The fact that many of those issues
were related to Facebook, a service known to billions, probably played an important
role in sensitizing a much broader group of people around the world to the severity of
the issues.

The data privacy landscape is also shifting, as governments are increasingly getting
involved.

Regulation is certainly spreading in full force:

• GDPR, the European data protection and privacy regulation, came into effect in May
2018, and since then a few high profile fines have been announced including a €50
million fine issued to Google in January 2019 by the French data protection regulator
and a £500,000 fine issued to Facebook in October 2018 by the UK’s Information
Commissioner’s Office.
• The California Consumer Privacy Act (CCPA) will become effective on January 1, 2020.
• New York’s privacy bill is “even bolder” than California’s.
• San Francisco just voted to ban the use of facial recognition by city agencies.
• Illinois moved against video bots for hiring interviews.

Yet harsher government actions could take place. For starters, Facebook is likely to
be fined up to $5B by the FTC over privacy issues. Perhaps most importantly, there
have been increasing calls to break up the largest Internet franchises — too much
power, too much data and not enough privacy. The clearest target has been Facebook
(see this well- publicized opinion piece by one of its founders, Chris Hughes), but the
discussion has included others as well (a proposal from presidential candidate
Elizabeth Warren targets Google and Amazon).
Big Tech was already under pressure from within their own midst. Employees at
Google, Amazon and Microsoft protested against the commercialization of their face
recognition technology. Google relented. Amazon did not – some activist shareholders
and employees tried to put a ban into effect, but were defeated.

For the FAANGs, privacy has become a new battleground, forcing their leaders to take
much more of a public stance on the issue:

• Tim Cook, CEO of Apple, warned us about the “weaponization of data” which is leading
us into a “data industrial complex.”
• Sundar Pichai, CEO of Google, took a public stand on the issue in the NY Times.
• Mark Zuckerberg, CEO of Facebook, vowed to turn Facebook into a privacy-focused
messaging and social networking platform.

To which extent such statements should be taken for face value, of course, is anyone’s
guess, and probably depends on the specific company and leader.

In Facebook’s case, the launch of Libra, a global cryptocurrency, could arguably be

considered as a way to continue making money in a “post-data”, privacy-first world
where the company would be less reliant on a pure advertising model based on user
data – or as a way to collect even more personal data.

The debate around the impact of data and AI on privacy and society is obviously hugely
important, and it is fundamentally healthy that it has become much more central over
the last year or so.

Yet it is a complex discussion, which involves many nuances.

Our relationship to privacy continues to be a complicated one, full of mixed

signals. People say they care about privacy, but continue to purchase all sorts of
connected devices that have uncertain privacy protection. They say they are outraged
by Facebook’s privacy breaches, yet Facebook continues to add users and beat
estimates (both in Q4 2018 and Q1 2019).
In the same vein, how we decide to handle AI involves many trade-offs. As all
technologies, AI is intrinsically neutral, and whether it creates good or bad for
society is ultimately a human decision. Take face recognition for example: it can be a
tool for state surveillance, but it can also help locate victims of sex trafficking Deciding
how to regulate or curb AI, to the extent such a thing is even possible, would involve
all sorts of second order consequences that are hard to predict. For example, if you
regulate AI in the West, do you end up losing long term competitive advantage against
China, which has a different set of rules (leaving aside any discussion on values)?

Data technologies: A vibrant but evolving landscape

While it is impossible in 2019 to ignore the broader questions of privacy, security and
regulation around data and AI, the ecosystem of data technologies and products is as
exciting (and full!) as ever.

The ecosystem is also evolving into some interesting ways, as some pioneering
technologies such as Hadoop may be on their way out, replaced by cloud computing
and Kubernetes, and entire segments, such as Business Intelligence, seem to be
rapidly consolidating.

We’ll dig into those various trends in some detail, but first, here’s our 2019 Data & AI
Landscape:
Some key resources:

• View in full size: click here

• Underlying list: despite how busy the landscape is, we cannot possibly fit in every
interesting company on the chart itself. As a result, we have a
whole spreadsheet that not only lists all the companies in the landscape, but also
hundreds more – click here.

A few additional comments:

• Yes, you can zoom! The image and all logos are very high-res, so you can navigate
the landscape in detail by zooming. Works very well on mobile, too!
• This year, my FirstMark colleague Lisa Xu provided immense help with the
landscape.
• We’ve detailed some of our methodology in the notes at the end of this post.
• Thoughts and suggestions welcome – please use the comment section to this
post. We’ll probably publish two or three revisions of the chart until it’s fully final.

Who’s in, who’s out?

The last year (since our 2018 landscape) has been active from an exit perspective.
Several companies on the landscape went public. Crowdstrike (NASDAQ:CRWD) and
Elastic (NYSE:ESTC) reached big valuations at IPO time – $7B and $5B, respectively.
Other IPOs included PagerDuty ($1.8B), Anaplan ($1.8B), and Domo ($500M).

Some very large acquisitions occurred in the last year, including Qualtrics (acquired by
SAP for $8B), Medidata (acquired post-IPO by Dassault for $5.8B), Hortonworks
($5.2B merger with Cloudera), Imperva (acquired by Thoma Bravo for $2.1B),
AppNexus (acquired by AT&T for up to $2B), Cylance (acquired by BlackBerry for
$1.4B), Datorama (acquired by Salesforce for $800M), Treasure Data (acquired by
Arm for $600M), Attunity (acquired post-IPO by Qlik for $560M), Dynamic Yield
(acquired by McDonald’s for $300M), and Figure Eight (acquired by Appen for
$300M).

Notably, there has been a wave of consolidation in business intelligence in just

the last quarter: Tableau (acquired by Salesforce for $15.7B), Looker (acquired by
Google for $2.6B), Periscope Data (acquired by Sisense for $100M), ClearStory Data
(acquired by Alteryx for $20M), and Zoomdata (acquired by Logi Analytics).

Many other companies on the 2018 landscape were acquired for smaller amounts:
Alooma (Google), Bonsai (Microsoft), Euclid Analytics (WeWork), Sailthru (Campaign
Monitor), Data Artisans (Alibaba), GRIDSMART (Cubic), Drawbridge (LinkedIn),
Citus Data (Microsoft), Quandl (NASDAQ), Connotate (import.io), Datafox (Oracle),
Market Track (Vista Equity Partners), Lattice Engines (Dun & Bradstreet), Blue Yonder
(JDA Software), SimpleReach (Nativo).

Also worth noting, the AI acqui-hire by large Internet companies, a fixture of 2016-
2017, is not completely dead: Twitter acquired Fabula AI to strengthen its machine
learning expertise, for example.

On the investment front, Big Data and AI startups continued to see big financing
rounds. Investments in China were not quite as oversized as last year, when there were
multiple companies that raised over a billion dollars. Chinese companies that raised
large rounds this year included facial recognition company Face++ ($750M Series D),
AI chip maker Horizon Robotics ($600M Series B), fleet management company G7
($320M Series F), online tutoring platform Yuanfudao ($300M Series F).
In the US, huge investments went into autonomous vehicle companies, including
Cruise ($1.9B across 2 rounds in 2018 and 2019), Nuro ($940M Series B), and Aurora
($600M Series B). RPA companies also saw massive rounds: UiPath ($800M across 2
rounds in 2018 and 2019) and Automation Anywhere ($550M across 2 rounds in
2018).

Other major rounds of US companies on the landscape include Verily Life Sciences
($1B private equity round), Cambridge Mobile Telematics ($500M), Clover Health
($500M Series E), Veeam Software ($500M), Snowflake Computing ($450M Series
F), Compass ($400M Series F), Zymergen ($400M Series C), Dataminr ($392M Series
E), Lemonade ($400M Series D), Rubrik ($260M Series E), Databricks ($250M Series
E), and MediaMath ($225M Series D).
Part II: Major Trends in the 2019 Data
& AI Landscape
Part I of the 2019 Data & AI Landscape covered issues around the societal impact of
data and AI, and included the landscape chart itself. In this Part II, we’re going to dive
into some of the main industry trends in data and AI.

The data and AI ecosystem continues to be one of the most exciting areas of
technology. Not only does it have its own explosive momentum, but it also powers
and accelerates innovation in many other areas (consumer applications, gaming,
transportation, etc). As such, its overall impact is immense, and goes much
beyond the technical discussions below.

Of course, no meaningful trend unfolds over the course of just one year, and many of
the following has been years in the making. We’ll focus the discussion on trends that
we have seen particularly accelerating in 2019, or gaining rapid prominence in industry
conversations.

We will loosely follow the order of the landscape, from left to right: infrastructure,
analytics and applications.

INFRASTRUCTURE TRENDS

We see three big trends in infrastructure:

• A third wave? From Hadoop to cloud services to Kubernetes

• Data governance, cataloging, lineage: data management is increasingly important
• The rise of an AI-specific infrastructure stack

The data infrastructure world continues its own rapid evolution. The main arc here,
which has been playing out for years but seems to be accelerating, is a three phase
transition from Hadoop to the cloud services to a hybrid/Kubernetes
environment.

Hadoop is very much the “OG” of the Big Data world, dating back to an October
2003 paper. A framework for distributed storage and processing of massive amounts
of data using a network of computers, it played an absolutely central role in the
explosion of the data ecosystem.

Over the last few years, however, it has become a bit of a sport among industry watchers
to pronounce Hadoop dead. This trend accelerated further this year, as Hadoop
vendors ran into all sorts of trouble. MapR has been on the brink of shutting down
and may have found a buyer, at the time of writing. The recently merged Cloudera
and Hortonworks, fresh off their $5.2B merger had a rough day in June when the
stock plummeted 40% as a result of disappointing quarterly earnings. Cloudera has
announced a variety of cloud and hybrid products, but they have not launched yet.

Hadoop is running into increasing headwinds as a direct result of competition from

cloud platforms. Hadoop was developed at a time when the cloud was not a serious
option, most data was on-premise, network latency was a real bottleneck and therefore
keeping data and compute co-located made a lot of sense. The world has now changed.

However, it is unlikely that Hadoop is going to go away anytime soon. Its adoption may
slow down, but the sheer magnitude of its deployment across enterprises will give it
inertia and staying power for years to come.

Regardless, the transition to the cloud is clearly accelerating. Anecdotally, in

our conversations with Fortune 1000 executives, 2019 has felt like a real shift. Over the
last few years, it was almost a dirty secret that, for all the talk about the cloud, the real
action was on-prem, especially in regulated industries. Many of the same Fortune 1000
executives are actively moving to the cloud, with a particular segment of activity
involving traditional Microsoft shops making the switch to Azure.

As a result, the cloud providers continue to grow rapidly, despite their

already massive scale. AWS generated $25.7 billion in revenue in 2018, up 46.9%
from $17.5 billion in 2017. Microsoft Azure’s revenues aren’t disclosed separately but
grew 73% yoy for the quarter ended March 2019. Not a perfect comp but AWS’ revenue
grew 41% yoy for the same quarte

While cloud usage deepens, customers are beginning to balk at costs. In board
rooms all around the world, executives have suddenly taken notice of a line item that
used to be small and has now snowballed very rapidly: their cloud bill. The cloud does
offer agility, but it can often come at a high price, particularly if customers take their
eye off the meter or fail to accurately forecast their computing needs. There are many
stories of AWS customers like Adobe and Capital One that saw their bill grow 60%+
over just one year between 2017 and 2018, to well over $200M.

Costs, as well as concerns over vendor lock-in, have precipitated the evolution towards
a hybrid approach, involving a combination of public cloud, private cloud and on-
prem. Faced with a myriad of options, enterprises will increasingly select the best
tool for the job to optimize performance and economics. As cloud providers
more aggressively differentiate themselves, enterprises are adapting with multi-
cloud strategies that leverage what each cloud provider is best at. And in some cases,
the best approach is to keep (or even repatriate) some workloads back on-premises in
order to optimize economics, especially for non-dynamic workload

Interestingly, cloud providers are adapting to the reality that enterprise computing will
occur in a mix of environments by providing tools such as AWS Outposts which allows
customers to run compute and storage on-premises as well as seamlessly integrate on-
premise workloads with the rest of their applications in the AWS cloud.

In this new multi-cloud and hybrid cloud era, the rising superstar is
undoubtedly Kubernetes. A project for managing containerized workloads and
services open sourced by Google in 2014, Kubernetes is experiencing the same fervor
as Hadoop did a few years ago, with 8,000 attendees at its KubeCon event, and a never
ending stream of blog posts and podcasts. Many analysts believe that Red Hat’s
prominence in the Kubernetes world largely contributed to its massive acquisition by
IBM for $34B. The promise of Kubernetes is very much to help enterprises run their
workloads across their own datacenter and private cloud, as well as one or several
public clouds.
As an orchestration framework that’s particularly apt at managing complex, hybrid
environments, Kubernetes is also becoming an increasingly attractive option for
machine learning. Kubernetes gives data scientists the flexibility to choose
whichever language, machine learning library or framework they prefer, and train and
scale models, allowing for comparatively rapid iteration and strong reproducibility,
without having to be infrastructure experts, with the same infrastructure serving
multiple users (more here). Kubeflow, a machine learning toolkit for Kubernetes, has
been gaining rapid momentum.

Kubernetes is still relatively nascent, but interestingly, the above could signal an
evolution away from the cloud machine learning services, as data scientists may prefer
the overall flexibility and controllability of Kubernetes. We could be entering a third
paradigm shift for data science and ML infrastructure, from Hadoop (up until
2017?) to data cloud services (2017-2019) to a world dominated by Kubernetes and
next-generation data warehouses like Snowflake (2019-?).

The flipside of this evolution is increased complexity. There is certainly an

opportunity to provide a full platform that would abstract a lot of the cloud underlying
infrastructure complexity and make this brave new world more accessible to a broader
group of data scientists and analysts.

Serverless is one attempt at such simplification, albeit with a different angle. This
execution model enables users to write and deploy code without the hassle of worrying
about the underlying infrastructure. The cloud provider handles all backend services
and the customer is charged based on what they actually use. Serverless has certainly
been a key emerging topic in the last couple of years, and this is another new category
we’ve added to this year’s Data & AI Landscape. However, the applicability of serverless
to machine learning and data science is still a very much a work in progress, with
companies like Algorithmia and Iguazio/Nuclio being early entrants.

Another consequence of the increasingly hybrid nature of the data environment is in

the enterprise is the need to ramp up efforts to gain control of one’s data.

In a world where some data lives in a data warehouse, some in a data lake, some in
various other sources, across on-prem, private cloud and public cloud, how do you
find, curate, control and trace data? Those efforts take various related forms and
names, including data querying, data governance, data cataloging and data lineage, all
of which are gaining increasing importance and prominence.

Querying data across a hybrid environment is its own challenge, with solutions that
fall within the general trend of separating storage and compute (see
this video from Starburst Data, a company offering an enterprise version of SQL query
engine Presto, from our Data Driven NYC event).

Data governance is another area that’s rapidly becoming top of mind in the
enterprise. The general idea of data governance is to manage one’s data, and make sure
that it’s of high quality throughout the lifecycle of data It touches on areas such as data
availability, integrity, usability, consistency, integrity and security. Notably, in early
2019, Collibra raised a $100M round at over a $1B valuation.

Data catalogs are another increasingly important flavor of data management.

Effectively data catalogs are dictionaries that synthesize an enterprise’s various data
assets. They enable users, including data scientists, data analysts, developers and
business users, to discover and consume data in a self-service context. See this
good description by leading vendor Alation.

Finally, data lineage is perhaps the most recent category of data management to
emerge. Data lineage is meant to capture the “journey of data” across the enterprise. It
helps companies figure out how data was gathered, and how it was modified and shared
across its lifecycle. The growth of this segment is driven by a number of factors
including the increasing importance of compliance, privacy and ethics, as well as the
need for reproducibility and transparency of machine learning pipelines and models.
Here’s a good podcast on the topic from O’Reilly.

The final key trend that has been accelerating this year is the continued emergence of
an AI-specific infrastructure stack.

The need to manage AI pipelines and models has given rise to the rapidly growing
MLOps (or AIOps) category. To acknowledge this new-ish trend, we have added two
new boxes to this year’s Landscape, one under Infrastructure (with various early stage
startups including Algorithmia, Spell, Weights & Biases, etc.) and one under Open
Source (with a variety of projects, typically fairly early as well, including Pachyderm,
Seldon, Snorkel, MLeap, etc.).

ML engineers need to be able to run experiments and rapidly iterate, accessing

resources such as GPUs when needed. At our Data Driven NYC event, we have featured
a number of early stage startups providing such infrastructure including Spell (video),
Comet (video), Paperspace (video).

AI is having a profound impact on infrastructure even at the lower levels of the stack,
with the rise of GPU databases and the birth of a new generation of AI chips
(Graphcore, Cerebras, etc.). AI may be forcing us to rethink the entire nature of
compute.

ANALYTICS TRENDS

In analytics, we’ll highlight a couple of key trends:

• Business Intelligence (BI) is consolidating

• The action is moving to Enterprise AI platforms
• Horizontal AI continues to be very vibrant

In business intelligence, the unmistakable trend of the last few months has been
the burst of consolidation activity that we mentioned earlier in this post, with the
acquisitions of Tableau, Looker, Zoomdata and Clearstory, as well as the merger
between SiSense and Periscope (Henry Glaser, CEO of Periscope, had spoken at Data
Driven NYC last year).

With the benefit of 20/20 hindsight, consolidation in BI was somewhat inevitable, as

the data visualization and self-service analytics space had commoditized, with
a plethora of pure-play vendors. Every vendor, big and small, was under pressure
to diversify and expand capabilities. For cloud acquirers, those new product lines will
certainly add revenue, but more importantly, they have attachment power, as yet
another tool to help generate core platform revenue.
Will there be more consolidation in BI? Microsoft has a strong position with Power BI,
but M&A markets can have their own dynamic when an entire segment consolidates
and every company effectively is in play. AWS may have a stronger product need,
considering its QuickSight BI is generally thought to be a bit behind.

As BI consolidates, the heat continues to increase in the data science and machine
learning platform segments. The deployment of ML/AI in the enterprise is a
mega-trend that is still in its early innings, and various players are rushing to build
the platform of choice.

For most companies in the space, the clear goal is to facilitate the democratization of
ML/AI, making its benefits accessible to larger groups of users and companies, in a
context where the ongoing talent shortage in ML/AI continues to be a major
bottleneck to broad adoption. However, different players have different strategies.

One approach is AutoML. It involves automating entire parts of the machine learning
lifecycle, including some of the most tedious ones. Depending on the product, AutoML
will handle anything from feature generation and engineering, algorithm selection, and
model training, deployment and monitoring. DataRobot, an AutoML specialist,
raised a $100M Series D (and allegedly more since) since our 2018 Landscape.

Other companies in the space, such as Dataiku, H20 and RapidMiner offer platforms
that feature AutoML capabilities too, but also offer broader capabilities. Dataiku,
for example, raised a large $101M Series C since our 2018 Landscape, with an overall
philosophy of empowering entire data teams (both data scientists and data analysts),
and abstract away a lot of the complexity and tediousness involved in handling the
entire lifecycle of data (for a great overview, see this video of a presentation by Florian
Douetteau, CEO at of Dataiku) [Disclaimer: FirstMark is an investor in Dataiku].

The cloud providers are of course active, with Microsoft’s Learning Studio, Google’s
Cloud AutoML and AWS Sagemaker. Despite the might of the cloud providers, those
products are still reasonably narrow in their scope – generally hard to use and largely
targeting very technical, advanced users. They’re also still very much
nascent. Sagemaker, Amazon’s cloud machine learning platform, reportedly had
a slow start in 2018, with only $11M in sales to the commercial sector.
Some cloud providers are actively partnering with pure play players in the space:
Microsoft participated in the $250M Series E of Databricks, perhaps a prelude to a
future acquisition.

Beyond the enterprise AI platforms, the world of horizontal AI (including computer

vision, NLP, voice, etc.) continues to be incredibly vibrant.

We had covered the world of AI research in a previous post: Frontier AI: How far are
we from artificial “general” intelligence, really?.

Since that post, some of the key trends in AI include:

• major improvements in NLP, particularly through the application of transfer learning

(which involves training a model on a large amount of data, and the porting it and fine-
tuning it for the specific problem one is working on) to make it work with less data:
see ELMO, ULMFit and, most importantly, BERT from Google AI
• More efforts to make AI work with less data, including 1-shot learning
• combining deep learning with reinforcement learning
• continued progress in GANs

For more, see two great reports that just came out: State of AI Report 2019 by Nathan
Benaich and The State of AI: Divergence by MMC Ventures.

APPLICATION TRENDS

As we complete our journey through the 2019 landscape from the left to the right of
the chart, a couple of key trends to highlights in applications:

• ML/AI hits the deployment phase in the enterprise

• The rise of enterprise automation and RPA

At this stage, we are probably 3 or 4 years into a journey of trying to build ML/AI
applications for the enterprise.
There were certainly some awkward product attempts (first generation chatbots) and
some big marketing claims well ahead of reality, especially from older companies
trying to retrofit ML/AI into existing products.

But, bit by bit, we’ve entered the deployment phase of ML/AI in the enterprise,
going from curiosity and experimentation to actual use in production. The trend for
the next few years seems clear: take a given problem, see if ML/AI (more often than
not, deep learning, or a variation thereof) can make a difference, and if so, build an AI
application to address the problem more effectively.

This deployment phase will occur in a variety of ways. Some products will be built and
deployed by internal teams using the enterprise AI platforms mentioned above. Others
will be full-stack products with embedded AI, offered by various vendors, where the
AI part might be largely invisible to the customer. Yet others will be provided by
vendors offering a mix of products and services (for an example of this approach, see
this talk by Jean-Francois Gagne, CEO of Element AI).

Certainly, it is still very much early days. Internal teams often started with discreet
projects addressing one use case (e.g., churn prediction), and are starting to expand to
other problems. Many startups building ML/AI applications are still learning about the
challenges of going from R&D mode to a fully scaled out operation (I wrote a few
thoughts on the topic in this earlier blog post: Scaling AI Startups).

However, maturity is coming. There’s been a tremendous amount of learning for

anyone deploying ML/AI in real life applications in the last few years, about what the
technology can and cannot do, and we are starting to get a better sense for the right
allocation of tasks between the machine and the human. See this talk by Dennis
Mortensen, CEO of x.ai, about lessons learned building one of the first AI applications
out there. Next generation customer service chatbots, for example, offer a much
smarter mix between ML/AI and configurability and transparency, for the ultimate
benefit of end users. See this great talk on the topic by Mike Murchison, CEO of Ada,
an emerging leader in Automated Customer Experience at Data Driven NYC.
[Disclaimer: FirstMark is an investor in both x.ai and Ada]
Projecting into the future, as ML/AI gradually becomes pervasive with the support of
an increasingly high performance data stack, are we seeing the dawn of the fully
automated enterprise?

Since Information Technology appeared, enterprises have been plagued by siloisation,

with various systems and data spread across departments, unable to communicate with
each other (which gave rise to the massive system integration services industry), and
humans acting as “glue” in between. In a world where data and systems become
increasingly integrated, and ML/AI enables to gradually remove humans from certain
functions, it becomes more possible than ever to imagine enterprises functioning in an
increasingly automated, systematic way.

For example, imagine an automated enterprise where an increase in demand

(predicted via ML) automatically triggers an increase in order from suppliers, which
would be automatically recorded in the finance system (which could automatically
compute and pay compensation bonuses, etc.); or an anticipated decrease in demand
could automatically trigger a corresponding increase in performance marketing spend,
etc.

There is a futuristic world where enterprises become not only fully automated
organizations, but eventually also self-healing and autonomous, a topic which we had
explored in our presentation on AI and blockchain last year.

However, we’re far from that stage, and today’s reality is largely focused on RPA. This
is a red hot category, with leaders such as UI Path and Automation Anywhere
growing very fast and raising mega-rounds, as mentioned above.

RPA, short for Robotic Process Automation (although, perhaps disappointingly, it does
not leverage any actual robot), involves taking generally very simple workflows,
typically manual (performed by humans) and repetitive, and replacing them by
software. A lot of RPA takes place in back office functions (e.g., invoice processing).

RPA is propelled by a very strong tailwind around digital transformation that

has been accelerating over the last couple of years in particular. Several RPA leaders
had been around for years (UiPath was founded in 2005), but “suddenly” hit hockey
stick growth when digital transformation became the topic du jour. It also offers a
strong ROI as its implementation can be directly compared to the cost of humans
performing the same task. RPA is also very attractive to the tech services behemoths
because it involves a large amount of implementation services (as the software needs
to be configured for a myriad different workflows); therefore RPA startups have
benefited from strong partnerships with those large services firms.

There are perhaps reasons to be cynical about RPA. Some consider it to be largely
unintelligent “band aid”, or a stopgap measure of sorts – take an inefficient workflow
performed by humans, and just have the machine do it. From that perspective, RPA
may be simply creating the next level of technical debt, and it is unclear what happens
to automated RPA functions as the environment around them changes, other than
leading to the need to more RPA to reconfigure the old task to its new environment.
RPA, at this stage at least, is more about automation than intelligence, more about
rules-based solutions than AI (although several RPA vendors taut their AI capabilities
in marketing materials).

RPA should be distinguished from intelligent automation, which is a more

emerging category centered around ML/AI. Intelligent automation also targets
enterprise processes and workflows, but it is more data centric than it is process
centric, and will ultimately be able to learn, improve and heal.

One example of intelligent automation is intelligent document processing (ADP),

a category where ML/AI can be leveraged to understand documents (forms, invoices,
contracts, etc.) at levels comparable or better than humans, except at massive
scale. See this talk by Hyperscience at Data Driven NYC for more context [disclaimer:
FirstMark is an investor in HyperScience].

It will be particularly interesting to observe those spaces in the next few years, and it is
possible that RPA and intelligent automation will merge, either through M&A or
through the launch of new homegrown products, unless the latter progresses so rapidly
that is limits the need for the former.

____________________
NOTES:

1) As every year, we couldn’t possibly fit all companies we wanted on the chart. While
the general philosophy of the chart is to be as inclusive as possible, we ended up having
to be somewhat selective. Our methodology is certainly imperfect, but in a nutshell,
here are the main criteria:

• Everything being equal, we gave priority to companies that have reached some level of
market significance. This is a reasonably easy exercise for large tech companies. For
growing startups, considering the limited amounts of data available, we often used
venture capital financings as a proxy for underlying market traction (again, probably
imperfect). So everything else being equal, we tend to feature startups that have raised
larger amounts, typically Series A and beyond.
• Occasionally, we made editorial decisions to include earlier stage startups when we
thought they were particularly interesting.
• On the application front, we gave priority to companies that explicitly leverage Big
Data, machine learning and AI as a key component or differentiator of their offering.
it is a tricky exercise at a time when companies are increasingly crafting their
marketing around an AI message, but we did our best.
• This year as in previous years, we removed a number of companies. One key reason for
removal is that the company was acquired, and not run by the acquirer as an
independent company.. In some select cases, we left the acquired company as is in the
chart when we felt that the brand would be preserved as a reasonably separate offering
from that of the acquiring company.

2) As always, it is inevitable that we inadvertently missed some great companies in the

process of putting this chart together. Did we miss yours? Feel free to add thoughts
and suggestions in the comments.

3) As we get a lot of requests every year: feel free to use the chart in books, conferences,
presentations, etc – two obvious asks: (i) do not alter/edit the chart and (ii) please
provide clear attribution (Matt Turck, Lisa Xu and FirstMark Capital).

4) Disclaimer: I’m an investor through FirstMark in a number of companies

mentioned on this 2019 Data & AI Landscape, specifically: ActionIQ, Ada, Cockroach
Labs, Dataiku, Frame.ai, Helium, HyperScience, Kinsa, Text IQ, Timber, Sense360 and
x.ai. Other FirstMark portfolio companies mentioned on this chart include Bluecore,
Engagio, Graffiti, HowGood, Payoff, Knewton, Insikt, Optimus Ride, and Tubular. I’m
a small personal shareholder in Datadog.

FTI Trends 2019 Hi PDF
100% (1)
FTI Trends 2019 Hi PDF
381 pages
SummIT Dosssier
No ratings yet
SummIT Dosssier
75 pages
Cuarta Opcional
No ratings yet
Cuarta Opcional
103 pages
Harvard Tech Moves 2024 Technology Trends Report
No ratings yet
Harvard Tech Moves 2024 Technology Trends Report
36 pages
Big Data Trends in 2019 - DATAVERSITY
No ratings yet
Big Data Trends in 2019 - DATAVERSITY
9 pages
Tech Trends Report
No ratings yet
Tech Trends Report
248 pages
Digital Economy Compass 2019
100% (1)
Digital Economy Compass 2019
230 pages
WP Contentuploads202304AI Now 2023 Landscape Report FINAL - Pdf#page3 4
No ratings yet
WP Contentuploads202304AI Now 2023 Landscape Report FINAL - Pdf#page3 4
103 pages
1.9 Data Privacy and Ethics
No ratings yet
1.9 Data Privacy and Ethics
5 pages
Why Data Is The Future
No ratings yet
Why Data Is The Future
6 pages
The 7 Biggest Technology Trends in 2020 Everyone Must Get Ready For Now
No ratings yet
The 7 Biggest Technology Trends in 2020 Everyone Must Get Ready For Now
5 pages
2020 AI Trends
No ratings yet
2020 AI Trends
3 pages
State of Digital 2024 Presentation
No ratings yet
State of Digital 2024 Presentation
70 pages
Ai Trends 2020 White Paper
No ratings yet
Ai Trends 2020 White Paper
16 pages
Module 7 - Biggest Issues It Faces Today
No ratings yet
Module 7 - Biggest Issues It Faces Today
8 pages
Tech Trends 2025 1734007332
No ratings yet
Tech Trends 2025 1734007332
72 pages
The State of AI 2019 Divergence PDF
No ratings yet
The State of AI 2019 Divergence PDF
151 pages
FTI Tech Trends 2022 AI
No ratings yet
FTI Tech Trends 2022 AI
73 pages
TR2024 Artificial-Intelligence FINAL LINKED
No ratings yet
TR2024 Artificial-Intelligence FINAL LINKED
122 pages
Software Latest Update
No ratings yet
Software Latest Update
22 pages
Data Protection On Cyber Space
100% (1)
Data Protection On Cyber Space
43 pages
Govtech Trends 2025
No ratings yet
Govtech Trends 2025
5 pages
Computer Science Trends & Issues
No ratings yet
Computer Science Trends & Issues
7 pages
FTI Trends 2020 PDF
100% (2)
FTI Trends 2020 PDF
366 pages
Session 3 4 Data Literacy Privacy Ethics
100% (1)
Session 3 4 Data Literacy Privacy Ethics
19 pages
India’s 2020 AI & Data Science Trends
No ratings yet
India’s 2020 AI & Data Science Trends
15 pages
Alben Formatted Final
No ratings yet
Alben Formatted Final
23 pages
Data Literacy
No ratings yet
Data Literacy
5 pages
MINDSHARE - Trends 2024
No ratings yet
MINDSHARE - Trends 2024
66 pages
Black and Grey Clean Minimalist Modern Nature Magazine
No ratings yet
Black and Grey Clean Minimalist Modern Nature Magazine
7 pages
Emerging Tech for Innovators
No ratings yet
Emerging Tech for Innovators
31 pages
Us Gps Tech Trends 2024
No ratings yet
Us Gps Tech Trends 2024
4 pages
AI Data Trends 2024: Key Insights
No ratings yet
AI Data Trends 2024: Key Insights
20 pages
Mittr-X-Databricks Survey-Report Final 090823
No ratings yet
Mittr-X-Databricks Survey-Report Final 090823
25 pages
9 Current Technology Trends
No ratings yet
9 Current Technology Trends
32 pages
The Future of Tech - Top New Technology Innovations
No ratings yet
The Future of Tech - Top New Technology Innovations
3 pages
FTI 2023 Trend Report
No ratings yet
FTI 2023 Trend Report
820 pages
CH 10
No ratings yet
CH 10
33 pages
ch7 IT Trends Issues and Challenges
No ratings yet
ch7 IT Trends Issues and Challenges
18 pages
OW Rethinking Data Governance
No ratings yet
OW Rethinking Data Governance
6 pages
DataStax-MIT Report Final
No ratings yet
DataStax-MIT Report Final
18 pages
Current Trends and Emerging Technologies
No ratings yet
Current Trends and Emerging Technologies
13 pages
IEEE Computer Society Predicts The Future of Tech: Top 10 Technology Trends For 2019
No ratings yet
IEEE Computer Society Predicts The Future of Tech: Top 10 Technology Trends For 2019
12 pages
Whose Data Is It Anyway - Report
No ratings yet
Whose Data Is It Anyway - Report
14 pages
Evolution It Book
No ratings yet
Evolution It Book
5 pages
9 Mega Technology Trends Ebook
100% (1)
9 Mega Technology Trends Ebook
114 pages
Emerging 4
No ratings yet
Emerging 4
7 pages
The Evolution and Impact of AI and Data Science
No ratings yet
The Evolution and Impact of AI and Data Science
4 pages
IT TRENDS ISSUES AND CHALLENGES Soft
75% (4)
IT TRENDS ISSUES AND CHALLENGES Soft
18 pages
Trends 25 Cuts Through AI Noise To Drive Your Initiatives Forward
No ratings yet
Trends 25 Cuts Through AI Noise To Drive Your Initiatives Forward
23 pages
BAI Sessions 2
No ratings yet
BAI Sessions 2
25 pages
The Age of Data
No ratings yet
The Age of Data
24 pages
Chapter 2-IT, Culture, Trends PDF
100% (1)
Chapter 2-IT, Culture, Trends PDF
31 pages
Module 6: It Culture and The Society: A. Cloud Computing
No ratings yet
Module 6: It Culture and The Society: A. Cloud Computing
8 pages
Your Data Literacy Depends On Understanding The Types of Data and How They're Captured
No ratings yet
Your Data Literacy Depends On Understanding The Types of Data and How They're Captured
5 pages
AI's Impact on Business Amid COVID
No ratings yet
AI's Impact on Business Amid COVID
4 pages
Mit Bringing Breakthrough Data Intelligence To Industries
No ratings yet
Mit Bringing Breakthrough Data Intelligence To Industries
32 pages
Superhuman's Product/Market Fit Strategy
No ratings yet
Superhuman's Product/Market Fit Strategy
17 pages
Tech in Construction
No ratings yet
Tech in Construction
12 pages
8 Pillars of Innovation Articles
0% (1)
8 Pillars of Innovation Articles
4 pages
Age of Sigmar Firestorm en PDF
No ratings yet
Age of Sigmar Firestorm en PDF
1 page
Building 101 - RFIs
No ratings yet
Building 101 - RFIs
4 pages
Miniature Painting: NMM Guide
100% (1)
Miniature Painting: NMM Guide
14 pages
Constructionexecutive201905 DL PDF
No ratings yet
Constructionexecutive201905 DL PDF
88 pages
Hello World
No ratings yet
Hello World
1 page
Difin 2013 SM Plichta
No ratings yet
Difin 2013 SM Plichta
18 pages