Looking at the New Features of
Apache NiFi
Timothy Spann
Principal Developer Advocate
Sunday October 8, 2023
4:10PM - 4:50 PM
Room 102
Slides, Code, Articles and More…
3
FLaNK Stack
Tim Spann
@PaasDev // Blog: www.datainmotion.dev
Principal Developer Advocate.
Princeton Future of Data Meetup.
ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC
https://medium.com/@tspann
https://github.com/tspannhw
Apache NiFi x Apache Kafka x Apache Flink
© 2023 Cloudera, Inc. All rights reserved. 4
Future of Data - New York + Princeton + Virtual
@PaasDev
https://www.meetup.com/futureofdata-princeton/
https://www.meetup.com/futureofdata-newyork/
From Big Data to AI to Streaming to Containers to
Cloud to Analytics to Cloud Storage to Fast Data to
Machine Learning to Microservices to ...
FLaNK Stack Weekly
This week in Apache NiFi, Apache Flink, Apache
Kafka, Apache Spark, Apache Iceberg, Python, Java,
AI, ML, LLM and Open Source friends.
https://bit.ly/32dAJft
My Talk List
Utilizing Real-Time Transit Data for Travel Optimization
Let’s Monitor the Conditions at the Conference
Agenda
Apache NiFi has a lot of new features, processors and best practices that have arrived
in the last year or so.
I will walk through building flows using the latest tips, techniques and processor.
I will and change a number of data flows utilizing the latest NiFi version and point out
gotchas and some never dos. The deck will act as a take-away with notes, tips and
guides to what we covered.
===> Any NiFi 1.23+ and 2.0 in progress features people want to see?
Records
New ExcelRecord Reader
AmazonGlueSchemaRegistry
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12353320
New to 2023 Processors
GenerateRecord
GetAsanaObject
PutSalesforceObject
QuerySalesforceObject
PutIoTDBRecord
QueryIoTDBRecord
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12353320
ListGoogleDrive
FetchGoogleDrive
PutGoogleDrive
PutBoxFile
ListBoxFile
FetchBoxFile
PutDropbox
DecryptContent
DecryptContentCompatibility
New to 2023 Processors
ExtractRecordSchema
RemoveRecordField
VerifyContentMAC
TriggerHiveMetaStoreEvent
“count” function added to RecordPath
AWS ML Service Processors
https://github.com/tspannhw/FLaNK-AWSML
AWS Translate
Deprecating for Removal
Deprecate Lua and Ruby Script Engines
Deprecate ECMAScript Script Engine
Deprecate the Ambari Reporting Task
Deprecate Kafka 1.x components and 2.0 components
XML Templates
Variables
See:
https://cwiki.apache.org/confluence/display/NIFI/Deprecated+Components+and+Features
Start Using
ExecuteStateless -> run your stateless flows right in a regular NiFi cluster
Parameters
JSON Flow Serialization
Records everywhere
© 2020 Cloudera, Inc. All rights reserved. 15
https://medium.com/cloudera-inc/getting-ready-for-apache-nifi-2-0-5a5e6a67f450
NiFi 2.0 Coming
● Python Integration
● Parameters
● JDK 17, maybe JDK 21+
● JSON Flow Serialization
● Rules Engine for Development Assistance
● Run Process Group as Stateless
● flow.json.gz
https://cwiki.apache.org/confluence/display/NIFI/NiFi+2.0+Release+Goals
Thanks to Pierre!
© 2019 Cloudera, Inc. All rights reserved. 18
Python as First Class (NIFI-11241)
Graphical UI with custom Python based extensions
NEW
in NiFi
2.0
© 2019 Cloudera, Inc. All rights reserved. 19
Apache NiFi in a few numbers
A very active project with a dynamic community & comparison with ACEU 2019
2800+ members on the Slack channel (535+ - 4 years ago)
475+ contributors on Github across the repositories (260+ - 4 years
ago)
65 committers in the Apache NiFi community (45 - 4 years ago)
Apache NiFi 1.23.2 is the latest release, NiFi 2.0 coming soon (NiFi
1.10 - 4 years ago)
14M+ docker pulls of the Apache NiFi image (1M+ - 4 years ago)
20
© 2023 Cloudera, Inc. All rights reserved.
Cloudera Edge Flow Manager
(Command & Control of MiNiFi Agents)
MiNiFi C++
(small footprint)
MiNiFi Java
(headless version of NiFi)
NiFi Registry
Cloudera NiFi for Kafka
Connect
NiFi in
Cloudera DataFlow Functions
Cloudera DataFlow
Stateless NiFi
NiFi Deploy Options from Open Source to Managed
21
© 2023 Cloudera, Inc. All rights reserved.
NiFi 2.0 is coming… https://medium.com/cloudera-inc/getting-ready-for-apache-nifi-2-0-5a5e6a67f450
- First-class citizen Python API
- Rules Engine
- NiFi Stateless at Process Group level
- Java 21 (virtual threads, perf improvements, etc)
https://medium.com/@george.vetticaden/accelerating-ai-data-pipelines-building-an-evernote-chatbot-with-apache-nifi-2-0-and-generative-ai-9d977466ff4c
Closing the gap between data engineers and data scientists…
- Export documentation (Sharepoint, OCR) to build the knowledge base powering your chatbot
- Scrape the internet (Sitemap) to build the knowledge base powering your chatbot
- Real-time streaming ingest of Slack to build the knowledge base powering your chatbot
DEMO
24
TH N Y U

CoC23_ Looking at the New Features of Apache NiFi

  • 1.
    Looking at theNew Features of Apache NiFi Timothy Spann Principal Developer Advocate Sunday October 8, 2023 4:10PM - 4:50 PM Room 102
  • 2.
  • 3.
    3 FLaNK Stack Tim Spann @PaasDev// Blog: www.datainmotion.dev Principal Developer Advocate. Princeton Future of Data Meetup. ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC https://medium.com/@tspann https://github.com/tspannhw Apache NiFi x Apache Kafka x Apache Flink
  • 4.
    © 2023 Cloudera,Inc. All rights reserved. 4 Future of Data - New York + Princeton + Virtual @PaasDev https://www.meetup.com/futureofdata-princeton/ https://www.meetup.com/futureofdata-newyork/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  • 5.
    FLaNK Stack Weekly Thisweek in Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Python, Java, AI, ML, LLM and Open Source friends. https://bit.ly/32dAJft
  • 6.
    My Talk List UtilizingReal-Time Transit Data for Travel Optimization Let’s Monitor the Conditions at the Conference
  • 7.
    Agenda Apache NiFi hasa lot of new features, processors and best practices that have arrived in the last year or so. I will walk through building flows using the latest tips, techniques and processor. I will and change a number of data flows utilizing the latest NiFi version and point out gotchas and some never dos. The deck will act as a take-away with notes, tips and guides to what we covered. ===> Any NiFi 1.23+ and 2.0 in progress features people want to see?
  • 8.
  • 9.
    New to 2023Processors GenerateRecord GetAsanaObject PutSalesforceObject QuerySalesforceObject PutIoTDBRecord QueryIoTDBRecord https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12353320 ListGoogleDrive FetchGoogleDrive PutGoogleDrive PutBoxFile ListBoxFile FetchBoxFile PutDropbox DecryptContent DecryptContentCompatibility
  • 10.
    New to 2023Processors ExtractRecordSchema RemoveRecordField VerifyContentMAC TriggerHiveMetaStoreEvent “count” function added to RecordPath
  • 11.
    AWS ML ServiceProcessors https://github.com/tspannhw/FLaNK-AWSML
  • 12.
  • 13.
    Deprecating for Removal DeprecateLua and Ruby Script Engines Deprecate ECMAScript Script Engine Deprecate the Ambari Reporting Task Deprecate Kafka 1.x components and 2.0 components XML Templates Variables See: https://cwiki.apache.org/confluence/display/NIFI/Deprecated+Components+and+Features
  • 14.
    Start Using ExecuteStateless ->run your stateless flows right in a regular NiFi cluster Parameters JSON Flow Serialization Records everywhere
  • 15.
    © 2020 Cloudera,Inc. All rights reserved. 15
  • 16.
    https://medium.com/cloudera-inc/getting-ready-for-apache-nifi-2-0-5a5e6a67f450 NiFi 2.0 Coming ●Python Integration ● Parameters ● JDK 17, maybe JDK 21+ ● JSON Flow Serialization ● Rules Engine for Development Assistance ● Run Process Group as Stateless ● flow.json.gz https://cwiki.apache.org/confluence/display/NIFI/NiFi+2.0+Release+Goals
  • 17.
  • 18.
    © 2019 Cloudera,Inc. All rights reserved. 18 Python as First Class (NIFI-11241) Graphical UI with custom Python based extensions NEW in NiFi 2.0
  • 19.
    © 2019 Cloudera,Inc. All rights reserved. 19 Apache NiFi in a few numbers A very active project with a dynamic community & comparison with ACEU 2019 2800+ members on the Slack channel (535+ - 4 years ago) 475+ contributors on Github across the repositories (260+ - 4 years ago) 65 committers in the Apache NiFi community (45 - 4 years ago) Apache NiFi 1.23.2 is the latest release, NiFi 2.0 coming soon (NiFi 1.10 - 4 years ago) 14M+ docker pulls of the Apache NiFi image (1M+ - 4 years ago)
  • 20.
    20 © 2023 Cloudera,Inc. All rights reserved. Cloudera Edge Flow Manager (Command & Control of MiNiFi Agents) MiNiFi C++ (small footprint) MiNiFi Java (headless version of NiFi) NiFi Registry Cloudera NiFi for Kafka Connect NiFi in Cloudera DataFlow Functions Cloudera DataFlow Stateless NiFi NiFi Deploy Options from Open Source to Managed
  • 21.
    21 © 2023 Cloudera,Inc. All rights reserved. NiFi 2.0 is coming… https://medium.com/cloudera-inc/getting-ready-for-apache-nifi-2-0-5a5e6a67f450 - First-class citizen Python API - Rules Engine - NiFi Stateless at Process Group level - Java 21 (virtual threads, perf improvements, etc) https://medium.com/@george.vetticaden/accelerating-ai-data-pipelines-building-an-evernote-chatbot-with-apache-nifi-2-0-and-generative-ai-9d977466ff4c Closing the gap between data engineers and data scientists… - Export documentation (Sharepoint, OCR) to build the knowledge base powering your chatbot - Scrape the internet (Sitemap) to build the knowledge base powering your chatbot - Real-time streaming ingest of Slack to build the knowledge base powering your chatbot
  • 22.
  • 24.