0% found this document useful (0 votes)

145 views108 pages

GTC'24 Special Event - Build A RAG-powered Application With A Human Voice Interface (SE62869) - Deck - FINAL - 1714408879420001sjpp

Uploaded by

tvuongpham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views108 pages

GTC'24 Special Event - Build A RAG-powered Application With A Human Voice Interface (SE62869) - Deck - FINAL - 1714408879420001sjpp

Uploaded by

tvuongpham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 108

Build a RAG-Powered Application

with a Human Voice Interface

GTC Spring 2024 – SE62869 – Thur, March 21, 2024 @ 8:00 AM – 12:00 PM PDT

Ruchi Gupta, VP Product Management, Kore.ai

Ravi Teja Konkimalla, Sr. Solution Architect, ML, Quantiphi
Ka Wai Leung, Hybrid Cloud Alliance Manager, HPE
Dan Lesovodski, VP of AI, Data Monsters
Annie Surla, Developer Advocate Engineer, NVIDIA
Sven Chilton, Deep Learning Developer Advocate, NVIDIA
Ryan Kraus, Sr. Technical Marketing Engineer, NVIDIA
Agenda
8:00 AM – 9:30AM
Hands-On: Building Voice & RAG Powered Applications
NVIDIA
• Retrieval Augmented Generation
• Thinking about Multimodality
• The Basics of Speech AI

10:30 AM – 11:30 AM
Real-World Implementations:
10:00 AM – 10:30 AM
Powering CX with AI (Kore.ai)
10:30 AM – 11:00 AM
Transforming UX with RAG-Powered Human Voice
Interfaces (Quantiphi)
11:00 AM – 11:30 AM
Enhancing CX with GenAI-Based Virtual Assistant
(HPE & Data Monsters)

11:30 AM – 12:00 PM
Q&A Panel
Hands-On:
Building Voice & RAG Powered Applications

Annie Surla, Developer Advocate Engineer, NVIDIA

Sven Chilton, DL Developer Advocate, NVIDIA
Ryan Kraus, Sr. Technical Marketing Engineer, NVIDIA
Retrieval Augmented Generation

Ryan Kraus, Sr. Technical Marketing Engineer, NVIDIA

Why RAG, Why Now and Why it is so Exciting
RAG us finally starting to deliver on the promise of AI—unlocking the value of ALL enterprise data
Long, Long Ago…
…. November 30, 2022
… But Wait!

Unknown
Data Sour
ces

ia l
e c g y
Sp olo
pt m in
o m Ter
Pr ering
i n e Outdated
En g n s
Knowledg tio
e in a
lu c
a l
Difficult H
to tune
We Can Rebuild
We have the technology
The Depths of the Tech Debt…

w l
S p ra
API
One Framework to Bring Them All …
… and in the darkness bind them.

All we have to do is:

Give the model a clear and concise
answer in the question!

🤔
Give the model a clear and concise
answer in the question?

😕
Can’t an LLM Do It? Well… Yeah!
Multi-query prompting
Can’t an LLM do it? Well… Yeah!
Hypothetical Document Embedding (HyDE)
Can’t an LLM Do It?
Engineering Design Spectrum
Creativity

Philosopher
Answers must be correct.
Brainstorming
Answers will require some
Wrong answers are welcomed.
assumptions to be made and
Large logical leaps required
tested. Reasonable minds may
disagree.
High risk tolerance.
Moderate risk tolerance.

High Accuracy Low-Latency

Subject Matter Assistants and

Experts Co-Pilots
Answer must be correct. Wrong answers are tolerated
Answers are needed soon, Repeatable and reliable
but not now. answers are required.

Low risk tolerance. Moderate risk tolerance.

Consistency
How to Get Started with RAG
www.nvidia.com/generative-ai-chatbots

Evaluate Models in Try NVIDIA Developer Apply for a free

NVIDIA API Catalog RAG Examples in GitHub NVIDIA LaunchPad Trial
ai.nvidia.com nvidia.github.io/GenerativeAIExamples www.nvidia.com/launchpad
RAG Talks, Special Events & More at GTC 2024
Learn how Retrieval Augmented Generation (RAG) is Transforming Generative AI

Talks:
• Beyond RAG Basics: Building Agents, Co-Pilots, Assistants, and More!, NVIDIA
• Practical Strategies for Building Enterprise Applications Powered by LLMs, NVIDIA
• Financial Knowledge Graphs for Retrieval Augmented Generation, BlackRock
• New Offerings from NVIDIA to Overcome the Complexities of Generative AI, NVIDIA
• A Guide to Building Safe Generative AI Copilots that Improve Productivity and Protect
Company Data, NVIDIA
• The Future of AI Chatbots: How Retrieval-Augmented Generation is Changing the Game,
Blackrock, NVIDIA, ServiceNow
• Accelerating Enterprise: Tools and Techniques for Next Generation AI Deployment, NVIDIA
• Perform High-Efficiency Search, Improve Data Freshness, and Increase Recall with GPU-
Accelerated Vector Search and RAG Workflows, Zilliz, NVIDIA
• Re-Imagine Service Assurance Chatbots With LLMs and RAG, Tata Consultancy Services
(TCS)

Special Events (SE):

• Best Practices for Building LLM RAG Using NVIDIA AI [SE]
• Build a RAG-Powered Application With a Human Voice Interface [SE], NVIDIA, Quantiphi,
Kore.ai, HPE, Data Monsters
March 18–21 | www.nvidia.com/gtc | #GTC24
Connect with Experts (CWE):
• Building Generative AI Applications With Retrieval Augmented Generation [CWE]

Deep Learning Institute Workshops (DLI):

• Large-Scale Production Deployment of RAG Pipelines [DLI ]
• Building a GPU-Accelerated Retrieval Augmented Generation (RAG) Pipeline [DLI]
• Streamlining Enterprise Data Operations with Multimodal RAG and LangChain [DLI]
Thinking About Multimodality

Annie Surla, Developer Advocate Engineer, NVIDIA

Multimodal Data in the Wild!
Multimodal RAG - What Can it Do?

What deep learning models were

used to benchmark the relative
performance of H100?

Text Query Qu
er
y Ve
ct
or

Text Embedding
Model
Knowledge Base
(Images and Text)

ar
il ks
i m n
S hu
Answer C

Large Language
Model
Why is Multimodality Hard?
Multimodal RAG - Approaches for Multimodal Retrieval

Images
Transform all modalities into a single Vector
Space
Multimodal VectorStore
Text Documents Embedding Model

Images

Image Embedding VectorStore

Model
Different Modalities in Different
VectorStores

Text Documents

Text Embedding VectorStore

Model

Transform
Grounding all modalities into one Images

Text Embedding VectorStore

Text Documents Model
Multimodal RAG - Preprocessing Workflow

No Image
Description

Is this Yes Extract

Generate Image Image Image
Description Chart/Plot? Description
Chart
Description
Metadata

Linearized table as text

Images (stored as metadata)

Extract Clean
LLM
Text Splitter & Meta Chunks
Webpages Texts
Data Augmentation
Extract Text (includes Tables), Chart/Plot
Structured
Figures from PDFs Custom Logic Summary
JSON (as a chunk)

Embedding Model

Vector Store
Multimodal RAG - Preprocessing Workflow

No Image
Description

Is this Yes Extract

Generate Image Image Image
Description Chart/Plot? Description
Chart
Description
Metadata

Linearized table as text

Images (stored as metadata)

Extract Clean
LLM
Text Splitter & Meta Chunks
Webpages Texts
Data Augmentation
Extract Text (includes Tables), Chart/Plot
Structured
Figures from PDFs Custom Logic Summary
JSON (as a chunk)

Embedding Model

Vector Store
Multimodal RAG - Preprocessing Workflow

No Image
Description
This image is a bar chart comparing the relative
performance of different NVIDIA GPU
various machine learning Extract
Is thisacross Yes
accelerators
Generate Image Image models and tasks. The chart Image
has six categories
Description Chart/Plot? Description
Chart
Description represented by the machine learning models or
tasks, which are RNN-T, 3D U-Net, Mask R- Metadata
CNN, ResNet-50 v1.5, RetinaNet, and BERT.

Linearized table as text

Images (stored as metadata)

Extract Clean
LLM
Text Splitter & Meta Chunks
Webpages Texts
Data Augmentation
Extract Text (includes Tables), Chart/Plot
Structured
Figures from PDFs Custom Logic Summary
JSON (as a chunk)

Embedding Model

Vector Store
Multimodal RAG - Preprocessing Workflow

TITLE | Relative Performance – Per Accelerator<0x0A>Higher is

Better | NVIDIA A100 (v2.1 - Available) | NVIDIA H100 (v2.1 -
Preview) | NVIDIA H100 (v3.0 - Available)<0x0A>RNN-T | 1.0 |
1.7 | 1.8 | 1.8<0x0A>3D U-Net | 1.0 | 1.8 | 1.8 | 1.97<0x0A>Mask
No Image Description
R-CNN | 1.0 | 1.97 | 1.97 | 2.09<0x0A>ResNet-50 v1.5 | 1.0 |
1.95 | 2.07 | 2.10<0x0A>RetinaNet | 1.0 | 2.2 | 2.22 |
This image is a bar chart comparing the relative 2.28<0x0A>BERT | 1.0 | 2.65 | 3.07 | 3.08
performance of different NVIDIA GPU
accelerators across various machine learning Is this Yes Extract
Generate
models Image
and tasks. Image
The chart has six categories Image
Description Chart/Plot? Description
Chart
Description
represented by the machine learning models or
tasks, which are RNN-T, 3D U-Net, Mask R- Metadata
CNN, ResNet-50 v1.5, RetinaNet, and BERT.

Linearized table as text

Images (stored as metadata)

Extract Clean This is a comparison of the relative performance of

LLM
various accelerators from NVIDIA (A100 v2.1, H100
Text Splitter & Meta Data Chunks
v2.1 preview, H100 v3.0 available) using different
Webpages Texts models (RNN-T, 3D U-Net, Mask R-CNN, ResNet-50
Augmentation
v1.5, RetinaNet, BERT). Higher values indicate better
Extract Text (includes Tables), performance. The A100 has a baseline score of 1.0 Chart/Plot
Structured for all models. The H100 v2.1 preview and v3.0
Figures from PDFs Custom Logic Summary
JSON available perform 1.7-2.2x and 1.8-3.08x better than
(as a chunk)
the A100, respectively, depending on the model used.

Embedding Model

Vector Store
Multimodal RAG - Preprocessing Workflow

No Image
Description

Is this Yes Extract

Generate Image Image The Image Depicts A Network With Three Servers, Each With A
Image
Description Chart/Plot? Description
Chart
Single Node. Each Server Has A Single Disk Drive, A Single
Description Network Card, And A Single CPU. The Servers Are Connected
Metadata
To A Central Server, Which Is Connected To The Network Via A
Cable. The Cable Connects The Central Server To The Servers,
And The Cable Connects Each Server To Its Respective
Network Card. The Network
Linearized tableCard
as Has
textA Storage Capacity Of 1
GB, And Each Server Has An Additional 1 GB Of Storage. The
Images (stored
Central Server Has as
Twometadata)
Network Interfaces, One For The Server
And The Other For The Network Card And Storage.

Extract Clean
LLM
Text Splitter & Meta Chunks
Webpages Texts
Data Augmentation
Extract Text (includes Tables), Chart/Plot
Structured
Figures from PDFs Custom Logic Summary
JSON (as a chunk)

Embedding Model

Vector Store
I

Multimodal RAG - Preprocessing Workflow

'''Setting New Records in MLPerf Inference v3.0 with Full-Stack

Optimizations for AI \n\nNVIDIA L4 Tensor Core GPU vaults ahead No Image
\n\nNVIDIA L4 \n\nTensor Core GPU\n\nIn MLPerf Inference v3.0, NVIDIA
made the debut submission of the . Based on the new NVIDIA Ada Description
Lovelace architecture, L4 is the successor to the popular NVIDIA T4
Tensor Core GPU, delivering significant improvements for AI, video,
and graphics in the same single-slot, low-profile PCIe form factor.
\n\nNVIDIA Ada Lovelace architecture incorporates 4th Gen Tensor Is this Yes Extract
Cores with FP8, enabling excellent inference performance even at high Generate Image Image Image
accuracy. In MLPerf Inference v3.0, L4 delivered up to 3x more
Description Chart/Plot? Description
Chart
performance than T4 at 99.9% of the reference (FP32) accuracy of Description
BERT—the highest BERT accuracy level tested in MLPerf Inference v3.0.
\n\n\n\nFigure 2. MLPerf Inference performance comparison between
Metadata
NVIDIA L4 and NVIDIA T4 GPUs \n\nPer-accelerator throughput is not a
primary metric of MLPerf Inference. MLPerf Inference v3.0: Datacenter
Closed. Inference speedups calculated by dividing the inference
throughputs reported in MLPerf Inference v0.7 result ID 0.7-113 by Linearized table as text
the number of accelerators to calculate T4 Tensor Core GPU per-
accelerator throughput and calculating the ratios of the inference Images (stored as metadata)
performance of the L4 Tensor Core GPU in \n\n
www.mlcommons.org\n\n3.0-0123 (Preview) by the calculated per-
Extract
of MLCommons Association in the United States andClean
accelerator throughput of T4. The MLPerf name and logo are trademarks
other countries. LLM
All rights reserved. Unauthorized use strictly prohibited. See for
more information. \n\nThe NVIDIA L4 also incorporates a large L2
cache, providing additional opportunity Text Splitter & Meta Chunks
Webpages
to increase performance and energy efficiency. In the NVIDIA MLPerf Texts
Inference v3.0 submission, two key software optimizations were Data Augmentation
Extract Text (includes Tables),
implemented to take advantage of the larger L2 cache: cache residency Chart/Plot
and persistent cache management. \n\nThe larger L2 cache on L4 Structured
Figures from PDFs
enabled the MLPerf workloads entirely within the cache. The L2 cache
Custom Logic Summary
provides higher bandwidth at lower power than GDDR memory, so the
significant reduction in GDDR accesses helped to both increase
JSON (as a chunk)
performance and reduce energy use. \n\nUp to 1.4x higher performance
was observed when batch sizes were optimized to enable the workloads
to fit entirely within the L2 cache, compared to the performance when
batch sizes were set to maximum capacity. \n\nL2 cache
persistence\n\nNVIDIA Ampere architecture\n\nAnother optimization
used the feature first introduced in the . This enables developers, Embedding Model
with a single call to TensorRT, to tag a subset of the L2 cache so
that it can be prioritized for retention (that is, scheduled to be
evicted last). This feature is especially useful for inference when
working under a regime of residency, as developers can target the
memory being reused for layer activations across model execution,
dramatically reducing GDDR write bandwidth usage. \n\n'''

Vector Store
Multimodal RAG - Inference Workflow

Image
VQA - VQA
MLLM Answer

Slack User Embedding Retrieved Text from Final

VectorStore LLM
Interface Query Model Chunks Chart/Plot Response

Plain Text
Multimodal RAG - Inference Workflow

Image
VQA
This is a comparison of the relative performance of various accelerators from -
NVIDIA VQA
(A100 v2.1, H100 v2.1 preview, H100 v3.0 available) using different models (RNN-T, 3D
MLLM
U-Net, Mask R-CNN, ResNet-50 v1.5, RetinaNet, BERT). Higher values indicate better Answer
performance. The A100 has a baseline score of 1.0 for all models. The H100 v2.1
preview and v3.0 available perform 1.7-2.2x and 1.8-3.08x better than the A100,
respectively, depending on the model used.

Slack User Embedding Retrieved Text from Final

VectorStore LLM
Interface Query Model Chunks Chart/Plot Response

'''Breaking MLPerf Training Records with NVIDIA H100 GPUs \n\n3D U-Net \n\n
NVIDIA submitted results on 432 NVIDIA H100 Tensor Core GPUs, achieving a new
Plain
record for the benchmark of 0.82 minutes (49 seconds) to train. Per-accelerator
performance on H100 also improved by 8.2% compared to the prior round. \n\nTo
Text
achieve excellent performance at scale, a faster GroupBatchNorm kernel was one
key optimization. \n\nIn our largest scale 3D U-Net submission, the instance
normalization operation in the neural network needs to perform a reduction of
the tensor mean and variance across four GPUs. By using a faster GroupBatchNorm
kernel to implement instance normalization, we delivered a 1.5% performance
increase. \n\n''’
Multimodal RAG - Inference Workflow

The Image Depicts A Network With Three Servers, Each With A Single Node. Each Server Has A Single Disk
Drive, A Single Network Card, And A Single CPU. The Servers Are Connected To A Central Server, Which Is
Connected To The Network Via A Cable. The Cable Connects The Central Server To The Servers, And The
Cable Connects Each Server To Its Respective Network Card. The Network Card Has A Storage Capacity Of
1 GB, And Each Server Has An Additional 1 GB Of Storage. The Central Server Has Two Network Interfaces,
One For The Server And The Other For The Network Card And Storage.

Image VQA - VQA

MLLM Answer

Slack User Embedding Retrieved Text from Final

VectorStore LLM
Interface Query Model Chunks Chart/Plot Response

Plain Text
Multimodal RAG - Inference Workflow

Image
VQA - VQA
MLLM Answer

Slack User Embedding Retrieved Text from Final

VectorStore LLM Response
Interface Query Model Chunks Chart/Plot

TITLE | Relative Performance – Per Accelerator<0x0A>Higher is Better | NVIDIA A100 (v2.1 -

Available) | NVIDIA H100 (v2.1 - Preview) | NVIDIA H100 (v3.0 - Available)<0x0A>RNN-T | 1.0
| 1.7 | 1.8 | 1.8<0x0A>3D U-Net | 1.0Plain Text
| 1.8 | 1.8 | 1.97<0x0A>Mask R-CNN | 1.0 | 1.97 | 1.97 |
2.09<0x0A>ResNet-50 v1.5 | 1.0 | 1.95 | 2.07 | 2.10<0x0A>RetinaNet | 1.0 | 2.2 | 2.22 |
2.28<0x0A>BERT | 1.0 | 2.65 | 3.07 | 3.08
Multimodal RAG - Inference Workflow

Image
VQA - VQA
MLLM Answer

Slack User Embedding Retrieved Text from Final

VectorStore LLM Response
Interface Query Model Chunks Chart/Plot

Plain Text

Breaking MLPerf Training Records with NVIDIA H100 GPUs \n\n3D U-Net
\n\nNVIDIA submitted results on 432 NVIDIA H100 Tensor Core GPUs, achieving
a new record for the benchmark of 0.82 minutes (49 seconds) to train. Per-
accelerator performance on H100 also improved by 8.2% compared to the prior
round. \n\nTo achieve excellent performance at scale, a faster GroupBatchNorm
kernel was one key optimization. \n\nIn our largest scale 3D U-Net submission,
the instance normalization operation in the neural network needs to perform a
reduction of the tensor mean and variance across four GPUs. By using a faster
GroupBatchNorm kernel to implement instance normalization, we delivered a
1.5% performance increase.
Multimodal RAG - Inference Workflow

Image
VQA - VQA
MLLM Answer
Retriever

VectorStore
Slack User Retrieved Text from Final
LLM
Interface Query Chunks Chart/Plot Response
Embedding
Model
Plain Text
Multimodal RAG - Demo
The Basics of Speech AI

Sven Chilton, Deep Learning Developer Advocate, NVIDIA

What is Speech AI?
Any machine learning system or application involving audio data representing human speech

Automatic Speech Recognition (ASR) Text-to-Speech (TTS)

aka Speech-to-Text (STT, S2T) aka Speech Synthesis

ASR Model "good morning" "good morning" TTS Model

Neural Machine Translation (NMT)

Spanish English
speech speech

Text
ASR Model "buenos días" Translation "good morning" TTS Model
Model
How Do Computers Perceive Speech?
Audio Waveform Signal (Mel) Spectrogram

Pure Tone

Singing a note

Speaking

*different scale vs top 2 plots

Outline of a Typical Speech-To-Text System

Transcription "good morning"

Turn speech vectors into

Decoder text (tokens)

Speech Neural
Representation Network
Vectors

Turn complex speech signal

Encoder into compact vector
representation

STT often = ASR Mel-Spectrogram

(Automatic Speech
Recognition)
Canary Model Demo - Multilingual Transcription & Translation
Text-To-Speech: Controllable Audio Synthesis
Controllable features

Transcription "cat" Controlling word-level emphasis at

inference:
Speaker embedding Speech signal
The food here isn’t
that bad.
F0 (pitch)
TTS model

Emotion
The food here isn’t
that bad.
Language

Speech Rate
The Inefficiency of Adding New Speakers in TTS

• Most models operate more like a switch

• Incorporating new speakers typically requires retraining and substantial amount of data
• Figuring out new speaker embeddings for novel speakers is challenging

Speech signal of
Alice or Bob
Transcription "cat"

Speaker embedding TTS model

”Alice”
Or
Speaker embedding
“Bob”
Reference-Based Text-To-Speech: Handling Novel Samples

• Reformulate model to condition on short reference samples

• How do we get the model to generalize to novel reference samples?

Speech signal

Transcription "cat"
TTS model
Reference
Speech signal
P-Flow: TTS with 3 Seconds of Reference Audio

• Model learns to fill-in-the-blank

• Reconstruction is also provided with temporally aligned text to enable text-to-speech

1st place in LIMMITS 2024 Multilingual

Zero Shot TTS Track!
Input Prompt Synthesized Input Prompt Synthesized
Speech (Kannada) Speech (English)

P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting, Sungwon Kim et al. Neurips 2023
A3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing, He Bai et al. ICML 2022
What is Neural Machine Translation?

Spanish text English text

Historically: Text
“NMT” = text-to-text translation "buenos días" Translation "good morning"
model

Spanish
speech

If you want to translate speech?

=> Cascade a speech-to-text Text
transcription and text-to-text ASR Model "buenos días" Translation "good morning"
Model
translation model

Automatic
Recently: development of end- Speech
Translation "good morning"
to-end speech-to-text (AST) model
translation models

ASR – Automatic Speech Recognition

Speech & Text Translations are Attention-Based Encoder-Decoder Models

Text Translation Model

Historically: Such text translation models

were trained first, & then techniques were
applied to speech translation
Example of RAG with Speech AI

LLM – Large Language Model | RAG – Retrieval-Augmented Generation

How to Get Started with NVIDIA Riva
Make your conversational applications talk in many languages

Experience Riva APIs Enroll to Protype, Test & Deploy Contact NVIDIA AI Enterprise
See Riva in Action
Through NVIDIA API Catalog* Your Own App on NVIDIA LaunchPad

Webpage: nvidia.com/en-us/ai-data-science/products/riva/
Documentation: docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html
GitHub: github.com/nvidia-riva
Contact Us: nvidia.com/en-us/data-center/products/ai-enterprise/contact-sales/
Speech & Translation AI Talks, Special Event & More at GTC 2024
Join to learn the latest speech & translation AI achievements & how to use them with GenAI-based conversational applications

All Speech & Translation AI GTC’24 Events

Talks:
• Speech AI Demystified, NVIDIA
• Speaking in Every Language: A Quick Start Guide to TTS Models for Accented, Multilingual
Communication, NVIDIA
• Mastering Speech for Multilingual Multimedia Transformation, OVHcloud, NVIDIA
• Adapting Conformer-Based ASR Models for Conversations Over the Phone, PolyAI
• Secure AI-Driven Translation in Video Conferencing, Pexip
• Behind the Scenes of Running a Conversational Character in a 3D Scene, Convai Technologies

Special Events (SE):

• Build Speech AI for Multilingual Multimedia Transformation [SE], HPE, Data Monsters, Kore.ai,
Quantiphi, NVIDIA

Connect with Experts (CWE):

• Multi-Speaker ASR with NVIDIA NeMo Toolkit —Training & Inference [CWE]

March 18–21 | www.nvidia.com/gtc | #GTC24 Deep Learning Institute Workshops (DLI):

• Talk to Your Data in Your Native Language [DLI]
Real-World Voice & RAG Implementations

Ruchi Gupta, VP Product Management, Kore.ai

Ravi Teja Konkimalla, Sr. Solution Architect, ML, Quantiphi
Ka Wai Leung, Hybrid Cloud Alliance Manager, HPE
Dan Lesovodski, VP of AI, Data Monsters
Powering CX with AI

Ruchi Gupta, VP Product Management, Kore.ai

Kore.ai – Powering Customer Experiences with AI
Overview and Speech and RAG Enabled Applications

PRESENTED BY

Ruchi Gupta

©2023 Kore.ai. All Right Reserved

Disclaimer

The following information is being shared in order to outline some of our current product plans, but like
everything else in life, even the best laid plans get put on rest.
We are hopeful that the following can shed some light on our roadmap, but it's important to understand that
it is being shared for informational purpose only and not a binding commitment.
The development, release, and timing of any products, features or functionality remains at the sole discretion
of Kore and is subject to change.

© 2023 Kore.ai. All Right Reserved

Agenda

ü About Kore.ai
ü CX Speech AI Use Case
ü CX GenAI RAG Use Case

© 2023 Kore.ai. All Right Reserved

Partnering with Businesses to Put AI to Work
Kore.ai is Your Partner in Realizing the Value from AI Responsibly - Delivering an intelligent, secure
enterprise solution that enables a human-like conversational experience to automate interactions.

Key Highlights

400+ BILLIONS 45-75%

Global Customers Of Conversations Automation Rates COMPLIANT
GDPR, HIPAA,
PCI DSS, SOC2,
200m+ $1b+ 2m+ FEDRAMP Ready,
TLS 1.2, AES 256
Enterprise Consumers Cost Employees use Kore.ai IVAs
Served Reduction

Automating and Optimizing Global Enterprise Customer and Employee Experiences

© 2023 Kore.ai. All Right Reserved Source: Gartner (Nov-2021) Note: Fiscal year end is 3/31. Financial metrics as of FY ending March 2023. Rule of 100+ calculated as 2024 ARR growth + 2024 EBITDA margin. 54
Kore.ai Named Leader, Again!
2023 Conversational AI Gartner Magic Quadrant

• Market understanding. Kore.ai has an excellent understanding of

the major enterprise use cases and the requirements

• The company’s Customer Experience and Employee Experience

solutions is among the sharpest found in this research

• Product capabilities. An extensive feature set and capabilities that

are available to non-developers and non-data scientists in no-code
tooling

• Easier to operationalize by a variety of users across different

business units

• Innovation and excellence. Large development organization

allows Kore.ai to stay up to date with the R&D trends and
emerging demand for capabilities

• Comprehensive toolkit for NLP, as well as a complete portfolio of

connectors and integrations with channels and leading back-end
systems

© 2023 Kore.ai. All Right Reserved 55

Automation First, Experience Optimization Platform
CONVERSATIONAL AI ANSWERS CONTACT CENTER AGENT AI INSIGHTS AI

Virtual Agent - User

Virtual Assistant Builder Knowledge AI w/LLM AI Routing & Experience Flows Automatic Summarization
Conversational Insights

Voice AI Document AI Agent Desktop Real-time AI suggestions Intent Discovery

Answer from structured and Agent Customer Interaction

Multi-Engine NLU Agent Monitoring & Reporting AI Assisted Conversations
unstructured data Insights

Integrations to CRMs and Virtual Agent Performance

AI Assisted Dialogs Auto fulfillment
other apps Insights

Data AI Case Management Agent Coaching Agent Performance Insights

Contact Center Quality

Workforce Management
Management Insights

Platform Services
Life cycle management Multi-lingual Support Pre-built Integrations Generative AI Campaign Management Omni channel
(Versioning, Publish, Collaboration)

Enterprise services
Role-based Access Controls, Maker checker process, Audit Logs
Security, Compliance (SOC-2, PCI, FedRAMP, HIPAA, GDPR) Cloud and On-Premise Deployments, Scalability, Integrations, Authentication, and
Authorization
© 2023 Kore.ai. All Right Reserved 56
Traditional Contact Centers vs. AI-Native Contact Centers

Agents Handle All Inquiries | Uses AI to Handle & Direct All Inquiries

| 24/7 Availability
Agents Work in Shifts |

Traditional

AI-Native
Inconsistent Experiences Consistent Experiences
from one Agent to another | | Across All Channels
(depending on experience) (Voice, SMS, Chat, Social)

“Human-Like” Interactions
“Robotic Interaction” | | Using LLMs, NLP and
Machine-Learning (ML)

Time Consuming Manual Scans Multiple Databases

Database Queries & | | in Seconds to Provide More
Additional Hold Times Personalized Service

Traditional IVR Routing | | Intelligent Routing & Queueing

©2023 Kore.ai. All Right Reserved 57

Multi-Modal CX Use Case Speech AI Use Case

ü Kore Voice Gateway integration for voice recogination STT

ü TTS integration for playing back reponses to customers
ü Agent AI leveraging STT to be able to provide next best reponses to the agent in real-time
ü Translation usecase where multilingual queries coming in from customers are handled by
English speaking Agents via a multi-modal contact center

© 2023 Kore.ai. All Right Reserved

Agent Desktop

© 2023 Kore.ai. All Right Reserved

Speech AI Demo Video

© 2023 Kore.ai. All Right Reserved

Multi-Modal CX Speech AI Use Case
Use Case
● Multi-modal interactions where customer interacting in voice non-English channel and with agents on chat channel in
English. Vertical Healthcare

Business Problem / Pain Points

● Voice agents not available during off-peak hours. Only chat agents would be accessible outside of working hours, for both
voice calls and digital interactions
● Cost of voice interaction > chat interaction
● Risk loss of customers, revenue, etc.

Metrics
● # of interaction services (Trial phase few hundreds per month for after hours interactions)
● Saving for customer -> not needing to staff voice agents after hours
● Response processing time from Agent send to playback to customer

Challenges and Enhancements

● Avoid Cross talk – Agent experience built such that during user input phase pause agent input (send button greyed out)
● Restrictions on concurrency and transfer
● Future: Real-time experience when Agent is typing – Filler audio (other than hold music)
● Human assisted Bot experience

© 2023 Kore.ai. All Right Reserved 61

Voice Preference Selection Flexibility

STT
Automated speech recognition engines to best suit
your use cases

TTS
Voice preferences to personalize the ASR Engine and
the voice that plays for your text-to-speech conversions

Configuration
• Navigate to the Contact Centre from the left menu
within the desired App
• Select the Languages and Speech tab within the
configurations section in the menu
• Click on Voice Preferences and click the drop-down
menu under Automated Speech

© 2023 Kore.ai. All Right Reserved 62

XO Platform – Generative AI Capabilities
Playbooks and coaching
Suggest use cases
•

Suggest contextual
Generate training utterances
•

responses
•

Generate conversation flows Improve Retrieve relevant knowledge

•

10X Faster •

• Generate test cases IVA Agent information

Development Quality and
• Suggest flow improvements Productivity • Co-pilot of drafting responses

• Summarize conversations

• Handle multiple languages

• Zero-shot, Few-shot intent Deliver Generate • Discover new topics or intents

recognition, intent resolver and human like Intelligent Generate Conversational
disambiguation
•

experiences Insights insights

Dynamic paraphrasing
Generate Agent Performance
•

• Handle complex co-referencing Insights

• Retrieval augmented generation • Generate Quality

Management Insights
• Generate personalized
responses

© 2023 Kore.ai. All Right Reserved

XO Platform - LLM and GenAI Framework
Model Library

Prebuilt LLM Integrations Custom LLM Integration (BYO) XO GPT

Support for commercial (OpenAI, Azure Open, Integrate with any in-house, commercial or open Kore.ai fine-tuned language models to deliver
Anthropic etc.) & community models source models hosted internally or externally. human like conversational experiences.

Prompt Engineering

Prompt Library Prompt Fine-tuning & Testing Prompt Versioning

Prebuilt templates for all models and support to Fine-tune the prompts by testing against the Version the prompts within the platform with the
write custom prompts using dynamic values. models; Support for pre and post processors. support for

Guardrails

Anonymization of PII/Sensitive Info Enterprise Content Filtering Rules Fact Checking

Secure access to models by anonymizing the Scan the request and response text for harmful, Fact check the responses to ensure that the
PII and sensitive information in the requests. bias, toxic content, and design fallback flows. content is grounded to enterprise context.

Monitoring

Model Performance Analytics Usage Analytics

Measure model’s performance using metrics Track the response times, tokens usage, failure
like correctness, faithfulness, relevancy etc. rates across all modules and features.

© 2023 Kore.ai. All Right Reserved

CX GenAI RAG Use Case
Virtual Assistant:

Helps to provide conversational

employee assistance and provides
intelligent search assistance to find the
most suitable answers from the articles

Search Assist Platform for

Intelligent Data Indexing:

The knowledge articles/contents and

documents can be ingested into Search
Assist platform in multiple ways to
enable intelligent indexing of data.

Dynamic Answer Generation with

Generative Model:

Harnessing the power of a generative

model to dynamically produce
responses based on matched content
These conversation features can
to provide dynamic answer(s)
be voice-enabled using Kore –
Smart Assist Voice platform

© 2023 Kore.ai. All Right Reserved

GenAI RAG Demo Video

© 2023 Kore.ai. All Right Reserved

GenAI - Intelligent Knowledge Base Exploration
B Answers Lookup Dynamic answer generation Answer

Answers Lookup
Present Answer
Curated/cached
Summarized answer
●

answers look: 75%

●

with links to sources.

User input "Is a pug allowed in hold?"

confidence.
Links to document
LLM (OpenAI)
●

when summarization
●

judgement for accurate

is not feasible.
answer Search all content
● kore LLM powered
Vector search results
● Traditional index
Search

Summarize with LLM

● OpenAI
● Slow step

© 2023 Kore.ai. All Right Reserved 67

CX GenAI RAG Use Case
Use Case
● Empower Agents with intelligent search capabilities, aiding in the swift retrieval of pertinent answers, documents, and
articles based on user input. Vertical Travel and Hospitality (Airline)

Business Problem / Pain Points

● Multiple data sources (Sharepoint) and article sometimes with conflicting information
● Long hold times cause customer dissatisfaction
● Risk loss of customers, revenue, etc.

Metrics
● Reduces AHT on avg of 30sec (vs. a target of 15-20sec reduction). Range from (10sec for call and 85sec for chat)
● Saving of 100k for every sec reduction in AHT
● Faster onboarding of newer Agents
● Pilot rolled out with ~50 agents production anticipated for 1400+ agents
● Average prompt token was 8000 and completion token was 500

Challenges and Enhancements

● Cache implementation for Answers to reduce response time significantly

© 2023 Kore.ai. All Right Reserved 68

Thank You
Visit Kore.ai for more information

US Office Address India Office Address UK Office Address Mail Us Follow Us

7380, West Sand Lake Road 12th Floor, E-Park, Alpha House, 100 [email protected]
Suite 390 Plot No.1, Hitech City Road, Borough High Street
Orlando, FL 32819 Kondapur, Hyderabad-500084 London, UK SE1 1 LB

©2023 Kore.ai. All Right Reserved

Transforming User Experience with
RAG-Powered Human Voice Interfaces

Ravi Teja Konkimalla, Sr. Solution Architect, ML, Quantiphi

AGENDA

1. Introduction to Quantiphi
2. Speech AI & Retrieval Augmented Generation
1. Case Study
2. Considerations & Challenges
3. Demo
3. Summary & Call-to-Action
4. Q&A
01
Introduction to Quantiphi

© 2023 Confidential & Proprietary

Quantiphi Factsheet

GEOGRAPHIC REACH INDUSTRIES

Highlights
Boston Singapore

Chicago Mumbai
BFSI Telecom Manufacturing Public
Princeton Bangalore Sector

San Jose Trivandrum

Toronto Netherlands CPG / M&E, Healthcare & Energy

QUANTIPHI CLIENT Retail Gaming Lifesciences
London
70+
> 65% F500
clients EXPERTISE
3Yr Revenue
CAGR Generative AI Digital Twins
PARTNER DIFFERENTIATION

Conversational AI & Digital Avatars Document AI

42% 3 x Americas Service
Delivery Partner
Computer Vision Marketing Analytics

Medical Imaging Platform Modernization

2013 250+
Year Of Rev share of Data Analytics Infrastructure
Inception R&D team F500+G500

NVAIE Solution Preferred Partner Preferred Partner

Provider Embedded Edge Visualization
STRATEGIC PARTNERSHIPS

450+
3500+ 11 40% Oracle Service Google Cloud
Premier Partner
Premier Global
Consulting Partner
DLI Certifications
Accelerator
Rev share
Professionals Patents
of Top 10
filed
clients © 2024 Confidential & Proprietary
Differentiated Mentions

PRESS RELEASES

GTC 2023 Keynote Mention

ANALYST RECOGNITIONS

Leader in the 2023 IDC MarketScape: Worldwide AI

Services 2023 Vendor Assessment

Leader in 2023 ISG Analytics Services

Report for both the US & Europe

Frost & Sullivan’s North America

Competitive Strategy Leadership Award

© 2024 Confidential & Proprietary

02
Speech AI & Retrieval
Augmented Generation

© 2023 Confidential & Proprietary

Architecting Enterprise AI Success with

RAGs and LLM
Lessons from the first 12 months of building
Generative AI solutions Speaker:
Siddharth Kothwal
Available On-demand in 24 Hours Global Head - NVIDIA Practice,
Quantiphi

© 2024 Confidential & Proprietary

2.1
Case Study

© 2023 Confidential & Proprietary

Voice-Enabled Generative AI Enterprise Search System
Speech AI and RAG-enabled Enterprise Semantic Search System transforming maintenance operations for one of the leading F500 manufacturers,
empowering maintenance technicians to seamlessly navigate vast data corpora, reducing downtime and improving operational efficiency

PROBLEM STATEMENT QUANTIPHI’S SOLUTION

Suboptimal maintenance operations and low technician End-to-end Semantic Search pipeline with a Real-time
productivity due to the inefficient retrieval of relevant Conversational Agent to retrieve information and address
information from vast corpus of repair manuals technician queries, improving maintenance operations efficiency

Information High Turnaround Refined Search with Faster Information

Overload Time (TAT) Enhanced Accuracy Retrieval

Limited Scalability Reduced Unlocked Scalability Improved Operational

of Manual Process Productivity Potential Efficiency

ALL INFORMATION IN THIS PRESENTATION IS CONFIDENTIAL. PLEASE DO NOT SHARE IT WITHOUT PRIOR CONSENT FROM QUANTIPHI.
© 2024 Confidential & Proprietary
Speech AI & Retrieval Augmented Generation High-Level Process Flow
How it all comes together?

Speech NeMo LLM Retrieval Augmented

Recognition Generation by LLMs

Document
Riva ASR Intent Slot Indexing
Identification Classification
Audio Data Vector DB
Input Dialogue Sources
Query Manager
Question

Retriever
Response Answer NeMo
User ACE Agent
Web Application Retriever
Audio
Text-to-Speech
Output

Generator
Answer NeMo LLM
Riva TTS

ALL INFORMATION IN THIS PRESENTATION IS CONFIDENTIAL. PLEASE DO NOT SHARE IT WITHOUT PRIOR CONSENT FROM QUANTIPHI.
2.2
Speech AI + RAG:
Considerations & Challenges

Why is Implementing a Speech AI & RAG-powered System Complicated?

LLM
Retrieve precise information on
Text: Inspect 2359-RFID
Intentmodule forSlot
anomalies electrical connections for 2359-RFID
,focusing on electrical connections.
Identification Provide
Classification module anomalies
immediate solutions for any identified issues
Inspect 2359-RFID module for
anomalies, focusing on electrical Speech Retrieval Augmented
connections. Provide immediate Generation by LLMs
Recognition
solutions for any identified issues

Document
Indexing
ASR
Vector DB
Data
Audio Sources
Input Dialogue
Query Manager
Question

Retriever
Answer
Response
User Web Application
Audio
Text-to-Speech
Output

Generator
Answer LLM

Check for any loose connections, TTS

ensure power supply stability and data
cables are intact; address issues by
replacing damaged components and
Text Response: Check for any loose
updating firmware as needed connections, ensure power supply stability and
data cables are intact; address issues by
replacing damaged components and updating
firmware as needed
© 2024 Confidential & Proprietary
ALL INFORMATION IN THIS PRESENTATION IS CONFIDENTIAL. PLEASE DO NOT SHARE IT WITHOUT PRIOR CONSENT FROM QUANTIPHI.
Why is Implementing a Speech AI & RAG-powered System Complicated?

LLM Retrieves inaccurate information by

misunderstanding maintenance
Text: Inspect 2359-rapid
Intent module forSlot
animals,
module anomalies as wildlife animal
focusing on electrical connections.
Identification Provide
Classification
interactions.
immediate solutions for any identified issues
Inspect 2359-RFID module for
anomalies, focusing on electrical Speech Retrieval Augmented
connections. Provide immediate Generation by LLMs
Recognition
solutions for any identified issues

Document
Indexing
ASR
Vector DB
Data
Audio Sources
Input Dialogue
Query Manager
Question

Retriever
Answer
Response
User Web Application
Audio
Text-to-Speech
Output

Generator
Answer LLM

TTS
In the realm of fauna, the notion of a
rapid module is non-existent, as
sentiment entities lack inherent Text Response: In the realm of
electrical connections
fauna, the notion of a rapid module'
is non-existent, as sentient entities
lack inherent electrical connections
© 2024 Confidential & Proprietary
ALL INFORMATION IN THIS PRESENTATION IS CONFIDENTIAL. PLEASE DO NOT SHARE IT WITHOUT PRIOR CONSENT FROM QUANTIPHI.
Automatic Speech Recognition (ASR) Examples

Captures other domain-specific Captures proper nouns Captures hastily spoken

language punctuation and name entities

Determine the isentropic efficiency of How to integrate Industry 4.0 concepts Initiate the diagnostics for the HVAC
the centrifugal compressor, accounting with Siemens PLM software for system, inspecting the pneumatic
Ground Truth streamlined data analytics in smart actuators and replace the
for polytropic efficiency, adiabatic
efficiency, and impeller tip clearance manufacturing malfunctioning valve, now what’s next?

Determine the ice entropic efficiency of How to integrate industry four point zero Initiate the die Agnostic for the Hvic
ASR the centrifugal compressor, accounting concepts with Siemels PNM software for system, inspecting the new actuates,
Model for poly topic efficiency, audio attic streamlined data analytics in smart and replace the malfunctioning vow,
efficiency, and impeller tip clearance manufacturing now what’s next?

ASR inaccuracies can significantly disrupt the RAG system by retrieving improper information, thereby compromising with the accuracy and
Effects on integrity of the generated response
RAG

Deconstructing ASR : Why is it a Hard Problem?

Deconstructing RAG - ASR : Why is it a Hard Problem?
Key Considerations and its Challenges

How does ASR output affect RAG output?

1
Transcription errors can introduce inaccuracies in the
Acoustic Model retrieved information, ultimately affecting the quality of
Way the speaker pronounces the words generated responses

2
Struggles in distinguishing similar-sounding
Language Model words and varied accents can influence
Distinguishes similar sounding words semantic understanding of the input text

3
Fast-paced audio streams affect
Punctuation Model transcriptions, reducing input coherence
Grammatical construction of a sentence and output clarity

4
Difficulties with domain-specific phrases,
Named Entity Recognition
misinterprets key audio details, leading to
Key name or element from an audio misinterpreted information being passed to RAG

How to Solve the ASR Problem

Word Boosting
Customizations to help the ASR engine recognize specific words
of interest

Text Normalization
Address tokens where spoken form differs from Written Forms

Language Model Fine-Tuning

Retraining with domain specific text data to identify
Reducing Word niche phrases
Error Rate

Voice Activity Detection

Distinguishes parts of speech with an active speaker and
without.

Acoustic Model Fine Tuning

Retraining with domain specific audio data to catch different accents

Text to Speech (TTS) Examples

Produce speech for Produce speech with Produce speech with regional
alphanumerics & symbols. punctuation & notations accent variations

Initialize security protocol 5489-A to

Maintenance log entry: 2024-03-13 08:00 Look out for misconfigured network
LLM Output fortify data defenses, implementing a
- Machine #2 inspection completed settings that may affect connectivity
multi-layered firewall 🚧

Initialize security protocol five four eight Maintenance log entry colon twenty
TTS nine dash A to forty five data defenses, twenty-four zero three thirteen zero Look out for miss kuhn-fig-yerd network
Model implementing a multi-layered firewall eight zero zero dash Machine hashtag settings that may affect connectivity
sign two inspection completed

Maintenance log entry on the thirteenth

Initialize security protocol fifty four
of March tweety twenty-four at 8 am for Look out for misconfigured network
Ground Truth eighty nine A to fortify data defenses,
Machine number 2 inspection settings that may affect connectivity
implementing a multi-layered firewall
completed

Deconstructing RAG - TTS : Why is it a Hard Problem?
Key Considerations and Challenges

How does RAG output affect TTS?

Domain Adaptation
1 Synthesized speech relevance in
specialized domains
TTS fails to capture domain-specific vocabulary,
terminology/jargons, or linguistic variations

Coherence and Naturalness

2
Disparity between written and spoken content
Cohesiveness in generated responses for proper
poses challenges for the TTS model in producing
TTS outputs speech that is both natural and coherent

Latency and Real-Time Processing

3
Computationally intensive or time-consuming
LLM response generation can result in delays
Timely and seamless conversion of TTS from RAG Output
in generating the TTS output

4
Handling Special Characters Lack of direct phonetic representations or standard
Ensuring accurate encoding and representation
pronunciation rules for alphanumerics and emojis/
of alphanumerics, emojis, and symbols symbols pose challenges to accurately render speech

How to Solve the TTS Problem

Prompt Engineering
Carefully designing and formulating prompts can significantly
reduce the need for pre-processing before TTS.

Arpabet and IPA Tuning

Manual Updation of Arpabaet and IPA Files for domain
specific words significantly boost the performance.

Reducing Speech Prosody Modeling

Error Rate Incorporating prosody (intonation, rhythm, and stress patterns) into
the TTS model can enhance the naturalness of generated speech.

Model Fine-tuning
Fine-tuning the language model on specific TTS-related data
can help adapt the model to the nuances of the TTS task

2.3
Demo

Summary - Building and Deploying Speech AI & RAG Applications

Fine-Tuning ASR Model for Adapting Document

Domain-Specific Chunking for Different Types Fine-Tuning the Retriever and
Understanding and Modalities Re-ranker Model

Optimizing Document Augmenting Data and Fine- Establishing Custom Platform

Retrieval Accuracy with Tuning TTS Model on Domain Metrics for Scalability, Low
Sparse & Dense Methods Specific Data Latency, and High Throughput

3 x Americas Service
Delivery Partner of the Year

Discover Innovation Firsthand! Join Us At:

Booth #1513 & Booth #G129
AI CoE Pavillion Generative AI Pavillion

Ravi Teja Konkimalla Akshaya Save

Senior Solution Architect Manager, Strategic Alliances
+1 267 206 4642 +91 98692 41489

Scan to accelerate your AI journey

Enhancing CX with GenAI-Based Virtual Assistant

Ka Wai Leung, Hybrid Cloud Alliance Manager, HPE

Dan Lesovodski, VP of AI, Data Monsters
Enhancing Customer Experience
with GenAI-Based Virtual Assistant
Ka Wai Leung, HPE
Dan Lesovodski, Data Monsters
HPE Customer Innovation Centers and Demo Portal

As-a-service worldwide platform

100+ live | 400+ recorded demos on

demand

London CIC

Silicon Valley CIC New York CIC 34,897 live + recorded sessions in FY23
Geneva CIC

Houston CIC, HQ HPE Digital Life Garage,

Dubai
HPE GreenLake demos

Singapore CIC

HPE and Partner content

Workload-based use cases

https://www.hpe.com/us/en/about/virtual-customer-innovation-center.html

97
HPE Use Cases and Needs

• Showcase HPE GenAI capabilities for demo center visitors.

• Provide “cool” interactive demos and not just PPTs and
chatbots.
• Speech and video are more engaging than typing long
questions to chatbots.
• Take customers through all demos virtually and answer
questions.
• Increase HPE staff efficiency and productivity.
• Provide detailed, accurate, and relevant answers for HPE.

98
Virtual Assistant (Avatar) Solution

The Team

Data management, infrastructure,

and platform architecture

NVIDIA® Avatar Cloud Engine (ACE),

NVIDIA Riva, and NVIDIA GPUs

The goal of this demo is to create a virtual avatar for HPE demo
centers that can interact with customers and provide relevant
information about HPE solutions and corporate information.
Software architecture and system integrations

99
Data Monsters Elite AI Professional Service

of experience in the data science

and engineering market

comprising 70+ engineers

and 11 Ph.D. holders

including those for

Fortune 500 companies

101
Broad Use Cases
Customer service Sales Customer experience PR Trade shows
Problem Problem Problem Problem Problem
• Limited employee • Inactive sales reps • Difficult to design and scale • Difficulty attracting • Difficulty attracting
knowledge • Failure to follow processes • Poor customer experience attention attention
• Time-consuming • Limited working hours • Low upsell/cross-sell • Weak company positioning • Sales reps get tired/
• Limited working hours • Slow access to • Clients choosing walk away.
customer data competitors • Limited knowledge
of product
Solution Solution Solution Solution Solution
• Avatars with corporate • 24/7, always-proactive • Avatars with a centralized • New interactive technology • Avatars attract attention.
knowledge avatars control panel attracts attention. • Access to corporate
• Single point of contact • Following processes • Developing and scaling • Positions a company knowledge
• 24/7 availability • Immediate access to customer experience as innovative • 24/7 info at your booth
customer and corporate
data

Target industries
• Financial services: Including banks, insurance, and brokerage firms
• Telecommunications
• Retail
• Hospitality and food service
• Transportation: Airports and train stations
• Real estate
• Business centers
• Healthcare: Hospitals and clinics
• Marketing agencies

102
Key Software Modules

• HPE MLDM used for the private data

Training Loop logging and data audit
HPE ML Data Management HPE ML Development
MLDM (Pachyderm)—data Environment MLDE • NVIDIA NeMo™ can be used for AI model
sets logging (Determined AI)—distributed
training.
learning—(future phase)
NVIDIA
NeMo • NVIDIA Riva ASR/TTS used for speech
processing

• NVIDIA ACE used for digital avatar

HPE Private creation
docs

LangChain • NVIDIA Omniverse™ Audio2Text used for

NVIDIA
NVIDIA NVIDIA LLM + NVIDIA avatar visualization process (picture
Omniverse
Riva ASR ACE Riva TTS rendering and lip-synching)
Audio2Text
orchestration

• LangChain used for LLM orchestration

• Qdrant used for vector database

HPE Demo Center guest

103
Solution Architecture Data extraction layer

Data processing Embeddings

HPE MLDM

Private docs Qdrant

Mobile remote repository vector DB
control
CV
personalization
Dialog LLM module
management
Botmaker
NVIDIA
Riva ASR Yes

Guardrails + role-based access

RAG LLM
NVIDIA Event content Is it a
NVIDIA ACE
Omniverse update scenario?
LLM tests and
NVIDIA monitoring
Riva STT NVIDIA Monitoring
No LangChain Triton™ W&B
LLAMA2
Custom voice
Tests
PromptBench
Enterprise
lexicons tuning

Automated deployment scripts

HPE servers, storage, networking

104
On-Premises Deployment: 5x Faster

Hardware setup SaaS-based commercial Public cloud HPE on-premises

LLM version T4 and V100 GPUs A10 and A100 servers

Response time 10-second delay 5-second delay 1-second delay

NVIDIA ACE &

LLM server
NVIDIA Riva server
1x HPE ProLiant 1x HPE ProLiant
DL380 server with DL380 server
3x NVIDIA A10 GPUs 2x NVIDIA A100 GPUs

Lower deployment cost vs public cloud with dedicated CPU+GPU instances

105
“One-Stop Shop” for NVIDIA AI Workloads with HPE

Build Optimize Integrate

Operationalizes existing Integrates analytics with

deployments/platforms existing applications Architect solution
Curates design and
with desired partners
implementation
for vertical use cases
AI models review and
identify improvements
Develops and deploys Develop, train, needed for production Integrate solutions Apply reference
your AI solution and operationalize readiness. with your partners. architectures and AI
end to end the AI models for frameworks to provide
customer use cases. solution integrity.

HPE
(Services, software, infrastructure, HPE GreenLake cloud)

106
Thank you
Ka Wai Leung: [email protected]
Dan Lesovodski: [email protected]

THANK YOU!

Understanding Vector Embeddings
No ratings yet
Understanding Vector Embeddings
14 pages
Genai Architectures On Databricks
No ratings yet
Genai Architectures On Databricks
28 pages
Gluon Tutorials: Deep Learning - The Straight Dope
No ratings yet
Gluon Tutorials: Deep Learning - The Straight Dope
403 pages
S73042 Dynamo Tutorial GTC 2025
No ratings yet
S73042 Dynamo Tutorial GTC 2025
79 pages
Sparklabsaiprimer2025 250731210106 E29a3ed2
No ratings yet
Sparklabsaiprimer2025 250731210106 E29a3ed2
123 pages
Generative AI: Creative Chaos Unleashed
No ratings yet
Generative AI: Creative Chaos Unleashed
1 page
Building Neo4j Powered
No ratings yet
Building Neo4j Powered
312 pages
16 Must-Read Playbooks For AI Leaders
No ratings yet
16 Must-Read Playbooks For AI Leaders
12 pages
Genarative AI - Dev Doc-1
No ratings yet
Genarative AI - Dev Doc-1
48 pages
New Text Document
No ratings yet
New Text Document
41 pages
SNA Labs - Master AI Agents (No Code)
No ratings yet
SNA Labs - Master AI Agents (No Code)
6 pages
Deeskhith Resume
No ratings yet
Deeskhith Resume
2 pages
Ai Notes
No ratings yet
Ai Notes
2 pages
MM-LLMs Recent Advances in MultiModal Large Language Models
No ratings yet
MM-LLMs Recent Advances in MultiModal Large Language Models
22 pages
Lang Graph
100% (1)
Lang Graph
113 pages
RAGvs Agentic RAG
No ratings yet
RAGvs Agentic RAG
52 pages
Lecture 12 - Agentic AI
No ratings yet
Lecture 12 - Agentic AI
22 pages
Hugging Face Transformers
100% (1)
Hugging Face Transformers
8 pages
Agentic AI System Documentation
No ratings yet
Agentic AI System Documentation
3 pages
Kubernetes
No ratings yet
Kubernetes
42 pages
Binder
No ratings yet
Binder
97 pages
Agentic AI System Architecture Presentation
0% (1)
Agentic AI System Architecture Presentation
8 pages
Gen Ai
No ratings yet
Gen Ai
3 pages
Owasp Ai
No ratings yet
Owasp Ai
144 pages
Advanced LangChain AI Assistant Framework For Comp
No ratings yet
Advanced LangChain AI Assistant Framework For Comp
7 pages
5-Day Gen AI Intensive Course 2024 November 11-15 (Full)
No ratings yet
5-Day Gen AI Intensive Course 2024 November 11-15 (Full)
347 pages
Huggingface Basics
No ratings yet
Huggingface Basics
28 pages
Unit 4 Generative AI
No ratings yet
Unit 4 Generative AI
5 pages
Agentic AI
No ratings yet
Agentic AI
10 pages
Kubernetes For Generative AI Solutions - Sukirti GuptaSukirti Gupta
100% (1)
Kubernetes For Generative AI Solutions - Sukirti GuptaSukirti Gupta
334 pages
AI Engineer Resume
No ratings yet
AI Engineer Resume
2 pages
LLM Guide for Interns
No ratings yet
LLM Guide for Interns
4 pages
Generative AI Use Cases
No ratings yet
Generative AI Use Cases
3 pages
Intel GenAI Hackathon
No ratings yet
Intel GenAI Hackathon
10 pages
AI Engineer Resume: Generative AI & NLP
No ratings yet
AI Engineer Resume: Generative AI & NLP
3 pages
(Capgemini) Customer Service Transformation
No ratings yet
(Capgemini) Customer Service Transformation
112 pages
Building Living Software Systems With Generative & Agentic AI
No ratings yet
Building Living Software Systems With Generative & Agentic AI
6 pages
Panaversity Certified Agentic and Robotic AI Engineer
No ratings yet
Panaversity Certified Agentic and Robotic AI Engineer
50 pages
My CV
No ratings yet
My CV
2 pages
Context Engineering For AI Agents
No ratings yet
Context Engineering For AI Agents
13 pages
Cody Mckeand Resume-Lang
No ratings yet
Cody Mckeand Resume-Lang
5 pages
Prompt Engineering Seminar Presentation
No ratings yet
Prompt Engineering Seminar Presentation
21 pages
Quantecon Python Programming
No ratings yet
Quantecon Python Programming
388 pages
5 Techiques To FineTune LLMs
No ratings yet
5 Techiques To FineTune LLMs
7 pages
LLMs in Software Engineering
No ratings yet
LLMs in Software Engineering
75 pages
LLM Chaining & Indexing Workshop
No ratings yet
LLM Chaining & Indexing Workshop
19 pages
Cloud Native Artifical Intaligence (Text Book)
No ratings yet
Cloud Native Artifical Intaligence (Text Book)
31 pages
GenAI For Managers Brochure
No ratings yet
GenAI For Managers Brochure
4 pages
Report Final Year
No ratings yet
Report Final Year
72 pages
What Is Stream Processing
No ratings yet
What Is Stream Processing
3 pages
XAI and GNN Research Overview
No ratings yet
XAI and GNN Research Overview
4 pages
LLaMa Model Hallucination Analysis
No ratings yet
LLaMa Model Hallucination Analysis
3 pages
RAG Systems Evaluation Guide
No ratings yet
RAG Systems Evaluation Guide
8 pages
GenAI Roadmap
No ratings yet
GenAI Roadmap
8 pages
Context Engineering vs. Prompt Engineering, A Comprehensive Guide
100% (1)
Context Engineering vs. Prompt Engineering, A Comprehensive Guide
15 pages
Swarm MultiAgents Financial Analyst Framework
No ratings yet
Swarm MultiAgents Financial Analyst Framework
9 pages
UasF8LbRcGSLtQZG83HN - Natural Language Processing in Python
No ratings yet
UasF8LbRcGSLtQZG83HN - Natural Language Processing in Python
219 pages
LLM Roadmap
No ratings yet
LLM Roadmap
23 pages
Blogs Nvidia Com Blog What-Is-Retrieval-Augmented-Generation
No ratings yet
Blogs Nvidia Com Blog What-Is-Retrieval-Augmented-Generation
12 pages
Critical Business Update - Positioning & Portfolio Simplification Deck
No ratings yet
Critical Business Update - Positioning & Portfolio Simplification Deck
38 pages
GTFO Kit - Fungineers
No ratings yet
GTFO Kit - Fungineers
1 page
Latest Earthquakes
No ratings yet
Latest Earthquakes
1 page
MAR-14377 Bosch Mainline KitchenPackage July-Dec2024 BKPIN24H2 1 v3
No ratings yet
MAR-14377 Bosch Mainline KitchenPackage July-Dec2024 BKPIN24H2 1 v3
3 pages
HPE - A00117319en - Us - HPE SimpliVity OmniStack Interoperability Guide 5.2.0
No ratings yet
HPE - A00117319en - Us - HPE SimpliVity OmniStack Interoperability Guide 5.2.0
31 pages
Air Canada AC 779 Flight Status Update
No ratings yet
Air Canada AC 779 Flight Status Update
1 page
FF30X Workout+Overview 2016
No ratings yet
FF30X Workout+Overview 2016
28 pages
QuietLab Plus - User Guide V5
100% (2)
QuietLab Plus - User Guide V5
27 pages
VMware Broadcom Offerings Skus
No ratings yet
VMware Broadcom Offerings Skus
7 pages
Transmission Fluid (HCF-2) Replacement
No ratings yet
Transmission Fluid (HCF-2) Replacement
3 pages
Nx-8170N-G8 Specification: Model Nutanix: Per Node (Per Block) NX-8170N-G8 (Configure To Order)
No ratings yet
Nx-8170N-G8 Specification: Model Nutanix: Per Node (Per Block) NX-8170N-G8 (Configure To Order)
3 pages
SR5010 Na en PDF
No ratings yet
SR5010 Na en PDF
295 pages
Nutanix Services Training Guide
No ratings yet
Nutanix Services Training Guide
163 pages
Vmware Virtualization Health Check Service
No ratings yet
Vmware Virtualization Health Check Service
13 pages
VMCE - 9.5 - Unofficial Study Guide PDF
No ratings yet
VMCE - 9.5 - Unofficial Study Guide PDF
56 pages
vSAN POC Guide
No ratings yet
vSAN POC Guide
160 pages
ATP Student Pogil
0% (1)
ATP Student Pogil
5 pages
VMware NSX Technical White Paper 20170202 v1.0
No ratings yet
VMware NSX Technical White Paper 20170202 v1.0
11 pages
Newspaper
No ratings yet
Newspaper
3 pages
Improving GPU Performance Via Large Warps and Two-Level Warp Scheduling
No ratings yet
Improving GPU Performance Via Large Warps and Two-Level Warp Scheduling
10 pages
Clevo W370et Service Manual PDF
No ratings yet
Clevo W370et Service Manual PDF
109 pages
Xps 8960 Setup and Specifications en Us
No ratings yet
Xps 8960 Setup and Specifications en Us
29 pages
Lightgbm Abril2019 PDF
No ratings yet
Lightgbm Abril2019 PDF
157 pages
Deep Learning For Consumer Devices and Services
No ratings yet
Deep Learning For Consumer Devices and Services
9 pages
2006 April
No ratings yet
2006 April
141 pages
D3D March14
No ratings yet
D3D March14
100 pages
SOLIDWORKS 2023 Hardware FAQs and Recommendations
No ratings yet
SOLIDWORKS 2023 Hardware FAQs and Recommendations
6 pages
X-Plane Installer Log Sep 2023
No ratings yet
X-Plane Installer Log Sep 2023
4 pages
Nvidia RTX A5000: Perfectly Balanced. Blazing Performance
No ratings yet
Nvidia RTX A5000: Perfectly Balanced. Blazing Performance
1 page
DX Diag
No ratings yet
DX Diag
30 pages
HPE Reference Architecture For Digital Workspace On HPE Synergy Composable Infrastructure
No ratings yet
HPE Reference Architecture For Digital Workspace On HPE Synergy Composable Infrastructure
57 pages
Marvelous Designer
100% (2)
Marvelous Designer
145 pages
Altium Designer Winter 09 Technical Review
No ratings yet
Altium Designer Winter 09 Technical Review
49 pages
Using Deep Lab Cut For 3 D Markerless Pose Estimation Across Species and Behaviors
No ratings yet
Using Deep Lab Cut For 3 D Markerless Pose Estimation Across Species and Behaviors
23 pages
Nvidia insideBigData Guide To Deep Learning and AI PDF
No ratings yet
Nvidia insideBigData Guide To Deep Learning and AI PDF
9 pages
Cis Laptop Requirements
No ratings yet
Cis Laptop Requirements
5 pages
Computer Organization and Design MIPS Edition 5th Edition by David A Patterson
No ratings yet
Computer Organization and Design MIPS Edition 5th Edition by David A Patterson
325 pages
System Compatibility Report
No ratings yet
System Compatibility Report
3 pages
Compaq Evo n1020
No ratings yet
Compaq Evo n1020
13 pages
Milkbrain's Basic FPS Boost Guide
50% (2)
Milkbrain's Basic FPS Boost Guide
8 pages
TB 04631 001 - v01
No ratings yet
TB 04631 001 - v01
25 pages
On Building An Accurate Stereo Matching System On Graphics Hardware
No ratings yet
On Building An Accurate Stereo Matching System On Graphics Hardware
8 pages
ECCOMAS Oslo Article
No ratings yet
ECCOMAS Oslo Article
12 pages
GAN Torrentz ITU Presentation
No ratings yet
GAN Torrentz ITU Presentation
18 pages
Neo, An AI Desktop Assistant! - 10 Steps (With Pictures) - Instructables
No ratings yet
Neo, An AI Desktop Assistant! - 10 Steps (With Pictures) - Instructables
9 pages
Hardware Engineering - 1 Year Internship - NVIDIA
No ratings yet
Hardware Engineering - 1 Year Internship - NVIDIA
2 pages
400Hz - Cavotec SA
No ratings yet
400Hz - Cavotec SA
6 pages
Computer Specifications and Choosing A Computer
No ratings yet
Computer Specifications and Choosing A Computer
36 pages