GTC'24 Special Event - Build A RAG-powered Application With A Human Voice Interface (SE62869) - Deck - FINAL - 1714408879420001sjpp
GTC'24 Special Event - Build A RAG-powered Application With A Human Voice Interface (SE62869) - Deck - FINAL - 1714408879420001sjpp
10:30 AM – 11:30 AM
Real-World Implementations:
10:00 AM – 10:30 AM
Powering CX with AI (Kore.ai)
10:30 AM – 11:00 AM
Transforming UX with RAG-Powered Human Voice
Interfaces (Quantiphi)
11:00 AM – 11:30 AM
Enhancing CX with GenAI-Based Virtual Assistant
(HPE & Data Monsters)
11:30 AM – 12:00 PM
Q&A Panel
Hands-On:
Building Voice & RAG Powered Applications
Unknown
Data Sour
ces
ia l
e c g y
Sp olo
pt m in
o m Ter
Pr ering
i n e Outdated
En g n s
Knowledg tio
e in a
lu c
a l
Difficult H
to tune
We Can Rebuild
We have the technology
The Depths of the Tech Debt…
w l
S p ra
API
One Framework to Bring Them All …
… and in the darkness bind them.
🤔
Give the model a clear and concise
answer in the question?
😕
Can’t an LLM Do It? Well… Yeah!
Multi-query prompting
Can’t an LLM do it? Well… Yeah!
Hypothetical Document Embedding (HyDE)
Can’t an LLM Do It?
Engineering Design Spectrum
Creativity
Philosopher
Answers must be correct.
Brainstorming
Answers will require some
Wrong answers are welcomed.
assumptions to be made and
Large logical leaps required
tested. Reasonable minds may
disagree.
High risk tolerance.
Moderate risk tolerance.
Consistency
How to Get Started with RAG
www.nvidia.com/generative-ai-chatbots
Talks:
• Beyond RAG Basics: Building Agents, Co-Pilots, Assistants, and More!, NVIDIA
• Practical Strategies for Building Enterprise Applications Powered by LLMs, NVIDIA
• Financial Knowledge Graphs for Retrieval Augmented Generation, BlackRock
• New Offerings from NVIDIA to Overcome the Complexities of Generative AI, NVIDIA
• A Guide to Building Safe Generative AI Copilots that Improve Productivity and Protect
Company Data, NVIDIA
• The Future of AI Chatbots: How Retrieval-Augmented Generation is Changing the Game,
Blackrock, NVIDIA, ServiceNow
• Accelerating Enterprise: Tools and Techniques for Next Generation AI Deployment, NVIDIA
• Perform High-Efficiency Search, Improve Data Freshness, and Increase Recall with GPU-
Accelerated Vector Search and RAG Workflows, Zilliz, NVIDIA
• Re-Imagine Service Assurance Chatbots With LLMs and RAG, Tata Consultancy Services
(TCS)
Text Query Qu
er
y Ve
ct
or
Text Embedding
Model
Knowledge Base
(Images and Text)
ar
il ks
i m n
S hu
Answer C
Large Language
Model
Why is Multimodality Hard?
Multimodal RAG - Approaches for Multimodal Retrieval
Images
Transform all modalities into a single Vector
Space
Multimodal VectorStore
Text Documents Embedding Model
Images
Text Documents
Transform
Grounding all modalities into one Images
No Image
Description
Extract Clean
LLM
Text Splitter & Meta Chunks
Webpages Texts
Data Augmentation
Extract Text (includes Tables), Chart/Plot
Structured
Figures from PDFs Custom Logic Summary
JSON (as a chunk)
Embedding Model
Vector Store
Multimodal RAG - Preprocessing Workflow
No Image
Description
Extract Clean
LLM
Text Splitter & Meta Chunks
Webpages Texts
Data Augmentation
Extract Text (includes Tables), Chart/Plot
Structured
Figures from PDFs Custom Logic Summary
JSON (as a chunk)
Embedding Model
Vector Store
Multimodal RAG - Preprocessing Workflow
No Image
Description
This image is a bar chart comparing the relative
performance of different NVIDIA GPU
various machine learning Extract
Is thisacross Yes
accelerators
Generate Image Image models and tasks. The chart Image
has six categories
Description Chart/Plot? Description
Chart
Description represented by the machine learning models or
tasks, which are RNN-T, 3D U-Net, Mask R- Metadata
CNN, ResNet-50 v1.5, RetinaNet, and BERT.
Extract Clean
LLM
Text Splitter & Meta Chunks
Webpages Texts
Data Augmentation
Extract Text (includes Tables), Chart/Plot
Structured
Figures from PDFs Custom Logic Summary
JSON (as a chunk)
Embedding Model
Vector Store
Multimodal RAG - Preprocessing Workflow
Embedding Model
Vector Store
Multimodal RAG - Preprocessing Workflow
No Image
Description
Extract Clean
LLM
Text Splitter & Meta Chunks
Webpages Texts
Data Augmentation
Extract Text (includes Tables), Chart/Plot
Structured
Figures from PDFs Custom Logic Summary
JSON (as a chunk)
Embedding Model
Vector Store
I
Vector Store
Multimodal RAG - Inference Workflow
Image
VQA - VQA
MLLM Answer
Plain Text
Multimodal RAG - Inference Workflow
Image
VQA
This is a comparison of the relative performance of various accelerators from -
NVIDIA VQA
(A100 v2.1, H100 v2.1 preview, H100 v3.0 available) using different models (RNN-T, 3D
MLLM
U-Net, Mask R-CNN, ResNet-50 v1.5, RetinaNet, BERT). Higher values indicate better Answer
performance. The A100 has a baseline score of 1.0 for all models. The H100 v2.1
preview and v3.0 available perform 1.7-2.2x and 1.8-3.08x better than the A100,
respectively, depending on the model used.
'''Breaking MLPerf Training Records with NVIDIA H100 GPUs \n\n3D U-Net \n\n
NVIDIA submitted results on 432 NVIDIA H100 Tensor Core GPUs, achieving a new
Plain
record for the benchmark of 0.82 minutes (49 seconds) to train. Per-accelerator
performance on H100 also improved by 8.2% compared to the prior round. \n\nTo
Text
achieve excellent performance at scale, a faster GroupBatchNorm kernel was one
key optimization. \n\nIn our largest scale 3D U-Net submission, the instance
normalization operation in the neural network needs to perform a reduction of
the tensor mean and variance across four GPUs. By using a faster GroupBatchNorm
kernel to implement instance normalization, we delivered a 1.5% performance
increase. \n\n''’
Multimodal RAG - Inference Workflow
The Image Depicts A Network With Three Servers, Each With A Single Node. Each Server Has A Single Disk
Drive, A Single Network Card, And A Single CPU. The Servers Are Connected To A Central Server, Which Is
Connected To The Network Via A Cable. The Cable Connects The Central Server To The Servers, And The
Cable Connects Each Server To Its Respective Network Card. The Network Card Has A Storage Capacity Of
1 GB, And Each Server Has An Additional 1 GB Of Storage. The Central Server Has Two Network Interfaces,
One For The Server And The Other For The Network Card And Storage.
Plain Text
Multimodal RAG - Inference Workflow
Image
VQA - VQA
MLLM Answer
Image
VQA - VQA
MLLM Answer
Plain Text
Breaking MLPerf Training Records with NVIDIA H100 GPUs \n\n3D U-Net
\n\nNVIDIA submitted results on 432 NVIDIA H100 Tensor Core GPUs, achieving
a new record for the benchmark of 0.82 minutes (49 seconds) to train. Per-
accelerator performance on H100 also improved by 8.2% compared to the prior
round. \n\nTo achieve excellent performance at scale, a faster GroupBatchNorm
kernel was one key optimization. \n\nIn our largest scale 3D U-Net submission,
the instance normalization operation in the neural network needs to perform a
reduction of the tensor mean and variance across four GPUs. By using a faster
GroupBatchNorm kernel to implement instance normalization, we delivered a
1.5% performance increase.
Multimodal RAG - Inference Workflow
Image
VQA - VQA
MLLM Answer
Retriever
VectorStore
Slack User Retrieved Text from Final
LLM
Interface Query Chunks Chart/Plot Response
Embedding
Model
Plain Text
Multimodal RAG - Demo
The Basics of Speech AI
Text
ASR Model "buenos días" Translation "good morning" TTS Model
Model
How Do Computers Perceive Speech?
Audio Waveform Signal (Mel) Spectrogram
Pure Tone
Singing a note
Speaking
Speech Neural
Representation Network
Vectors
Emotion
The food here isn’t
that bad.
Language
Speech Rate
The Inefficiency of Adding New Speakers in TTS
Speech signal of
Alice or Bob
Transcription "cat"
Speech signal
Transcription "cat"
TTS model
Reference
Speech signal
P-Flow: TTS with 3 Seconds of Reference Audio
P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting, Sungwon Kim et al. Neurips 2023
A3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing, He Bai et al. ICML 2022
What is Neural Machine Translation?
Historically: Text
“NMT” = text-to-text translation "buenos días" Translation "good morning"
model
Spanish
speech
Automatic
Recently: development of end- Speech
Translation "good morning"
to-end speech-to-text (AST) model
translation models
Experience Riva APIs Enroll to Protype, Test & Deploy Contact NVIDIA AI Enterprise
See Riva in Action
Through NVIDIA API Catalog* Your Own App on NVIDIA LaunchPad
Webpage: nvidia.com/en-us/ai-data-science/products/riva/
Documentation: docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html
GitHub: github.com/nvidia-riva
Contact Us: nvidia.com/en-us/data-center/products/ai-enterprise/contact-sales/
Speech & Translation AI Talks, Special Event & More at GTC 2024
Join to learn the latest speech & translation AI achievements & how to use them with GenAI-based conversational applications
PRESENTED BY
Ruchi Gupta
The following information is being shared in order to outline some of our current product plans, but like
everything else in life, even the best laid plans get put on rest.
We are hopeful that the following can shed some light on our roadmap, but it's important to understand that
it is being shared for informational purpose only and not a binding commitment.
The development, release, and timing of any products, features or functionality remains at the sole discretion
of Kore and is subject to change.
ü About Kore.ai
ü CX Speech AI Use Case
ü CX GenAI RAG Use Case
Key Highlights
© 2023 Kore.ai. All Right Reserved Source: Gartner (Nov-2021) Note: Fiscal year end is 3/31. Financial metrics as of FY ending March 2023. Rule of 100+ calculated as 2024 ARR growth + 2024 EBITDA margin. 54
Kore.ai Named Leader, Again!
2023 Conversational AI Gartner Magic Quadrant
Platform Services
Life cycle management Multi-lingual Support Pre-built Integrations Generative AI Campaign Management Omni channel
(Versioning, Publish, Collaboration)
Enterprise services
Role-based Access Controls, Maker checker process, Audit Logs
Security, Compliance (SOC-2, PCI, FedRAMP, HIPAA, GDPR) Cloud and On-Premise Deployments, Scalability, Integrations, Authentication, and
Authorization
© 2023 Kore.ai. All Right Reserved 56
Traditional Contact Centers vs. AI-Native Contact Centers
Agents Handle All Inquiries | Uses AI to Handle & Direct All Inquiries
| 24/7 Availability
Agents Work in Shifts |
Traditional
AI-Native
Inconsistent Experiences Consistent Experiences
from one Agent to another | | Across All Channels
(depending on experience) (Voice, SMS, Chat, Social)
“Human-Like” Interactions
“Robotic Interaction” | | Using LLMs, NLP and
Machine-Learning (ML)
Metrics
● # of interaction services (Trial phase few hundreds per month for after hours interactions)
● Saving for customer -> not needing to staff voice agents after hours
● Response processing time from Agent send to playback to customer
STT
Automated speech recognition engines to best suit
your use cases
TTS
Voice preferences to personalize the ASR Engine and
the voice that plays for your text-to-speech conversions
Configuration
• Navigate to the Contact Centre from the left menu
within the desired App
• Select the Languages and Speech tab within the
configurations section in the menu
• Click on Voice Preferences and click the drop-down
menu under Automated Speech
Suggest contextual
Generate training utterances
•
responses
•
10X Faster •
• Summarize conversations
Prompt Engineering
Guardrails
Monitoring
Answers Lookup
Present Answer
Curated/cached
Summarized answer
●
confidence.
Links to document
LLM (OpenAI)
●
when summarization
●
Metrics
● Reduces AHT on avg of 30sec (vs. a target of 15-20sec reduction). Range from (10sec for call and 85sec for chat)
● Saving of 100k for every sec reduction in AHT
● Faster onboarding of newer Agents
● Pilot rolled out with ~50 agents production anticipated for 1400+ agents
● Average prompt token was 8000 and completion token was 500
1. Introduction to Quantiphi
2. Speech AI & Retrieval Augmented Generation
1. Case Study
2. Considerations & Challenges
3. Demo
3. Summary & Call-to-Action
4. Q&A
01
Introduction to Quantiphi
Chicago Mumbai
BFSI Telecom Manufacturing Public
Princeton Bangalore Sector
450+
3500+ 11 40% Oracle Service Google Cloud
Premier Partner
Premier Global
Consulting Partner
DLI Certifications
Accelerator
Rev share
Professionals Patents
of Top 10
filed
clients © 2024 Confidential & Proprietary
Differentiated Mentions
PRESS RELEASES
ANALYST RECOGNITIONS
Suboptimal maintenance operations and low technician End-to-end Semantic Search pipeline with a Real-time
productivity due to the inefficient retrieval of relevant Conversational Agent to retrieve information and address
information from vast corpus of repair manuals technician queries, improving maintenance operations efficiency
ALL INFORMATION IN THIS PRESENTATION IS CONFIDENTIAL. PLEASE DO NOT SHARE IT WITHOUT PRIOR CONSENT FROM QUANTIPHI.
© 2024 Confidential & Proprietary
Speech AI & Retrieval Augmented Generation High-Level Process Flow
How it all comes together?
Document
Riva ASR Intent Slot Indexing
Identification Classification
Audio Data Vector DB
Input Dialogue Sources
Query Manager
Question
Retriever
Response Answer NeMo
User ACE Agent
Web Application Retriever
Audio
Text-to-Speech
Output
Generator
Answer NeMo LLM
Riva TTS
LLM
Retrieve precise information on
Text: Inspect 2359-RFID
Intentmodule forSlot
anomalies electrical connections for 2359-RFID
,focusing on electrical connections.
Identification Provide
Classification module anomalies
immediate solutions for any identified issues
Inspect 2359-RFID module for
anomalies, focusing on electrical Speech Retrieval Augmented
connections. Provide immediate Generation by LLMs
Recognition
solutions for any identified issues
Document
Indexing
ASR
Vector DB
Data
Audio Sources
Input Dialogue
Query Manager
Question
Retriever
Answer
Response
User Web Application
Audio
Text-to-Speech
Output
Generator
Answer LLM
Document
Indexing
ASR
Vector DB
Data
Audio Sources
Input Dialogue
Query Manager
Question
Retriever
Answer
Response
User Web Application
Audio
Text-to-Speech
Output
Generator
Answer LLM
TTS
In the realm of fauna, the notion of a
rapid module is non-existent, as
sentiment entities lack inherent Text Response: In the realm of
electrical connections
fauna, the notion of a rapid module'
is non-existent, as sentient entities
lack inherent electrical connections
© 2024 Confidential & Proprietary
ALL INFORMATION IN THIS PRESENTATION IS CONFIDENTIAL. PLEASE DO NOT SHARE IT WITHOUT PRIOR CONSENT FROM QUANTIPHI.
Automatic Speech Recognition (ASR) Examples
Determine the isentropic efficiency of How to integrate Industry 4.0 concepts Initiate the diagnostics for the HVAC
the centrifugal compressor, accounting with Siemens PLM software for system, inspecting the pneumatic
Ground Truth streamlined data analytics in smart actuators and replace the
for polytropic efficiency, adiabatic
efficiency, and impeller tip clearance manufacturing malfunctioning valve, now what’s next?
Determine the ice entropic efficiency of How to integrate industry four point zero Initiate the die Agnostic for the Hvic
ASR the centrifugal compressor, accounting concepts with Siemels PNM software for system, inspecting the new actuates,
Model for poly topic efficiency, audio attic streamlined data analytics in smart and replace the malfunctioning vow,
efficiency, and impeller tip clearance manufacturing now what’s next?
ASR inaccuracies can significantly disrupt the RAG system by retrieving improper information, thereby compromising with the accuracy and
Effects on integrity of the generated response
RAG
1
Transcription errors can introduce inaccuracies in the
Acoustic Model retrieved information, ultimately affecting the quality of
Way the speaker pronounces the words generated responses
2
Struggles in distinguishing similar-sounding
Language Model words and varied accents can influence
Distinguishes similar sounding words semantic understanding of the input text
3
Fast-paced audio streams affect
Punctuation Model transcriptions, reducing input coherence
Grammatical construction of a sentence and output clarity
4
Difficulties with domain-specific phrases,
Named Entity Recognition
misinterprets key audio details, leading to
Key name or element from an audio misinterpreted information being passed to RAG
Word Boosting
Customizations to help the ASR engine recognize specific words
of interest
Text Normalization
Address tokens where spoken form differs from Written Forms
Produce speech for Produce speech with Produce speech with regional
alphanumerics & symbols. punctuation & notations accent variations
Initialize security protocol five four eight Maintenance log entry colon twenty
TTS nine dash A to forty five data defenses, twenty-four zero three thirteen zero Look out for miss kuhn-fig-yerd network
Model implementing a multi-layered firewall eight zero zero dash Machine hashtag settings that may affect connectivity
sign two inspection completed
Domain Adaptation
1 Synthesized speech relevance in
specialized domains
TTS fails to capture domain-specific vocabulary,
terminology/jargons, or linguistic variations
4
Handling Special Characters Lack of direct phonetic representations or standard
Ensuring accurate encoding and representation
pronunciation rules for alphanumerics and emojis/
of alphanumerics, emojis, and symbols symbols pose challenges to accurately render speech
Prompt Engineering
Carefully designing and formulating prompts can significantly
reduce the need for pre-processing before TTS.
Model Fine-tuning
Fine-tuning the language model on specific TTS-related data
can help adapt the model to the nuances of the TTS task
Contact Us
London CIC
Silicon Valley CIC New York CIC 34,897 live + recorded sessions in FY23
Geneva CIC
Singapore CIC
https://www.hpe.com/us/en/about/virtual-customer-innovation-center.html
97
HPE Use Cases and Needs
98
Virtual Assistant (Avatar) Solution
The Team
The goal of this demo is to create a virtual avatar for HPE demo
centers that can interact with customers and provide relevant
information about HPE solutions and corporate information.
Software architecture and system integrations
99
Data Monsters Elite AI Professional Service
101
Broad Use Cases
Customer service Sales Customer experience PR Trade shows
Problem Problem Problem Problem Problem
• Limited employee • Inactive sales reps • Difficult to design and scale • Difficulty attracting • Difficulty attracting
knowledge • Failure to follow processes • Poor customer experience attention attention
• Time-consuming • Limited working hours • Low upsell/cross-sell • Weak company positioning • Sales reps get tired/
• Limited working hours • Slow access to • Clients choosing walk away.
customer data competitors • Limited knowledge
of product
Solution Solution Solution Solution Solution
• Avatars with corporate • 24/7, always-proactive • Avatars with a centralized • New interactive technology • Avatars attract attention.
knowledge avatars control panel attracts attention. • Access to corporate
• Single point of contact • Following processes • Developing and scaling • Positions a company knowledge
• 24/7 availability • Immediate access to customer experience as innovative • 24/7 info at your booth
customer and corporate
data
Target industries
• Financial services: Including banks, insurance, and brokerage firms
• Telecommunications
• Retail
• Hospitality and food service
• Transportation: Airports and train stations
• Real estate
• Business centers
• Healthcare: Hospitals and clinics
• Marketing agencies
102
Key Software Modules
103
Solution Architecture Data extraction layer
104
On-Premises Deployment: 5x Faster
105
“One-Stop Shop” for NVIDIA AI Workloads with HPE
HPE
(Services, software, infrastructure, HPE GreenLake cloud)
106
Thank you
Ka Wai Leung: [email protected]
Dan Lesovodski: [email protected]