0% found this document useful (0 votes)
145 views108 pages

GTC'24 Special Event - Build A RAG-powered Application With A Human Voice Interface (SE62869) - Deck - FINAL - 1714408879420001sjpp

Uploaded by

tvuongpham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views108 pages

GTC'24 Special Event - Build A RAG-powered Application With A Human Voice Interface (SE62869) - Deck - FINAL - 1714408879420001sjpp

Uploaded by

tvuongpham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

Build a RAG-Powered Application

with a Human Voice Interface


GTC Spring 2024 – SE62869 – Thur, March 21, 2024 @ 8:00 AM – 12:00 PM PDT

Ruchi Gupta, VP Product Management, Kore.ai


Ravi Teja Konkimalla, Sr. Solution Architect, ML, Quantiphi
Ka Wai Leung, Hybrid Cloud Alliance Manager, HPE
Dan Lesovodski, VP of AI, Data Monsters
Annie Surla, Developer Advocate Engineer, NVIDIA
Sven Chilton, Deep Learning Developer Advocate, NVIDIA
Ryan Kraus, Sr. Technical Marketing Engineer, NVIDIA
Agenda
8:00 AM – 9:30AM
Hands-On: Building Voice & RAG Powered Applications
NVIDIA
• Retrieval Augmented Generation
• Thinking about Multimodality
• The Basics of Speech AI

10:30 AM – 11:30 AM
Real-World Implementations:
10:00 AM – 10:30 AM
Powering CX with AI (Kore.ai)
10:30 AM – 11:00 AM
Transforming UX with RAG-Powered Human Voice
Interfaces (Quantiphi)
11:00 AM – 11:30 AM
Enhancing CX with GenAI-Based Virtual Assistant
(HPE & Data Monsters)

11:30 AM – 12:00 PM
Q&A Panel
Hands-On:
Building Voice & RAG Powered Applications

Annie Surla, Developer Advocate Engineer, NVIDIA


Sven Chilton, DL Developer Advocate, NVIDIA
Ryan Kraus, Sr. Technical Marketing Engineer, NVIDIA
Retrieval Augmented Generation

Ryan Kraus, Sr. Technical Marketing Engineer, NVIDIA


Why RAG, Why Now and Why it is so Exciting
RAG us finally starting to deliver on the promise of AI—unlocking the value of ALL enterprise data
Long, Long Ago…
…. November 30, 2022
… But Wait!

Unknown
Data Sour
ces

ia l
e c g y
Sp olo
pt m in
o m Ter
Pr ering
i n e Outdated
En g n s
Knowledg tio
e in a
lu c
a l
Difficult H
to tune
We Can Rebuild
We have the technology
The Depths of the Tech Debt…

w l
S p ra
API
One Framework to Bring Them All …
… and in the darkness bind them.

All we have to do is:


Give the model a clear and concise
answer in the question!

🤔
Give the model a clear and concise
answer in the question?

😕
Can’t an LLM Do It? Well… Yeah!
Multi-query prompting
Can’t an LLM do it? Well… Yeah!
Hypothetical Document Embedding (HyDE)
Can’t an LLM Do It?
Engineering Design Spectrum
Creativity

Philosopher
Answers must be correct.
Brainstorming
Answers will require some
Wrong answers are welcomed.
assumptions to be made and
Large logical leaps required
tested. Reasonable minds may
disagree.
High risk tolerance.
Moderate risk tolerance.

High Accuracy Low-Latency

Subject Matter Assistants and


Experts Co-Pilots
Answer must be correct. Wrong answers are tolerated
Answers are needed soon, Repeatable and reliable
but not now. answers are required.

Low risk tolerance. Moderate risk tolerance.

Consistency
How to Get Started with RAG
www.nvidia.com/generative-ai-chatbots

Evaluate Models in Try NVIDIA Developer Apply for a free


NVIDIA API Catalog RAG Examples in GitHub NVIDIA LaunchPad Trial
ai.nvidia.com nvidia.github.io/GenerativeAIExamples www.nvidia.com/launchpad
RAG Talks, Special Events & More at GTC 2024
Learn how Retrieval Augmented Generation (RAG) is Transforming Generative AI

Talks:
• Beyond RAG Basics: Building Agents, Co-Pilots, Assistants, and More!, NVIDIA
• Practical Strategies for Building Enterprise Applications Powered by LLMs, NVIDIA
• Financial Knowledge Graphs for Retrieval Augmented Generation, BlackRock
• New Offerings from NVIDIA to Overcome the Complexities of Generative AI, NVIDIA
• A Guide to Building Safe Generative AI Copilots that Improve Productivity and Protect
Company Data, NVIDIA
• The Future of AI Chatbots: How Retrieval-Augmented Generation is Changing the Game,
Blackrock, NVIDIA, ServiceNow
• Accelerating Enterprise: Tools and Techniques for Next Generation AI Deployment, NVIDIA
• Perform High-Efficiency Search, Improve Data Freshness, and Increase Recall with GPU-
Accelerated Vector Search and RAG Workflows, Zilliz, NVIDIA
• Re-Imagine Service Assurance Chatbots With LLMs and RAG, Tata Consultancy Services
(TCS)

Special Events (SE):


• Best Practices for Building LLM RAG Using NVIDIA AI [SE]
• Build a RAG-Powered Application With a Human Voice Interface [SE], NVIDIA, Quantiphi,
Kore.ai, HPE, Data Monsters
March 18–21 | www.nvidia.com/gtc | #GTC24
Connect with Experts (CWE):
• Building Generative AI Applications With Retrieval Augmented Generation [CWE]

Deep Learning Institute Workshops (DLI):


• Large-Scale Production Deployment of RAG Pipelines [DLI ]
• Building a GPU-Accelerated Retrieval Augmented Generation (RAG) Pipeline [DLI]
• Streamlining Enterprise Data Operations with Multimodal RAG and LangChain [DLI]
Thinking About Multimodality

Annie Surla, Developer Advocate Engineer, NVIDIA


Multimodal Data in the Wild!
Multimodal RAG - What Can it Do?

What deep learning models were


used to benchmark the relative
performance of H100?

Text Query Qu
er
y Ve
ct
or

Text Embedding
Model
Knowledge Base
(Images and Text)

ar
il ks
i m n
S hu
Answer C

Large Language
Model
Why is Multimodality Hard?
Multimodal RAG - Approaches for Multimodal Retrieval

Images
Transform all modalities into a single Vector
Space
Multimodal VectorStore
Text Documents Embedding Model

Images

Image Embedding VectorStore


Model
Different Modalities in Different
VectorStores

Text Documents

Text Embedding VectorStore


Model

Transform
Grounding all modalities into one Images

Text Embedding VectorStore


Text Documents Model
Multimodal RAG - Preprocessing Workflow

No Image
Description

Is this Yes Extract


Generate Image Image Image
Description Chart/Plot? Description
Chart
Description
Metadata

Linearized table as text


Images (stored as metadata)

Extract Clean
LLM
Text Splitter & Meta Chunks
Webpages Texts
Data Augmentation
Extract Text (includes Tables), Chart/Plot
Structured
Figures from PDFs Custom Logic Summary
JSON (as a chunk)

Embedding Model

Vector Store
Multimodal RAG - Preprocessing Workflow

No Image
Description

Is this Yes Extract


Generate Image Image Image
Description Chart/Plot? Description
Chart
Description
Metadata

Linearized table as text


Images (stored as metadata)

Extract Clean
LLM
Text Splitter & Meta Chunks
Webpages Texts
Data Augmentation
Extract Text (includes Tables), Chart/Plot
Structured
Figures from PDFs Custom Logic Summary
JSON (as a chunk)

Embedding Model

Vector Store
Multimodal RAG - Preprocessing Workflow

No Image
Description
This image is a bar chart comparing the relative
performance of different NVIDIA GPU
various machine learning Extract
Is thisacross Yes
accelerators
Generate Image Image models and tasks. The chart Image
has six categories
Description Chart/Plot? Description
Chart
Description represented by the machine learning models or
tasks, which are RNN-T, 3D U-Net, Mask R- Metadata
CNN, ResNet-50 v1.5, RetinaNet, and BERT.

Linearized table as text


Images (stored as metadata)

Extract Clean
LLM
Text Splitter & Meta Chunks
Webpages Texts
Data Augmentation
Extract Text (includes Tables), Chart/Plot
Structured
Figures from PDFs Custom Logic Summary
JSON (as a chunk)

Embedding Model

Vector Store
Multimodal RAG - Preprocessing Workflow

TITLE | Relative Performance – Per Accelerator<0x0A>Higher is


Better | NVIDIA A100 (v2.1 - Available) | NVIDIA H100 (v2.1 -
Preview) | NVIDIA H100 (v3.0 - Available)<0x0A>RNN-T | 1.0 |
1.7 | 1.8 | 1.8<0x0A>3D U-Net | 1.0 | 1.8 | 1.8 | 1.97<0x0A>Mask
No Image Description
R-CNN | 1.0 | 1.97 | 1.97 | 2.09<0x0A>ResNet-50 v1.5 | 1.0 |
1.95 | 2.07 | 2.10<0x0A>RetinaNet | 1.0 | 2.2 | 2.22 |
This image is a bar chart comparing the relative 2.28<0x0A>BERT | 1.0 | 2.65 | 3.07 | 3.08
performance of different NVIDIA GPU
accelerators across various machine learning Is this Yes Extract
Generate
models Image
and tasks. Image
The chart has six categories Image
Description Chart/Plot? Description
Chart
Description
represented by the machine learning models or
tasks, which are RNN-T, 3D U-Net, Mask R- Metadata
CNN, ResNet-50 v1.5, RetinaNet, and BERT.

Linearized table as text


Images (stored as metadata)

Extract Clean This is a comparison of the relative performance of


LLM
various accelerators from NVIDIA (A100 v2.1, H100
Text Splitter & Meta Data Chunks
v2.1 preview, H100 v3.0 available) using different
Webpages Texts models (RNN-T, 3D U-Net, Mask R-CNN, ResNet-50
Augmentation
v1.5, RetinaNet, BERT). Higher values indicate better
Extract Text (includes Tables), performance. The A100 has a baseline score of 1.0 Chart/Plot
Structured for all models. The H100 v2.1 preview and v3.0
Figures from PDFs Custom Logic Summary
JSON available perform 1.7-2.2x and 1.8-3.08x better than
(as a chunk)
the A100, respectively, depending on the model used.

Embedding Model

Vector Store
Multimodal RAG - Preprocessing Workflow

No Image
Description

Is this Yes Extract


Generate Image Image The Image Depicts A Network With Three Servers, Each With A
Image
Description Chart/Plot? Description
Chart
Single Node. Each Server Has A Single Disk Drive, A Single
Description Network Card, And A Single CPU. The Servers Are Connected
Metadata
To A Central Server, Which Is Connected To The Network Via A
Cable. The Cable Connects The Central Server To The Servers,
And The Cable Connects Each Server To Its Respective
Network Card. The Network
Linearized tableCard
as Has
textA Storage Capacity Of 1
GB, And Each Server Has An Additional 1 GB Of Storage. The
Images (stored
Central Server Has as
Twometadata)
Network Interfaces, One For The Server
And The Other For The Network Card And Storage.

Extract Clean
LLM
Text Splitter & Meta Chunks
Webpages Texts
Data Augmentation
Extract Text (includes Tables), Chart/Plot
Structured
Figures from PDFs Custom Logic Summary
JSON (as a chunk)

Embedding Model

Vector Store
I

Multimodal RAG - Preprocessing Workflow

'''Setting New Records in MLPerf Inference v3.0 with Full-Stack


Optimizations for AI \n\nNVIDIA L4 Tensor Core GPU vaults ahead No Image
\n\nNVIDIA L4 \n\nTensor Core GPU\n\nIn MLPerf Inference v3.0, NVIDIA
made the debut submission of the . Based on the new NVIDIA Ada Description
Lovelace architecture, L4 is the successor to the popular NVIDIA T4
Tensor Core GPU, delivering significant improvements for AI, video,
and graphics in the same single-slot, low-profile PCIe form factor.
\n\nNVIDIA Ada Lovelace architecture incorporates 4th Gen Tensor Is this Yes Extract
Cores with FP8, enabling excellent inference performance even at high Generate Image Image Image
accuracy. In MLPerf Inference v3.0, L4 delivered up to 3x more
Description Chart/Plot? Description
Chart
performance than T4 at 99.9% of the reference (FP32) accuracy of Description
BERT—the highest BERT accuracy level tested in MLPerf Inference v3.0.
\n\n\n\nFigure 2. MLPerf Inference performance comparison between
Metadata
NVIDIA L4 and NVIDIA T4 GPUs \n\nPer-accelerator throughput is not a
primary metric of MLPerf Inference. MLPerf Inference v3.0: Datacenter
Closed. Inference speedups calculated by dividing the inference
throughputs reported in MLPerf Inference v0.7 result ID 0.7-113 by Linearized table as text
the number of accelerators to calculate T4 Tensor Core GPU per-
accelerator throughput and calculating the ratios of the inference Images (stored as metadata)
performance of the L4 Tensor Core GPU in \n\n
www.mlcommons.org\n\n3.0-0123 (Preview) by the calculated per-
Extract
of MLCommons Association in the United States andClean
accelerator throughput of T4. The MLPerf name and logo are trademarks
other countries. LLM
All rights reserved. Unauthorized use strictly prohibited. See for
more information. \n\nThe NVIDIA L4 also incorporates a large L2
cache, providing additional opportunity Text Splitter & Meta Chunks
Webpages
to increase performance and energy efficiency. In the NVIDIA MLPerf Texts
Inference v3.0 submission, two key software optimizations were Data Augmentation
Extract Text (includes Tables),
implemented to take advantage of the larger L2 cache: cache residency Chart/Plot
and persistent cache management. \n\nThe larger L2 cache on L4 Structured
Figures from PDFs
enabled the MLPerf workloads entirely within the cache. The L2 cache
Custom Logic Summary
provides higher bandwidth at lower power than GDDR memory, so the
significant reduction in GDDR accesses helped to both increase
JSON (as a chunk)
performance and reduce energy use. \n\nUp to 1.4x higher performance
was observed when batch sizes were optimized to enable the workloads
to fit entirely within the L2 cache, compared to the performance when
batch sizes were set to maximum capacity. \n\nL2 cache
persistence\n\nNVIDIA Ampere architecture\n\nAnother optimization
used the feature first introduced in the . This enables developers, Embedding Model
with a single call to TensorRT, to tag a subset of the L2 cache so
that it can be prioritized for retention (that is, scheduled to be
evicted last). This feature is especially useful for inference when
working under a regime of residency, as developers can target the
memory being reused for layer activations across model execution,
dramatically reducing GDDR write bandwidth usage. \n\n'''

Vector Store
Multimodal RAG - Inference Workflow

Image
VQA - VQA
MLLM Answer

Slack User Embedding Retrieved Text from Final


VectorStore LLM
Interface Query Model Chunks Chart/Plot Response

Plain Text
Multimodal RAG - Inference Workflow

Image
VQA
This is a comparison of the relative performance of various accelerators from -
NVIDIA VQA
(A100 v2.1, H100 v2.1 preview, H100 v3.0 available) using different models (RNN-T, 3D
MLLM
U-Net, Mask R-CNN, ResNet-50 v1.5, RetinaNet, BERT). Higher values indicate better Answer
performance. The A100 has a baseline score of 1.0 for all models. The H100 v2.1
preview and v3.0 available perform 1.7-2.2x and 1.8-3.08x better than the A100,
respectively, depending on the model used.

Slack User Embedding Retrieved Text from Final


VectorStore LLM
Interface Query Model Chunks Chart/Plot Response

'''Breaking MLPerf Training Records with NVIDIA H100 GPUs \n\n3D U-Net \n\n
NVIDIA submitted results on 432 NVIDIA H100 Tensor Core GPUs, achieving a new
Plain
record for the benchmark of 0.82 minutes (49 seconds) to train. Per-accelerator
performance on H100 also improved by 8.2% compared to the prior round. \n\nTo
Text
achieve excellent performance at scale, a faster GroupBatchNorm kernel was one
key optimization. \n\nIn our largest scale 3D U-Net submission, the instance
normalization operation in the neural network needs to perform a reduction of
the tensor mean and variance across four GPUs. By using a faster GroupBatchNorm
kernel to implement instance normalization, we delivered a 1.5% performance
increase. \n\n''’
Multimodal RAG - Inference Workflow

The Image Depicts A Network With Three Servers, Each With A Single Node. Each Server Has A Single Disk
Drive, A Single Network Card, And A Single CPU. The Servers Are Connected To A Central Server, Which Is
Connected To The Network Via A Cable. The Cable Connects The Central Server To The Servers, And The
Cable Connects Each Server To Its Respective Network Card. The Network Card Has A Storage Capacity Of
1 GB, And Each Server Has An Additional 1 GB Of Storage. The Central Server Has Two Network Interfaces,
One For The Server And The Other For The Network Card And Storage.

Image VQA - VQA


MLLM Answer

Slack User Embedding Retrieved Text from Final


VectorStore LLM
Interface Query Model Chunks Chart/Plot Response

Plain Text
Multimodal RAG - Inference Workflow

Image
VQA - VQA
MLLM Answer

Slack User Embedding Retrieved Text from Final


VectorStore LLM Response
Interface Query Model Chunks Chart/Plot

TITLE | Relative Performance – Per Accelerator<0x0A>Higher is Better | NVIDIA A100 (v2.1 -


Available) | NVIDIA H100 (v2.1 - Preview) | NVIDIA H100 (v3.0 - Available)<0x0A>RNN-T | 1.0
| 1.7 | 1.8 | 1.8<0x0A>3D U-Net | 1.0Plain Text
| 1.8 | 1.8 | 1.97<0x0A>Mask R-CNN | 1.0 | 1.97 | 1.97 |
2.09<0x0A>ResNet-50 v1.5 | 1.0 | 1.95 | 2.07 | 2.10<0x0A>RetinaNet | 1.0 | 2.2 | 2.22 |
2.28<0x0A>BERT | 1.0 | 2.65 | 3.07 | 3.08
Multimodal RAG - Inference Workflow

Image
VQA - VQA
MLLM Answer

Slack User Embedding Retrieved Text from Final


VectorStore LLM Response
Interface Query Model Chunks Chart/Plot

Plain Text

Breaking MLPerf Training Records with NVIDIA H100 GPUs \n\n3D U-Net
\n\nNVIDIA submitted results on 432 NVIDIA H100 Tensor Core GPUs, achieving
a new record for the benchmark of 0.82 minutes (49 seconds) to train. Per-
accelerator performance on H100 also improved by 8.2% compared to the prior
round. \n\nTo achieve excellent performance at scale, a faster GroupBatchNorm
kernel was one key optimization. \n\nIn our largest scale 3D U-Net submission,
the instance normalization operation in the neural network needs to perform a
reduction of the tensor mean and variance across four GPUs. By using a faster
GroupBatchNorm kernel to implement instance normalization, we delivered a
1.5% performance increase.
Multimodal RAG - Inference Workflow

Image
VQA - VQA
MLLM Answer
Retriever

VectorStore
Slack User Retrieved Text from Final
LLM
Interface Query Chunks Chart/Plot Response
Embedding
Model
Plain Text
Multimodal RAG - Demo
The Basics of Speech AI

Sven Chilton, Deep Learning Developer Advocate, NVIDIA


What is Speech AI?
Any machine learning system or application involving audio data representing human speech

Automatic Speech Recognition (ASR) Text-to-Speech (TTS)


aka Speech-to-Text (STT, S2T) aka Speech Synthesis

ASR Model "good morning" "good morning" TTS Model

Neural Machine Translation (NMT)


Spanish English
speech speech

Text
ASR Model "buenos días" Translation "good morning" TTS Model
Model
How Do Computers Perceive Speech?
Audio Waveform Signal (Mel) Spectrogram

Pure Tone

Singing a note

Speaking

*different scale vs top 2 plots


Outline of a Typical Speech-To-Text System

Transcription "good morning"

Turn speech vectors into


Decoder text (tokens)

Speech Neural
Representation Network
Vectors

Turn complex speech signal


Encoder into compact vector
representation

STT often = ASR Mel-Spectrogram


(Automatic Speech
Recognition)
Canary Model Demo - Multilingual Transcription & Translation
Text-To-Speech: Controllable Audio Synthesis
Controllable features

Transcription "cat" Controlling word-level emphasis at


inference:
Speaker embedding Speech signal
The food here isn’t
that bad.
F0 (pitch)
TTS model

Emotion
The food here isn’t
that bad.
Language

Speech Rate
The Inefficiency of Adding New Speakers in TTS

• Most models operate more like a switch


• Incorporating new speakers typically requires retraining and substantial amount of data
• Figuring out new speaker embeddings for novel speakers is challenging

Speech signal of
Alice or Bob
Transcription "cat"

Speaker embedding TTS model


”Alice”
Or
Speaker embedding
“Bob”
Reference-Based Text-To-Speech: Handling Novel Samples

• Reformulate model to condition on short reference samples


• How do we get the model to generalize to novel reference samples?

Speech signal

Transcription "cat"
TTS model
Reference
Speech signal
P-Flow: TTS with 3 Seconds of Reference Audio

• Model learns to fill-in-the-blank


• Reconstruction is also provided with temporally aligned text to enable text-to-speech

1st place in LIMMITS 2024 Multilingual


Zero Shot TTS Track!
Input Prompt Synthesized Input Prompt Synthesized
Speech (Kannada) Speech (English)

P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting, Sungwon Kim et al. Neurips 2023
A3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing, He Bai et al. ICML 2022
What is Neural Machine Translation?

Spanish text English text

Historically: Text
“NMT” = text-to-text translation "buenos días" Translation "good morning"
model

Spanish
speech

If you want to translate speech?


=> Cascade a speech-to-text Text
transcription and text-to-text ASR Model "buenos días" Translation "good morning"
Model
translation model

Automatic
Recently: development of end- Speech
Translation "good morning"
to-end speech-to-text (AST) model
translation models

ASR – Automatic Speech Recognition


Speech & Text Translations are Attention-Based Encoder-Decoder Models

Text Translation Model

Historically: Such text translation models


were trained first, & then techniques were
applied to speech translation
Example of RAG with Speech AI

LLM – Large Language Model | RAG – Retrieval-Augmented Generation


How to Get Started with NVIDIA Riva
Make your conversational applications talk in many languages

Experience Riva APIs Enroll to Protype, Test & Deploy Contact NVIDIA AI Enterprise
See Riva in Action
Through NVIDIA API Catalog* Your Own App on NVIDIA LaunchPad

Webpage: nvidia.com/en-us/ai-data-science/products/riva/
Documentation: docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html
GitHub: github.com/nvidia-riva
Contact Us: nvidia.com/en-us/data-center/products/ai-enterprise/contact-sales/
Speech & Translation AI Talks, Special Event & More at GTC 2024
Join to learn the latest speech & translation AI achievements & how to use them with GenAI-based conversational applications

All Speech & Translation AI GTC’24 Events


Talks:
• Speech AI Demystified, NVIDIA
• Speaking in Every Language: A Quick Start Guide to TTS Models for Accented, Multilingual
Communication, NVIDIA
• Mastering Speech for Multilingual Multimedia Transformation, OVHcloud, NVIDIA
• Adapting Conformer-Based ASR Models for Conversations Over the Phone, PolyAI
• Secure AI-Driven Translation in Video Conferencing, Pexip
• Behind the Scenes of Running a Conversational Character in a 3D Scene, Convai Technologies

Special Events (SE):


• Build Speech AI for Multilingual Multimedia Transformation [SE], HPE, Data Monsters, Kore.ai,
Quantiphi, NVIDIA

Connect with Experts (CWE):


• Multi-Speaker ASR with NVIDIA NeMo Toolkit —Training & Inference [CWE]

March 18–21 | www.nvidia.com/gtc | #GTC24 Deep Learning Institute Workshops (DLI):


• Talk to Your Data in Your Native Language [DLI]
Real-World Voice & RAG Implementations

Ruchi Gupta, VP Product Management, Kore.ai


Ravi Teja Konkimalla, Sr. Solution Architect, ML, Quantiphi
Ka Wai Leung, Hybrid Cloud Alliance Manager, HPE
Dan Lesovodski, VP of AI, Data Monsters
Powering CX with AI

Ruchi Gupta, VP Product Management, Kore.ai


Kore.ai – Powering Customer Experiences with AI
Overview and Speech and RAG Enabled Applications

PRESENTED BY

Ruchi Gupta

©2023 Kore.ai. All Right Reserved


Disclaimer

The following information is being shared in order to outline some of our current product plans, but like
everything else in life, even the best laid plans get put on rest.
We are hopeful that the following can shed some light on our roadmap, but it's important to understand that
it is being shared for informational purpose only and not a binding commitment.
The development, release, and timing of any products, features or functionality remains at the sole discretion
of Kore and is subject to change.

© 2023 Kore.ai. All Right Reserved


Agenda

ü About Kore.ai
ü CX Speech AI Use Case
ü CX GenAI RAG Use Case

© 2023 Kore.ai. All Right Reserved


Partnering with Businesses to Put AI to Work
Kore.ai is Your Partner in Realizing the Value from AI Responsibly - Delivering an intelligent, secure
enterprise solution that enables a human-like conversational experience to automate interactions.

Key Highlights

400+ BILLIONS 45-75%


Global Customers Of Conversations Automation Rates COMPLIANT
GDPR, HIPAA,
PCI DSS, SOC2,
200m+ $1b+ 2m+ FEDRAMP Ready,
TLS 1.2, AES 256
Enterprise Consumers Cost Employees use Kore.ai IVAs
Served Reduction

Automating and Optimizing Global Enterprise Customer and Employee Experiences

© 2023 Kore.ai. All Right Reserved Source: Gartner (Nov-2021) Note: Fiscal year end is 3/31. Financial metrics as of FY ending March 2023. Rule of 100+ calculated as 2024 ARR growth + 2024 EBITDA margin. 54
Kore.ai Named Leader, Again!
2023 Conversational AI Gartner Magic Quadrant

• Market understanding. Kore.ai has an excellent understanding of


the major enterprise use cases and the requirements

• The company’s Customer Experience and Employee Experience


solutions is among the sharpest found in this research

• Product capabilities. An extensive feature set and capabilities that


are available to non-developers and non-data scientists in no-code
tooling

• Easier to operationalize by a variety of users across different


business units

• Innovation and excellence. Large development organization


allows Kore.ai to stay up to date with the R&D trends and
emerging demand for capabilities

• Comprehensive toolkit for NLP, as well as a complete portfolio of


connectors and integrations with channels and leading back-end
systems

© 2023 Kore.ai. All Right Reserved 55


Automation First, Experience Optimization Platform
CONVERSATIONAL AI ANSWERS CONTACT CENTER AGENT AI INSIGHTS AI

Virtual Agent - User


Virtual Assistant Builder Knowledge AI w/LLM AI Routing & Experience Flows Automatic Summarization
Conversational Insights

Voice AI Document AI Agent Desktop Real-time AI suggestions Intent Discovery

Answer from structured and Agent Customer Interaction


Multi-Engine NLU Agent Monitoring & Reporting AI Assisted Conversations
unstructured data Insights

Integrations to CRMs and Virtual Agent Performance


AI Assisted Dialogs Auto fulfillment
other apps Insights

Data AI Case Management Agent Coaching Agent Performance Insights

Contact Center Quality


Workforce Management
Management Insights

Platform Services
Life cycle management Multi-lingual Support Pre-built Integrations Generative AI Campaign Management Omni channel
(Versioning, Publish, Collaboration)

Enterprise services
Role-based Access Controls, Maker checker process, Audit Logs
Security, Compliance (SOC-2, PCI, FedRAMP, HIPAA, GDPR) Cloud and On-Premise Deployments, Scalability, Integrations, Authentication, and
Authorization
© 2023 Kore.ai. All Right Reserved 56
Traditional Contact Centers vs. AI-Native Contact Centers

Agents Handle All Inquiries | Uses AI to Handle & Direct All Inquiries

| 24/7 Availability
Agents Work in Shifts |

Traditional

AI-Native
Inconsistent Experiences Consistent Experiences
from one Agent to another | | Across All Channels
(depending on experience) (Voice, SMS, Chat, Social)

“Human-Like” Interactions
“Robotic Interaction” | | Using LLMs, NLP and
Machine-Learning (ML)

Time Consuming Manual Scans Multiple Databases


Database Queries & | | in Seconds to Provide More
Additional Hold Times Personalized Service

Traditional IVR Routing | | Intelligent Routing & Queueing

©2023 Kore.ai. All Right Reserved 57


Multi-Modal CX Use Case Speech AI Use Case

ü Kore Voice Gateway integration for voice recogination STT


ü TTS integration for playing back reponses to customers
ü Agent AI leveraging STT to be able to provide next best reponses to the agent in real-time
ü Translation usecase where multilingual queries coming in from customers are handled by
English speaking Agents via a multi-modal contact center

© 2023 Kore.ai. All Right Reserved


Agent Desktop

© 2023 Kore.ai. All Right Reserved


Speech AI Demo Video

© 2023 Kore.ai. All Right Reserved


Multi-Modal CX Speech AI Use Case
Use Case
● Multi-modal interactions where customer interacting in voice non-English channel and with agents on chat channel in
English. Vertical Healthcare

Business Problem / Pain Points


● Voice agents not available during off-peak hours. Only chat agents would be accessible outside of working hours, for both
voice calls and digital interactions
● Cost of voice interaction > chat interaction
● Risk loss of customers, revenue, etc.

Metrics
● # of interaction services (Trial phase few hundreds per month for after hours interactions)
● Saving for customer -> not needing to staff voice agents after hours
● Response processing time from Agent send to playback to customer

Challenges and Enhancements


● Avoid Cross talk – Agent experience built such that during user input phase pause agent input (send button greyed out)
● Restrictions on concurrency and transfer
● Future: Real-time experience when Agent is typing – Filler audio (other than hold music)
● Human assisted Bot experience

© 2023 Kore.ai. All Right Reserved 61


Voice Preference Selection Flexibility

STT
Automated speech recognition engines to best suit
your use cases

TTS
Voice preferences to personalize the ASR Engine and
the voice that plays for your text-to-speech conversions

Configuration
• Navigate to the Contact Centre from the left menu
within the desired App
• Select the Languages and Speech tab within the
configurations section in the menu
• Click on Voice Preferences and click the drop-down
menu under Automated Speech

© 2023 Kore.ai. All Right Reserved 62


XO Platform – Generative AI Capabilities
Playbooks and coaching
Suggest use cases

Suggest contextual
Generate training utterances

responses

Generate conversation flows Improve Retrieve relevant knowledge


10X Faster •

• Generate test cases IVA Agent information


Development Quality and
• Suggest flow improvements Productivity • Co-pilot of drafting responses

• Summarize conversations

• Handle multiple languages

• Zero-shot, Few-shot intent Deliver Generate • Discover new topics or intents


recognition, intent resolver and human like Intelligent Generate Conversational
disambiguation

experiences Insights insights


Dynamic paraphrasing
Generate Agent Performance

• Handle complex co-referencing Insights

• Retrieval augmented generation • Generate Quality


Management Insights
• Generate personalized
responses

© 2023 Kore.ai. All Right Reserved


XO Platform - LLM and GenAI Framework
Model Library

Prebuilt LLM Integrations Custom LLM Integration (BYO) XO GPT


Support for commercial (OpenAI, Azure Open, Integrate with any in-house, commercial or open Kore.ai fine-tuned language models to deliver
Anthropic etc.) & community models source models hosted internally or externally. human like conversational experiences.

Prompt Engineering

Prompt Library Prompt Fine-tuning & Testing Prompt Versioning


Prebuilt templates for all models and support to Fine-tune the prompts by testing against the Version the prompts within the platform with the
write custom prompts using dynamic values. models; Support for pre and post processors. support for

Guardrails

Anonymization of PII/Sensitive Info Enterprise Content Filtering Rules Fact Checking


Secure access to models by anonymizing the Scan the request and response text for harmful, Fact check the responses to ensure that the
PII and sensitive information in the requests. bias, toxic content, and design fallback flows. content is grounded to enterprise context.

Monitoring

Model Performance Analytics Usage Analytics


Measure model’s performance using metrics Track the response times, tokens usage, failure
like correctness, faithfulness, relevancy etc. rates across all modules and features.

© 2023 Kore.ai. All Right Reserved


CX GenAI RAG Use Case
Virtual Assistant:

Helps to provide conversational


employee assistance and provides
intelligent search assistance to find the
most suitable answers from the articles

Search Assist Platform for


Intelligent Data Indexing:

The knowledge articles/contents and


documents can be ingested into Search
Assist platform in multiple ways to
enable intelligent indexing of data.

Dynamic Answer Generation with


Generative Model:

Harnessing the power of a generative


model to dynamically produce
responses based on matched content
These conversation features can
to provide dynamic answer(s)
be voice-enabled using Kore –
Smart Assist Voice platform

© 2023 Kore.ai. All Right Reserved


GenAI RAG Demo Video

© 2023 Kore.ai. All Right Reserved


GenAI - Intelligent Knowledge Base Exploration
B Answers Lookup Dynamic answer generation Answer

Answers Lookup
Present Answer
Curated/cached
Summarized answer

answers look: 75%


with links to sources.


User input "Is a pug allowed in hold?"

confidence.
Links to document
LLM (OpenAI)

when summarization

judgement for accurate


is not feasible.
answer Search all content
● kore LLM powered
Vector search results
● Traditional index
Search

Summarize with LLM


● OpenAI
● Slow step

© 2023 Kore.ai. All Right Reserved 67


CX GenAI RAG Use Case
Use Case
● Empower Agents with intelligent search capabilities, aiding in the swift retrieval of pertinent answers, documents, and
articles based on user input. Vertical Travel and Hospitality (Airline)

Business Problem / Pain Points


● Multiple data sources (Sharepoint) and article sometimes with conflicting information
● Long hold times cause customer dissatisfaction
● Risk loss of customers, revenue, etc.

Metrics
● Reduces AHT on avg of 30sec (vs. a target of 15-20sec reduction). Range from (10sec for call and 85sec for chat)
● Saving of 100k for every sec reduction in AHT
● Faster onboarding of newer Agents
● Pilot rolled out with ~50 agents production anticipated for 1400+ agents
● Average prompt token was 8000 and completion token was 500

Challenges and Enhancements


● Cache implementation for Answers to reduce response time significantly

© 2023 Kore.ai. All Right Reserved 68


Thank You
Visit Kore.ai for more information

US Office Address India Office Address UK Office Address Mail Us Follow Us


7380, West Sand Lake Road 12th Floor, E-Park, Alpha House, 100 [email protected]
Suite 390 Plot No.1, Hitech City Road, Borough High Street
Orlando, FL 32819 Kondapur, Hyderabad-500084 London, UK SE1 1 LB

©2023 Kore.ai. All Right Reserved


Transforming User Experience with
RAG-Powered Human Voice Interfaces

Ravi Teja Konkimalla, Sr. Solution Architect, ML, Quantiphi


AGENDA

1. Introduction to Quantiphi
2. Speech AI & Retrieval Augmented Generation
1. Case Study
2. Considerations & Challenges
3. Demo
3. Summary & Call-to-Action
4. Q&A
01
Introduction to Quantiphi

© 2023 Confidential & Proprietary


Quantiphi Factsheet

GEOGRAPHIC REACH INDUSTRIES


Highlights
Boston Singapore

Chicago Mumbai
BFSI Telecom Manufacturing Public
Princeton Bangalore Sector

San Jose Trivandrum

Toronto Netherlands CPG / M&E, Healthcare & Energy


QUANTIPHI CLIENT Retail Gaming Lifesciences
London
70+
> 65% F500
clients EXPERTISE
3Yr Revenue
CAGR Generative AI Digital Twins
PARTNER DIFFERENTIATION

Conversational AI & Digital Avatars Document AI


42% 3 x Americas Service
Delivery Partner
Computer Vision Marketing Analytics

Medical Imaging Platform Modernization


2013 250+
Year Of Rev share of Data Analytics Infrastructure
Inception R&D team F500+G500

NVAIE Solution Preferred Partner Preferred Partner


Provider Embedded Edge Visualization
STRATEGIC PARTNERSHIPS

450+
3500+ 11 40% Oracle Service Google Cloud
Premier Partner
Premier Global
Consulting Partner
DLI Certifications
Accelerator
Rev share
Professionals Patents
of Top 10
filed
clients © 2024 Confidential & Proprietary
Differentiated Mentions

PRESS RELEASES

GTC 2023 Keynote Mention

ANALYST RECOGNITIONS

Leader in the 2023 IDC MarketScape: Worldwide AI


Services 2023 Vendor Assessment

Leader in 2023 ISG Analytics Services


Report for both the US & Europe

Frost & Sullivan’s North America


Competitive Strategy Leadership Award

© 2024 Confidential & Proprietary


02
Speech AI & Retrieval
Augmented Generation

© 2023 Confidential & Proprietary


In Case you Missed it!

Architecting Enterprise AI Success with


RAGs and LLM
Lessons from the first 12 months of building
Generative AI solutions Speaker:
Siddharth Kothwal
Available On-demand in 24 Hours Global Head - NVIDIA Practice,
Quantiphi

© 2024 Confidential & Proprietary


2.1
Case Study

© 2023 Confidential & Proprietary


Voice-Enabled Generative AI Enterprise Search System
Speech AI and RAG-enabled Enterprise Semantic Search System transforming maintenance operations for one of the leading F500 manufacturers,
empowering maintenance technicians to seamlessly navigate vast data corpora, reducing downtime and improving operational efficiency

PROBLEM STATEMENT QUANTIPHI’S SOLUTION

Suboptimal maintenance operations and low technician End-to-end Semantic Search pipeline with a Real-time
productivity due to the inefficient retrieval of relevant Conversational Agent to retrieve information and address
information from vast corpus of repair manuals technician queries, improving maintenance operations efficiency

Information High Turnaround Refined Search with Faster Information


Overload Time (TAT) Enhanced Accuracy Retrieval

Limited Scalability Reduced Unlocked Scalability Improved Operational


of Manual Process Productivity Potential Efficiency

ALL INFORMATION IN THIS PRESENTATION IS CONFIDENTIAL. PLEASE DO NOT SHARE IT WITHOUT PRIOR CONSENT FROM QUANTIPHI.
© 2024 Confidential & Proprietary
Speech AI & Retrieval Augmented Generation High-Level Process Flow
How it all comes together?

Speech NeMo LLM Retrieval Augmented


Recognition Generation by LLMs

Document
Riva ASR Intent Slot Indexing
Identification Classification
Audio Data Vector DB
Input Dialogue Sources
Query Manager
Question

Retriever
Response Answer NeMo
User ACE Agent
Web Application Retriever
Audio
Text-to-Speech
Output

Generator
Answer NeMo LLM
Riva TTS

© 2024 Confidential & Proprietary


ALL INFORMATION IN THIS PRESENTATION IS CONFIDENTIAL. PLEASE DO NOT SHARE IT WITHOUT PRIOR CONSENT FROM QUANTIPHI.
2.2
Speech AI + RAG:
Considerations & Challenges

© 2023 Confidential & Proprietary


Why is Implementing a Speech AI & RAG-powered System Complicated?

LLM
Retrieve precise information on
Text: Inspect 2359-RFID
Intentmodule forSlot
anomalies electrical connections for 2359-RFID
,focusing on electrical connections.
Identification Provide
Classification module anomalies
immediate solutions for any identified issues
Inspect 2359-RFID module for
anomalies, focusing on electrical Speech Retrieval Augmented
connections. Provide immediate Generation by LLMs
Recognition
solutions for any identified issues

Document
Indexing
ASR
Vector DB
Data
Audio Sources
Input Dialogue
Query Manager
Question

Retriever
Answer
Response
User Web Application
Audio
Text-to-Speech
Output

Generator
Answer LLM

Check for any loose connections, TTS


ensure power supply stability and data
cables are intact; address issues by
replacing damaged components and
Text Response: Check for any loose
updating firmware as needed connections, ensure power supply stability and
data cables are intact; address issues by
replacing damaged components and updating
firmware as needed
© 2024 Confidential & Proprietary
ALL INFORMATION IN THIS PRESENTATION IS CONFIDENTIAL. PLEASE DO NOT SHARE IT WITHOUT PRIOR CONSENT FROM QUANTIPHI.
Why is Implementing a Speech AI & RAG-powered System Complicated?

LLM Retrieves inaccurate information by


misunderstanding maintenance
Text: Inspect 2359-rapid
Intent module forSlot
animals,
module anomalies as wildlife animal
focusing on electrical connections.
Identification Provide
Classification
interactions.
immediate solutions for any identified issues
Inspect 2359-RFID module for
anomalies, focusing on electrical Speech Retrieval Augmented
connections. Provide immediate Generation by LLMs
Recognition
solutions for any identified issues

Document
Indexing
ASR
Vector DB
Data
Audio Sources
Input Dialogue
Query Manager
Question

Retriever
Answer
Response
User Web Application
Audio
Text-to-Speech
Output

Generator
Answer LLM

TTS
In the realm of fauna, the notion of a
rapid module is non-existent, as
sentiment entities lack inherent Text Response: In the realm of
electrical connections
fauna, the notion of a rapid module'
is non-existent, as sentient entities
lack inherent electrical connections
© 2024 Confidential & Proprietary
ALL INFORMATION IN THIS PRESENTATION IS CONFIDENTIAL. PLEASE DO NOT SHARE IT WITHOUT PRIOR CONSENT FROM QUANTIPHI.
Automatic Speech Recognition (ASR) Examples

Captures other domain-specific Captures proper nouns Captures hastily spoken


language punctuation and name entities

Determine the isentropic efficiency of How to integrate Industry 4.0 concepts Initiate the diagnostics for the HVAC
the centrifugal compressor, accounting with Siemens PLM software for system, inspecting the pneumatic
Ground Truth streamlined data analytics in smart actuators and replace the
for polytropic efficiency, adiabatic
efficiency, and impeller tip clearance manufacturing malfunctioning valve, now what’s next?

Determine the ice entropic efficiency of How to integrate industry four point zero Initiate the die Agnostic for the Hvic
ASR the centrifugal compressor, accounting concepts with Siemels PNM software for system, inspecting the new actuates,
Model for poly topic efficiency, audio attic streamlined data analytics in smart and replace the malfunctioning vow,
efficiency, and impeller tip clearance manufacturing now what’s next?

ASR inaccuracies can significantly disrupt the RAG system by retrieving improper information, thereby compromising with the accuracy and
Effects on integrity of the generated response
RAG

© 2024 Confidential & Proprietary


Deconstructing ASR : Why is it a Hard Problem?

© 2024 Confidential & Proprietary


Deconstructing RAG - ASR : Why is it a Hard Problem?
Key Considerations and its Challenges

How does ASR output affect RAG output?

1
Transcription errors can introduce inaccuracies in the
Acoustic Model retrieved information, ultimately affecting the quality of
Way the speaker pronounces the words generated responses

2
Struggles in distinguishing similar-sounding
Language Model words and varied accents can influence
Distinguishes similar sounding words semantic understanding of the input text

3
Fast-paced audio streams affect
Punctuation Model transcriptions, reducing input coherence
Grammatical construction of a sentence and output clarity

4
Difficulties with domain-specific phrases,
Named Entity Recognition
misinterprets key audio details, leading to
Key name or element from an audio misinterpreted information being passed to RAG

© 2024 Confidential & Proprietary


How to Solve the ASR Problem

Word Boosting
Customizations to help the ASR engine recognize specific words
of interest

Text Normalization
Address tokens where spoken form differs from Written Forms

Language Model Fine-Tuning


Retraining with domain specific text data to identify
Reducing Word niche phrases
Error Rate

Voice Activity Detection


Distinguishes parts of speech with an active speaker and
without.

Acoustic Model Fine Tuning


Retraining with domain specific audio data to catch different accents

© 2024 Confidential & Proprietary


Text to Speech (TTS) Examples

Produce speech for Produce speech with Produce speech with regional
alphanumerics & symbols. punctuation & notations accent variations

Initialize security protocol 5489-A to


Maintenance log entry: 2024-03-13 08:00 Look out for misconfigured network
LLM Output fortify data defenses, implementing a
- Machine #2 inspection completed settings that may affect connectivity
multi-layered firewall 🚧

Initialize security protocol five four eight Maintenance log entry colon twenty
TTS nine dash A to forty five data defenses, twenty-four zero three thirteen zero Look out for miss kuhn-fig-yerd network
Model implementing a multi-layered firewall eight zero zero dash Machine hashtag settings that may affect connectivity
sign two inspection completed

Maintenance log entry on the thirteenth


Initialize security protocol fifty four
of March tweety twenty-four at 8 am for Look out for misconfigured network
Ground Truth eighty nine A to fortify data defenses,
Machine number 2 inspection settings that may affect connectivity
implementing a multi-layered firewall
completed

© 2024 Confidential & Proprietary


Deconstructing RAG - TTS : Why is it a Hard Problem?
Key Considerations and Challenges

How does RAG output affect TTS?

Domain Adaptation
1 Synthesized speech relevance in
specialized domains
TTS fails to capture domain-specific vocabulary,
terminology/jargons, or linguistic variations

Coherence and Naturalness


2
Disparity between written and spoken content
Cohesiveness in generated responses for proper
poses challenges for the TTS model in producing
TTS outputs speech that is both natural and coherent

Latency and Real-Time Processing


3
Computationally intensive or time-consuming
LLM response generation can result in delays
Timely and seamless conversion of TTS from RAG Output
in generating the TTS output

4
Handling Special Characters Lack of direct phonetic representations or standard
Ensuring accurate encoding and representation
pronunciation rules for alphanumerics and emojis/
of alphanumerics, emojis, and symbols symbols pose challenges to accurately render speech

© 2024 Confidential & Proprietary


How to Solve the TTS Problem

Prompt Engineering
Carefully designing and formulating prompts can significantly
reduce the need for pre-processing before TTS.

Arpabet and IPA Tuning


Manual Updation of Arpabaet and IPA Files for domain
specific words significantly boost the performance.

Reducing Speech Prosody Modeling


Error Rate Incorporating prosody (intonation, rhythm, and stress patterns) into
the TTS model can enhance the naturalness of generated speech.

Model Fine-tuning
Fine-tuning the language model on specific TTS-related data
can help adapt the model to the nuances of the TTS task

© 2024 Confidential & Proprietary


2.3
Demo

© 2023 Confidential & Proprietary


© 2023 Confidential & Proprietary
03
Summary

© 2023 Confidential & Proprietary


Summary - Building and Deploying Speech AI & RAG Applications

Fine-Tuning ASR Model for Adapting Document


Domain-Specific Chunking for Different Types Fine-Tuning the Retriever and
Understanding and Modalities Re-ranker Model

Optimizing Document Augmenting Data and Fine- Establishing Custom Platform


Retrieval Accuracy with Tuning TTS Model on Domain Metrics for Scalability, Low
Sparse & Dense Methods Specific Data Latency, and High Throughput

© 2024 Confidential & Proprietary


3 x Americas Service
Delivery Partner of the Year

Discover Innovation Firsthand! Join Us At:


Booth #1513 & Booth #G129
AI CoE Pavillion Generative AI Pavillion

Contact Us

Ravi Teja Konkimalla Akshaya Save


Senior Solution Architect Manager, Strategic Alliances
+1 267 206 4642 +91 98692 41489

Scan to accelerate your AI journey


Enhancing CX with GenAI-Based Virtual Assistant

Ka Wai Leung, Hybrid Cloud Alliance Manager, HPE


Dan Lesovodski, VP of AI, Data Monsters
Enhancing Customer Experience
with GenAI-Based Virtual Assistant
Ka Wai Leung, HPE
Dan Lesovodski, Data Monsters
HPE Customer Innovation Centers and Demo Portal

As-a-service worldwide platform

100+ live | 400+ recorded demos on


demand

London CIC

Silicon Valley CIC New York CIC 34,897 live + recorded sessions in FY23
Geneva CIC

Houston CIC, HQ HPE Digital Life Garage,


Dubai
HPE GreenLake demos

Singapore CIC

HPE and Partner content

Workload-based use cases

https://www.hpe.com/us/en/about/virtual-customer-innovation-center.html

97
HPE Use Cases and Needs

• Showcase HPE GenAI capabilities for demo center visitors.


• Provide “cool” interactive demos and not just PPTs and
chatbots.
• Speech and video are more engaging than typing long
questions to chatbots.
• Take customers through all demos virtually and answer
questions.
• Increase HPE staff efficiency and productivity.
• Provide detailed, accurate, and relevant answers for HPE.

98
Virtual Assistant (Avatar) Solution

The Team

Data management, infrastructure,


and platform architecture

NVIDIA® Avatar Cloud Engine (ACE),


NVIDIA Riva, and NVIDIA GPUs

The goal of this demo is to create a virtual avatar for HPE demo
centers that can interact with customers and provide relevant
information about HPE solutions and corporate information.
Software architecture and system integrations

99
Data Monsters Elite AI Professional Service

of experience in the data science


and engineering market

comprising 70+ engineers


and 11 Ph.D. holders

including those for


Fortune 500 companies

101
Broad Use Cases
Customer service Sales Customer experience PR Trade shows
Problem Problem Problem Problem Problem
• Limited employee • Inactive sales reps • Difficult to design and scale • Difficulty attracting • Difficulty attracting
knowledge • Failure to follow processes • Poor customer experience attention attention
• Time-consuming • Limited working hours • Low upsell/cross-sell • Weak company positioning • Sales reps get tired/
• Limited working hours • Slow access to • Clients choosing walk away.
customer data competitors • Limited knowledge
of product
Solution Solution Solution Solution Solution
• Avatars with corporate • 24/7, always-proactive • Avatars with a centralized • New interactive technology • Avatars attract attention.
knowledge avatars control panel attracts attention. • Access to corporate
• Single point of contact • Following processes • Developing and scaling • Positions a company knowledge
• 24/7 availability • Immediate access to customer experience as innovative • 24/7 info at your booth
customer and corporate
data

Target industries
• Financial services: Including banks, insurance, and brokerage firms
• Telecommunications
• Retail
• Hospitality and food service
• Transportation: Airports and train stations
• Real estate
• Business centers
• Healthcare: Hospitals and clinics
• Marketing agencies

102
Key Software Modules

• HPE MLDM used for the private data


Training Loop logging and data audit
HPE ML Data Management HPE ML Development
MLDM (Pachyderm)—data Environment MLDE • NVIDIA NeMo™ can be used for AI model
sets logging (Determined AI)—distributed
training.
learning—(future phase)
NVIDIA
NeMo • NVIDIA Riva ASR/TTS used for speech
processing

• NVIDIA ACE used for digital avatar


HPE Private creation
docs

LangChain • NVIDIA Omniverse™ Audio2Text used for


NVIDIA
NVIDIA NVIDIA LLM + NVIDIA avatar visualization process (picture
Omniverse
Riva ASR ACE Riva TTS rendering and lip-synching)
Audio2Text
orchestration

• LangChain used for LLM orchestration

• Qdrant used for vector database


HPE Demo Center guest

103
Solution Architecture Data extraction layer

Data processing Embeddings


HPE MLDM

Private docs Qdrant


Mobile remote repository vector DB
control
CV
personalization
Dialog LLM module
management
Botmaker
NVIDIA
Riva ASR Yes

Guardrails + role-based access


RAG LLM
NVIDIA Event content Is it a
NVIDIA ACE
Omniverse update scenario?
LLM tests and
NVIDIA monitoring
Riva STT NVIDIA Monitoring
No LangChain Triton™ W&B
LLAMA2
Custom voice
Tests
PromptBench
Enterprise
lexicons tuning

Automated deployment scripts

HPE servers, storage, networking

104
On-Premises Deployment: 5x Faster

Hardware setup SaaS-based commercial Public cloud HPE on-premises


LLM version T4 and V100 GPUs A10 and A100 servers

Response time 10-second delay 5-second delay 1-second delay

NVIDIA ACE &


LLM server
NVIDIA Riva server
1x HPE ProLiant 1x HPE ProLiant
DL380 server with DL380 server
3x NVIDIA A10 GPUs 2x NVIDIA A100 GPUs

Lower deployment cost vs public cloud with dedicated CPU+GPU instances

105
“One-Stop Shop” for NVIDIA AI Workloads with HPE

Build Optimize Integrate

Operationalizes existing Integrates analytics with


deployments/platforms existing applications Architect solution
Curates design and
with desired partners
implementation
for vertical use cases
AI models review and
identify improvements
Develops and deploys Develop, train, needed for production Integrate solutions Apply reference
your AI solution and operationalize readiness. with your partners. architectures and AI
end to end the AI models for frameworks to provide
customer use cases. solution integrity.

HPE
(Services, software, infrastructure, HPE GreenLake cloud)

106
Thank you
Ka Wai Leung: [email protected]
Dan Lesovodski: [email protected]

© 2024 Hewlett Packard Enterprise Development LP


THANK YOU!

You might also like