paddle 移动端部署_带有移动部署的基本语音认证系统

最新推荐文章于 2024-10-08 14:42:13 发布

weixin_26707115

最新推荐文章于 2024-10-08 14:42:13 发布

阅读量486

点赞数

CC 4.0 BY-SA版权

文章标签：语音识别

原文链接：https://medium.com/towards-artificial-intelligence/a-rudimentary-voice-authentication-system-with-mobile-deployment-1d41f5baa319

本文介绍了一种基于移动端的语音认证系统，该系统通过发言人的语音配置文件进行身份验证，特别适用于Covid-19时代，当面部识别因佩戴口罩而变得不实用时。系统包括移动应用程序、语音认证服务器和深度学习模型，详细描述了用户注册、身份验证流程及设计决策。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

paddle 移动端部署

机器学习，编程 (Machine Learning, Programming)

For the group project component of my Android development course in university, our team built and deployed an authentication system that authenticates via a speaker’s voice profile.

对于我在大学的Android开发课程中的小组项目部分，我们的团队构建并部署了一个身份验证系统，该系统可以通过发言人的语音配置文件进行身份验证。

With face masks now being the norm amidst this Covid-19 season, an authentication system relying on a person’s voice profile might be more useful than systems relying on facial recognition.

如今，在Covid-19赛季中，口罩已成为一种常态，依赖人的语音配置文件的身份验证系统可能比依赖面部识别的系统更有用。

Image for post — Overcoming facial recognition systems by covering half my face (Photo by Arisa Chattasa on Unsplash)

In this short article, I will describe the different parts of the voice authentication system and some design choices we made along the way.

在这篇简短的文章中，我将描述语音认证系统的不同部分以及我们在此过程中所做的一些设计选择。

Here is an overview of the article:

这是文章的概述：

Voice-Auth Service Overview
语音验证服务概述
User Registration Overview
用户注册概述
User Authentication Overview
用户认证概述
Challenges and Design Decisions
挑战与设计决策
Demo Video
示范影片

Most of the details will be about the high-level architecture and mobile app deployment.

大多数细节将涉及高级架构和移动应用程序部署。

Details about the Deep Learning model can be found in my other article here.

关于深度学习模型的详细信息可以在我的其他文章中找到。

语音验证服务概述 (Voice-Auth Service Overview)

The voice authentication system consists of a few main components:

语音认证系统由几个主要组件组成：

Mobile App / Client — A Mobile app that provides an authentication service. Think of this authentication service as something similar to the “password lock” or “pattern lock” service on your Android phone, except that the unlocking is done by speaking into the phone’s mic. This could theoretically be modified for use on top of any other mobile application needing an authentication function.

移动应用程序/客户端 -提供身份验证服务的移动应用程序。可以将此认证服务视为类似于Android手机上的“密码锁定”或“模式锁定”服务，只是通过对手机的麦克风讲话即可完成解锁。从理论上讲，可以将其修改为在需要身份验证功能的任何其他移动应用程序之上使用。

Voice Authentication Server — A web server that provides voice-based authentication. The web server hosts the Deep Learning (DL) model that gives the system its voice verification abilities. The DL model works by determining whether or not two input voice recordings are from the same person.

语音认证服务器 —一种Web服务器，提供基于语音的认证。 Web服务器托管深度学习(DL)模型，该模型为系统提供了语音验证功能。 DL模型通过确定两个输入的语音记录是否来自同一个人来工作。

Voice Authentication Deep Learning Model — As with many other classification problems nowadays, most problems with complex inputs (like a voice audio signal) are solved with Deep Learning. The Deep Learning (DL) model is trained offline and then deployed on to the web server, which means it can be re-trained and updated on to the webserver at any time. More details of the DL model can be found in my other article here.

语音认证深度学习模型 -与当今的许多其他分类问题一样，深度学习解决了大多数复杂输入(例如语音音频信号)的问题。深度学习(DL)模型是脱机训练的，然后部署到Web服务器上，这意味着可以随时对其进行重新训练并将其更新到Web服务器上。 DL模型的更多细节可以在我的其他文章中找到。

用户注册概述 (User Registration Overview)

As with all authentication services, a “password” for a given user needs to be registered with the system first.

与所有身份验证服务一样，给定用户的“密码”需要首先在系统中注册。

For our system, the user first registers a profile and then provides a voice sample to be used as a reference during authentication later on.

对于我们的系统，用户首先注册一个配置文件，然后提供语音样本，以便稍后在身份验证期间用作参考。

User profile registration (black)
用户个人资料注册(黑色)
User voice reference capture (red)
用户语音参考捕获(红色)

The user registers a new profile on the Android app, provides some basic personal information (username, etc.) and the profile is saved on to a Firebase database. The Android app then prompts the user to submit a voice sample (reference sample) which is saved on to Firebase Storage (file storage on Firebase).

用户在Android应用程序上注册新的配置文件，提供一些基本的个人信息(用户名等)，并且该配置文件保存到Firebase数据库中。然后，Android应用会提示用户提交语音样本(参考样本)，该样本将保存到Firebase存储(Firebase上的文件存储)上。

用户认证概述 (User Authentication Overview)

As with all authentication services, a “password” is provided during authentication and the service checks whether the given password matches the stored reference password previously set by the user.

与所有身份验证服务一样，在身份验证期间会提供“密码”，并且该服务会检查给定的密码是否与用户先前设置的存储参考密码匹配。

For our system, the user “logs in” to his registered profile and provides a live voice sample for authentication. The system compares this live voice sample against the previously provided reference voice sample and determines whether or not these two voice samples come from the same person.

对于我们的系统，用户“登录”到他的注册个人资料，并提供实时语音样本进行身份验证。系统将该实时语音样本与先前提供的参考语音样本进行比较，并确定这两个语音样本是否来自同一个人。

User profile and voice reference retrieval (black)
用户资料和语音参考检索(黑色)
User live voice capture and authentication (red)
用户实时语音捕获和认证(红色)

The user “logs in” to his registered profile by providing his username on the Android app, and the app checks for the existence of the user on the Firebase database. The profile’s reference voice sample is then downloaded from Firebase storage, and the user is prompted to provide a live voice sample. The Android app then passes both the reference and the live voice samples to the webserver where the DL model compares these two voice samples and determines whether or not they came from the same person. The positive or negative result from the DL model is then returned to the Android app.

用户通过在Android应用程序上提供用户名来“登录”其注册个人资料，然后该应用程序在Firebase数据库上检查用户是否存在。然后，从Firebase存储下载配置文件的参考语音样本，并提示用户提供实时语音样本。然后，Android应用程序会将参考语音样本和实时语音样本都传递到Web服务器，在网络服务器上，DL模型会比较这两个语音样本，并确定它们是否来自同一个人。 DL模型的肯定或否定结果然后返回到Android应用。

挑战与设计决策 (Challenges and Design Decisions)

No software engineering project is free from challenges and compromises are always made to balance out various objectives.

任何软件工程项目都无法摆脱挑战，总会做出折衷以平衡各种目标。

在TensorFlow Lite上选择PyTorch (Choice of PyTorch over TensorFlow Lite)

In the initial stages of the project, I actually started out building the DL model in Keras (TensorFlow). We soon uncovered the difficulty in deploying our TensorFlow Lite model in the Android environment. All the tutorials we saw online seemed to use the pre-trained TensorFlow Lite models provided by Google and we did not see any tutorials deploying a custom-built model. I also feared the dreaded situation of getting stuck due to unavailable opcodes.

在项目的初始阶段，我实际上是从在Keras(TensorFlow)中构建DL模型开始的。我们很快发现了在Android环境中部署TensorFlow Lite模型的困难。我们在网上看到的所有教程似乎都使用了Google提供的经过预先训练的TensorFlow Lite模型，而且我们没有看到任何部署自定义模型的教程。我还担心由于操作码不可用而陷入困境的可怕情况。

PyTorch, on the other hand, showed how to trace a given model right off the bat on their website. Granted that tracing has some limitations, it will work when data flow in the model is simple (in the sense that there is no control flow) and if you stick to PyTorch tensors and modules.

另一方面，PyTorch在其网站上展示了如何立即追踪给定模型。鉴于跟踪具有某些限制，因此当模型中的数据流很简单时(从某种意义上说，没有控制流)，并且如果您坚持使用PyTorch张量和模块，它将可以使用。

I focused on conceptualizing the high-level architecture of my DL model and quickly trained one to test it on an Android environment. The fact that the basic model worked in the Android environment gave me the confidence to proceed with investing more time and mental energy to improve the model performance (while adhering to the high-level architecture).

我专注于概念化DL模型的高级体系结构，并快速培训了一个模型来在Android环境中对其进行测试。基本模型可以在Android环境中工作的事实使我有信心继续投入更多的时间和精力，以提高模型的性能(同时坚持使用高级架构)。

As long as the trained model could run on Android, I could focus on the following as the impact on the PyTorch scripting process was minimal (or none at all):

只要受过训练的模型可以在Android上运行，由于对PyTorch脚本编写过程的影响很小(或根本没有)，因此我可以重点关注以下内容：

Playing with the learning rate
发挥学习率
Stacking more layers in the classifier
在分类器中堆叠更多层
Playing with the activation functions
玩激活功能
Tweaking the data sampling method
调整数据采样方法
Using different base models for my encoder (transfer learning)
对我的编码器使用不同的基本模型(转移学习)
etc.
等等

为什么要使用Web服务？ (Why a Web Service?)

In our original design, the team wanted to build a fully native Android app to perform voice authentication.

在我们的原始设计中，团队希望构建一个完全本机的Android应用程序来执行语音身份验证。

Lack of Audio Signal Facilities in Android — Android can handle, read from, and play a multitude of media files and file formats. Android can store media input from the phone into a variety of file formats as well.

Android中缺少音频信号设施 -Android可以处理，读取和播放多种媒体文件和文件格式。 Android还可将手机输入的媒体输入存储为多种文件格式。

The one critical thing that I needed, which Android did not provide, was to convert an audio file into an audio signal or byte stream. It didn’t help that the Javax Sound audio processing library was not available in the Android Java subset.

Android所没有提供的我需要的关键一件事是将音频文件转换为音频信号或字节流。 Javax音频处理库在Android Java子集中不可用没有帮助。

After scrolling through endless websites on how to parse .wav files and how to manage sampling rates, we decided that this was not worth our time with the project deadline looming.

在无休止的网站上浏览了有关如何解析.wav文件以及如何管理采样率之后，我们认为由于项目截止日期迫在眉睫，这不值得我们花时间。

Lack of Signal Processing Libraries in Java — While building the data preprocessing pipeline for the DL model, I relied heavily on the Python LibROSA (Librosa) library. Librosa automatically handles many audio processing tasks, like automatic downsampling or upsampling to the target frequency (critical as the DL model analyzed the audio spectrogram) and the creation of the melfilterbanks and the melspectrograms.

Java中缺乏信号处理库 —在为DL模型构建数据预处理管道时，我严重依赖于Python LibROSA(Librosa)库。 Librosa自动处理许多音频处理任务，例如对目标频率进行自动下采样或上采样(对于DL模型分析音频频谱图至关重要)，并创建melfilterbank和melspectrogram。

We wanted to use the Chaquopy library to automatically convert our Python code which used the Librosa into a Java-compatible format, but the Librosa library was not properly supported by Chaquopy (Numpy is supported, but I think SciPy is not fully supported).

我们想使用Chaquopy库自动将使用Librosa的Python代码转换为与Java兼容的格式，但是Chaquopy未正确支持Librosa库(支持Numpy，但我认为不完全支持SciPy)。

While we did find Github libraries that have manually recreated “Librosa-like” functions in pure Java, the lack of good signal processing libraries in Java still forced us to manually handle the signal processing steps.

尽管我们确实发现Github库已经在纯Java中手动重新创建了“类似Librosa”的函数，但是Java中缺乏良好的信号处理库仍然迫使我们手动处理信号处理步骤。

Web Service in Python — Ultimately, we abandoned the plan to deploy our model in the Android environment altogether.

Python中的Web服务 —最终，我们放弃了将模型完全部署在Android环境中的计划。

Instead, we decided to change our approach and hosted the DL model on a web server instead that was powered by Flask. Since we could work on a Python environment, wrapping the DL model into a web service was very straightforward and we focused on making our Android phone interface with this web service instead. Managing files on Firebase storage and on the local Android file storage is another challenge in and of itself, but it is a more manageable one.

相反，我们决定更改方法，并将DL模型托管在由Flask支持的Web服务器上。由于我们可以在Python环境中工作，因此将DL模型包装到Web服务中非常简单，我们专注于使用此Web服务制作Android手机界面。在Firebase存储和本地Android文件存储上管理文件本身就是另一个挑战，但这是一个更易于管理的挑战。

Because of these challenges, we were forced to decouple our voice authentication service and our Android authentication app, resulting in this architecture.

由于这些挑战，我们被迫将语音认证服务与Android认证应用程序分离，从而形成了这种架构。

型号尺寸限制 (Model Size Limitations)

The DL model was trained and hosted on my local machine which has a GPU with 3GB of VRAM. While this was enough to get the model trained and hosted for prediction, the size of the base model that I could use was limited.

DL模型经过培训，并托管在我的本地计算机上，该计算机具有3GB VRAM的GPU。尽管这足以使模型得到训练并托管用于预测，但是我可以使用的基本模型的大小是有限的。

Granted that we initially wanted to deploy the voice authentication DL model on to the mobile phone, we started out with the most compact image model MobileNetV2, a model created by Google intended for use in resource-limited environments.

鉴于我们最初希望将语音认证DL模型部署到手机上，因此我们从最紧凑的图像模型MobileNetV2开始，这是Google创建的模型，旨在用于资源受限的环境。

When we decided to host the DL model as a web service instead, I changed the base model to DenseNet121, the largest one that could fit on my (small) GPU. The more powerful base model improved the classification performance of the voice authentication DL model significantly, but I was ultimately limited by the size of the GPU VRAM. Even larger models like ResNet or ResNeXt could not be used, unfortunately.

当我们决定将DL模型作为Web服务托管时，我将基本模型更改为DenseNet121，这是可以在我的(小型)GPU上安装的最大模型。功能更强大的基本模型显着提高了语音认证DL模型的分类性能，但最终我受到GPU VRAM大小的限制。不幸的是，甚至无法使用更大的模型，例如ResNet或ResNeXt。

示范影片 (Demo Video)

Here is a live demo that we recorded (pardon our Singaporean accents😄). Enjoy!

这是我们录制的现场演示(请原谅我们的新加坡口音😄)。请享用！

Credits to my team who put in an incredible amount of work and made the seemingly impossible possible: Ng Qing Hui, Gabriel Sim, He Yicheng

归功于我的团队，他们付出了令人难以置信的工作量，并使看似不可能的事情变成现实：吴庆辉，加百利·辛，何一成

翻译自: https://medium.com/towards-artificial-intelligence/a-rudimentary-voice-authentication-system-with-mobile-deployment-1d41f5baa319

paddle 移动端部署