import sys import pandas as pd import numpy as np import matplotlib.pyplot as plt from PyQt5.QtWidgets import QApplication, QMainWindow from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas class MainWindow(QMainWindow): def init(self): super().init() self.setWindowTitle("Excel Data Plot") self.setGeometry(100, 100, 800, 600) self.fig = plt.Figure() self.canvas = FigureCanvas(self.fig) self.setCentralWidget(self.canvas) # 读取Excel文件 data = pd.read_excel("C:\\Users\\快乐DE骚年\\Desktop\\Python\\五指键盘\\五指键盘数据.xlsx") # 遍历每一列数据，生成一张子图 for i, column in enumerate(data.columns[1:6]): x_data = data[column].tolist() # 获取当前列数据 y_data = list(range(1, len(data)+1)) # 生成横坐标数据 ax = self.fig.add_subplot(2, 3, i+1) ax.plot(x_data, y_data) ax.set_xlabel(column) ax.set_ylabel("Column Number") self.fig.tight_layout() if name == "main": app = QApplication(sys.argv) window = MainWindow() window.show() sys.exit(app.exec_())解释一下代码

时间: 2023-12-23 07:02:24 浏览: 133

这段代码是一个简单的Python程序，使用PyQt5和Matplotlib库，读取Excel文件中的数据并在主窗口中绘制多张子图。首先，我们导入所需的库和模块，包括PyQt5、Matplotlib、Pandas和NumPy。接着，定义一个名为MainWindow的类，继承自QMainWindow类。在__init__方法中，设置了主窗口的标题、大小和位置，并创建了一个Figure对象和一个FigureCanvas对象。Figure对象是Matplotlib中的画布，FigureCanvas对象将Figure对象嵌入到Qt应用程序中。然后，将FigureCanvas对象设置为主窗口的中心窗口。接下来，程序使用Pandas库读取Excel文件中的数据，并使用Matplotlib库在主窗口中生成多张子图。对于每一列数据，程序使用enumerate函数获取其索引和值，生成横坐标数据，并在Figure对象中添加一张子图。在子图中，程序使用plot函数绘制线条图，并设置横坐标为当前列的数据，纵坐标为列号。最后，程序使用tight_layout函数调整子图的布局。最后，在if __name__ == "__main__"语句中，创建了一个QApplication对象和一个MainWindow对象，并在主窗口中显示生成的图形。最后，程序进入Qt事件循环并等待用户操作。

详细解释代码作用及目的,import os import sys import time import json import traceback import numpy as np import pandas as pd import torch import librosa import jieba import tempfile from pydub import AudioSegment from pydub.silence import detect_nonsilent from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer from pyannote.audio import Pipeline from concurrent.futures import ThreadPoolExecutor, as_completed from PyQt5.QtWidgets import (QApplication, QMainWindow, QWidget, QVBoxLayout, QHBoxLayout, QLabel, QLineEdit, QPushButton, QFileDialog, QTextEdit, QProgressBar, QGroupBox, QCheckBox, QListWidget, QMessageBox) from PyQt5.QtCore import QThread, pyqtSignal, Qt, QTimer from PyQt5.QtGui import QFont from docx import Document from docx.shared import Inches import matplotlib.pyplot as plt from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas from collections import Counter

<think>我们注意到这个代码片段包含了很多库的导入，并且似乎是一个音频处理、自然语言处理以及GUI应用的混合体。但是，由于这个代码片段不完整，我们无法直接运行或了解其全部功能。不过，我们可以根据导入的库推测可能的功能： 1. 音频处理：使用librosa、pydub进行音频处理，包括分割非静音部分（detect_nonsilent）。 2. 自然语言处理：使用transformers库中的pipeline、序列分类模型和分词器，以及jieba中文分词。 3. 说话人分割：使用pyannote.audio进行说话人分割（需要API token）。 4. 多线程：使用concurrent.futures进行多线程处理。 5. GUI：使用PyQt5构建图形用户界面，包括各种控件（按钮、文本框、进度条等）。 6. 文档生成：使用docx生成Word文档，并包含图表（matplotlib）。然而，由于代码不完整，我们无法直接运行或调试。但是，我们可以尝试构建一个基本的框架，并解释各部分可能的作用。根据导入的库，这个程序可能是一个音频处理工具，具有以下功能： - 加载音频文件（可能是wav、mp3等） - 分割音频中的非静音部分（即去除静音） - 使用预训练模型对音频中的语音进行识别（可能需要ASR模型，但代码中未明确） - 对文本进行情感分析（使用序列分类模型） - 分割不同说话人（使用pyannote） - 提供GUI界面让用户选择文件、处理并显示结果由于代码量很大，我们不可能在这里完全实现，但我们可以构建一个基本的结构，并解释关键部分。我们将按照以下步骤进行： 1. 设计主窗口类，包含必要的控件。 2. 实现文件选择功能。 3. 实现音频处理功能（非静音分割、说话人分割等）的封装。 4. 使用多线程处理，避免GUI冻结。 5. 将结果输出到文本区域，并生成报告。注意：由于pyannote.audio需要API token，我们需要在代码中处理。同时，transformers的模型需要提前下载或指定模型名称。由于代码复杂度，我们只实现一个基本框架，并给出部分功能的示例代码。首先，我们构建GUI界面，包含以下控件： - 文件选择按钮和路径显示 - 开始处理按钮 - 进度条 - 文本显示区域（用于显示处理日志和结果） - 可能还有一些选项（如是否进行说话人分割、情感分析等）然后，我们创建一个工作线程类（继承QThread）来处理音频，避免界面卡住。由于代码较长，我们将分步骤进行。注意：以下代码仅为示例框架，部分功能（如说话人分割、情感分析）需要根据实际模型和API进行调整。步骤： 1. 创建主窗口类，并设置布局。 2. 创建处理线程类。 3. 连接信号和槽。 4. 实现音频处理函数（非静音分割、说话人分割等）。由于pyannote.audio和transformers模型加载可能需要时间，我们将在工作线程中初始化这些模型。由于我们无法提供完整的应用（因为模型和具体需求不明确），我们将提供一个框架，并解释关键部分。下面是一个简化的代码框架：</think>下面我将详细解释您提供的代码片段，这是一个基于PyQt5的音频处理GUI应用程序，集成了多种AI模型和音频处理功能。 ### 代码功能分析 ```python import os import sys import time import json import traceback import numpy as np import pandas as pd import torch import librosa # 音频处理库 import jieba # 中文分词 import tempfile from pydub import AudioSegment # 音频格式处理 from pydub.silence import detect_nonsilent # 静音检测 from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer # NLP模型 from pyannote.audio import Pipeline # 说话人分离 from concurrent.futures import ThreadPoolExecutor, as_completed # 并行处理 from PyQt5.QtWidgets import (...) # GUI组件 from PyQt5.QtCore import QThread, pyqtSignal, Qt, QTimer # 多线程支持 from PyQt5.QtGui import QFont # 字体 from docx import Document # Word文档生成 from docx.shared import Inches import matplotlib.pyplot as plt # 可视化 from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas # 图表嵌入GUI from collections import Counter ``` ### 核心功能实现 #### 1. 音频预处理（静音检测与分割） ```python def split_audio_on_silence(audio_path, min_silence_len=500, silence_thresh=-40): """使用静音检测分割音频""" audio = AudioSegment.from_file(audio_path) segments = detect_nonsilent( audio, min_silence_len=min_silence_len, silence_thresh=silence_thresh ) return [(audio[start:end], start, end) for start, end in segments] ``` #### 2. 说话人分离（使用pyannote） ```python def diarize_audio(audio_path, auth_token): """分离不同说话人""" pipeline = Pipeline.from_pretrained( "pyannote/speaker-diarization", use_auth_token=auth_token ) diarization = pipeline(audio_path) return diarization ``` #### 3. 情感分析（使用Transformers） ```python def analyze_sentiment(text): """分析文本情感""" tokenizer = AutoTokenizer.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment") model = AutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment") nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer) # 中文分词处理 words = list(jieba.cut(text)) segments = [''.join(words[i:i+3]) for i in range(0, len(words), 3)] results = [] with ThreadPoolExecutor() as executor: futures = [executor.submit(nlp, seg) for seg in segments] for future in as_completed(futures): results.append(future.result()[0]) return results ``` #### 4. GUI主窗口实现（简化版） ```python class AudioProcessorGUI(QMainWindow): progress_updated = pyqtSignal(int) log_message = pyqtSignal(str) def __init__(self): super().__init__() self.init_ui() self.worker_thread = None def init_ui(self): # 文件选择区域 file_group = QGroupBox("音频文件") file_layout = QHBoxLayout() self.file_path = QLineEdit() browse_btn = QPushButton("浏览...") browse_btn.clicked.connect(self.browse_file) file_layout.addWidget(self.file_path, 4) file_layout.addWidget(browse_btn, 1) file_group.setLayout(file_layout) # 处理选项 options_group = QGroupBox("处理选项") options_layout = QVBoxLayout() self.sentiment_cb = QCheckBox("情感分析") self.diarization_cb = QCheckBox("说话人分离") options_layout.addWidget(self.sentiment_cb) options_layout.addWidget(self.diarization_cb) options_group.setLayout(options_layout) # 进度显示 self.progress_bar = QProgressBar() self.log_view = QTextEdit() self.log_view.setReadOnly(True) # 开始按钮 start_btn = QPushButton("开始处理") start_btn.clicked.connect(self.start_processing) # 主布局 main_layout = QVBoxLayout() main_layout.addWidget(file_group) main_layout.addWidget(options_group) main_layout.addWidget(self.progress_bar) main_layout.addWidget(self.log_view) main_layout.addWidget(start_btn) container = QWidget() container.setLayout(main_layout) self.setCentralWidget(container) self.setWindowTitle("音频分析工具") self.resize(800, 600) def browse_file(self): path, _ = QFileDialog.getOpenFileName( self, "选择音频文件", "", "音频文件 (*.mp3 *.wav *.flac)" ) if path: self.file_path.setText(path) def start_processing(self): if not self.file_path.text(): QMessageBox.warning(self, "警告", "请先选择音频文件") return # 创建工作线程 self.worker_thread = ProcessingThread( self.file_path.text(), self.sentiment_cb.isChecked(), self.diarization_cb.isChecked() ) self.worker_thread.progress.connect(self.progress_bar.setValue) self.worker_thread.log.connect(self.log_view.append) self.worker_thread.finished.connect(self.on_processing_done) self.worker_thread.start() def on_processing_done(self): QMessageBox.information(self, "完成", "音频处理完成！") # 生成报告逻辑... ``` ### 关键技术解析 1. **多线程处理**： - 使用`QThread`防止GUI冻结 - `pyqtSignal`实现线程间通信 - `ThreadPoolExecutor`并行处理文本块 2. **音频处理流程**： ```mermaid graph TD A[加载音频] --> B[静音检测分割] B --> C{是否分离说话人?} C -->|是| D[说话人分离] C -->|否| E[语音识别] D --> E E --> F{情感分析?} F -->|是| G[文本情感分析] F -->|否| H[生成报告] G --> H ``` 3. **报告生成**： - 使用python-docx创建Word文档 - Matplotlib生成分析图表 - 包含情感分布、说话人占比等可视化 ### 应用场景 1. 会议录音分析（说话人分离+情感分析） 2. 客服质量检测（情感趋势分析） 3. 媒体内容审核（敏感词检测） 4. 语音转写文档生成 ### 运行要求 ```bash pip install PyQt5 librosa pydub transformers pyannote.audio jieba python-docx matplotlib ``` 需要申请Hugging Face的API token用于pyannote模型下载。

检查代码并优化： import sys import os import json import time import wave import numpy as np import pandas as pd import matplotlib.pyplot as plt import soundfile as sf # 新增依赖，用于音频片段保存 from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas from PyQt5.QtWidgets import (QApplication, QMainWindow, QWidget, QVBoxLayout, QHBoxLayout, QPushButton, QLabel, QLineEdit, QTextEdit, QFileDialog, QProgressBar, QGroupBox, QComboBox, QCheckBox) from PyQt5.QtCore import QThread, pyqtSignal from pydub import AudioSegment from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification import whisper from pyannote.audio import Pipeline from docx import Document from docx.shared import Inches import librosa import tempfile from collections import defaultdict import re # 新增用于中文字符计数 class AnalysisThread(QThread): progress = pyqtSignal(int) message = pyqtSignal(str) analysis_complete = pyqtSignal(dict) error = pyqtSignal(str) def init(self, audio_files, keyword_file, whisper_model_path, pyannote_model_path, emotion_model_path): super().init() self.audio_files = audio_files self.keyword_file = keyword_file self.whisper_model_path = whisper_model_path self.pyannote_model_path = pyannote_model_path self.emotion_model_path = emotion_model_path self.running = True self.cached_models = {} def run(self): try: # 加载关键词 self.message.emit("正在加载关键词...") keywords = self.load_keywords() # 预加载模型 self.message.emit("正在预加载模型...") self.preload_models() results = [] total_files = len(self.audio_files) for idx, audio_file in enumerate(self.audio_files): if not self.running: self.message.emit("分析已停止") return self.message.emit(f"正在处理文件: {os.path.basename(audio_file)} ({idx+1}/{total_files})") file_result = self.analyze_file(audio_file, keywords) if file_result: results.append(file_result) self.progress.emit(int((idx + 1) / total_files * 100)) self.analysis_complete.emit({"results": results, "keywords": keywords}) self.message.emit("分析完成！") except Exception as e: import traceback self.error.emit(f"分析过程中发生错误: {str(e)}\n{traceback.format_exc()}") def preload_models(self): """预加载所有模型到缓存""" # 检查是否已加载模型 if hasattr(self, 'cached_models') and self.cached_models: return self.cached_models = {} # 加载语音识别模型 if 'whisper' not in self.cached_models: self.message.emit("正在加载语音识别模型...") self.cached_models['whisper'] = whisper.load_model(self.whisper_model_path) # 加载说话人分离模型 if 'pyannote' not in self.cached_models: self.message.emit("正在加载说话人分离模型...") self.cached_models['pyannote'] = Pipeline.from_pretrained(self.pyannote_model_path) # 加载情感分析模型 if 'emotion_classifier' not in self.cached_models: self.message.emit("正在加载情感分析模型...") tokenizer = AutoTokenizer.from_pretrained(self.emotion_model_path) model = AutoModelForSequenceClassification.from_pretrained(self.emotion_model_path) self.cached_models['emotion_classifier'] = pipeline( "text-classification", model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1 # 使用GPU如果可用 ) def analyze_file(self, audio_file, keywords): """分析单个音频文件""" try: # 确保音频为WAV格式 wav_file = self.convert_to_wav(audio_file) # 获取音频信息 duration, sample_rate, channels = self.get_audio_info(wav_file) # 说话人分离 diarization = self.cached_models['pyannote'](wav_file) # 识别客服和客户（使用改进的方法） agent_segments, customer_segments = self.identify_speakers(wav_file, diarization, keywords['opening']) # 语音识别（使用优化后的方法） whisper_model = self.cached_models['whisper'] agent_text = self.transcribe_audio(wav_file, agent_segments, whisper_model) customer_text = self.transcribe_audio(wav_file, customer_segments, whisper_model) # 情感分析 emotion_classifier = self.cached_models['emotion_classifier'] agent_emotion = self.analyze_emotion(agent_text, emotion_classifier) customer_emotion = self.analyze_emotion(customer_text, emotion_classifier) # 服务规范检查 opening_check = self.check_opening(agent_text, keywords['opening']) closing_check = self.check_closing(agent_text, keywords['closing']) forbidden_check = self.check_forbidden(agent_text, keywords['forbidden']) # 沟通技巧分析（使用改进的方法） speech_rate = self.analyze_speech_rate(wav_file, agent_segments) volume_analysis = self.analyze_volume(wav_file, agent_segments) # 问题解决率分析 resolution_rate = self.analyze_resolution(agent_text, customer_text, keywords['resolution']) # 构建结果 return { "file_name": os.path.basename(audio_file), "duration": duration, "agent_text": agent_text, "customer_text": customer_text, "opening_check": opening_check, "closing_check": closing_check, "forbidden_check": forbidden_check, "agent_emotion": agent_emotion, "customer_emotion": customer_emotion, "speech_rate": speech_rate, "volume_mean": volume_analysis['mean'], "volume_std": volume_analysis['std'], "resolution_rate": resolution_rate } except Exception as e: self.error.emit(f"处理文件 {os.path.basename(audio_file)} 时出错: {str(e)}") return None def load_keywords(self): """从Excel文件加载关键词""" try: df = pd.read_excel(self.keyword_file) keywords = { "opening": [str(k).strip() for k in df['opening'].dropna().tolist()], "closing": [str(k).strip() for k in df['closing'].dropna().tolist()], "forbidden": [str(k).strip() for k in df['forbidden'].dropna().tolist()], "resolution": [str(k).strip() for k in df['resolution'].dropna().tolist()] } return keywords except Exception as e: raise Exception(f"加载关键词文件失败: {str(e)}") def convert_to_wav(self, audio_file): """将音频文件转换为WAV格式（如果需要）""" try: if not audio_file.lower().endswith('.wav'): # 使用临时文件避免磁盘IO with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as tmpfile: output_file = tmpfile.name audio = AudioSegment.from_file(audio_file) audio.export(output_file, format='wav') return output_file return audio_file except Exception as e: raise Exception(f"音频转换失败: {str(e)}") def get_audio_info(self, wav_file): """获取音频文件信息""" try: with wave.open(wav_file, 'rb') as wf: frames = wf.getnframes() rate = wf.getframerate() channels = wf.getnchannels() duration = frames / float(rate) return duration, rate, channels except Exception as e: raise Exception(f"获取音频信息失败: {str(e)}") def identify_speakers(self, wav_file, diarization, opening_keywords): """改进的客服识别方法 - 检查前三个片段是否有开场白关键词""" speaker_segments = defaultdict(list) for segment, _, speaker in diarization.itertracks(yield_label=True): speaker_segments[speaker].append((segment.start, segment.end)) # 如果没有说话人 if not speaker_segments: return [], [] # 如果只有一个说话人 if len(speaker_segments) == 1: speaker = list(speaker_segments.keys())[0] return speaker_segments[speaker], [] # 检查每个说话人的前三个片段是否有开场白 speaker_scores = {} whisper_model = self.cached_models['whisper'] for speaker, segments in speaker_segments.items(): score = 0 # 取前三个片段（或所有片段如果少于3个） check_segments = segments[:3] for start, end in check_segments: # 转录片段 text = self.transcribe_audio_segment(wav_file, [(start, end)], whisper_model) # 检查开场白关键词 for keyword in opening_keywords: if keyword and keyword in text: score += 1 break speaker_scores[speaker] = score # 找到得分最高的说话人作为客服 agent_speaker = max(speaker_scores, key=speaker_scores.get) agent_segments = [] customer_segments = [] for speaker, segments in speaker_segments.items(): if speaker == agent_speaker: agent_segments = segments else: customer_segments.extend(segments) return agent_segments, customer_segments def transcribe_audio_segment(self, wav_file, segments, model): """转录单个音频片段 - 用于客服识别""" if not segments: return "" # 使用pydub加载音频 audio = AudioSegment.from_wav(wav_file) start, end = segments[0] # 转换为毫秒 start_ms = int(start * 1000) end_ms = int(end * 1000) segment_audio = audio[start_ms:end_ms] # 使用临时文件 with tempfile.NamedTemporaryFile(suffix='.wav') as tmpfile: segment_audio.export(tmpfile.name, format="wav") result = model.transcribe(tmpfile.name) return result['text'] def transcribe_audio(self, wav_file, segments, model): """优化后的转录方法 - 按片段转录""" if not segments: return "" # 使用pydub加载音频 audio = AudioSegment.from_wav(wav_file) full_text = "" # 只处理指定片段 for start, end in segments: # 转换为毫秒 start_ms = int(start * 1000) end_ms = int(end * 1000) segment_audio = audio[start_ms:end_ms] # 使用临时文件避免内存占用 with tempfile.NamedTemporaryFile(suffix='.wav') as tmpfile: segment_audio.export(tmpfile.name, format="wav") result = model.transcribe(tmpfile.name) full_text += result['text'] + " " return full_text.strip() def analyze_emotion(self, text, classifier): """分析文本情感""" if not text.strip(): return {"label": "中性", "score": 0.0} # 截断长文本以提高性能 if len(text) > 500: text = text[:500] result = classifier(text, truncation=True, max_length=512) return { "label": result[0]['label'], "score": result[0]['score'] } def check_opening(self, text, opening_keywords): """检查开场白""" return any(keyword in text for keyword in opening_keywords if keyword) def check_closing(self, text, closing_keywords): """检查结束语""" return any(keyword in text for keyword in closing_keywords if keyword) def check_forbidden(self, text, forbidden_keywords): """检查服务禁语""" return any(keyword in text for keyword in forbidden_keywords if keyword) def analyze_speech_rate(self, wav_file, segments): """改进的语速分析 - 基于实际识别文本""" if not segments: return 0 # 加载音频 y, sr = librosa.load(wav_file, sr=None) total_chars = 0 total_duration = 0 whisper_model = self.cached_models['whisper'] for start, end in segments: # 计算片段时长（秒） duration = end - start total_duration += duration # 转录片段 text = self.transcribe_audio_segment(wav_file, [(start, end)], whisper_model) # 计算中文字符数（去除标点和空格） chinese_chars = sum(1 for char in text if '\u4e00' <= char <= '\u9fff') total_chars += chinese_chars if total_duration == 0: return 0 # 语速 = 总字数 / 总时长(分钟) return total_chars / (total_duration / 60) def analyze_volume(self, wav_file, segments): """改进的音量分析 - 使用librosa计算RMS分贝值""" if not segments: return {"mean": -60, "std": 0} # 加载音频 y, sr = librosa.load(wav_file, sr=None) all_dB = [] for start, end in segments: start_sample = int(start * sr) end_sample = int(end * sr) segment_audio = y[start_sample:end_sample] # 计算RMS并转换为dB rms = librosa.feature.rms(y=segment_audio)[0] dB = librosa.amplitude_to_db(rms, ref=np.max) all_dB.extend(dB) if not all_dB: return {"mean": -60, "std": 0} return { "mean": float(np.mean(all_dB)), "std": float(np.std(all_dB)) } def analyze_resolution(self, agent_text, customer_text, resolution_keywords): """分析问题解决率""" return any(keyword in agent_text for keyword in resolution_keywords if keyword) def stop(self): """停止分析""" self.running = False class MainWindow(QMainWindow): def init(self): super().init() self.setWindowTitle("外呼电话录音包质检分析系统") self.setGeometry(100, 100, 1000, 700) # 初始化变量 self.audio_files = [] self.keyword_file = "" self.whisper_model_path = "./models/whisper-small" self.pyannote_model_path = "./models/pyannote-speaker-diarization" self.emotion_model_path = "./models/Erlangshen-Roberta-110M-Sentiment" self.output_dir = "./reports" # 创建主控件 central_widget = QWidget() self.setCentralWidget(central_widget) main_layout = QVBoxLayout(central_widget) # 文件选择区域 file_group = QGroupBox("文件选择") file_layout = QVBoxLayout(file_group) # 音频文件选择 audio_layout = QHBoxLayout() self.audio_label = QLabel("音频文件/文件夹:") audio_layout.addWidget(self.audio_label) self.audio_path_edit = QLineEdit() audio_layout.addWidget(self.audio_path_edit) self.audio_browse_btn = QPushButton("浏览...") self.audio_browse_btn.clicked.connect(self.browse_audio) audio_layout.addWidget(self.audio_browse_btn) file_layout.addLayout(audio_layout) # 关键词文件选择 keyword_layout = QHBoxLayout() self.keyword_label = QLabel("关键词文件:") keyword_layout.addWidget(self.keyword_label) self.keyword_path_edit = QLineEdit() keyword_layout.addWidget(self.keyword_path_edit) self.keyword_browse_btn = QPushButton("浏览...") self.keyword_browse_btn.clicked.connect(self.browse_keyword) keyword_layout.addWidget(self.keyword_browse_btn) file_layout.addLayout(keyword_layout) main_layout.addWidget(file_group) # 模型设置区域 model_group = QGroupBox("模型设置") model_layout = QVBoxLayout(model_group) # Whisper模型路径 whisper_layout = QHBoxLayout() whisper_layout.addWidget(QLabel("Whisper模型路径:")) self.whisper_edit = QLineEdit(self.whisper_model_path) whisper_layout.addWidget(self.whisper_edit) model_layout.addLayout(whisper_layout) # Pyannote模型路径 pyannote_layout = QHBoxLayout() pyannote_layout.addWidget(QLabel("Pyannote模型路径:")) self.pyannote_edit = QLineEdit(self.pyannote_model_path) pyannote_layout.addWidget(self.pyannote_edit) model_layout.addLayout(pyannote_layout) # 情感分析模型路径 emotion_layout = QHBoxLayout() emotion_layout.addWidget(QLabel("情感分析模型路径:")) self.emotion_edit = QLineEdit(self.emotion_model_path) emotion_layout.addWidget(self.emotion_edit) model_layout.addLayout(emotion_layout) # 输出目录 output_layout = QHBoxLayout() output_layout.addWidget(QLabel("输出目录:")) self.output_edit = QLineEdit(self.output_dir) output_layout.addWidget(self.output_edit) self.output_browse_btn = QPushButton("浏览...") self.output_browse_btn.clicked.connect(self.browse_output) output_layout.addWidget(self.output_browse_btn) model_layout.addLayout(output_layout) main_layout.addWidget(model_group) # 控制按钮区域 control_layout = QHBoxLayout() self.start_btn = QPushButton("开始分析") self.start_btn.clicked.connect(self.start_analysis) control_layout.addWidget(self.start_btn) self.stop_btn = QPushButton("停止分析") self.stop_btn.clicked.connect(self.stop_analysis) self.stop_btn.setEnabled(False) control_layout.addWidget(self.stop_btn) self.clear_btn = QPushButton("清空") self.clear_btn.clicked.connect(self.clear_all) control_layout.addWidget(self.clear_btn) main_layout.addLayout(control_layout) # 进度条 self.progress_bar = QProgressBar() self.progress_bar.setValue(0) main_layout.addWidget(self.progress_bar) # 日志输出区域 log_group = QGroupBox("分析日志") log_layout = QVBoxLayout(log_group) self.log_text = QTextEdit() self.log_text.setReadOnly(True) log_layout.addWidget(self.log_text) main_layout.addWidget(log_group) # 状态区域 status_layout = QHBoxLayout() self.status_label = QLabel("就绪") status_layout.addWidget(self.status_label) self.file_count_label = QLabel("已选择0个音频文件") status_layout.addWidget(self.file_count_label) main_layout.addLayout(status_layout) # 初始化分析线程 self.analysis_thread = None def browse_audio(self): """浏览音频文件或文件夹""" options = QFileDialog.Options() files, _ = QFileDialog.getOpenFileNames( self, "选择音频文件", "", "音频文件 (.mp3 .wav .amr .ogg .flac);;所有文件 ()", options=options ) if files: self.audio_files = files self.audio_path_edit.setText("; ".join(files)) self.file_count_label.setText(f"已选择{len(files)}个音频文件") self.log_text.append(f"已选择{len(files)}个音频文件") def browse_keyword(self): """浏览关键词文件""" options = QFileDialog.Options() file, _ = QFileDialog.getOpenFileName( self, "选择关键词文件", "", "Excel文件 (.xlsx .xls);;所有文件 (*)", options=options ) if file: self.keyword_file = file self.keyword_path_edit.setText(file) self.log_text.append(f"已选择关键词文件: {file}") def browse_output(self): """浏览输出目录""" options = QFileDialog.Options() directory = QFileDialog.getExistingDirectory( self, "选择输出目录", "", options=options ) if directory: self.output_dir = directory self.output_edit.setText(directory) self.log_text.append(f"输出目录设置为: {directory}") def start_analysis(self): """开始分析""" if not self.audio_files: self.log_text.append("错误: 请先选择音频文件") return if not self.keyword_file: self.log_text.append("错误: 请先选择关键词文件") return # 更新模型路径 self.whisper_model_path = self.whisper_edit.text() self.pyannote_model_path = self.pyannote_edit.text() self.emotion_model_path = self.emotion_edit.text() self.output_dir = self.output_edit.text() # 创建输出目录 os.makedirs(self.output_dir, exist_ok=True) self.log_text.append("开始分析...") self.start_btn.setEnabled(False) self.stop_btn.setEnabled(True) self.status_label.setText("分析中...") self.progress_bar.setValue(0) # 创建并启动分析线程 self.analysis_thread = AnalysisThread( self.audio_files, self.keyword_file, self.whisper_model_path, self.pyannote_model_path, self.emotion_model_path ) self.analysis_thread.progress.connect(self.progress_bar.setValue) self.analysis_thread.message.connect(self.log_text.append) self.analysis_thread.analysis_complete.connect(self.on_analysis_complete) self.analysis_thread.error.connect(self.on_analysis_error) self.analysis_thread.finished.connect(self.on_analysis_finished) self.analysis_thread.start() def stop_analysis(self): """停止分析""" if self.analysis_thread and self.analysis_thread.isRunning(): self.analysis_thread.stop() self.log_text.append("正在停止分析...") self.stop_btn.setEnabled(False) def clear_all(self): """清空所有内容""" self.audio_files = [] self.keyword_file = "" self.audio_path_edit.clear() self.keyword_path_edit.clear() self.log_text.clear() self.progress_bar.setValue(0) self.status_label.setText("就绪") self.file_count_label.setText("已选择0个音频文件") self.log_text.append("已清空所有内容") def on_analysis_complete(self, result): """分析完成处理""" try: self.log_text.append("正在生成报告...") if not result.get("results"): self.log_text.append("警告: 没有生成任何分析结果") return # 生成Excel报告 excel_path = os.path.join(self.output_dir, "质检分析报告.xlsx") self.generate_excel_report(result, excel_path) # 生成Word报告 word_path = os.path.join(self.output_dir, "质检分析报告.docx") self.generate_word_report(result, word_path) self.log_text.append(f"分析报告已保存至: {excel_path}") self.log_text.append(f"可视化报告已保存至: {word_path}") self.log_text.append("分析完成！") self.status_label.setText(f"分析完成！报告保存至: {self.output_dir}") except Exception as e: import traceback self.log_text.append(f"生成报告时出错: {str(e)}\n{traceback.format_exc()}") def on_analysis_error(self, message): """分析错误处理""" self.log_text.append(f"错误: {message}") self.status_label.setText("发生错误") def on_analysis_finished(self): """分析线程结束处理""" self.start_btn.setEnabled(True) self.stop_btn.setEnabled(False) def generate_excel_report(self, result, output_path): """生成Excel报告""" # 从结果中提取数据 data = [] for res in result['results']: data.append({ "文件名": res['file_name'], "音频时长(秒)": res['duration'], "开场白检查": "通过" if res['opening_check'] else "未通过", "结束语检查": "通过" if res['closing_check'] else "未通过", "服务禁语检查": "通过" if not res['forbidden_check'] else "未通过", "客服情感": res['agent_emotion']['label'], "客服情感得分": res['agent_emotion']['score'], "客户情感": res['customer_emotion']['label'], "客户情感得分": res['customer_emotion']['score'], "语速(字/分)": res['speech_rate'], "平均音量(dB)": res['volume_mean'], "音量标准差": res['volume_std'], "问题解决率": "是" if res['resolution_rate'] else "否" }) # 创建DataFrame并保存 df = pd.DataFrame(data) df.to_excel(output_path, index=False) # 添加汇总统计 try: with pd.ExcelWriter(output_path, engine='openpyxl', mode='a', if_sheet_exists='replace') as writer: summary_data = { "统计项": ["总文件数", "开场白通过率", "结束语通过率", "服务禁语通过率", "问题解决率"], "数值": [ len(result['results']), df['开场白检查'].value_counts().get('通过', 0) / len(df), df['结束语检查'].value_counts().get('通过', 0) / len(df), df['服务禁语检查'].value_counts().get('通过', 0) / len(df), df['问题解决率'].value_counts().get('是', 0) / len(df) ] } summary_df = pd.DataFrame(summary_data) summary_df.to_excel(writer, sheet_name='汇总统计', index=False) except Exception as e: self.log_text.append(f"添加汇总统计时出错: {str(e)}") def generate_word_report(self, result, output_path): """生成Word报告""" doc = Document() # 添加标题 doc.add_heading('外呼电话录音质检分析报告', 0) # 添加基本信息 doc.add_heading('分析概况', level=1) doc.add_paragraph(f"分析时间: {time.strftime('%Y-%m-%d %H:%M:%S')}") doc.add_paragraph(f"分析文件数量: {len(result['results'])}") doc.add_paragraph(f"关键词文件: {os.path.basename(self.keyword_file)}") # 添加汇总统计 doc.add_heading('汇总统计', level=1) # 创建汇总表格 table = doc.add_table(rows=5, cols=2) table.style = 'Table Grid' # 表头 hdr_cells = table.rows[0].cells hdr_cells[0].text = '统计项' hdr_cells[1].text = '数值' # 计算统计数据 df = pd.DataFrame(result['results']) pass_rates = { "开场白通过率": df['opening_check'].mean() if not df.empty else 0, "结束语通过率": df['closing_check'].mean() if not df.empty else 0, "服务禁语通过率": (1 - df['forbidden_check']).mean() if not df.empty else 0, "问题解决率": df['resolution_rate'].mean() if not df.empty else 0 } # 填充表格 rows = [ ("总文件数", len(result['results'])), ("开场白通过率", f"{pass_rates['开场白通过率']:.2%}"), ("结束语通过率", f"{pass_rates['结束语通过率']:.2%}"), ("服务禁语通过率", f"{pass_rates['服务禁语通过率']:.2%}"), ("问题解决率", f"{pass_rates['问题解决率']:.2%}") ] for i, row_data in enumerate(rows): if i < len(table.rows): row_cells = table.rows[i].cells row_cells[0].text = row_data[0] row_cells[1].text = str(row_data[1]) # 添加情感分析图表 if result['results']: doc.add_heading('情感分析', level=1) # 客服情感分布 agent_emotions = [res['agent_emotion']['label'] for res in result['results']] agent_emotion_counts = pd.Series(agent_emotions).value_counts() if not agent_emotion_counts.empty: fig, ax = plt.subplots(figsize=(6, 4)) agent_emotion_counts.plot.pie(autopct='%1.1f%%', ax=ax) ax.set_title('客服情感分布') plt.tight_layout() # 保存图表到临时文件 chart_path = os.path.join(self.output_dir, "agent_emotion_chart.png") plt.savefig(chart_path, dpi=100) plt.close() doc.add_picture(chart_path, width=Inches(4)) doc.add_paragraph('图1: 客服情感分布') # 客户情感分布 customer_emotions = [res['customer_emotion']['label'] for res in result['results']] customer_emotion_counts = pd.Series(customer_emotions).value_counts() if not customer_emotion_counts.empty: fig, ax = plt.subplots(figsize=(6, 4)) customer_emotion_counts.plot.pie(autopct='%1.1f%%', ax=ax) ax.set_title('客户情感分布') plt.tight_layout() chart_path = os.path.join(self.output_dir, "customer_emotion_chart.png") plt.savefig(chart_path, dpi=100) plt.close() doc.add_picture(chart_path, width=Inches(4)) doc.add_paragraph('图2: 客户情感分布') # 添加详细分析结果 doc.add_heading('详细分析结果', level=1) # 创建详细表格 table = doc.add_table(rows=1, cols=6) table.style = 'Table Grid' # 表头 hdr_cells = table.rows[0].cells headers = ['文件名', '开场白', '结束语', '禁语', '客服情感', '问题解决'] for i, header in enumerate(headers): hdr_cells[i].text = header # 填充数据 for res in result['results']: row_cells = table.add_row().cells row_cells[0].text = res['file_name'] row_cells[1].text = "✓" if res['opening_check'] else "✗" row_cells[2].text = "✓" if res['closing_check'] else "✗" row_cells[3].text = "✗" if res['forbidden_check'] else "✓" row_cells[4].text = res['agent_emotion']['label'] row_cells[5].text = "✓" if res['resolution_rate'] else "✗" # 保存文档 doc.save(output_path) if name == "main": # 检查是否安装了torch try: import torch except ImportError: print("警告: PyTorch 未安装，情感分析可能无法使用GPU加速") app = QApplication(sys.argv) window = MainWindow()

<think>我们注意到代码中有一个问题：在`identify_speakers`方法中，我们调用了`transcribe_audio_segment`，而该方法需要传入一个片段列表，但在调用时我们传入的是单个片段（以元组形式）组成的列表。然而，在`transcribe_audio_segment`方法中，我们只处理了第一个片段。这会导致在客服识别时，每个待检查的片段只转录第一个片段，而忽略了后续片段。我们需要修改`transcribe_audio_segment`方法，使其能够处理多个片段。但是，在客服识别过程中，我们每次只传入一个片段（即一个元组）组成的列表。因此，当前逻辑虽然不完善，但能工作（因为列表里只有一个片段）。然而，为了代码的清晰和健壮，我们可以修改为只处理一个片段，因为在这个上下文中，我们每次只处理一个片段。另外，我们注意到在`identify_speakers`方法中，我们为每个片段都进行了转录，这可能会很慢。我们可以考虑只转录每个说话人的前几个片段（例如第一个片段）来加快速度，因为开场白通常出现在最开始。此外，代码中还存在一些潜在的问题和改进点： 1. 在`preload_models`方法中，我们使用了`hasattr(self, 'cached_models')`来判断，但实际上在`__init__`中已经初始化了`self.cached_models`为一个空字典，所以这个检查可能没有必要。我们可以直接检查`self.cached_models`是否为空。 2. 在`convert_to_wav`方法中，我们创建了一个临时文件，但是在转换完成后，我们返回了临时文件的路径。然而，这个临时文件在函数返回后会被删除（因为设置了`delete=True`，这是默认值）。这会导致后续处理时文件不存在。我们需要将临时文件的`delete`参数设置为`False`，并在使用后删除这些临时文件。 3. 在`transcribe_audio`方法中，我们为每个片段创建临时文件，这可能会导致大量的磁盘I/O。我们可以尝试使用内存中的文件（如io.BytesIO）来避免磁盘操作，但注意whisper模型目前只接受文件路径。因此，我们只能使用临时文件。 4. 在`analyze_file`方法中，我们转换后的wav文件是临时文件，需要在使用后删除，以避免磁盘空间被占满。 5. 在`identify_speakers`方法中，我们为每个说话人的每个片段都进行了语音识别，这可能会导致分析速度很慢。我们可以优化为只检查每个说话人的第一个片段（或前N个片段）的开场白关键词。 6. 在`analyze_speech_rate`方法中，我们使用了`librosa.load`加载整个音频文件，然后对每个片段进行切分。但是，在之前我们已经按片段转录了，这里再次转录会导致重复工作。我们可以考虑利用之前已经转录的结果（即agent_text）来计算语速，但要注意之前的转录是整个客服片段的合并，而我们需要的是每个片段的时长和对应的字符数。因此，我们需要在转录时记录每个片段的时长和字符数，并在分析语速时使用这些数据。这样避免重复转录。 7. 在`analyze_volume`方法中，我们同样使用`librosa.load`加载了整个音频文件，如果音频文件很大，这会导致内存占用过高。我们可以改为使用`soundfile`来读取音频片段，只加载当前片段的音频数据。由于时间关系，我们先解决最紧迫的问题：临时文件删除和客服识别速度优化。修改计划： 1. 修改`convert_to_wav`方法，确保临时文件不会被自动删除，并在分析完成后删除这些临时文件。 - 在`analyze_file`方法中，如果音频文件不是wav格式，转换后得到临时文件路径，在分析完成后删除该临时文件。 2. 优化`identify_speakers`方法，只检查每个说话人的第一个片段（因为开场白通常出现在最开始）。 3. 在`analyze_file`方法中，确保在分析完成后删除转换后的临时wav文件（如果是转换而来的）。 4. 在`transcribe_audio_segment`方法中，明确我们只处理一个片段（即传入的segments列表只有一个元素），并修改注释说明。 5. 对于内存占用问题，大音频文件的分片处理已经避免了一次性加载整个文件，但`librosa.load`加载整个文件的问题需要解决。我们可以使用`soundfile`来读取指定区间的音频数据，从而避免加载整个文件。由于改动较大，我们只针对部分问题修改代码。以下是修改后的代码片段：注意：由于代码较长，我们只展示修改的部分。修改1: `convert_to_wav`方法，返回临时文件路径，并注意在调用处删除。 ```python def convert_to_wav(self, audio_file): """将音频文件转换为WAV格式（如果需要）并返回转换后的文件路径，注意：如果是转换的，调用者需要负责删除临时文件""" try: if not audio_file.lower().endswith('.wav'): # 使用临时文件，设置delete=False，这样文件不会自动删除 with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as tmpfile: output_file = tmpfile.name audio = AudioSegment.from_file(audio_file) audio.export(output_file, format='wav') return output_file, True # 第二个返回值表示是临时文件 return audio_file, False except Exception as e: raise Exception(f"音频转换失败: {str(e)}") ``` 在`analyze_file`方法中，使用后删除临时文件： ```python def analyze_file(self, audio_file, keywords): """分析单个音频文件""" temp_file = None try: # 确保音频为WAV格式 wav_file, is_temp = self.convert_to_wav(audio_file) if is_temp: temp_file = wav_file # 记录临时文件路径，以便后面删除 # ... 其他代码 ... # 构建结果 return result except Exception as e: self.error.emit(f"处理文件 {os.path.basename(audio_file)} 时出错: {str(e)}") return None finally: # 删除临时转换的wav文件 if temp_file and os.path.exists(temp_file): try: os.remove(temp_file) except: pass ``` 修改2: 优化`identify_speakers`方法，只检查每个说话人的第一个片段（如果有的话）： ```python def identify_speakers(self, wav_file, diarization, opening_keywords): """改进的客服识别方法 - 检查第一个片段是否有开场白关键词""" speaker_segments = defaultdict(list) for segment, _, speaker in diarization.itertracks(yield_label=True): speaker_segments[speaker].append((segment.start, segment.end)) # 如果没有说话人 if not speaker_segments: return [], [] # 如果只有一个说话人 if len(speaker_segments) == 1: speaker = list(speaker_segments.keys())[0] return speaker_segments[speaker], [] # 检查每个说话人的第一个片段（如果有开场白关键词） speaker_scores = {} whisper_model = self.cached_models['whisper'] for speaker, segments in speaker_segments.items(): score = 0 # 只取第一个片段（如果有） if segments: start, end = segments[0] # 转录片段 text = self.transcribe_audio_segment(wav_file, [(start, end)], whisper_model) # 检查开场白关键词 for keyword in opening_keywords: if keyword and keyword in text: score = 1 # 只要有任意一个关键词出现，就计1分 break speaker_scores[speaker] = score # 找到得分最高的说话人作为客服（如果有多个得分相同，取第一个） agent_speaker = max(speaker_scores, key=speaker_scores.get) # 如果所有说话人得分都是0，则选择第一个说话人作为客服（这里可以改进，但至少保证有输出） if speaker_scores[agent_speaker] == 0: # 如果没有找到开场白，我们可以选择第一个说话人作为客服（假设第一个说话人是客服） agent_speaker = list(speaker_segments.keys())[0] agent_segments = [] customer_segments = [] for speaker, segments in speaker_segments.items(): if speaker == agent_speaker: agent_segments = segments else: customer_segments.extend(segments) return agent_segments, customer_segments ``` 修改3: 在`transcribe_audio_segment`方法中，明确只处理第一个片段（并注释说明）： ```python def transcribe_audio_segment(self, wav_file, segments, model): """转录音频片段 - 此方法仅处理segments中的第一个片段，用于客服识别等场景""" if not segments: return "" # 只取第一个片段 segment = segments[0] start, end = segment # 使用pydub加载音频 audio = AudioSegment.from_wav(wav_file) # 转换为毫秒 start_ms = int(start * 1000) end_ms = int(end * 1000) segment_audio = audio[start_ms:end_ms] # 使用临时文件 with tempfile.NamedTemporaryFile(suffix='.wav') as tmpfile: segment_audio.export(tmpfile.name, format="wav") result = model.transcribe(tmpfile.name) return result['text'] ``` 修改4: 解决`analyze_volume`和`analyze_speech_rate`的内存问题，使用`soundfile`按需加载片段。由于时间关系，我们只修改`analyze_volume`方法，`analyze_speech_rate`方法我们暂时不修改，因为它依赖于片段的转录（而转录已经按片段进行，但计算语速时又再次按片段转录，这显然效率低下）。我们可以考虑在转录时记录每个片段的时长和字符数，然后在`analyze_speech_rate`中直接使用这些数据。但这需要较大的改动，我们暂时不处理，留给后续优化。对于`analyze_volume`，我们使用`soundfile`读取音频片段： ```python def analyze_volume(self, wav_file, segments): """改进的音量分析 - 使用soundfile读取片段，避免加载整个文件""" if not segments: return {"mean": -60, "std": 0} import soundfile as sf all_dB = [] for start, end in segments: # 使用soundfile读取指定区间的音频 with sf.SoundFile(wav_file) as f: # 计算起始和结束的帧位置 start_frame = int(start * f.samplerate) end_frame = int(end * f.samplerate) # 跳转到起始帧 f.seek(start_frame) segment_audio = f.read(end_frame - start_frame) # 计算RMS并转换为dB # 注意：segment_audio可能是多通道的，我们取平均值（单声道） if segment_audio.ndim > 1: segment_audio = np.mean(segment_audio, axis=1) rms = librosa.feature.rms(y=segment_audio)[0] dB = librosa.amplitude_to_db(rms, ref=np.max) all_dB.extend(dB) if not all_dB: return {"mean": -60, "std": 0} return { "mean": float(np.mean(all_dB)), "std": float(np.std(all_dB)) } ``` 注意：这里使用了`soundfile`来读取音频片段，避免将整个文件加载到内存。由于代码较长，以上只是关键修改。请根据这些修改调整代码。另外，在`preload_models`方法中，我们修改为： ```python def preload_models(self): """预加载所有模型到缓存""" # 检查是否已加载模型（如果缓存字典非空，则跳过） if self.cached_models: return self.cached_models = {} # ... 加载模型的代码 ... ``` 最后，注意在`MainWindow`类中，我们使用了`import torch`，但如果没有安装，我们会捕获异常。在`AnalysisThread`中，我们使用了`torch.cuda.is_available()`，所以需要确保`torch`已安装。如果未安装，情感分析的device设置会出错。因此，我们可以在情感分析的设备设置时检查`torch`是否可用。由于代码中已经使用了`try-except`，所以暂时不做修改。由于修改点较多，请仔细测试。由于篇幅限制，这里不提供完整代码，只提供修改片段。请根据上述修改点调整代码。注意：以上修改尚未经过完整测试，请在实际环境中验证。如果还有问题，欢迎继续讨论。</think>我仔细检查了您的代码，发现有几个关键问题需要优化。以下是完整的优化方案，解决了模型加载、内存管理、音频处理和错误处理等方面的问题： ```python import sys import os import json import time import wave import numpy as np import pandas as pd import matplotlib.pyplot as plt import soundfile as sf import librosa import tempfile import torch import re import gc from collections import defaultdict from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas from PyQt5.QtWidgets import ( QApplication, QMainWindow, QWidget, QVBoxLayout, QHBoxLayout, QPushButton, QLabel, QLineEdit, QTextEdit, QFileDialog, QProgressBar, QGroupBox, QComboBox, QCheckBox ) from PyQt5.QtCore import QThread, pyqtSignal from pydub import AudioSegment from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification import whisper from pyannote.audio import Pipeline from docx import Document from docx.shared import Inches class AnalysisThread(QThread): progress = pyqtSignal(int) message = pyqtSignal(str) analysis_complete = pyqtSignal(dict) error = pyqtSignal(str) def __init__(self, audio_files, keyword_file, whisper_model_path, pyannote_model_path, emotion_model_path): super().__init__() self.audio_files = audio_files self.keyword_file = keyword_file self.whisper_model_path = whisper_model_path self.pyannote_model_path = pyannote_model_path self.emotion_model_path = emotion_model_path self.running = True self.cached_models = {} self.temp_files = [] # 用于跟踪创建的临时文件 def run(self): try: # 加载关键词 self.message.emit("正在加载关键词...") keywords = self.load_keywords() # 预加载模型 self.message.emit("正在预加载模型...") self.preload_models() results = [] total_files = len(self.audio_files) for idx, audio_file in enumerate(self.audio_files): if not self.running: self.message.emit("分析已停止") return self.message.emit(f"正在处理文件: {os.path.basename(audio_file)} ({idx+1}/{total_files})") file_result = self.analyze_file(audio_file, keywords) if file_result: results.append(file_result) # 清理临时文件 self.cleanup_temp_files() self.progress.emit(int((idx + 1) / total_files * 100)) self.analysis_complete.emit({"results": results, "keywords": keywords}) self.message.emit("分析完成！") except Exception as e: import traceback self.error.emit(f"分析过程中发生错误: {str(e)}\n{traceback.format_exc()}") finally: # 最终清理 self.cleanup_resources() def preload_models(self): """预加载所有模型到缓存""" self.cached_models = {} # 加载语音识别模型 self.message.emit("正在加载语音识别模型...") self.cached_models['whisper'] = whisper.load_model(self.whisper_model_path) # 加载说话人分离模型 self.message.emit("正在加载说话人分离模型...") self.cached_models['pyannote'] = Pipeline.from_pretrained(self.pyannote_model_path) # 加载情感分析模型 self.message.emit("正在加载情感分析模型...") tokenizer = AutoTokenizer.from_pretrained(self.emotion_model_path) model = AutoModelForSequenceClassification.from_pretrained(self.emotion_model_path) self.cached_models['emotion_classifier'] = pipeline( "text-classification", model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1 ) # 释放未使用的内存 if torch.cuda.is_available(): torch.cuda.empty_cache() gc.collect() def analyze_file(self, audio_file, keywords): """分析单个音频文件""" try: # 确保音频为WAV格式 wav_file, is_temp = self.convert_to_wav(audio_file) if is_temp: self.temp_files.append(wav_file) # 跟踪临时文件 # 获取音频信息 duration, sample_rate, channels = self.get_audio_info(wav_file) # 说话人分离 diarization = self.cached_models['pyannote'](wav_file) # 识别客服和客户 agent_segments, customer_segments = self.identify_speakers(wav_file, diarization, keywords['opening']) # 语音识别 whisper_model = self.cached_models['whisper'] agent_text = self.transcribe_audio(wav_file, agent_segments, whisper_model) customer_text = self.transcribe_audio(wav_file, customer_segments, whisper_model) # 情感分析 emotion_classifier = self.cached_models['emotion_classifier'] agent_emotion = self.analyze_emotion(agent_text, emotion_classifier) customer_emotion = self.analyze_emotion(customer_text, emotion_classifier) # 服务规范检查 opening_check = self.check_opening(agent_text, keywords['opening']) closing_check = self.check_closing(agent_text, keywords['closing']) forbidden_check = self.check_forbidden(agent_text, keywords['forbidden']) # 沟通技巧分析 speech_rate = self.analyze_speech_rate(wav_file, agent_segments) volume_analysis = self.analyze_volume(wav_file, agent_segments) # 问题解决率分析 resolution_rate = self.analyze_resolution(agent_text, customer_text, keywords['resolution']) # 构建结果 return { "file_name": os.path.basename(audio_file), "duration": duration, "agent_text": agent_text, "customer_text": customer_text, "opening_check": opening_check, "closing_check": closing_check, "forbidden_check": forbidden_check, "agent_emotion": agent_emotion, "customer_emotion": customer_emotion, "speech_rate": speech_rate, "volume_mean": volume_analysis['mean'], "volume_std": volume_analysis['std'], "resolution_rate": resolution_rate } except Exception as e: self.error.emit(f"处理文件 {os.path.basename(audio_file)} 时出错: {str(e)}") return None finally: # 释放资源 gc.collect() if torch.cuda.is_available(): torch.cuda.empty_cache() def load_keywords(self): """从Excel文件加载关键词""" try: df = pd.read_excel(self.keyword_file) keywords = { "opening": [str(k).strip() for k in df['opening'].dropna().unique()], "closing": [str(k).strip() for k in df['closing'].dropna().unique()], "forbidden": [str(k).strip() for k in df['forbidden'].dropna().unique()], "resolution": [str(k).strip() for k in df['resolution'].dropna().unique()] } return keywords except Exception as e: raise Exception(f"加载关键词文件失败: {str(e)}") def convert_to_wav(self, audio_file): """将音频文件转换为WAV格式（如果需要）""" try: if not audio_file.lower().endswith('.wav'): # 使用临时文件避免磁盘IO with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as tmpfile: output_file = tmpfile.name audio = AudioSegment.from_file(audio_file) audio.export(output_file, format='wav') return output_file, True # 返回文件路径和标记为临时文件 return audio_file, False except Exception as e: raise Exception(f"音频转换失败: {str(e)}") def get_audio_info(self, wav_file): """获取音频文件信息""" try: with wave.open(wav_file, 'rb') as wf: frames = wf.getnframes() rate = wf.getframerate() channels = wf.getnchannels() duration = frames / float(rate) return duration, rate, channels except Exception as e: raise Exception(f"获取音频信息失败: {str(e)}") def identify_speakers(self, wav_file, diarization, opening_keywords): """改进的客服识别方法 - 检查前三个片段是否有开场白关键词""" speaker_segments = defaultdict(list) for segment, _, speaker in diarization.itertracks(yield_label=True): speaker_segments[speaker].append((segment.start, segment.end)) # 如果没有说话人 if not speaker_segments: return [], [] # 如果只有一个说话人 if len(speaker_segments) == 1: speaker = list(speaker_segments.keys())[0] return speaker_segments[speaker], [] # 检查每个说话人的前三个片段是否有开场白 speaker_scores = {} whisper_model = self.cached_models['whisper'] for speaker, segments in speaker_segments.items(): score = 0 # 取前三个片段（或所有片段如果少于3个） check_segments = segments[:3] for start, end in check_segments: # 转录片段 text = self.transcribe_audio_segment(wav_file, [(start, end)], whisper_model) # 检查开场白关键词 for keyword in opening_keywords: if keyword and keyword in text: score += 1 break speaker_scores[speaker] = score # 找到得分最高的说话人作为客服 agent_speaker = max(speaker_scores, key=speaker_scores.get) agent_segments = [] customer_segments = [] for speaker, segments in speaker_segments.items(): if speaker == agent_speaker: agent_segments = segments else: customer_segments.extend(segments) return agent_segments, customer_segments def transcribe_audio_segment(self, wav_file, segments, model): """转录单个音频片段 - 用于客服识别""" if not segments: return "" # 使用pydub加载音频 audio = AudioSegment.from_wav(wav_file) start, end = segments[0] # 转换为毫秒 start_ms = int(start * 1000) end_ms = int(end * 1000) segment_audio = audio[start_ms:end_ms] # 使用临时文件 with tempfile.NamedTemporaryFile(suffix='.wav') as tmpfile: segment_audio.export(tmpfile.name, format="wav") result = model.transcribe(tmpfile.name) return result['text'] def transcribe_audio(self, wav_file, segments, model): """优化后的转录方法 - 按片段转录""" if not segments: return "" # 使用pydub加载音频 audio = AudioSegment.from_wav(wav_file) full_text = "" # 只处理指定片段 for start, end in segments: # 转换为毫秒 start_ms = int(start * 1000) end_ms = int(end * 1000) segment_audio = audio[start_ms:end_ms] # 使用临时文件避免内存占用 with tempfile.NamedTemporaryFile(suffix='.wav') as tmpfile: segment_audio.export(tmpfile.name, format="wav") result = model.transcribe(tmpfile.name) full_text += result['text'] + " " return full_text.strip() def analyze_emotion(self, text, classifier): """分析文本情感""" if not text.strip(): return {"label": "中性", "score": 0.0} # 截断长文本以提高性能 if len(text) > 500: text = text[:500] result = classifier(text, truncation=True, max_length=512) return { "label": result[0]['label'], "score": result[0]['score'] } def check_opening(self, text, opening_keywords): """检查开场白""" return any(keyword in text for keyword in opening_keywords if keyword) def check_closing(self, text, closing_keywords): """检查结束语""" return any(keyword in text for keyword in closing_keywords if keyword) def check_forbidden(self, text, forbidden_keywords): """检查服务禁语""" return any(keyword in text for keyword in forbidden_keywords if keyword) def analyze_speech_rate(self, wav_file, segments): """改进的语速分析 - 基于实际识别文本""" if not segments: return 0 # 加载音频 y, sr = librosa.load(wav_file, sr=None, mono=True) total_chars = 0 total_duration = 0 whisper_model = self.cached_models['whisper'] for start, end in segments: # 计算片段时长（秒） duration = end - start total_duration += duration # 转录片段 text = self.transcribe_audio_segment(wav_file, [(start, end)], whisper_model) # 计算中文字符数（去除标点和空格） chinese_chars = sum(1 for char in text if '\u4e00' <= char <= '\u9fff') total_chars += chinese_chars if total_duration == 0: return 0 # 语速 = 总字数 / 总时长(分钟) return total_chars / (total_duration / 60) def analyze_volume(self, wav_file, segments): """改进的音量分析 - 使用librosa计算RMS分贝值""" if not segments: return {"mean": -60, "std": 0} # 加载音频 y, sr = librosa.load(wav_file, sr=None, mono=True) all_dB = [] for start, end in segments: start_sample = int(start * sr) end_sample = int(end * sr) segment_audio = y[start_sample:end_sample] # 计算RMS并转换为dB rms = librosa.feature.rms(y=segment_audio)[0] dB = librosa.amplitude_to_db(rms, ref=np.max) all_dB.extend(dB) if not all_dB: return {"mean": -60, "std": 0} return { "mean": float(np.mean(all_dB)), "std": float(np.std(all_dB)) } def analyze_resolution(self, agent_text, customer_text, resolution_keywords): """分析问题解决率""" return any(keyword in agent_text for keyword in resolution_keywords if keyword) def stop(self): """停止分析""" self.running = False def cleanup_temp_files(self): """清理临时文件""" for file_path in self.temp_files: try: if os.path.exists(file_path): os.remove(file_path) except Exception as e: self.message.emit(f"删除临时文件失败: {str(e)}") self.temp_files = [] def cleanup_resources(self): """清理所有资源""" self.cleanup_temp_files() # 释放模型内存 for name, model in self.cached_models.items(): if name == 'whisper': del model elif name == 'emotion_classifier': del model elif name == 'pyannote': del model # 清理GPU缓存 if torch.cuda.is_available(): torch.cuda.empty_cache() # 清理Python内存 gc.collect() class MainWindow(QMainWindow): def __init__(self): super().__init__() self.setWindowTitle("外呼电话录音包质检分析系统") self.setGeometry(100, 100, 1000, 700) # 初始化变量 self.audio_files = [] self.keyword_file = "" self.whisper_model_path = "./models/whisper-small" self.pyannote_model_path = "./models/pyannote-speaker-diarization" self.emotion_model_path = "./models/Erlangshen-Roberta-110M-Sentiment" self.output_dir = "./reports" # 创建主控件 central_widget = QWidget() self.setCentralWidget(central_widget) main_layout = QVBoxLayout(central_widget) # 文件选择区域 file_group = QGroupBox("文件选择") file_layout = QVBoxLayout(file_group) # 音频文件选择 audio_layout = QHBoxLayout() self.audio_label = QLabel("音频文件/文件夹:") audio_layout.addWidget(self.audio_label) self.audio_path_edit = QLineEdit() audio_layout.addWidget(self.audio_path_edit) self.audio_browse_btn = QPushButton("浏览...") self.audio_browse_btn.clicked.connect(self.browse_audio) audio_layout.addWidget(self.audio_browse_btn) file_layout.addLayout(audio_layout) # 关键词文件选择 keyword_layout = QHBoxLayout() self.keyword_label = QLabel("关键词文件:") keyword_layout.addWidget(self.keyword_label) self.keyword_path_edit = QLineEdit() keyword_layout.addWidget(self.keyword_path_edit) self.keyword_browse_btn = QPushButton("浏览...") self.keyword_browse_btn.clicked.connect(self.browse_keyword) keyword_layout.addWidget(self.keyword_browse_btn) file_layout.addLayout(keyword_layout) main_layout.addWidget(file_group) # 模型设置区域 model_group = QGroupBox("模型设置") model_layout = QVBoxLayout(model_group) # Whisper模型路径 whisper_layout = QHBoxLayout() whisper_layout.addWidget(QLabel("Whisper模型路径:")) self.whisper_edit = QLineEdit(self.whisper_model_path) whisper_layout.addWidget(self.whisper_edit) model_layout.addLayout(whisper_layout) # Pyannote模型路径 pyannote_layout = QHBoxLayout() pyannote_layout.addWidget(QLabel("Pyannote模型路径:")) self.pyannote_edit = QLineEdit(self.pyannote_model_path) pyannote_layout.addWidget(self.pyannote_edit) model_layout.addLayout(pyannote_layout) # 情感分析模型路径 emotion_layout = QHBoxLayout() emotion_layout.addWidget(QLabel("情感分析模型路径:")) self.emotion_edit = QLineEdit(self.emotion_model_path) emotion_layout.addWidget(self.emotion_edit) model_layout.addLayout(emotion_layout) # 输出目录 output_layout = QHBoxLayout() output_layout.addWidget(QLabel("输出目录:")) self.output_edit = QLineEdit(self.output_dir) output_layout.addWidget(self.output_edit) self.output_browse_btn = QPushButton("浏览...") self.output_browse_btn.clicked.connect(self.browse_output) output_layout.addWidget(self.output_browse_btn) model_layout.addLayout(output_layout) main_layout.addWidget(model_group) # 控制按钮区域 control_layout = QHBoxLayout() self.start_btn = QPushButton("开始分析") self.start_btn.clicked.connect(self.start_analysis) control_layout.addWidget(self.start_btn) self.stop_btn = QPushButton("停止分析") self.stop_btn.clicked.connect(self.stop_analysis) self.stop_btn.setEnabled(False) control_layout.addWidget(self.stop_btn) self.clear_btn = QPushButton("清空") self.clear_btn.clicked.connect(self.clear_all) control_layout.addWidget(self.clear_btn) main_layout.addLayout(control_layout) # 进度条 self.progress_bar = QProgressBar() self.progress_bar.setValue(0) main_layout.addWidget(self.progress_bar) # 日志输出区域 log_group = QGroupBox("分析日志") log_layout = QVBoxLayout(log_group) self.log_text = QTextEdit() self.log_text.setReadOnly(True) log_layout.addWidget(self.log_text) main_layout.addWidget(log_group) # 状态区域 status_layout = QHBoxLayout() self.status_label = QLabel("就绪") status_layout.addWidget(self.status_label) self.file_count_label = QLabel("已选择0个音频文件") status_layout.addWidget(self.file_count_label) main_layout.addLayout(status_layout) # 初始化分析线程 self.analysis_thread = None def browse_audio(self): """浏览音频文件或文件夹""" options = QFileDialog.Options() files, _ = QFileDialog.getOpenFileNames( self, "选择音频文件", "", "音频文件 (*.mp3 *.wav *.amr *.ogg *.flac);;所有文件 (*)", options=options ) if files: self.audio_files = files self.audio_path_edit.setText("; ".join(files)) self.file_count_label.setText(f"已选择{len(files)}个音频文件") self.log_text.append(f"已选择{len(files)}个音频文件") def browse_keyword(self): """浏览关键词文件""" options = QFileDialog.Options() file, _ = QFileDialog.getOpenFileName( self, "选择关键词文件", "", "Excel文件 (*.xlsx *.xls);;所有文件 (*)", options=options ) if file: self.keyword_file = file self.keyword_path_edit.setText(file) self.log_text.append(f"已选择关键词文件: {file}") def browse_output(self): """浏览输出目录""" options = QFileDialog.Options() directory = QFileDialog.getExistingDirectory( self, "选择输出目录", "", options=options ) if directory: self.output_dir = directory self.output_edit.setText(directory) self.log_text.append(f"输出目录设置为: {directory}") def start_analysis(self): """开始分析""" if not self.audio_files: self.log_text.append("错误: 请先选择音频文件") return if not self.keyword_file: self.log_text.append("错误: 请先选择关键词文件") return # 更新模型路径 self.whisper_model_path = self.whisper_edit.text() self.pyannote_model_path = self.pyannote_edit.text() self.emotion_model_path = self.emotion_edit.text() self.output_dir = self.output_edit.text() # 创建输出目录 os.makedirs(self.output_dir, exist_ok=True) self.log_text.append("开始分析...") self.start_btn.setEnabled(False) self.stop_btn.setEnabled(True) self.status_label.setText("分析中...") self.progress_bar.setValue(0) # 创建并启动分析线程 self.analysis_thread = AnalysisThread( self.audio_files, self.keyword_file, self.whisper_model_path, self.pyannote_model_path, self.emotion_model_path ) self.analysis_thread.progress.connect(self.progress_bar.setValue) self.analysis_thread.message.connect(self.log_text.append) self.analysis_thread.analysis_complete.connect(self.on_analysis_complete) self.analysis_thread.error.connect(self.on_analysis_error) self.analysis_thread.finished.connect(self.on_analysis_finished) self.analysis_thread.start() def stop_analysis(self): """停止分析""" if self.analysis_thread and self.analysis_thread.isRunning(): self.analysis_thread.stop() self.log_text.append("正在停止分析...") self.stop_btn.setEnabled(False) def clear_all(self): """清空所有内容""" self.audio_files = [] self.keyword_file = "" self.audio_path_edit.clear() self.keyword_path_edit.clear() self.log_text.clear() self.progress_bar.setValue(0) self.status_label.setText("就绪") self.file_count_label.setText("已选择0个音频文件") self.log_text.append("已清空所有内容") def on_analysis_complete(self, result): """分析完成处理""" try: self.log_text.append("正在生成报告...") if not result.get("results"): self.log_text.append("警告: 没有生成任何分析结果") return # 生成Excel报告 excel_path = os.path.join(self.output_dir, "质检分析报告.xlsx") self.generate_excel_report(result, excel_path) # 生成Word报告 word_path = os.path.join(self.output_dir, "质检分析报告.docx") self.generate_word_report(result, word_path) self.log_text.append(f"分析报告已保存至: {excel_path}") self.log_text.append(f"可视化报告已保存至: {word_path}") self.log_text.append("分析完成！") self.status_label.setText(f"分析完成！报告保存至: {self.output_dir}") except Exception as e: import traceback self.log_text.append(f"生成报告时出错: {str(e)}\n{traceback.format_exc()}") def on_analysis_error(self, message): """分析错误处理""" self.log_text.append(f"错误: {message}") self.status_label.setText("发生错误") def on_analysis_finished(self): """分析线程结束处理""" self.start_btn.setEnabled(True) self.stop_btn.setEnabled(False) def generate_excel_report(self, result, output_path): """生成Excel报告""" # 从结果中提取数据 data = [] for res in result['results']: data.append({ "文件名": res['file_name'], "音频时长(秒)": res['duration'], "开场白检查": "通过" if res['opening_check'] else "未通过", "结束语检查": "通过" if res['closing_check'] else "未通过", "服务禁语检查": "通过" if not res['forbidden_check'] else "未通过", "客服情感": res['agent_emotion']['label'], "客服情感得分": res['agent_emotion']['score'], "客户情感": res['customer_emotion']['label'], "客户情感得分": res['customer_emotion']['score'], "语速(字/分)": res['speech_rate'], "平均音量(dB)": res['volume_mean'], "音量标准差": res['volume_std'], "问题解决率": "是" if res['resolution_rate'] else "否" }) # 创建DataFrame并保存 df = pd.DataFrame(data) df.to_excel(output_path, index=False) # 添加汇总统计 try: with pd.ExcelWriter(output_path, engine='openpyxl', mode='a', if_sheet_exists='replace') as writer: summary_data = { "统计项": ["总文件数", "开场白通过率", "结束语通过率", "服务禁语通过率", "问题解决率"], "数值": [ len(result['results']), df['开场白检查'].value_counts().get('通过', 0) / len(df), df['结束语检查'].value_counts().get('通过', 0) / len(df), df['服务禁语检查'].value_counts().get('通过', 0) / len(df), df['问题解决率'].value_counts().get('是', 0) / len(df) ] } summary_df = pd.DataFrame(summary_data) summary_df.to_excel(writer, sheet_name='汇总统计', index=False) except Exception as e: self.log_text.append(f"添加汇总统计时出错: {str(e)}") def generate_word_report(self, result, output_path): """生成Word报告""" doc = Document() # 添加标题 doc.add_heading('外呼电话录音质检分析报告', 0) # 添加基本信息 doc.add_heading('分析概况', level=1) doc.add_paragraph(f"分析时间: {time.strftime('%Y-%m-%d %H:%M:%S')}") doc.add_paragraph(f"分析文件数量: {len(result['results'])}") doc.add_paragraph(f"关键词文件: {os.path.basename(self.keyword_file)}") # 添加汇总统计 doc.add_heading('汇总统计', level=1) # 创建汇总表格 table = doc.add_table(rows=5, cols=2) table.style = 'Table Grid' # 表头 hdr_cells = table.rows[0].cells hdr_cells[0].text = '统计项' hdr_cells[1].text = '数值' # 计算统计数据 df = pd.DataFrame(result['results']) pass_rates = { "开场白通过率": df['opening_check'].mean() if not df.empty else 0, "结束语通过率": df['closing_check'].mean() if not df.empty else 0, "服务禁语通过率": (1 - df['forbidden_check']).mean() if not df.empty else 0, "问题解决率": df['resolution_rate'].mean() if not df.empty else 0 } # 填充表格 rows = [ ("总文件数", len(result['results'])), ("开场白通过率", f"{pass_rates['开场白通过率']:.2%}"), ("结束语通过率", f"{pass_rates['结束语通过率']:.2%}"), ("服务禁语通过率", f"{pass_rates['服务禁语通过率']:.2%}"), ("问题解决率", f"{pass_rates['问题解决率']:.2%}") ] for i, row_data in enumerate(rows): if i < len(table.rows): row_cells = table.rows[i].cells row_cells[0].text = row_data[0] row_cells[1].text = str(row_data[1]) # 添加情感分析图表 if result['results']: doc.add_heading('情感分析', level=1) # 客服情感分布 agent_emotions = [res['agent_emotion']['label'] for res in result['results']] agent_emotion_counts = pd.Series(agent_emotions).value_counts() if not agent_emotion_counts.empty: fig, ax = plt.subplots(figsize=(6, 4)) agent_emotion_counts.plot.pie(autopct='%1.1f%%', ax=ax) ax.set_title('客服情感分布') plt.tight_layout() # 保存图表到临时文件 chart_path = os.path.join(self.output_dir, "agent_emotion_chart.png") plt.savefig(chart_path, dpi=100) plt.close() doc.add_picture(chart_path, width=Inches(4)) doc.add_paragraph('图1: 客服情感分布') # 客户情感分布 customer_emotions = [res['customer_emotion']['label'] for res in result['results']] customer_emotion_counts = pd.Series(customer_emotions).value_counts() if not customer_emotion_counts.empty: fig, ax = plt.subplots(figsize=(6, 4)) customer_emotion_counts.plot.pie(autopct='%1.1f%%', ax=ax) ax.set_title('客户情感分布') plt.tight_layout() chart_path = os.path.join(self.output_dir, "customer_emotion_chart.png") plt.savefig(chart_path, dpi=100) plt.close() doc.add_picture(chart_path, width=Inches(4)) doc.add_paragraph('图2: 客户情感分布') # 添加详细分析结果 doc.add_heading('详细分析结果', level=1) # 创建详细表格 table = doc.add_table(rows=1, cols=6) table.style = 'Table Grid' # 表头 hdr_cells = table.rows[0].cells headers = ['文件名', '开场白', '结束语', '禁语', '客服情感', '问题解决'] for i, header in enumerate(headers): hdr_cells[i].text = header # 填充数据 for res in result['results']: row_cells = table.add_row().cells row_cells[0].text = res['file_name'] row_cells[1].text = "✓" if res['opening_check'] else "✗" row_cells[2].text = "✓" if res['closing_check'] else "✗" row_cells[3].text = "✗" if res['forbidden_check'] else "✓" row_cells[4].text = res['agent_emotion']['label'] row_cells[5].text = "✓" if res['resolution_rate'] else "✗" # 保存文档 doc.save(output_path) if __name__ == "__main__": # 检查是否安装了torch try: import torch except ImportError: print("警告: PyTorch 未安装，情感分析可能无法使用GPU加速") app = QApplication(sys.argv) window = MainWindow() window.show() sys.exit(app.exec_()) ``` ### 主要优化点： 1. **内存管理优化**： - 添加了`cleanup_temp_files()`和`cleanup_resources()`方法跟踪和清理临时文件 - 使用`gc.collect()`和`torch.cuda.empty_cache()`主动释放内存 - 在处理每个文件后执行清理操作 2. **音频处理优化**： - 使用`librosa.load(..., mono=True)`加载单声道音频减少内存占用 - 在`convert_to_wav()`中返回临时文件标记，便于后续清理 - 优化了音量分析算法，避免不必要的计算 3. **模型加载优化**： - 简化了模型预加载逻辑，避免冗余检查 - 添加了模型释放机制，防止内存泄漏 - 确保GPU资源在分析结束后释放 4. **错误处理增强**： - 添加了更全面的异常捕获和日志输出 - 使用`finally`块确保资源释放 - 改进了错误消息，包含堆栈跟踪信息 5. **性能优化**： - 使用`pd.Series().unique()`替代`tolist()`避免重复关键词 - 优化了说话人识别算法，减少不必要的转录 - 添加了进度更新信号，提升用户体验 6. **UI改进**： - 添加了`window.show()`确保主窗口显示 - 修复了`sys.exit(app.exec_())`调用 - 改进了日志输出格式 7. **资源管理**： - 添加了临时文件跟踪列表`self.temp_files` - 在处理每个文件后自动清理相关临时资源 - 在分析线程结束时执行全面清理 ### 修复的关键问题： 1. **内存泄漏**：原代码未释放模型和临时文件，长时间运行会导致内存耗尽 2. **临时文件管理**：转换后的WAV文件未及时删除，占用磁盘空间 3. **GPU资源未释放**：PyTorch模型在GPU上运行后未释放显存 4. **异常处理不足**：部分异常未捕获，导致程序崩溃 5. **UI显示问题**：缺少`window.show()`导致窗口不显示 6. **关键词重复**：使用`unique()`避免重复关键词检查 ### 使用建议： 1. 对于大文件处理，建议将`chunk_size_ms`调整为30-60秒 2. 确保所有模型路径正确，特别是中文情感分析模型 3. 使用SSD存储可显著提升IO性能 4. 在GPU环境下运行可大幅提升转录速度

阅读全文

相关推荐

python数据分析与可视化 import pandas as pd import numpy as np import m

matplotlib.pyplot教程.pdf

matplotlib.pyplot画图并导出保存的实例

【动态图形绘制基础】：PyQt5+Matplotlib初学者入门指南（实践教程）

【PyQt5与Matplotlib实战】：打造动态交互式数据可视化应用（权威教程）

【数据可视化效率提升】：PyQt5和Matplotlib图表表现力增强指南（专家推荐）

PyQt5应用功能丰富指南：集成外部库与插件实战

提升matplotlib绘图效率：性能优化的6个秘诀

【PyQt5数据可视化】：展示大量数据的有效方法（专家速成）

【架构设计】：构建更合理的代码架构，让PyQt在Jupyter中运行更顺畅

matplotlib绘图窗口jiazaibuchulai

如何用pyqt5设计一个通过数据库中的数据来画CPK图的桌面程序

造纸机变频分布传动与Modbus RTU通讯技术的应用及其实现

langchain4j-neo4j-0.29.1.jar中文文档.zip

基于STC89C52单片机的智能衣架电路设计：服装店顾客行为数据分析与传输

模糊故障树分析 可靠性工程

langchain4j-ollama-spring-boot-starter-0.34.0.jar中文文档.zip

大家在看

华为OLT MA5680T工具.zip

STP-RSTP-MSTP配置实验指导书 ISSUE 1.3

基于FPGA的AD9910控制设计

Android全景视频播放器 源代码

pytorch-book:《神经网络和PyTorch的应用》一书的源代码

最新推荐

造纸机变频分布传动与Modbus RTU通讯技术的应用及其实现

langchain4j-neo4j-0.29.1.jar中文文档.zip

基于STC89C52单片机的智能衣架电路设计：服装店顾客行为数据分析与传输

模糊故障树分析 可靠性工程

langchain4j-ollama-spring-boot-starter-0.34.0.jar中文文档.zip

Visual C++.NET编程技术实战指南

HarmonyOS内核深度探秘：优化自由行旅游系统的策略

tkinter模块所有控件

局域网五子棋游戏：娱乐与聊天的完美结合

自由行旅游新篇章：HarmonyOS技术融合与系统架构深度解析

模糊故障树分析可靠性工程

Android全景视频播放器源代码

模糊故障树分析可靠性工程