活动介绍

from scipy.stats import threshold是什么意思

时间: 2023-12-13 11:05:24 浏览: 155
`from scipy.stats import threshold` 是一行Python代码,它用于从`scipy.stats`模块中导入`threshold`函数。`scipy.stats`是SciPy库中用于统计分析的模块,而`threshold`函数用于计算给定数据集的阈值。阈值是一个用于判断数据是否满足某个条件的界限值。通过调用`threshold`函数,你可以根据不同的统计方法和参数来计算数据的阈值。
相关问题

ImportError: cannot import name 'threshold' from 'scipy.stats'

`ImportError: cannot import name 'threshold' from 'scipy.stats'` 错误表示在`scipy.stats`模块中无法导入`threshold`函数。这可能是因为你使用的SciPy版本较旧,或者该函数在你的版本中不存在。 请确保你的SciPy库已经正确安装,并且是最新的版本。你可以尝试更新SciPy库,使用以下命令: ``` pip install --upgrade scipy ``` 如果更新后仍然出现相同的错误,请检查你的代码是否正确,并查阅SciPy文档以了解有关阈值处理的其他函数或方法。

帮我解释代码,并检查有没有问题,import numpy as np import pandas as pd import math # import json # import os import sys import ast # from matplotlib.backends.backend_pdf import PdfPages import matplotlib.pyplot as plt from scipy.stats import pearsonr, spearmanr from IPython.display import display # from PIL import Image # from matplotlib.path import Path import argparse from sklearn.neighbors import KNeighborsClassifier from sklearn.preprocessing import StandardScaler # from scipy.spatial import ConvexHull # from matplotlib.patches import Patch ########## # spotsize = 1.0 ## 画图的点的大小 # sample = "WMQ-584-mHeart" # c2lpropfile = "/data1/zhaoshutao/projectworkspace/cell2location_20240715/heart_eyeball_c2l/heart_eyeball_c2l_max/WMQ-584-mHeart/1000/cell2location_map/st_cell2location_res.csv" # ctfreq_threshold = 0.5 # 细胞丰度的阈值,可以进行细胞丰度过滤(对转换后的实际丰度操作) # # 结果保存位置 # savepath = "/data1/zhaoshutao/projectworkspace/nucleus_recognition/sandiantu_20240716_c2l_card_cyto/20240830_knn_res_n_neighbors/c2l_max_no_problem" ########## ## 命令行传入参数 parser = argparse.ArgumentParser(description='') parser.add_argument('--Sample', '-s', type=str, help='Samplename', required=True) parser.add_argument('--c2lpropfile', '-j', type=str, help='', required=True) parser.add_argument('--Savepath', '-o', type=str, help='savepath', required=True) parser.add_argument('--spotsize', '-r', type=float, default=1.0,help='spotsize') parser.add_argument('--ctfreqthreshold', '-c', type=float, default=0.1,help='rare_ctfreq_threshold') parser.add_argument('--KNN_neighbors', '-k', type=int, default=5,help='KNN_n_neighbors') args = parser.parse_args() sample = args.Sample savepath = args.Savepath spotsize = args.spotsize c2lpropfile = args.c2lpropfile ctfreq_threshold = args.ctfreqthreshold KNN_neighbors = args.KNN_neighbors ##########

<think>好的,我现在需要帮用户解释并检查他们的Python代码。首先,我得仔细阅读用户提供的代码,理解每个部分的功能,然后看看有没有潜在的问题。让我从头开始分析。 首先,用户导入了多个库,比如numpy、pandas、math等。这些库在数据处理和科学计算中很常用。不过注意到有些库被注释掉了,比如json、os、PIL等,可能是在开发过程中暂时不需要,所以用户先注释掉,这是常见的做法,不会有问题,但可以建议用户如果不再需要的话可以删除,保持代码整洁。 接下来,用户定义了一些参数,通过argparse模块来处理命令行输入。这部分看起来结构正确,定义了必需的参数如Sample、c2lpropfile、Savepath,以及可选的参数spotsize、ctfreqthreshold、KNN_neighbors。这里需要检查是否所有必需的参数都正确设置,尤其是类型是否正确,比如Sample是字符串类型,路径参数是否正确处理。参数名的大小写也需要注意,比如--Sample和args.Sample是否匹配,看起来没问题,因为argparse会自动将--Sample转为小写的sample,但实际代码中变量名是sample,用户使用args.Sample赋值,这里可能存在错误。哦,不对,argparse默认会将--Sample的参数存储在args.Sample中,所以代码中的sample = args.Sample是正确的,变量名是sample,没有问题。 然后,用户通过parser.parse_args()解析参数,并赋值给各个变量。这部分看起来没问题,但需要注意路径参数是否正确处理,比如保存路径是否存在,是否需要提前创建目录,但代码中没有处理,可能在运行时如果目录不存在会出错。建议用户添加目录创建的代码,比如使用os.makedirs(savepath, exist_ok=True),但用户注释了os模块,可能需要先导入os并取消注释。 接下来,用户可能在进行一些数据处理或机器学习任务,比如使用KNeighborsClassifier和StandardScaler。这里导入了这些类,但代码中没有具体的使用部分,可能是在后续未提供的代码中。需要检查是否有必要的预处理步骤,比如数据标准化是否应用,因为KNN对尺度敏感,通常需要标准化。如果用户后续的代码中确实调用了StandardScaler的fit_transform或类似方法,那没问题,否则可能需要提醒用户进行数据标准化。 另外,用户从scipy.stats导入了pearsonr和spearmanr,可能是为了计算相关性,但代码中没有看到使用的地方,可能存在未完成的代码部分,或者这些部分被省略了。同样,matplotlib的相关导入用于绘图,但具体绘图代码是否完整?需要用户确认是否有遗漏的代码段。 关于变量ctfreq_threshold,用户提到用于细胞丰度过滤,但代码中没有后续的处理步骤,比如如何应用这个阈值过滤数据。可能是在用户未提供的代码部分,需要用户确认是否实现了过滤逻辑,否则这个参数可能没有实际作用。 还有一个潜在的问题是,用户是否处理了输入文件的存在性?比如,c2lpropfile指定的文件是否存在,如果不存在,代码会抛出错误。可以建议用户添加文件存在性检查,例如使用os.path.isfile进行检查,并处理异常。 关于KNN_neighbors参数,用户设置为默认5,但在KNN模型中,邻居数的选择会影响结果,需要用户根据数据情况调整。如果用户的后续代码中正确地将这个参数传递给KNeighborsClassifier的n_neighbors参数,那没问题,否则需要检查参数传递是否正确。 此外,代码中有些注释的中文部分,比如“## 画图的点的大小”,但整体代码结构清晰,变量命名基本符合规范,但部分变量名如ctfreq_threshold可以更明确,比如cell_type_frequency_threshold,以提高可读性。 最后,用户可能没有处理可能的异常情况,比如输入文件格式是否正确,数据列是否存在,或者是否有缺失值处理。这些都可能影响代码的健壮性,需要用户根据实际情况添加相应的错误处理或数据清洗步骤。 总结一下,代码的结构和导入部分基本正确,但存在一些潜在问题需要用户注意,比如路径处理、参数传递、数据预处理、异常处理等。建议用户补充这些部分,以确保代码的稳健性和正确性。</think>以下是对代码的分步解释及问题检查: --- ### **一、代码功能概述** 该代码是一个用于空间转录组数据分析的脚本,主要功能包括: - 读取细胞定位结果文件 - 通过K近邻算法(KNN)进行细胞类型分类 - 参数化处理输入输出路径及分析参数 - 数据标准化及可视化准备 --- ### **二、代码逐段解释** #### **1. 依赖库导入** ```python import numpy as np import pandas as pd import math import sys import ast import matplotlib.pyplot as plt from scipy.stats import pearsonr, spearmanr from IPython.display import display import argparse from sklearn.neighbors import KNeighborsClassifier from sklearn.preprocessing import StandardScaler ``` - **合理性**:基础科学计算库(numpy、pandas)、可视化(matplotlib)、统计分析(scipy.stats)、机器学习(sklearn)的导入均合理。 - **问题**:部分注释掉的库(如`os`、`json`)可能影响后续功能(如目录创建)。建议按需取消注释。 --- #### **2. 命令行参数解析** ```python parser = argparse.ArgumentParser(description='') parser.add_argument('--Sample', '-s', type=str, required=True) parser.add_argument('--c2lpropfile', '-j', type=str, required=True) parser.add_argument('--Savepath', '-o', type=str, required=True) parser.add_argument('--spotsize', '-r', type=float, default=1.0) parser.add_argument('--ctfreqthreshold', '-c', type=float, default=0.1) parser.add_argument('--KNN_neighbors', '-k', type=int, default=5) args = parser.parse_args() ``` - **功能**:通过命令行动态传入样本名、输入文件路径、保存路径及算法参数。 - **问题**: - 缺少对输入文件存在性的检查(如`os.path.isfile(args.c2lpropfile)`)。 - 未创建输出目录(需添加`os.makedirs(savepath, exist_ok=True)`)。 --- #### **3. 参数赋值** ```python sample = args.Sample savepath = args.Savepath spotsize = args.spotsize c2lpropfile = args.c2lpropfile ctfreq_threshold = args.ctfreqthreshold KNN_neighbors = args.KNN_neighbors ``` - **合理性**:参数命名清晰,类型正确。 - **潜在问题**:`ctfreq_threshold`未在后续代码中使用,需确认是否遗漏过滤逻辑。 --- #### **4. 潜在问题总结** 1. **路径处理缺失**: - 未检查输入文件是否存在。 - 未自动创建输出目录(需`os.makedirs`)。 2. **数据预处理不完整**: - 虽然导入了`StandardScaler`,但未在代码中实际调用,可能导致KNN未标准化数据。 3. **阈值参数未使用**: - `ctfreq_threshold`定义后未参与数据过滤,需补充相关逻辑。 4. **异常处理缺失**: - 未处理文件读取错误、空数据或无效输入格式。 --- ### **三、改进建议** 1. **补充路径检查与创建**: ```python import os if not os.path.isfile(c2lpropfile): raise FileNotFoundError(f"Input file {c2lpropfile} does not exist.") os.makedirs(savepath, exist_ok=True) ``` 2. **显式数据标准化**: ```python scaler = StandardScaler() scaled_data = scaler.fit_transform(raw_data) # 假设raw_data为输入特征 ``` 3. **应用细胞丰度阈值**: ```python filtered_data = data[data['cell_freq'] > ctfreq_threshold] ``` 4. **添加异常捕获**: ```python try: df = pd.read_csv(c2lpropfile) except Exception as e: print(f"Error reading file: {e}") sys.exit(1) ``` --- ### **四、总结** - **代码优势**:模块化参数设计、合理的依赖库选择。 - **改进方向**:增强鲁棒性(路径检查、异常处理)、补全数据预处理逻辑。 - **风险提示**:直接运行可能因文件/目录缺失导致崩溃,需按上述建议完善。
阅读全文

相关推荐

import numpy as np import statsmodels.tsa.stattools as sts import matplotlib.pyplot as plt import pandas as pd import seaborn as sns import statsmodels.api as sm # X = np.random.randn(1000) # Y = np.random.randn(1000) # plt.scatter(X,Y) # plt.show() data = pd.DataFrame(pd.read_excel(r'C:\Users\ivanss\Desktop\groud.xlsx')) # X = np.array(data[['Water heat']]) # Y = np.array(data[['pH']]) import numpy as np from scipy.stats import pearsonr #输入数组 x = np.array(data[['Water heat']]) y = np.array(data[['pH']]) #从二维数组转变成一维数组 x = x.squeeze() y = y.squeeze() print(x.shape, y.shape) # 检测无效值 # 将NaN和inf替换为可过滤的值 x_clean = x[~np.isnan(x) & ~np.isinf(x)] y_clean = y[~np.isnan(y) & ~np.isinf(y)] # 确保x和y长度一致 min_length = min(len(x_clean), len(y_clean)) x_final = x_clean[:min_length] y_final = y_clean[:min_length] # print(x_final) # 添加极其小的偏移量,避免除0 x = np.log(x + 1e-10) y = np.log(y + 1e-10) #输出pearsonr相关系数 from scipy.stats import pearsonr corr, p_value = pearsonr(x_final, y_final) print("水温和PH相关系数:", corr) x = np.array(data[['Water heat']]) y = np.array(data[['DO']]) #从二维数组转变成一维数组 x = x.squeeze() y = y.squeeze() print(x.shape, y.shape) # 检测无效值 # 将NaN和inf替换为可过滤的值 x_clean = x[~np.isnan(x) & ~np.isinf(x)] y_clean = y[~np.isnan(y) & ~np.isinf(y)] # 确保x和y长度一致 min_length = min(len(x_clean), len(y_clean)) x_final = x_clean[:min_length] y_final = y_clean[:min_length] # print(x_final) # 添加极其小的偏移量,避免除0 x = np.log(x + 1e-10) y = np.log(y + 1e-10) #输出pearsonr相关系数 from scipy.stats import pearsonr corr, p_value = pearsonr(x_final, y_final) print("水温和DO相关系数:", corr) 我想将每一列的相关系列都测出来 汇聚成heatmap 同时缺失值和异常值选择用填充而不是删除 该怎么做

#1.导入必备的库 import pandas as pd import numpy as np from matplotlib import pyplot as plt from scipy import stats from sklearn.ensemble import GradientBoostingRegressor from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score, KFold from sklearn.metrics import mean_squared_error, r2_score from sklearn.neighbors import KNeighborsRegressor from sklearn.preprocessing import StandardScaler #2.设置显示选项 plt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = False pd.set_option('display.max_rows', None) #显示最大行数 pd.set_option('display.max_columns', None) #显示最大列数 pd.set_option('display.max_colwidth', None) #显示的最大列宽 pd.set_option('display.width', None) #显示的最宽度 #3.导入数据 data=pd.read_excel("信用评分卡模型.xlsx") #4.数据预处理 #4.1使用均值填写缺失值 print("缺失值统计:\n",data.isnull().sum()) data=data.apply(lambda col:col.fillna(col.mean()),axis=0) #4.2处理异常值 numeric_data = data.select_dtypes(include=[np.number]) z_scores = np.abs(stats.zscore(data.select_dtypes(include=[np.number]))) # 仅对数值型数据计算 Z-score threshold = 3 # Z-score 阈值 3个标准差 outliers = (z_scores > threshold).any(axis=1) # 检测异常值 print("检测到的异常值行索引:\n", data[outliers].index.tolist()) # 输出异常值的行索引 print(data[outliers]) data = data[~outliers] # 移除异常值 X=data.drop("信用评分",axis=1) y=data["信用评分"] #4.3将数据划分为训练集与测试集 X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=0) #4.4创建标准化训练测试集与训练集 scaler=StandardScaler() X_train=scaler.fit_transform(X_train) X_test=scaler.transform(X_test) #建立模型 model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42) cv=KFold(n_splits=5, random_state=42, shuffle=True) # 参数网格 param_grid = { 'n_estimators': [50, 100, 200], 'learning_rate': [0.01, 0.1, 0.2], 'max_depth': [3, 5, 7] } # 网格搜索 grid_search = GridSearchCV( estimator=model, param_grid=param_grid, cv=cv, scoring='neg_mean_squared_error', n_jobs=-1 ) grid_search.fit(X_train, y_train) print("最佳参数组合:", grid_search

import pandas as pd import numpy as np from matplotlib import pyplot as plt from scipy import stats from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score, KFold from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, ConfusionMatrixDisplay from sklearn.preprocessing import StandardScaler, LabelEncoder from sklearn.neighbors import KNeighborsClassifier # 导入 KNN 模型 # 设置显示选项 plt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = False pd.set_option('display.max_rows', None) pd.set_option('display.max_columns', None) pd.set_option('display.max_colwidth', None) pd.set_option('display.width', None) # 导入数据 data_1 = pd.read_excel("员工离职预测模型.xlsx") # 对“工资”列进行标签编码 label_encoder = LabelEncoder() data_1['工资'] = label_encoder.fit_transform(data_1['工资']) print("工资列标签编码后的值:\n", data_1['工资'].unique()) # 将编码后的数值替换掉原Excel表中的字符串 data_1.to_excel("员工离职预测模型_编码后.xlsx", index=False) data = pd.read_excel("员工离职预测模型_编码后.xlsx") # 数据预处理 # 使用均值填充缺失值 print("缺失值统计:\n", data.isnull().sum()) data = data.apply(lambda col: col.fillna(col.mean()), axis=0) # 处理异常值(Z-score 方法) z_scores = np.abs(stats.zscore(data.select_dtypes(include=[np.number]))) threshold = 3 outliers = (z_scores > threshold).any(axis=1) print("检测到的异常值行索引:\n", data[outliers].index.tolist()) print(data[outliers]) data = data[~outliers] # 划分特征和目标变量 X = data.drop("离职", axis=1) y = data["离职"] # 划分训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0) # 标准化数据 scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # 初始化 KNN 模型 model = KNeighborsClassifier() cv = KFold(n_splits=5, random_state=10, shuffle=True) # 参数网格 param_grid = { 'n_neighbors': [3, 5, 7, 9, 11], # 邻居数量 'weights': ['uniform', 'distance'], # 权重函数 'p': [1, 2] # 距离度量(1:曼哈顿距离,2:欧氏距离) } # 网格搜索 grid_search = GridSearchCV( estimator=model, param_grid=param_grid

import numpy as np import matplotlib.pyplot as plt import networkx as nx from matplotlib.patches import Circle from scipy.spatial.distance import cdist from scipy.sparse import lil_matrix from scipy.interpolate import CubicSpline from scipy.stats import gaussian_kde from sklearn.cluster import KMeans from scipy.spatial import KDTree # 设置随机种子 np.random.seed(42) # 参数设置 - 面积和点数比例5:3:2 total_points = 5000 red_ratio, black_ratio, blue_ratio = 5, 3, 2 total_ratio = red_ratio + black_ratio + blue_ratio red_points = int(total_points * red_ratio / total_ratio) black_points = int(total_points * black_ratio / total_ratio) blue_points = int(total_points * blue_ratio / total_ratio) # 创建平滑自然的边界曲线(避免S形) def create_smooth_boundary(start, end, num_points=100, wave_count=2): """创建自然平滑的边界曲线""" # 使用更少的关键点创建更自然的曲线 key_x = np.linspace(0, 1, 5) key_y = np.linspace(start, end, 5) # 添加随机扰动但保持平滑 key_y[1:-1] += np.random.uniform(-0.1, 0.1, 3) # 使用三次样条插值创建平滑曲线 cs = CubicSpline(key_x, key_y) x = np.linspace(0, 1, num_points) return cs(x) # 创建两条平滑边界曲线 boundary1 = create_smooth_boundary(0.2, 0.8) # 红黑边界 boundary2 = create_smooth_boundary(0.4, 0.6) # 黑蓝边界 # 确保边界顺序正确且间距合理 for i in range(len(boundary1)): # 确保边界1 < 边界2 if boundary1[i] > boundary2[i]: boundary1[i], boundary2[i] = boundary2[i], boundary1[i] # 确保有足够的空间容纳中间区域 min_gap = 0.15 if boundary2[i] - boundary1[i] < min_gap: # 同时调整两条边界保持比例 adjustment = (min_gap - (boundary2[i] - boundary1[i])) / 2 boundary1[i] -= adjustment boundary2[i] += adjustment # 创建存储数据的列表 x_coords = [] y_coords = [] colors = [] # 计算区域面积以验证比例 def calculate_area(boundary_low, boundary_high): """计算两条边界之间的面积""" area = np.trapz(boundary_high - boundary_low, dx=1 / len(boundary_low)) return area # 计算各区域面积 red_area = calculate_area(np.zeros_like(boundary1), boundary1) black_area = calculate_area(boundary1, boundary2) blue_area = calculate_area(boundary2, np.ones_like(boundary2)) total_area = red_area + black_area + blue_area # 调整边界以确保面积比例严格为5:3:2 target_red = 0.5 * total_area target_black = 0.3 * total_area target_blue = 0.2 * total_area # 缩放边界以满足面积比例 scale_red = np.sqrt(target_red / red_area) # 面积缩放因子 scale_blue = np.sqrt(target_blue / blue_area) # 应用缩放 boundary1 *= scale_red boundary2 = 1 - (1 - boundary2) * scale_blue # 重新计算面积 red_area = calculate_area(np.zeros_like(boundary1), boundary1) black_area = calculate_area(boundary1, boundary2) blue_area = calculate_area(boundary2, np.ones_like(boundary2)) total_area = red_area + black_area + blue_area # 生成点分布(使用核密度估计使分布更自然) def generate_points_with_kde(num_points, x_range, y_range, kde_points=None): """使用KDE生成自然分布的点""" if kde_points is None or len(kde_points) < 2: # 如果没有参考点,生成均匀分布 x = np.random.uniform(x_range[0], x_range[1], num_points) y = np.random.uniform(y_range[0], y_range[1], num_points) return x, y # 使用KDE基于参考点生成新点 kde = gaussian_kde(kde_points.T) return kde.resample(num_points) # 生成红色点 (下方区域) x_red, y_red = [], [] for _ in range(red_points): x = np.random.uniform(0.01, 0.99) idx = min(int(x * 99), 98) max_y = boundary1[idx] * 0.55 # 留出边界缓冲区 # 使用KDE创建自然聚集效果 if len(x_red) > 50: new_x, new_y = generate_points_with_kde( 1, (0, 1), (0, max_y), np.column_stack([x_red, y_red])) x = new_x[0] y = new_y[0] else: y = np.random.uniform(0.01, max_y) x_red.append(x) y_red.append(y) colors.append('red') # 生成黑色点 (中间区域) x_black, y_black = [], [] for _ in range(black_points): x = np.random.uniform(0.01, 0.99) idx = min(int(x * 99), 98) min_y = boundary1[idx] * 0.85 max_y = boundary2[idx] * 0.85 # 确保区域有效 if max_y > min_y: # 使用KDE创建过渡效果 if len(x_black) > 50: new_x, new_y = generate_points_with_kde( 1, (0, 1), (min_y, max_y), np.column_stack([x_black, y_black])) x = new_x[0] y = new_y[0] else: y = np.random.uniform(min_y, max_y) x_black.append(x) y_black.append(y) colors.append('black') # 生成蓝色点 (上方区域) x_blue, y_blue = [], [] for _ in range(blue_points): x = np.random.uniform(0.01, 0.99) idx = min(int(x * 99), 98) min_y = boundary2[idx] * 1.10 # 使用KDE创建自然分布 if len(x_blue) > 50: new_x, new_y = generate_points_with_kde( 1, (0, 1), (min_y, 0.99), np.column_stack([x_blue, y_blue])) x = new_x[0] y = new_y[0] else: y = np.random.uniform(min_y, 0.99) x_blue.append(x) y_blue.append(y) colors.append('blue') # 合并所有点 all_x = np.array(x_red + x_black + x_blue) all_y = np.array(y_red + y_black + y_blue) colors = np.array(colors) # 粒球图类(优化版) class GranularBallGraph: def __init__(self, points, color, n_clusters=10, overlap_threshold=0.1, max_radius_ratio=0.05): """ 初始化粒球图 参数: points (np.ndarray): 输入点云数据,形状为 (n_points, 2) color (str): 粒球的颜色 n_clusters (int): 粒球数量 overlap_threshold (float): 粒球重叠判断阈值(半径比例) max_radius_ratio (float): 最大半径限制(相对于数据范围的比例) """ self.points = points self.color = color self.n_clusters = n_clusters self.overlap_threshold = overlap_threshold self.max_radius_ratio = max_radius_ratio self.balls = [] self.graph = None # 计算数据范围用于半径限制 self.x_range = np.ptp(points[:, 0]) self.y_range = np.ptp(points[:, 1]) self.max_radius = max(self.x_range, self.y_range) * max_radius_ratio def generate_granular_balls(self): """使用KMeans聚类生成粒球,确保粒球只包含同色点""" if len(self.points) < self.n_clusters: self.n_clusters = max(1, len(self.points) // 5) if len(self.points) > 0: # 使用KMeans聚类 kmeans = KMeans(n_clusters=self.n_clusters, random_state=42, n_init=10) labels = kmeans.fit_predict(self.points) self.balls = [] for i in range(self.n_clusters): cluster_points = self.points[labels == i] if len(cluster_points) > 0: center = kmeans.cluster_centers_[i] # 计算到中心的最大距离作为半径 distances = np.linalg.norm(cluster_points - center, axis=1) radius = min(np.max(distances), self.max_radius) if len(distances) > 0 else 0.01 # 确保半径合理 if radius < 0.001: radius = min(0.01, self.max_radius) self.balls.append((center, radius, cluster_points)) def build_graph(self): """构建粒球图结构""" if not self.balls: print(f"错误: 未生成粒球,请先调用 generate_granular_balls() - {self.color}") return n_balls = len(self.balls) adj_matrix = lil_matrix((n_balls, n_balls), dtype=int) centers = np.array([ball[0] for ball in self.balls]) radii = np.array([ball[1] for ball in self.balls]) # 计算粒球中心间的距离 dist_matrix = cdist(centers, centers) # 判断粒球是否相交 for i in range(n_balls): for j in range(i + 1, n_balls): # 计算重叠比例 distance = dist_matrix[i, j] min_radius = min(radii[i], radii[j]) if min_radius > 0: # 防止除以零 overlap_ratio = (radii[i] + radii[j] - distance) / min_radius if overlap_ratio > self.overlap_threshold: adj_matrix[i, j] = 1 adj_matrix[j, i] = 1 # 创建NetworkX图 self.graph = nx.Graph() for i in range(n_balls): self.graph.add_node(i, center=centers[i], radius=radii[i]) for i in range(n_balls): for j in adj_matrix.rows[i]: if i < j: # 避免重复添加边 self.graph.add_edge(i, j) # 分别处理三种颜色的点云 red_points = np.column_stack([all_x[colors == 'red'], all_y[colors == 'red']]) black_points = np.column_stack([all_x[colors == 'black'], all_y[colors == 'black']]) blue_points = np.column_stack([all_x[colors == 'blue'], all_y[colors == 'blue']]) # 创建并构建粒球图(使用KMeans聚类) print("创建红色粒球图...") gbg_red = GranularBallGraph(red_points, color='red', n_clusters=50, max_radius_ratio=0.03) gbg_red.generate_granular_balls() gbg_red.build_graph() print("创建黑色粒球图...") gbg_black = GranularBallGraph(black_points, color='black', n_clusters=30, max_radius_ratio=0.03) gbg_black.generate_granular_balls() gbg_black.build_graph() print("创建蓝色粒球图...") gbg_blue = GranularBallGraph(blue_points, color='blue', n_clusters=20, max_radius_ratio=0.03) gbg_blue.generate_granular_balls() gbg_blue.build_graph() # 可视化所有粒球图 plt.figure(figsize=(15, 15)) # 绘制原始点云(半透明) plt.scatter(all_x, all_y, c=colors, alpha=0.2, s=5, edgecolors='none') # 定义颜色映射 color_map = { 'red': ('red', 'darkred', '#ff9999'), 'black': ('black', 'dimgray', '#999999'), 'blue': ('blue', 'darkblue', '#9999ff') } # 绘制红色粒球(只包含同色点) for i, (center, radius, cluster_points) in enumerate(gbg_red.balls): # 绘制粒球边界 circle = Circle(center, radius, fill=False, edgecolor=color_map['red'][1], alpha=0.7, linestyle='-', linewidth=1.5) plt.gca().add_patch(circle) # 绘制粒球内的点(可选) # plt.scatter(cluster_points[:, 0], cluster_points[:, 1], # color=color_map['red'][0], alpha=0.5, s=2) # 绘制黑色粒球 for i, (center, radius, cluster_points) in enumerate(gbg_black.balls): circle = Circle(center, radius, fill=False, edgecolor=color_map['black'][1], alpha=0.7, linestyle='-', linewidth=1.5) plt.gca().add_patch(circle) # 绘制蓝色粒球 for i, (center, radius, cluster_points) in enumerate(gbg_blue.balls): circle = Circle(center, radius, fill=False, edgecolor=color_map['blue'][1], alpha=0.7, linestyle='-', linewidth=1.5) plt.gca().add_patch(circle) # 绘制图结构(节点和边) def draw_graph(gbg, color_map): if gbg.graph: pos = {i: ball[0] for i, ball in enumerate(gbg.balls)} # 绘制节点(使用更小的点) nx.draw_networkx_nodes(gbg.graph, pos, node_size=20, node_color=color_map[gbg.color][2], alpha=0.8, edgecolors=color_map[gbg.color][1]) # 绘制边(更细的线) nx.draw_networkx_edges(gbg.graph, pos, edge_color=color_map[gbg.color][1], width=0.8, alpha=0.4) # 绘制三种颜色的图结构 draw_graph(gbg_red, color_map) draw_graph(gbg_black, color_map) draw_graph(gbg_blue, color_map) # 添加图例 from matplotlib.lines import Line2D legend_elements = [ Line2D([0], [0], marker='o', color='w', markerfacecolor='red', markersize=10, label='红色点云'), Line2D([0], [0], marker='o', color='w', markerfacecolor='black', markersize=10, label='黑色点云'), Line2D([0], [0], marker='o', color='w', markerfacecolor='blue', markersize=10, label='蓝色点云'), Line2D([0], [0], color=color_map['red'][1], lw=2, label='红色粒球边界'), Line2D([0], [0], color=color_map['black'][1], lw=2, label='黑色粒球边界'), Line2D([0], [0], color=color_map['blue'][1], lw=2, label='蓝色粒球边界') ] plt.legend(handles=legend_elements, loc='upper right', fontsize=12) plt.title('多色点云的粒球图(优化版)', fontsize=16) plt.xlabel('X') plt.ylabel('Y') plt.grid(alpha=0.1) plt.axis('equal') plt.tight_layout() # 保存图像 plt.savefig('granular_ball_graph_optimized.png', dpi=300, bbox_inches='tight') plt.show() # 打印粒球图信息 print(f"红色粒球图: {len(gbg_red.balls)} 个粒球, {gbg_red.graph.number_of_edges()} 条边") print(f"黑色粒球图: {len(gbg_black.balls)} 个粒球, {gbg_black.graph.number_of_edges()} 条边") print(f"蓝色粒球图: {len(gbg_blue.balls)} 个粒球, {gbg_blue.graph.number_of_edges()} 条边") # 检查粒球大小 def analyze_balls(gbg, color): radii = [ball[1] for ball in gbg.balls] avg_radius = np.mean(radii) max_radius = np.max(radii) min_radius = np.min(radii) print(f"\n{color}粒球大小分析:") print(f" 平均半径: {avg_radius:.4f}") print(f" 最大半径: {max_radius:.4f}") print(f" 最小半径: {min_radius:.4f}") print(f" 粒球数量: {len(radii)}") return radii # 分析各颜色粒球大小 red_radii = analyze_balls(gbg_red, "红色") black_radii = analyze_balls(gbg_black, "黑色") blue_radii = analyze_balls(gbg_blue, "蓝色") # 绘制粒球大小分布 plt.figure(figsize=(10, 6)) plt.hist(red_radii, bins=20, alpha=0.5, color='red', label='红色粒球') plt.hist(black_radii, bins=20, alpha=0.5, color='black', label='黑色粒球') plt.hist(blue_radii, bins=20, alpha=0.5, color='blue', label='蓝色粒球') plt.title('粒球半径分布') plt.xlabel('半径') plt.ylabel('数量') plt.legend() plt.grid(alpha=0.2) plt.tight_layout() plt.savefig('granular_ball_radii_distribution.png', dpi=200) plt.show()“红色部分的粒球做的很好,请将其他颜色的粒球也像红色部分一样,保证粒球间的连通”

import pandas as pd import numpy as np import matplotlib.pyplot as plt from scipy import stats import seaborn as sns plt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = False def load_data(file_path): """加载并预处理数据""" try: df = pd.read_csv(file_path) except FileNotFoundError: print("错误:文件未找到") return None # 时间戳转换 df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='min') df.set_index('Timestamp', inplace=True) # 数值型字段处理 numeric_cols = ['Flow Bytes/s', ' Total Fwd Packets', ' Total Backward Packets', ' Flow Duration', ' Flow IAT Mean', ' Flow IAT Std'] df[numeric_cols] = df[numeric_cols].apply(pd.to_numeric, errors='coerce') return df def traffic_frequency_analysis(df): """流量频率分析""" traffic_frequency = df.resample('min').size() plt.figure(figsize=(12, 6)) traffic_frequency.plot(color='blue', linewidth=1) plt.title('流量时间分布') plt.xlabel('时间') plt.ylabel('流量数量') plt.grid(linestyle='--', alpha=0.7) plt.show() return traffic_frequency def detect_high_frequency_traffic(traffic_frequency): """检测高频流量""" threshold = traffic_frequency.quantile(0.99) return traffic_frequency[traffic_frequency > threshold] def detect_abnormal_packets(df): """检测异常包大小""" large_threshold = df['Length'].quantile(0.99) small_threshold = df['Length'].quantile(0.01) large_packets = df[df['Length'] > large_threshold] small_packets = df[df['Length'] < small_threshold] return large_packets, small_packets def flow_rate_analysis(df): """流量速率分析""" # 计算流量速率 df['Flow Packets/s'] = df[' Total Fwd Packets'] / (df[' Flow Duration'] / 1000) df[' 续写

import os import numpy as np import matplotlib.pyplot as plt import re from scipy.stats import variation from matplotlib.ticker import MaxNLocator # 解决中文显示问题 plt.rcParams['font.sans-serif'] = ['SimHei', 'Microsoft YaHei', 'WenQuanYi Micro Hei'] plt.rcParams['axes.unicode_minus'] = False def natural_sort_key(s): """自然排序算法:确保文件名按数字顺序排列""" return [int(text) if text.isdigit() else text.lower() for text in re.split(r'(\d+)', s)] def find_stable_intervals(counts, window_size=1000, stability_threshold=0.15): """ 找出预测框数量变化稳定的区间 :param counts: 预测框数量列表 :param window_size: 滑动窗口大小 :param stability_threshold: 稳定性阈值(变异系数小于此值视为稳定) :return: 稳定区间的起始索引列表 """ stable_intervals = [] current_start = None for i in range(len(counts) - window_size + 1): window = counts[i:i + window_size] cv = variation(window) # 计算变异系数(标准差/均值) if cv < stability_threshold: if current_start is None: current_start = i else: if current_start is not None: stable_intervals.append((current_start, i + window_size - 1)) current_start = None # 处理最后一个区间 if current_start is not None: stable_intervals.append((current_start, len(counts) - 1)) return stable_intervals def plot_box_count_trend_with_stable_intervals(file_list, box_counts, stable_intervals, output_path): """ 绘制预测框数量变化趋势图并标记稳定区间 :param file_list: 文件名列表 :param box_counts: 预测框数量列表 :param stable_intervals: 稳定区间列表 :param output_path: 输出图片路径 """ plt.figure(figsize=(20, 8)) # 绘制整体趋势 plt.plot(file_list, box_counts, 'b-', linewidth=1.5, label='预测框数量') # 标记稳定区间 for i, (start, end) in enumerate(stable_intervals): interval_files = file_list[start:end + 1] interval_counts = box_counts[start:end + 1] # 计算区间平均数量 avg_count = np.mean(interval_counts) # 绘制稳定区间 plt.fill_between(interval_files, interval_counts, color='green', alpha=0.3, label=f'稳定区间{i + 1}' if i == 0 else "") # 添加区间标注(避免使用空格) mid_idx = start + (end - start) // 2 plt.annotate(f"区间{i + 1}:{start + 1}-{end + 1}\n平均数量:{avg_count:.1f}", (file_list[mid_idx], avg_count), xytext=(0, 15), textcoords='offset points', ha='center', fontsize=9, bbox=dict(boxstyle="round,pad=0.3", fc="yellow", alpha=0.5)) # 设置图表属性 plt.title('预测框数量变化趋势及稳定区间分析', fontsize=16) plt.xlabel('图像文件名', fontsize=12) plt.ylabel('预测框数量', fontsize=12) plt.xticks(rotation=90, fontsize=6) plt.grid(True, linestyle='--', alpha=0.6) plt.legend() # 限制X轴刻度数量 plt.gca().xaxis.set_major_locator(MaxNLocator(20)) plt.tight_layout() plt.savefig(output_path, dpi=150) plt.close() # 配置路径 label_dir = "D:/630-3-label-combine" # 替换为您的标签文件夹路径 output_dir = "D:/630-3-label-combine" # 输出目录 os.makedirs(output_dir, exist_ok=True) # 获取文件列表并按自然顺序排序 file_list = [f for f in os.listdir(label_dir) if f.endswith(".txt")] file_list.sort(key=natural_sort_key) # 提取文件名(不含扩展名) file_names = [os.path.splitext(f)[0] for f in file_list] # 统计每个文件的预测框数量 box_counts = [] for file in file_list: file_path = os.path.join(label_dir, file) count = 0 with open(file_path, 'r') as f: for line in f: if line.strip(): # 非空行 count += 1 box_counts.append(count) # 找出稳定区间 stable_intervals = find_stable_intervals(box_counts, window_size=1000, stability_threshold=0.5) # 生成结果图片 output_path = os.path.join(output_dir, "box_count_stable_intervals.png") plot_box_count_trend_with_stable_intervals(file_names, box_counts, stable_intervals, output_path) # 输出结果 print(f"分析完成! 共处理 {len(file_list)} 个文件") print(f"发现 {len(stable_intervals)} 个稳定区间:") for i, (start, end) in enumerate(stable_intervals): print( f"区间{i + 1}: 文件 {start + 1}-{end + 1} (共{end - start + 1}个文件), 平均框数: {np.mean(box_counts[start:end + 1]):.2f}") print(f"结果图片已保存至: {output_path}") 将这个代码加入宏观趋势分析,自适应窗口机制和后续优化包括合并相邻平稳段以及过滤短时伪平稳段

aq_2017_shp = "F:\实验五 地理相关分析2025\data\全国地市级城市2017空气质量shp.shp" aq_swm = "F:\实验五 地理相关分析2025\data\aq_swm.swm" arcpy.stats.GenerateSpatialWeightsMatrix(aq_2017_shp,"FID_ID",aq_swm,"INVERSE_DISTANCE","EUCLIDEAN",1,500000) Traceback (most recent call last): File "<string>", line 3, in <module> File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\stats.py", line 3096, in GenerateSpatialWeightsMatrix raise e File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\stats.py", line 3093, in GenerateSpatialWeightsMatrix retval = convertArcObjectToPythonObject(gp.GenerateSpatialWeightsMatrix_stats(*gp_fixargs((Input_Feature_Class, Unique_ID_Field, Output_Spatial_Weights_Matrix_File, Conceptualization_of_Spatial_Relationships, Distance_Method, Exponent, Threshold_Distance, Number_of_Neighbors, Row_Standardization, Input_Table, Date_Time_Field, Date_Time_Interval_Type, Date_Time_Interval_Value, Use_Z_values), True))) File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\geoprocessing\_base.py", line 512, in <lambda> return lambda *args: val(*gp_fixargs(args, True)) arcgisscripting.ExecuteError: ERROR 000210: 无法创建输出 F:\实验五 地理相关分析2025\dataq_swm.swm AttributeError: 'NoneType' object has no attribute 'close' Exception ignored in: 'scipy.spatial.qhull._Qhull.__dealloc__' Traceback (most recent call last): File "c:\program files\arcgis\pro\Resources\ArcToolbox\Scripts\SSUtilities.py", line 4007, in getConvexHull hull = SCPS.ConvexHull(xyCoords) AttributeError: 'NoneType' object has no attribute 'close' 执行(GenerateSpatialWeightsMatrix)失败。 以上是报错情况

import numpy as np import matplotlib.pyplot as plt from pymatgen.io.vasp import Vasprun from pymatgen.core.structure import Structure from scipy.spatial import cKDTree from tqdm import tqdm import matplotlib as mpl import warnings import os import csv import argparse import multiprocessing from scipy.stats import gaussian_kde import time # 忽略可能的警告 warnings.filterwarnings("ignore", category=UserWarning) # 专业绘图设置 - 符合Journal of Chemical Physics要求 plt.style.use('seaborn-v0_8-whitegrid') mpl.rcParams.update({ 'font.family': 'serif', 'font.serif': ['Times New Roman', 'DejaVu Serif'], 'font.size': 12, 'axes.labelsize': 14, 'axes.titlesize': 16, 'xtick.labelsize': 12, 'ytick.labelsize': 12, 'figure.dpi': 600, 'savefig.dpi': 600, 'figure.figsize': (8, 6), 'lines.linewidth': 2.0, 'legend.fontsize': 10, 'legend.framealpha': 0.8, 'mathtext.default': 'regular', 'axes.linewidth': 1.5, 'xtick.major.width': 1.5, 'ytick.major.width': 1.5, 'xtick.major.size': 5, 'ytick.major.size': 5, }) def identify_atom_types(struct): """优化的原子类型识别函数,不再区分P=O和P-O""" # 初始化数据结构 atom_types = { "phosphate_oxygens": {"P-O/P=O": [], "P-OH": []}, "phosphate_hydrogens": [], "water_oxygens": [], "water_hydrogens": [], "hydronium_oxygens": [], "hydronium_hydrogens": [], "fluoride_atoms": [i for i, site in enumerate(struct) if site.species_string == "F"], "aluminum_atoms": [i for i, site in enumerate(struct) if site.species_string == "Al"] } # 构建全局KDTree all_coords = np.array([site.coords for site in struct]) kdtree = cKDTree(all_coords, boxsize=struct.lattice.abc) # 识别磷酸基团 p_atoms = [i for i, site in enumerate(struct) if site.species_string == "P"] phosphate_oxygens = [] for p_idx in p_atoms: # 查找P周围的O原子 (距离 < 1.6Å) neighbors = kdtree.query_ball_point(all_coords[p_idx], r=1.6) p_o_indices = [idx for idx in neighbors if idx != p_idx and struct[idx].species_string == "O"] phosphate_oxygens.extend(p_o_indices) # 识别所有H原子并确定归属 hydrogen_owners = {} h_atoms = [i for i, site in enumerate(struct) if site.species_string == "H"] for h_idx in h_atoms: neighbors = kdtree.query_ball_point(all_coords[h_idx], r=1.2) candidate_os = [idx for idx in neighbors if idx != h_idx and struct[idx].species_string == "O"] if not candidate_os: continue min_dist = float('inf') owner_o = None for o_idx in candidate_os: dist = struct.get_distance(h_idx, o_idx) if dist < min_dist: min_dist = dist owner_o = o_idx hydrogen_owners[h_idx] = owner_o # 分类磷酸氧:带H的为P-OH,不带H的为P-O/P=O for o_idx in phosphate_oxygens: has_hydrogen = any(owner_o == o_idx for h_idx, owner_o in hydrogen_owners.items()) if has_hydrogen: atom_types["phosphate_oxygens"]["P-OH"].append(o_idx) else: atom_types["phosphate_oxygens"]["P-O/P=O"].append(o_idx) # 识别水和水合氢离子 all_o_indices = [i for i, site in enumerate(struct) if site.species_string == "O"] non_phosphate_os = [o_idx for o_idx in all_o_indices if o_idx not in phosphate_oxygens] o_h_count = {} for h_idx, owner_o in hydrogen_owners.items(): o_h_count[owner_o] = o_h_count.get(owner_o, 0) + 1 for o_idx in non_phosphate_os: h_count = o_h_count.get(o_idx, 0) attached_hs = [h_idx for h_idx, owner_o in hydrogen_owners.items() if owner_o == o_idx] if h_count == 2: atom_types["water_oxygens"].append(o_idx) atom_types["water_hydrogens"].extend(attached_hs) elif h_count == 3: atom_types["hydronium_oxygens"].append(o_idx) atom_types["hydronium_hydrogens"].extend(attached_hs) # 识别磷酸基团的H原子 for o_idx in atom_types["phosphate_oxygens"]["P-OH"]: attached_hs = [h_idx for h_idx, owner_o in hydrogen_owners.items() if owner_o == o_idx] atom_types["phosphate_hydrogens"].extend(attached_hs) return atom_types def get_hbond_angle_config(): """返回九类氢键的键角分析配置""" return [ { "name": "P-O/P=O···Hw", "donor_type": "water_oxygens", # 供体氧 (水中的O) "acceptor_type": "P-O/P=O", # 受体氧 (磷酸中的非羟基氧) "h_type": "water_hydrogens", # 氢原子 (水中的H) "threshold": 2.375, # 氢键距离阈值 (Å) "color": "blue" }, { "name": "P-O/P=O···Hh", "donor_type": "hydronium_oxygens", # 供体氧 (水合氢离子中的O) "acceptor_type": "P-O/P=O", "h_type": "hydronium_hydrogens", # 氢原子 (水合氢离子中的H) "threshold": 2.275, "color": "green" }, { "name": "P-O/P=O···Hp", "donor_type": "P-OH", # 供体氧 (磷酸中的羟基氧) "acceptor_type": "P-O/P=O", "h_type": "phosphate_hydrogens", # 氢原子 (磷酸中的H) "threshold": 2.175, "color": "red" }, { "name": "P-OH···Ow", "donor_type": "P-OH", # 供体氧 (磷酸中的羟基氧) "acceptor_type": "water_oxygens", # 受体氧 (水中的O) "h_type": "phosphate_hydrogens", # 氢原子 (磷酸中的H) "threshold": 2.275, "color": "orange" }, { "name": "Hw···Ow", "donor_type": "water_oxygens", # 供体氧 (水中的O) "acceptor_type": "water_oxygens", # 受体氧 (另一个水中的O) "h_type": "water_hydrogens", # 氢原子 (水中的H) "threshold": 2.375, "color": "purple" }, { "name": "Hh···Ow", "donor_type": "hydronium_oxygens", # 供体氧 (水合氢离子中的O) "acceptor_type": "water_oxygens", # 受体氧 (水中的O) "h_type": "hydronium_hydrogens", # 氢原子 (水合氢离子中的H) "threshold": 2.225, "color": "cyan" }, { "name": "Hw···F", "donor_type": "water_oxygens", # 供体氧 (水中的O) "acceptor_type": "fluoride_atoms", # 受体 (氟离子) "h_type": "water_hydrogens", # 氢原子 (水中的H) "threshold": 2.225, "color": "magenta" }, { "name": "Hh···F", "donor_type": "hydronium_oxygens", # 供体氧 (水合氢离子中的O) "acceptor_type": "fluoride_atoms", "h_type": "hydronium_hydrogens", # 氢原子 (水合氢离子中的H) "threshold": 2.175, "color": "brown" }, { "name": "Hp···F", "donor_type": "P-OH", # 供体氧 (磷酸中的羟基氧) "acceptor_type": "fluoride_atoms", "h_type": "phosphate_hydrogens", # 氢原子 (磷酸中的H) "threshold": 2.275, "color": "pink" } ] def calculate_angle(d_coords, h_coords, a_coords): """计算D-H···A键角 (角度制)""" vec_dh = h_coords - d_coords vec_ha = a_coords - h_coords # 计算向量点积 dot_product = np.dot(vec_dh, vec_ha) # 计算向量模长 norm_dh = np.linalg.norm(vec_dh) norm_ha = np.linalg.norm(vec_ha) # 避免除以零 if norm_dh < 1e-6 or norm_ha < 1e-6: return None # 计算余弦值 cos_theta = dot_product / (norm_dh * norm_ha) # 处理数值误差 cos_theta = np.clip(cos_theta, -1.0, 1.0) # 计算角度 (弧度转角度) angle_rad = np.arccos(cos_theta) angle_deg = np.degrees(angle_rad) return angle_deg def calculate_hbond_angles_frame(struct, atom_types, hbond_config, bond_threshold=1.3): """计算单帧结构中所有氢键的键角""" # 构建全局KDTree用于快速搜索 all_coords = np.array([site.coords for site in struct]) kdtree = cKDTree(all_coords, boxsize=struct.lattice.abc) # 结果字典: {氢键类型: [角度列表]} angle_results = {hbond["name"]: [] for hbond in hbond_config} # 处理每一类氢键 for hbond in hbond_config: # 获取供体原子列表 if hbond["donor_type"] == "P-OH": donors = atom_types["phosphate_oxygens"]["P-OH"] else: donors = atom_types[hbond["donor_type"]] # 获取受体原子列表 if hbond["acceptor_type"] == "P-O/P=O": acceptors = atom_types["phosphate_oxygens"]["P-O/P=O"] elif hbond["acceptor_type"] == "P-OH": acceptors = atom_types["phosphate_oxygens"]["P-OH"] else: acceptors = atom_types[hbond["acceptor_type"]] # 获取氢原子列表 hydrogens = atom_types[hbond["h_type"]] # 如果没有氢原子或受体,跳过 if len(hydrogens) == 0 or len(acceptors) == 0: continue # 为受体构建KDTree acceptor_coords = np.array([struct[i].coords for i in acceptors]) acceptor_kdtree = cKDTree(acceptor_coords, boxsize=struct.lattice.abc) # 遍历所有氢原子 for h_idx in hydrogens: h_coords = all_coords[h_idx] # 查找与H成键的供体 (距离 < bond_threshold) donor_neighbors = kdtree.query_ball_point(h_coords, r=bond_threshold) donor_candidates = [idx for idx in donor_neighbors if idx in donors] # 如果没有找到供体,跳过 if not donor_candidates: continue # 选择最近的供体 min_dist = float('inf') donor_idx = None for d_idx in donor_candidates: dist = struct.get_distance(h_idx, d_idx) if dist < min_dist: min_dist = dist donor_idx = d_idx # 查找在阈值内的受体 acceptor_indices = acceptor_kdtree.query_ball_point(h_coords, r=hbond["threshold"]) for a_idx_offset in acceptor_indices: a_idx = acceptors[a_idx_offset] a_coords = all_coords[a_idx] # 排除供体自身 if a_idx == donor_idx: continue # 计算键角 d_coords = all_coords[donor_idx] angle = calculate_angle(d_coords, h_coords, a_coords) if angle is not None and angle >= 120.0: angle_results[hbond["name"]].append(angle) return angle_results def calculate_hbond_angles_frame_wrapper(args): """包装函数用于多进程处理""" struct, hbond_config = args atom_types = identify_atom_types(struct) return calculate_hbond_angles_frame(struct, atom_types, hbond_config) def calculate_hbond_angles_parallel(structures, workers=1): """并行计算氢键键角""" hbond_config = get_hbond_angle_config() all_results = {hbond["name"]: [] for hbond in hbond_config} # 准备参数列表 args_list = [(struct, hbond_config) for struct in structures] with multiprocessing.Pool(processes=workers) as pool: results = list(tqdm( pool.imap(calculate_hbond_angles_frame_wrapper, args_list), total=len(structures), desc="Calculating HBond Angles" )) # 合并结果 for frame_result in results: for hbond_name, angles in frame_result.items(): all_results[hbond_name].extend(angles) return all_results def plot_hbond_angle_distribution(all_angles, system_name): """绘制氢键键角分布图并保存原始数据""" # 创建输出目录 os.makedirs("HBond_Angle_Plots", exist_ok=True) os.makedirs("HBond_Angle_Data", exist_ok=True) hbond_config = get_hbond_angle_config() # 创建统计信息汇总文件 summary_path = os.path.join("HBond_Angle_Data", f"{system_name}_summary.csv") with open(summary_path, 'w', newline='') as summary_file: summary_writer = csv.writer(summary_file) summary_writer.writerow(["HBond Type", "Count", "Mean (°)", "Std Dev (°)", "Median (°)", "Min (°)", "Max (°)"]) # 处理每种氢键类型 for hbond in hbond_config: angles = np.array(all_angles[hbond["name"]]) if len(angles) == 0: print(f"No angles found for {hbond['name']} in {system_name}") continue # 创建安全文件名 safe_name = hbond["name"].replace("/", "").replace("=", "").replace("···", "_").replace(" ", "_") # ========== 保存原始角度数据 ========== data_path = os.path.join("HBond_Angle_Data", f"{system_name}_{safe_name}.csv") with open(data_path, 'w', newline='') as data_file: data_writer = csv.writer(data_file) data_writer.writerow(["Angle (degrees)"]) for angle in angles: data_writer.writerow([f"{angle:.4f}"]) # ========== 计算统计量 ========== mean_angle = np.mean(angles) std_angle = np.std(angles) median_angle = np.median(angles) min_angle = np.min(angles) max_angle = np.max(angles) # 保存统计信息到汇总文件 with open(summary_path, 'a', newline='') as summary_file: summary_writer = csv.writer(summary_file) summary_writer.writerow([ hbond["name"], len(angles), f"{mean_angle:.4f}", f"{std_angle:.4f}", f"{median_angle:.4f}", f"{min_angle:.4f}", f"{max_angle:.4f}" ]) # ========== 绘制分布图 ========== plt.figure(figsize=(8, 6)) # 绘制直方图 hist, bins, _ = plt.hist(angles, bins=30, density=True, alpha=0.5, color=hbond["color"], label="Histogram") # 绘制KDE曲线 kde = gaussian_kde(angles) x = np.linspace(120, 180, 500) plt.plot(x, kde(x), color="black", linewidth=2.5, label="KDE") # 添加统计信息 stats_text = (f"Mean: {mean_angle:.2f}°\n" f"Std: {std_angle:.2f}°\n" f"Median: {median_angle:.2f}°\n" f"Count: {len(angles)}") plt.text(0.75, 0.95, stats_text, transform=plt.gca().transAxes, verticalalignment='top', bbox=dict(boxstyle='round', alpha=0.5)) # 设置标题和标签 plt.title(f"{system_name}: {hbond['name']} Bond Angle Distribution", fontsize=16) plt.xlabel("Bond Angle (degrees)", fontsize=14) plt.ylabel("Probability Density", fontsize=14) plt.xlim(120, 180) plt.ylim(0, None) plt.grid(True, linestyle='--', alpha=0.7) plt.legend() # 保存图像 image_path = os.path.join("HBond_Angle_Plots", f"{system_name}_{safe_name}_angle.tiff") plt.savefig(image_path, dpi=600, bbox_inches='tight') plt.close() print(f"Saved HBond angle data: {data_path}") print(f"Saved HBond angle plot: {image_path}") print(f"Saved summary statistics: {summary_path}") def main(vasprun_files, workers=1): """主处理函数""" for system_name, vasprun_file in vasprun_files.items(): print(f"\n{'='*50}") print(f"Processing {system_name}: {vasprun_file} with {workers} workers") print(f"{'='*50}") start_time = time.time() try: # 加载VASP结果 vr = Vasprun(vasprun_file, ionic_step_skip=5) structures = vr.structures print(f"Loaded {len(structures)} frames") # 计算氢键键角分布 print("Calculating hydrogen bond angles...") hbond_angles = calculate_hbond_angles_parallel(structures, workers=workers) # 绘制并保存结果 plot_hbond_angle_distribution(hbond_angles, system_name) elapsed = time.time() - start_time print(f"\nCompleted processing for {system_name} in {elapsed:.2f} seconds") except Exception as e: print(f"Error processing {system_name}: {str(e)}") print("\nAll HBond angle analysis completed successfully!") if __name__ == "__main__": # 设置命令行参数 parser = argparse.ArgumentParser(description='Calculate hydrogen bond angles from VASP simulations') parser.add_argument('--workers', type=int, default=multiprocessing.cpu_count(), help=f'Number of parallel workers (default: {multiprocessing.cpu_count()})') args = parser.parse_args() # 自动设置vasprun文件和系统名称 vasprun_files = { "System1": "vasprun1.xml", "System2": "vasprun2.xml", "System3": "vasprun3.xml", "System4": "vasprun4.xml" } # 检查文件是否存在 missing_files = [name for name, path in vasprun_files.items() if not os.path.exists(path)] if missing_files: raise FileNotFoundError(f"Missing vasprun files: {', '.join(missing_files)}") print(f"Starting HBond angle analysis with {args.workers} workers...") main(vasprun_files, workers=args.workers)以上代码当中修正笛卡尔坐标的直接使用,修改为笛卡尔向量,避免在计算氢键角度的时候部分处于周期性外,导致角度计算出现偏差,在这里需要引入struct,但是pymatgen中应该用structure直接计算原子间距离,而不是通过Lattice避免调用错误,

# -*- coding: utf-8 -*- import sys import os import cv2 import numpy as np import json import time import logging import platform from datetime import datetime from scipy import ndimage from scipy.spatial import distance # PyQt5 导入 from PyQt5.QtWidgets import ( QApplication, QMainWindow, QPushButton, QWidget, QVBoxLayout, QHBoxLayout, QMessageBox, QLabel, QFileDialog, QToolBar, QComboBox, QStatusBar, QGroupBox, QSlider, QDockWidget, QProgressDialog, QCheckBox # 添加 QCheckBox ) from PyQt5.QtCore import QRect, Qt, QSettings, QThread, pyqtSignal from PyQt5.QtGui import QImage, QPixmap # 可能需要这些用于图像显示 # 海康 SDK 导入 sys.path.append("D:\\海康\\MVS\\Development\\Samples\\Python\\BasicDemo") from MvCameraControl_class import * from MvErrorDefine_const import * from CameraParams_header import * # 自定义模块导入 from CamOperation_class import CameraOperation from PyUICBasicDemo import Ui_MainWindow # 如果使用 UI 文件生成 # 图像处理库导入 try: from skimage.feature import local_binary_pattern from skimage import exposure except ImportError: # 提供替代方案或错误处理 pass # 配置日志系统 logging.basicConfig( level=logging.DEBUG, # 设置为DEBUG级别获取更多信息 format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler("cloth_inspection_debug.log"), logging.StreamHandler() ] ) logging.info("布料印花检测系统启动 - 增强版") # 全局变量 current_sample_path = "" # 当前使用的样本路径 detection_history = [] # 检测历史记录 sample_features = {} # 存储样本特征数据 config_file = "detection_config.json" # 配置文件路径 # 加载配置文件 def load_config(): global config_file if os.path.exists(config_file): try: with open(config_file, 'r') as f: return json.load(f) except Exception as e: logging.error(f"加载配置文件失败: {str(e)}") return {} return {} # 保存配置文件 def save_config(config): global config_file try: with open(config_file, 'w') as f: json.dump(config, f, indent=4) logging.info("配置文件已保存") except Exception as e: logging.error(f"保存配置文件失败: {str(e)}") # 初始化配置 default_config = { "threshold": 0.05, "min_defect_area": 50, "max_defect_area": 5000, "texture_analysis": True, "color_sensitivity": 0.8, "edge_detection": True, "adaptive_threshold": True, "multi_scale_levels": 3, "feature_extraction": "histogram", # histogram, lbp, sift, orb "defect_classification": True } app_config = {**default_config, **load_config()} # 帧监控线程 class FrameMonitorThread(QThread): frame_status = pyqtSignal(str) def __init__(self, cam_operation): super().__init__() self.cam_operation = cam_operation self.running = True def run(self): while self.running: if self.cam_operation: status = self.cam_operation.get_frame_status() frame_text = "有帧" if status.get('current_frame', False) else "无帧" self.frame_status.emit(f"帧状态: {frame_text}") QThread.msleep(500) def stop(self): self.running = False # ======================== 增强版布料印花检测算法 ======================== def extract_features(image): """ 提取图像的多尺度特征 :param image: 输入图像 (灰度或彩色) :return: 特征字典 """ features = {} # 确保图像是灰度图 if len(image.shape) == 3: gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) else: gray = image # 1. 直方图特征 hist = cv2.calcHist([gray], [0], None, [256], [0, 256]) hist = cv2.normalize(hist, hist).flatten() features['histogram'] = hist.tolist() # 2. LBP纹理特征 (局部二值模式) radius = 3 n_points = 8 * radius lbp = local_binary_pattern(gray, n_points, radius, method='uniform') lbp_hist, _ = np.histogram(lbp.ravel(), bins=np.arange(0, n_points + 3), range=(0, n_points + 2)) lbp_hist = lbp_hist.astype("float") lbp_hist /= (lbp_hist.sum() + 1e-7) # 归一化 features['lbp'] = lbp_hist.tolist() # 3. 颜色特征 (如果是彩色图像) if len(image.shape) == 3: # 在HSV空间中计算颜色直方图 hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) h_hist = cv2.calcHist([hsv], [0], None, [180], [0, 180]) s_hist = cv2.calcHist([hsv], [1], None, [256], [0, 256]) v_hist = cv2.calcHist([hsv], [2], None, [256], [0, 256]) h_hist = cv2.normalize(h_hist, h_hist).flatten() s_hist = cv2.normalize(s_hist, s_hist).flatten() v_hist = cv2.normalize(v_hist, v_hist).flatten() features['h_hist'] = h_hist.tolist() features['s_hist'] = s_hist.tolist() features['v_hist'] = v_hist.tolist() # 4. 边缘特征 edges = cv2.Canny(gray, 100, 200) edge_density = np.sum(edges > 0) / (edges.shape[0] * edges.shape[1]) features['edge_density'] = edge_density # 5. 多尺度分析 scales = [1.0, 0.5, 0.25] scale_features = [] for scale in scales: scaled = cv2.resize(gray, None, fx=scale, fy=scale) # 计算小波变换 (近似) coeffs = cv2.dct(np.float32(scaled)/255.0) coeffs_flat = coeffs[:10, :10].flatten() scale_features.extend(coeffs_flat.tolist()) features['multi_scale'] = scale_features return features def compute_feature_distance(features1, features2): """ 计算两个特征集之间的距离 :param features1: 特征集1 :param features2: 特征集2 :return: 距离分数 (0-1, 0表示完全相同) """ # 直方图距离 (卡方距离) hist_dist = distance.chisquare(features1['histogram'], features2['histogram'])[0] / 1000 # LBP距离 (巴氏距离) lbp_dist = distance.bhattacharyya(features1['lbp'], features2['lbp']) # 颜色距离 (如果存在) color_dist = 0 if 'h_hist' in features1 and 'h_hist' in features2: h_dist = distance.chisquare(features1['h_hist'], features2['h_hist'])[0] / 1000 s_dist = distance.chisquare(features1['s_hist'], features2['s_hist'])[0] / 1000 v_dist = distance.chisquare(features1['v_hist'], features2['v_hist'])[0] / 1000 color_dist = (h_dist + s_dist + v_dist) / 3 # 边缘密度差异 edge_dist = abs(features1['edge_density'] - features2['edge_density']) # 多尺度特征差异 (欧氏距离) scale_dist = distance.euclidean(features1['multi_scale'], features2['multi_scale']) / 100 # 加权组合距离 weights = { 'hist': 0.3, 'lbp': 0.3, 'color': 0.2, 'edge': 0.1, 'scale': 0.1 } total_dist = ( weights['hist'] * hist_dist + weights['lbp'] * lbp_dist + weights['color'] * color_dist + weights['edge'] * edge_dist + weights['scale'] * scale_dist ) # 归一化到0-1范围 total_dist = min(1.0, max(0.0, total_dist)) return total_dist def adaptive_threshold_diff(sample_gray, test_gray): """ 自适应阈值差异检测 :param sample_gray: 样本灰度图像 :param test_gray: 测试灰度图像 :return: 二值差异图像 """ # 计算绝对差异 diff = cv2.absdiff(sample_gray, test_gray) # 自适应阈值 block_size = max(11, min(sample_gray.shape[0] // 20, 51)) adaptive_thresh = cv2.adaptiveThreshold( diff, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, block_size, 5 ) # 形态学操作增强缺陷区域 kernel = np.ones((3, 3), np.uint8) enhanced = cv2.morphologyEx(adaptive_thresh, cv2.MORPH_CLOSE, kernel) enhanced = cv2.morphologyEx(enhanced, cv2.MORPH_OPEN, kernel) return enhanced def detect_defects(sample_gray, test_gray, diff_map): """ 检测并分类缺陷 :param sample_gray: 样本灰度图像 :param test_gray: 测试灰度图像 :param diff_map: 差异图 :return: 缺陷列表, 标记图像 """ # 查找连通区域 num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(diff_map, connectivity=8) defects = [] marked_image = cv2.cvtColor(test_gray, cv2.COLOR_GRAY2BGR) # 最小和最大缺陷面积 min_area = app_config.get("min_defect_area", 50) max_area = app_config.get("max_defect_area", 5000) for i in range(1, num_labels): # 跳过背景 area = stats[i, cv2.CC_STAT_AREA] # 过滤太小或太大的区域 if area < min_area or area > max_area: continue # 获取缺陷区域 x, y, w, h = stats[i, cv2.CC_STAT_LEFT], stats[i, cv2.CC_STAT_TOP], stats[i, cv2.CC_STAT_WIDTH], stats[i, cv2.CC_STAT_HEIGHT] defect_roi = diff_map[y:y+h, x:x+w] # 计算缺陷特征 sample_roi = sample_gray[y:y+h, x:x+w] test_roi = test_gray[y:y+h, x:x+w] # 缺陷类型分类 defect_type = classify_defect(sample_roi, test_roi, defect_roi) # 保存缺陷信息 defects.append({ "type": defect_type, "area": area, "location": (x, y, w, h), "centroid": (int(centroids[i][0]), int(centroids[i][1])) }) # 在标记图像上绘制 color = (0, 0, 255) # 默认红色 if defect_type == "color": color = (0, 165, 255) # 橙色 elif defect_type == "texture": color = (0, 255, 255) # 黄色 elif defect_type == "missing": color = (255, 0, 0) # 蓝色 elif defect_type == "stain": color = (0, 255, 0) # 绿色 cv2.rectangle(marked_image, (x, y), (x+w, y+h), color, 2) cv2.putText(marked_image, f"{defect_type}", (x, y-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1) return defects, marked_image def classify_defect(sample_roi, test_roi, defect_mask): """ 分类缺陷类型 :param sample_roi: 样本区域 :param test_roi: 测试区域 :param defect_mask: 缺陷掩码 :return: 缺陷类型 (color, texture, missing, stain) """ # 计算颜色差异 color_diff = np.mean(np.abs(sample_roi.astype(np.float32) - test_roi.astype(np.float32))) # 计算纹理差异 (使用LBP) radius = 2 n_points = 8 * radius lbp_sample = local_binary_pattern(sample_roi, n_points, radius, method='uniform') lbp_test = local_binary_pattern(test_roi, n_points, radius, method='uniform') hist_sample, _ = np.histogram(lbp_sample.ravel(), bins=np.arange(0, n_points + 3), range=(0, n_points + 2)) hist_test, _ = np.histogram(lbp_test.ravel(), bins=np.arange(0, n_points + 3), range=(0, n_points + 2)) hist_sample = hist_sample.astype("float") hist_test = hist_test.astype("float") hist_sample /= (hist_sample.sum() + 1e-7) hist_test /= (hist_test.sum() + 1e-7) texture_diff = distance.chisquare(hist_sample, hist_test)[0] # 计算边缘密度差异 edges_sample = cv2.Canny(sample_roi, 50, 150) edges_test = cv2.Canny(test_roi, 50, 150) edge_density_sample = np.sum(edges_sample > 0) / edges_sample.size edge_density_test = np.sum(edges_test > 0) / edges_test.size edge_diff = abs(edge_density_sample - edge_density_test) # 根据特征分类 if color_diff > 30 and texture_diff < 10: return "color" # 颜色缺陷 elif texture_diff > 15 and color_diff < 20: return "texture" # 纹理缺陷 elif edge_diff > 0.1 and np.mean(test_roi[defect_mask > 0]) < np.mean(sample_roi[defect_mask > 0]) - 20: return "missing" # 缺失图案 else: return "stain" # 污渍 def check_print_quality(sample_image_path, test_image, threshold=None): """ 增强版布料印花检测算法 :param sample_image_path: 合格样本图像路径 :param test_image: 内存中的测试图像 (numpy数组) :param threshold: 差异阈值 :return: 是否合格, 差异值, 标记图像, 缺陷列表 """ global sample_features, app_config # 使用配置中的阈值 if threshold is None: threshold = app_config.get("threshold", 0.05) # 读取样本图像 try: sample_img_data = np.fromfile(sample_image_path, dtype=np.uint8) sample_image = cv2.imdecode(sample_img_data, cv2.IMREAD_COLOR) if sample_image is None: logging.error(f"无法解码样本图像: {sample_image_path}") return None, None, None, None except Exception as e: logging.exception(f"样本图像读取异常: {str(e)}") return None, None, None, None # 确保测试图像是彩色 if len(test_image.shape) == 2: # 如果是灰度图像 test_image = cv2.cvtColor(test_image, cv2.COLOR_GRAY2BGR) # 确保两个图像大小一致 try: test_image = cv2.resize(test_image, (sample_image.shape[1], sample_image.shape[0])) except Exception as e: logging.error(f"图像调整大小失败: {str(e)}") return None, None, None, None # 特征提取和比较 if sample_image_path not in sample_features: logging.info(f"提取样本特征: {sample_image_path}") sample_features[sample_image_path] = extract_features(sample_image) test_features = extract_features(test_image) feature_diff = compute_feature_distance(sample_features[sample_image_path], test_features) # 转换为灰度图像进行像素级比较 sample_gray = cv2.cvtColor(sample_image, cv2.COLOR_BGR2GRAY) test_gray = cv2.cvtColor(test_image, cv2.COLOR_BGR2GRAY) # 自适应阈值差异检测 diff_map = adaptive_threshold_diff(sample_gray, test_gray) # 计算差异比例 diff_pixels = np.count_nonzero(diff_map) total_pixels = sample_gray.size pixel_diff_ratio = diff_pixels / total_pixels # 综合差异分数 combined_diff = 0.7 * feature_diff + 0.3 * pixel_diff_ratio # 检测和分类缺陷 defects, marked_image = detect_defects(sample_gray, test_gray, diff_map) # 判断是否合格 is_qualified = combined_diff <= threshold and len(defects) == 0 return is_qualified, combined_diff, marked_image, defects # ======================== UI更新函数 ======================== def update_diff_display(diff_ratio, is_qualified, defects): """ 更新差异度显示控件 """ # 更新当前差异度显示 ui.lblCurrentDiff.setText(f"当前差异度: {diff_ratio*100:.2f}%") # 显示缺陷数量 defect_count = len(defects) ui.lblDefectCount.setText(f"缺陷数量: {defect_count}") # 根据合格状态设置颜色 if is_qualified: ui.lblDiffStatus.setText("状态: 合格") ui.lblDiffStatus.setStyleSheet("color: green; font-size: 12px;") ui.lblDefectCount.setStyleSheet("color: green;") else: ui.lblDiffStatus.setText("状态: 不合格") ui.lblDiffStatus.setStyleSheet("color: red; font-size: 12px;") ui.lblDefectCount.setStyleSheet("color: red;") def update_diff_threshold(value): """ 当滑块值改变时更新阈值显示 """ global app_config app_config["threshold"] = value / 100.0 ui.lblDiffValue.setText(f"{value}%") save_config(app_config) def update_min_defect_area(value): """ 更新最小缺陷面积阈值 """ global app_config app_config["min_defect_area"] = value ui.lblMinAreaValue.setText(f"{value} px²") save_config(app_config) def update_max_defect_area(value): """ 更新最大缺陷面积阈值 """ global app_config app_config["max_defect_area"] = value ui.lblMaxAreaValue.setText(f"{value} px²") save_config(app_config) def toggle_texture_analysis(state): """ 切换纹理分析功能 """ global app_config app_config["texture_analysis"] = (state == Qt.Checked) save_config(app_config) def toggle_edge_detection(state): """ 切换边缘检测功能 """ global app_config app_config["edge_detection"] = (state == Qt.Checked) save_config(app_config) def toggle_adaptive_threshold(state): """ 切换自适应阈值功能 """ global app_config app_config["adaptive_threshold"] = (state == Qt.Checked) save_config(app_config) def update_color_sensitivity(value): """ 更新颜色敏感度 """ global app_config app_config["color_sensitivity"] = value / 100.0 ui.lblColorSensitivityValue.setText(f"{value}%") save_config(app_config) # 布料印花检测功能(增强版) def check_print(): global isGrabbing, obj_cam_operation, current_sample_path, detection_history, app_config logging.info("检测印花质量按钮按下") # 1. 检查相机状态 if not isGrabbing: logging.warning("相机未取流") QMessageBox.warning(mainWindow, "错误", "请先开始取流并捕获图像!", QMessageBox.Ok) return # 2. 检查相机操作对象 if not obj_cam_operation: logging.error("相机操作对象未初始化") QMessageBox.warning(mainWindow, "错误", "相机未正确初始化!", QMessageBox.Ok) return # 3. 检查样本路径 if not current_sample_path or not os.path.exists(current_sample_path): logging.warning(f"无效样本路径: {current_sample_path}") QMessageBox.warning(mainWindow, "错误", "请先设置有效的标准样本图像!", QMessageBox.Ok) return # 使用进度对话框防止UI阻塞 progress = QProgressDialog("正在检测...", "取消", 0, 100, mainWindow) progress.setWindowModality(Qt.WindowModal) progress.setValue(10) try: # 4. 获取当前帧 logging.info("尝试获取当前帧") test_image = obj_cam_operation.get_current_frame() progress.setValue(30) if test_image is None: logging.warning("获取当前帧失败") QMessageBox.warning(mainWindow, "错误", "无法获取当前帧图像!", QMessageBox.Ok) return # 5. 获取差异度阈值 diff_threshold = app_config.get("threshold", 0.05) logging.info(f"使用差异度阈值: {diff_threshold}") progress.setValue(50) # 6. 执行检测 is_qualified, diff_ratio, marked_image, defects = check_print_quality( current_sample_path, test_image, threshold=diff_threshold ) progress.setValue(70) # 检查返回结果是否有效 if is_qualified is None: logging.error("检测函数返回无效结果") QMessageBox.critical(mainWindow, "检测错误", "检测失败,请检查日志", QMessageBox.Ok) return logging.info(f"检测结果: 合格={is_qualified}, 差异={diff_ratio}, 缺陷数={len(defects)}") progress.setValue(90) # 7. 更新UI update_diff_display(diff_ratio, is_qualified, defects) # 显示详细结果 result_text = f"印花是否合格: {'合格' if is_qualified else '不合格'}\n" result_text += f"综合差异度: {diff_ratio*100:.2f}%\n" result_text += f"缺陷数量: {len(defects)}\n" result_text += f"阈值: {diff_threshold*100:.2f}%" if not is_qualified and defects: result_text += "\n\n缺陷详情:" for i, defect in enumerate(defects[:3]): # 最多显示3个主要缺陷 result_text += f"\n{i+1}. {defect['type']} (大小: {defect['area']}像素)" QMessageBox.information(mainWindow, "检测结果", result_text, QMessageBox.Ok) if marked_image is not None: cv2.imshow("缺陷标记结果", marked_image) cv2.waitKey(0) cv2.destroyAllWindows() else: logging.warning("标记图像为空") # 8. 记录检测结果 detection_result = { 'timestamp': datetime.now(), 'qualified': is_qualified, 'diff_ratio': diff_ratio, 'defects': defects, 'threshold': diff_threshold } detection_history.append(detection_result) update_history_display() progress.setValue(100) except Exception as e: logging.exception("印花检测失败") QMessageBox.critical(mainWindow, "检测错误", f"检测过程中发生错误: {str(e)}", QMessageBox.Ok) finally: progress.close() # 保存标准样本函数(增强版) def save_sample_image(): global isGrabbing, obj_cam_operation, current_sample_path, sample_features if not isGrabbing: QMessageBox.warning(mainWindow, "错误", "请先开始取流并捕获图像!", QMessageBox.Ok) return # 检查是否有有效图像 if not obj_cam_operation.is_frame_available(): QMessageBox.warning(mainWindow, "无有效图像", "未捕获到有效图像,请检查相机状态!", QMessageBox.Ok) return # 读取上次使用的路径 settings = QSettings("ClothInspection", "CameraApp") last_dir = settings.value("last_save_dir", os.path.join(os.getcwd(), "captures")) # 创建默认文件名 timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") default_filename = f"sample_{timestamp}" # 弹出文件保存对话框 file_path, selected_filter = QFileDialog.getSaveFileName( mainWindow, "保存标准样本图像", os.path.join(last_dir, default_filename), "BMP Files (*.bmp);;PNG Files (*.png);;JPEG Files (*.jpg);;所有文件 (*)", options=QFileDialog.DontUseNativeDialog ) if not file_path: logging.info("用户取消了图像保存操作") return # 用户取消保存 # 处理文件扩展名 file_extension = os.path.splitext(file_path)[1].lower() if not file_extension: # 根据选择的过滤器添加扩展名 if "BMP" in selected_filter: file_path += ".bmp" elif "PNG" in selected_filter: file_path += ".png" elif "JPEG" in selected_filter or "JPG" in selected_filter: file_path += ".jpg" else: # 默认使用BMP格式 file_path += ".bmp" file_extension = os.path.splitext(file_path)[1].lower() # 根据扩展名设置保存格式 format_mapping = { ".bmp": "bmp", ".png": "png", ".jpg": "jpg", ".jpeg": "jpg" } save_format = format_mapping.get(file_extension) if not save_format: QMessageBox.warning(mainWindow, "错误", "不支持的文件格式!", QMessageBox.Ok) return # 确保目录存在 directory = os.path.dirname(file_path) if directory and not os.path.exists(directory): try: os.makedirs(directory, exist_ok=True) logging.info(f"创建目录: {directory}") except OSError as e: error_msg = f"无法创建目录 {directory}: {str(e)}" QMessageBox.critical(mainWindow, "目录创建错误", error_msg, QMessageBox.Ok) return # 保存当前帧作为标准样本 try: ret = obj_cam_operation.save_image(file_path, save_format) if ret != MV_OK: strError = f"保存样本图像失败: {hex(ret)}" QMessageBox.warning(mainWindow, "错误", strError, QMessageBox.Ok) else: success_msg = f"标准样本已保存至:\n{file_path}" QMessageBox.information(mainWindow, "成功", success_msg, QMessageBox.Ok) # 更新当前样本路径 current_sample_path = file_path update_sample_display() # 提取并保存样本特征 sample_img = cv2.imread(file_path) if sample_img is not None: sample_features[file_path] = extract_features(sample_img) logging.info(f"样本特征已提取并保存: {file_path}") # 保存当前目录 settings.setValue("last_save_dir", os.path.dirname(file_path)) except Exception as e: error_msg = f"保存图像时发生错误: {str(e)}" QMessageBox.critical(mainWindow, "异常错误", error_msg, QMessageBox.Ok) logging.exception("保存样本图像时发生异常") # 预览当前样本 def preview_sample(): global current_sample_path if not current_sample_path or not os.path.exists(current_sample_path): QMessageBox.warning(mainWindow, "错误", "请先设置有效的标准样本图像!", QMessageBox.Ok) return try: # 使用安全方法读取图像 img_data = np.fromfile(current_sample_path, dtype=np.uint8) sample_img = cv2.imdecode(img_data, cv2.IMREAD_COLOR) if sample_img is None: raise Exception("无法加载图像") # 显示特征信息 if current_sample_path in sample_features: features = sample_features[current_sample_path] edge_density = features.get('edge_density', 0) feature_info = f"边缘密度: {edge_density:.4f}\n特征点: {len(features.get('multi_scale', []))}" cv2.putText(sample_img, feature_info, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2) cv2.imshow("标准样本预览", sample_img) cv2.waitKey(0) cv2.destroyAllWindows() except Exception as e: QMessageBox.warning(mainWindow, "错误", f"预览样本失败: {str(e)}", QMessageBox.Ok) # 更新样本路径显示 def update_sample_display(): global current_sample_path if current_sample_path: ui.lblSamplePath.setText(f"当前样本: {os.path.basename(current_sample_path)}") ui.lblSamplePath.setToolTip(current_sample_path) ui.bnPreviewSample.setEnabled(True) else: ui.lblSamplePath.setText("当前样本: 未设置样本") ui.bnPreviewSample.setEnabled(False) # 更新历史记录显示 def update_history_display(): global detection_history ui.cbHistory.clear() for i, result in enumerate(detection_history[-10:]): # 显示最近10条记录 timestamp = result['timestamp'].strftime("%H:%M:%S") status = "合格" if result['qualified'] else "不合格" ratio = f"{result['diff_ratio']*100:.2f}%" defects = len(result.get('defects', [])) ui.cbHistory.addItem(f"[{timestamp}] {status} - 差异: {ratio} - 缺陷: {defects}") # ... [其余原有函数保持不变,如设备枚举、相机操作等] ... if __name__ == "__main__": # ... [原有初始化代码] ... # 初始化UI app = QApplication(sys.argv) mainWindow = QMainWindow() ui = Ui_MainWindow() ui.setupUi(mainWindow) # 扩大主窗口尺寸 mainWindow.resize(1400, 900) # 更宽的界面 # 创建工具栏 toolbar = mainWindow.addToolBar("检测工具") # 添加检测按钮 ui.bnCheckPrint = QPushButton("检测印花质量") toolbar.addWidget(ui.bnCheckPrint) # 添加保存样本按钮 ui.bnSaveSample = QPushButton("保存标准样本") toolbar.addWidget(ui.bnSaveSample) # 添加预览样本按钮 ui.bnPreviewSample = QPushButton("预览样本") toolbar.addWidget(ui.bnPreviewSample) # 添加历史记录下拉框 ui.cbHistory = QComboBox() ui.cbHistory.setMinimumWidth(350) toolbar.addWidget(QLabel("历史记录:")) toolbar.addWidget(ui.cbHistory) # 添加当前样本显示标签 ui.lblSamplePath = QLabel("当前样本: 未设置样本") status_bar = mainWindow.statusBar() status_bar.addPermanentWidget(ui.lblSamplePath) # === 增强版差异度调整面板 === # 创建右侧面板容器 right_panel = QWidget() right_layout = QVBoxLayout(right_panel) right_layout.setContentsMargins(10, 10, 10, 10) # 创建差异度调整组 diff_group = QGroupBox("检测参数设置") diff_layout = QVBoxLayout(diff_group) # 差异度阈值控制 ui.lblDiffThreshold = QLabel("差异度阈值 (0-100%):") ui.sliderDiffThreshold = QSlider(Qt.Horizontal) ui.sliderDiffThreshold.setRange(0, 100) # 0-100% ui.sliderDiffThreshold.setValue(int(app_config.get("threshold", 0.05) * 100)) ui.lblDiffValue = QLabel(f"{ui.sliderDiffThreshold.value()}%") # 当前差异度显示 ui.lblCurrentDiff = QLabel("当前差异度: -") ui.lblCurrentDiff.setStyleSheet("font-size: 14px; font-weight: bold;") # 缺陷数量显示 ui.lblDefectCount = QLabel("缺陷数量: -") ui.lblDefectCount.setStyleSheet("font-size: 14px;") # 差异度状态指示器 ui.lblDiffStatus = QLabel("状态: 未检测") ui.lblDiffStatus.setStyleSheet("font-size: 12px;") # 添加分隔线 diff_layout.addWidget(QLabel(" ")) diff_layout.addWidget(QLabel("缺陷检测参数")) # 最小缺陷面积 ui.lblMinArea = QLabel("最小缺陷面积:") ui.sliderMinArea = QSlider(Qt.Horizontal) ui.sliderMinArea.setRange(10, 500) ui.sliderMinArea.setValue(app_config.get("min_defect_area", 50)) ui.lblMinAreaValue = QLabel(f"{ui.sliderMinArea.value()} px²") # 最大缺陷面积 ui.lblMaxArea = QLabel("最大缺陷面积:") ui.sliderMaxArea = QSlider(Qt.Horizontal) ui.sliderMaxArea.setRange(1000, 10000) ui.sliderMaxArea.setValue(app_config.get("max_defect_area", 5000)) ui.lblMaxAreaValue = QLabel(f"{ui.sliderMaxArea.value()} px²") # 颜色敏感度 ui.lblColorSensitivity = QLabel("颜色敏感度:") ui.sliderColorSensitivity = QSlider(Qt.Horizontal) ui.sliderColorSensitivity.setRange(0, 100) ui.sliderColorSensitivity.setValue(int(app_config.get("color_sensitivity", 0.8) * 100)) ui.lblColorSensitivityValue = QLabel(f"{ui.sliderColorSensitivity.value()}%") # 添加分隔线 diff_layout.addWidget(QLabel(" ")) diff_layout.addWidget(QLabel("高级分析选项")) # 纹理分析开关 ui.chkTextureAnalysis = QCheckBox("启用纹理分析") ui.chkTextureAnalysis.setChecked(app_config.get("texture_analysis", True)) # 边缘检测开关 ui.chkEdgeDetection = QCheckBox("启用边缘检测") ui.chkEdgeDetection.setChecked(app_config.get("edge_detection", True)) # 自适应阈值开关 ui.chkAdaptiveThreshold = QCheckBox("使用自适应阈值") ui.chkAdaptiveThreshold.setChecked(app_config.get("adaptive_threshold", True)) # 布局控件 diff_layout.addWidget(ui.lblDiffThreshold) diff_layout.addWidget(ui.sliderDiffThreshold) diff_layout.addWidget(ui.lblDiffValue) diff_layout.addWidget(ui.lblCurrentDiff) diff_layout.addWidget(ui.lblDefectCount) diff_layout.addWidget(ui.lblDiffStatus) diff_layout.addWidget(ui.lblMinArea) diff_layout.addWidget(ui.sliderMinArea) diff_layout.addWidget(ui.lblMinAreaValue) diff_layout.addWidget(ui.lblMaxArea) diff_layout.addWidget(ui.sliderMaxArea) diff_layout.addWidget(ui.lblMaxAreaValue) diff_layout.addWidget(ui.lblColorSensitivity) diff_layout.addWidget(ui.sliderColorSensitivity) diff_layout.addWidget(ui.lblColorSensitivityValue) diff_layout.addWidget(ui.chkTextureAnalysis) diff_layout.addWidget(ui.chkEdgeDetection) diff_layout.addWidget(ui.chkAdaptiveThreshold) # 添加拉伸项使控件靠上 diff_layout.addStretch(1) # 创建停靠窗口 dock = QDockWidget("检测参数面板", mainWindow) dock.setWidget(right_panel) dock.setFeatures(QDockWidget.DockWidgetMovable | QDockWidget.DockWidgetFloatable) mainWindow.addDockWidget(Qt.RightDockWidgetArea, dock) # === 连接信号 === # 差异度阈值滑块 ui.sliderDiffThreshold.valueChanged.connect(update_diff_threshold) # 最小缺陷面积滑块 ui.sliderMinArea.valueChanged.connect(update_min_defect_area) # 最大缺陷面积滑块 ui.sliderMaxArea.valueChanged.connect(update_max_defect_area) # 颜色敏感度滑块 ui.sliderColorSensitivity.valueChanged.connect(update_color_sensitivity) # 分析选项复选框 ui.chkTextureAnalysis.stateChanged.connect(toggle_texture_analysis) ui.chkEdgeDetection.stateChanged.connect(toggle_edge_detection) ui.chkAdaptiveThreshold.stateChanged.connect(toggle_adaptive_threshold) # 绑定按钮事件 ui.bnCheckPrint.clicked.connect(check_print) ui.bnSaveSample.clicked.connect(save_sample_image) ui.bnPreviewSample.clicked.connect(preview_sample) # ... [其余原有绑定代码] ... # 显示主窗口 mainWindow.show() # 执行应用 app.exec_() # 关闭设备 close_device() # ch:反初始化SDK | en: finalize SDK MvCamera.MV_CC_Finalize() sys.exit() 这个程序运行后出现了查找不到打不开相机同时原有的差异度检查还有新增的功能都看不到了

class DistanceAnalyzer: def __init__(self, max_history=1000): # 初始化统计数据存储 self.distance_history = deque(maxlen=max_history) # 历史距离数据 self.filter_ratios = deque(maxlen=max_history) # 历史过滤比例 self.frame_count = 0 # 处理帧数计数器 self.stats_data = [] # 统计信息存储 def analyze_distances(self, distances, min_threshold, max_threshold): """ 分析距离分布并计算过滤比例 :param distances: 当前帧的距离数组 :param min_threshold: 最小距离阈值 :param max_threshold: 最大距离阈值 :return: 过滤比例和统计信息字典 """ # 更新历史数据 self.distance_history.extend(distances) self.frame_count += 1 # 转换为NumPy数组 dist_array = np.array(distances) # 基本统计量 mean = np.mean(dist_array) median = np.median(dist_array) std = np.std(dist_array) min_val = np.min(dist_array) max_val = np.max(dist_array) # 百分位数 percentiles = np.percentile(dist_array, [5, 25, 50, 75, 95]) # 过滤比例计算 filter_flag = (dist_array >= min_threshold) & (dist_array <= max_threshold) filtered_count = np.sum(filter_flag) total_count = len(dist_array) filter_ratio = filtered_count / total_count if total_count > 0 else 1.0 self.filter_ratios.append(filter_ratio) # 距离分布直方图数据 hist, bin_edges = np.histogram(dist_array, bins=20, range=(0, max(100, max_val))) # 创建统计信息字典 stats_info = { 'frame': self.frame_count, 'total_points': total_count, 'filtered_points': filtered_count, 'filter_ratio': filter_ratio, 'mean_distance': mean, 'median_distance': median, 'std_distance': std, 'min_distance': min_val, 'max_distance': max_val, 'percentiles': { '5th': percentiles[0], '25th': percentiles[1], '50th': percentiles[2], '75th': percentiles[3], '95th': percentiles[4] }, 'histogram': { 'counts': hist.tolist(), 'bins': bin_edges.tolist() }, 'thresholds': { 'min': min_threshold, 'max': max_threshold } } # 保存统计信息 self.stats_data.append(stats_info) return filter_ratio, stats_info def print_current_stats(self, stats_info): """打印当前帧的统计信息""" print("\n" + "=" * 50) print(f"帧 #{stats_info['frame']} 距离分布统计") print("=" * 50) print(f"总点数: {stats_info['total_points']}") print(f"过滤后点数: {stats_info['filtered_points']}") print(f"过滤比例: {stats_info['filter_ratio'] * 100:.2f}%") print(f"平均距离: {stats_info['mean_distance']:.2f}m") print(f"中位数距离: {stats_info['median_distance']:.2f}m") print(f"标准差: {stats_info['std_distance']:.2f}m") print(f"最小距离: {stats_info['min_distance']:.2f}m") print(f"最大距离: {stats_info['max_distance']:.2f}m") print(f"距离阈值: {stats_info['thresholds']['min']:.2f}m - {stats_info['thresholds']['max']:.2f}m") print(f"5%分位数: {stats_info['percentiles']['5th']:.2f}m") print(f"95%分位数: {stats_info['percentiles']['95th']:.2f}m") def visualize_distribution(self, stats_info): """可视化当前帧的距离分布""" plt.figure(figsize=(15, 10)) # 直方图 plt.subplot(2, 2, 1) bins = stats_info['histogram']['bins'] counts = stats_info['histogram']['counts'] plt.bar(bins[:-1], counts, width=np.diff(bins), align='edge', alpha=0.7) plt.axvline(stats_info['thresholds']['min'], color='r', linestyle='--', label=f'最小阈值: {stats_info["thresholds"]["min"]:.1f}m') plt.axvline(stats_info['thresholds']['max'], color='g', linestyle='--', label=f'最大阈值: {stats_info["thresholds"]["max"]:.1f}m') plt.axvline(stats_info['mean_distance'], color='b', linestyle=':', label=f'平均值: {stats_info["mean_distance"]:.1f}m') plt.xlabel('距离 (米)') plt.ylabel('点数') plt.title(f'帧 #{stats_info["frame"]} 距离分布') plt.legend() # 箱线图 plt.subplot(2, 2, 2) sns.boxplot(x=np.array(self.distance_history)) plt.axvline(stats_info['thresholds']['min'], color='r', linestyle='--') plt.axvline(stats_info['thresholds']['max'], color='g', linestyle='--') plt.xlabel('距离 (米)') plt.title('距离分布箱线图') # 过滤比例趋势 plt.subplot(2, 2, 3) frames = [s['frame'] for s in self.stats_data] ratios = [s['filter_ratio'] for s in self.stats_data] plt.plot(frames, ratios, 'bo-', label='过滤比例') plt.axhline(0.9, color='r', linestyle='--', label='目标比例(90%)') plt.xlabel('帧号') plt.ylabel('过滤比例') plt.title('过滤比例趋势') plt.legend() plt.ylim(0, 1.1) # 距离阈值变化 plt.subplot(2, 2, 4) min_ths = [s['thresholds']['min'] for s in self.stats_data] max_ths = [s['thresholds']['max'] for s in self.stats_data] plt.plot(frames, min_ths, 'r-', label='最小阈值') plt.plot(frames, max_ths, 'g-', label='最大阈值') plt.xlabel('帧号') plt.ylabel('阈值 (米)') plt.title('距离阈值变化趋势') plt.legend() plt.tight_layout() plt.show() def generate_report(self, last_n_frames=100): """生成综合统计报告""" if not self.stats_data: return "无统计数据" # 获取最近的统计数据 recent_data = self.stats_data[-last_n_frames:] # 创建DataFrame用于分析 df = pd.DataFrame(recent_data) # 计算整体统计量 total_points = df['total_points'].sum() filtered_points = df['filtered_points'].sum() overall_ratio = filtered_points / total_points if total_points > 0 else 0 # 距离统计 all_distances = np.array(list(self.distance_history)) distance_stats = { 'mean': np.mean(all_distances), 'median': np.median(all_distances), 'std': np.std(all_distances), 'min': np.min(all_distances), 'max': np.max(all_distances) } # 阈值统计 min_th_mean = df['thresholds'].apply(lambda x: x['min']).mean() max_th_mean = df['thresholds'].apply(lambda x: x['max']).mean() # 打印报告 report = f"\n{'=' * 50}\n距离分析报告 (最近{last_n_frames}帧)\n{'=' * 50}" report += f"\n总点数: {total_points}" report += f"\n过滤后点数: {filtered_points}" report += f"\n整体过滤比例: {overall_ratio * 100:.2f}%" report += f"\n平均过滤比例: {df['filter_ratio'].mean() * 100:.2f}%" report += f"\n平均距离: {distance_stats['mean']:.2f}m ± {distance_stats['std']:.2f}m" report += f"\n距离范围: {distance_stats['min']:.2f}m - {distance_stats['max']:.2f}m" report += f"\n平均阈值范围: {min_th_mean:.2f}m - {max_th_mean:.2f}m" report += f"\n过滤比例标准差: {df['filter_ratio'].std() * 100:.2f}%" # 识别异常帧 low_ratio_frames = df[df['filter_ratio'] < 0.7] if not low_ratio_frames.empty: report += "\n\n警告: 以下帧过滤比例低于70%:" for _, row in low_ratio_frames.iterrows(): report += f"\n - 帧 #{row['frame']}: {row['filter_ratio'] * 100:.1f}%" return report把这个类整合进车辆过滤函数中

【我现在已经使用jupyter notebook将数据预处理做好填充了缺失值以及处理异常值 import pandas as pd df = pd.read_csv("数据分析:train_data.csv") df.info() # 重复值 duplicate_rows = df.duplicated() print("\n重复行的总数:", duplicate_rows.sum()) # 重复值 import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv("数据分析:train_data.csv") # 检测重复行 duplicate_rows = df.duplicated() num_duplicates = duplicate_rows.sum() num_non_duplicates = len(df) - num_duplicates # 准备可视化数据 categories = ['非重复行', '重复行'] values = [num_non_duplicates, num_duplicates] plt.rcParams['font.sans-serif']=['SimHei']#设置中文字体黑体 plt.rcParams['axes.unicode_minus']=False#正常显示负号 # 创建条形图 plt.figure(figsize=(10, 6)) bars = plt.bar(categories, values, color=['lightblue', 'red']) # 添加数据标签 for bar in bars: height = bar.get_height() plt.text(bar.get_x() + bar.get_width()/2., height + 0.1,f'{height}', ha='center', va='bottom', fontsize=10) # 添加标题和标签 plt.title('数据中重复行与非重复行数量对比', fontsize=16) plt.xlabel('行类型', fontsize=14) plt.ylabel('数量', fontsize=14) plt.tight_layout() plt.show() # 缺失值 import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # 设置中文字体 plt.rcParams['font.sans-serif']=['SimHei'] plt.rcParams['axes.unicode_minus']=False # 读取数据 df = pd.read_csv("数据分析:train_data.csv") # 计算每列缺失值数量 missing_counts = df.isnull().sum() # 计算每列缺失值占比 missing_ratios = (missing_counts / len(df)) * 100 # 创建缺失值统计表格 missing_stats = pd.DataFrame({'特征名称': missing_counts.index,'缺失值数量': missing_counts.values,'缺失值占比(%)': missing_ratios.round(2)}) # 按缺失值数量降序排列 missing_stats = missing_stats.sort_values('缺失值数量', ascending=False) print("缺失值统计结果:") print(missing_stats[missing_stats['缺失值数量'] > 0] if not missing_stats.empty else "无缺失值") # 仅选择有缺失值的列进行可视化 has_missing = missing_counts[missing_counts > 0] if not has_missing.empty: # 原始数据缺失值热力图 plt.figure(figsize=(12, 8)) sns.heatmap(df.isnull(), cbar=False, cmap='coolwarm', yticklabels=False) plt.title('原始数据缺失值分布图(红色表示缺失值)', fontsize=15) plt.tight_layout() plt.show() # 用-1填充缺失值 df_filled = df.fillna(-1) # 填充后缺失值热力图 plt.figure(figsize=(12, 8)) sns.heatmap(df_filled.isnull(), cbar=False, cmap='coolwarm', yticklabels=False) plt.title('填充后缺失值分布图(无缺失值)', fontsize=15) plt.tight_layout() plt.show() # 保存填充后的文件 df_filled.to_csv("train_data_填充缺失值.csv", index=False) print("已将缺失值用-1填充,并保存为 train_data_填充缺失值.csv") else: print("数据集中没有缺失值,无需填充") # 异常值 import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from scipy import stats def setup_chinese_font(): chinese_fonts = ["SimHei", "WenQuanYi Micro Hei", "Heiti TC", "Microsoft YaHei", "Arial Unicode MS"] for font in chinese_fonts: try: plt.rcParams["font.family"] = [font, "sans-serif"] plt.rcParams["axes.unicode_minus"] = False # 解决负号显示问题 sns.set(font=font) return font except: continue plt.rcParams["font.family"] = ["sans-serif"] plt.rcParams["axes.unicode_minus"] = False return "默认字体" def anomaly_detection_and_handling(): setup_chinese_font() # 1. 数据加载与初步探索 print("\n=== 1. 数据加载与初步探索 ===") # 定义所有数值型特征列 numeric_cols = ['Installation_Year', 'Operational_Hours', 'Temperature_C', 'Vibration_mms', 'Sound_dB', 'Oil_Level_pct', 'Coolant_Level_pct', 'Power_Consumption_kW', 'Last_Maintenance_Days_Ago', 'Maintenance_History_Count', 'Failure_History_Count', 'Error_Codes_Last_30_Days', 'Remaining_Useful_Life_days', 'Laser_Intensity', 'Hydraulic_Pressure_bar', 'Coolant_Flow_L_min', 'Heat_Index', 'AI_Override_Events'] # 定义使用-1填充的特殊列 special_cols = ['Laser_Intensity', 'Hydraulic_Pressure_bar', 'Coolant_Flow_L_min', 'Heat_Index'] # 定义需要特殊处理的特征列表 non_negative_features = ['Vibration_mms', 'Power_Consumption_kW','Maintenance_History_Count', 'Failure_History_Count', 'Error_Codes_Last_30_Days', 'AI_Override_Events','Remaining_Useful_Life_days', 'Last_Maintenance_Days_Ago'] # 读取数据 df = pd.read_csv("train_data_填充缺失值.csv", usecols=numeric_cols) print("\n=== 数据预处理 - 物理意义修正 ===") # 1. 安装年份修正 (>2025 → 2025) df['Installation_Year'] = np.where(df['Installation_Year'] > 2025, 2025, df['Installation_Year']) print(f"安装年份修正: >2025 → 2025,修正后范围: [{df['Installation_Year'].min()}, {df['Installation_Year'].max()}]") # 2. 非负特征负数修正 (负数 → 0) for col in non_negative_features: original_neg_count = (df[col] < 0).sum() df[col] = np.maximum(df[col], 0) print(f"{col} 负数修正: {original_neg_count}个负样本 → 0") # 3. 温度特征修正 (<19℃ → 20℃) original_low_temp_count = (df['Temperature_C'] < 19).sum() df['Temperature_C'] = np.where(df['Temperature_C'] < 19, 20, df['Temperature_C']) print(f"温度修正: {original_low_temp_count}个<19℃样本 → 20℃") # 采样用于可视化 sample_df = df.sample(frac=0.1, random_state=42) print("\n--- 数据基本信息 ---") info_df = pd.DataFrame({'总样本量': [df.shape[0]],'特征数量': [df.shape[1]],'采样可视化样本量': [sample_df.shape[0]]}) print(info_df) # 2. Z-score异常值检测 print("\n=== 2. Z-score异常值检测 (Z-score > 3) ===") # 计算Z-score统计指标 stats_df = pd.DataFrame() for col in numeric_cols: if col in special_cols: # 排除-1值 data = df[col].replace(-1, np.nan) else: data = df[col] # 计算Z-score相关统计量 col_stats = pd.DataFrame({'mean': [data.mean()],'std': [data.std()],'min': [data.min()],'max': [data.max()],'z_threshold': [3]}, index=[col]) # 特殊特征的边界设置 if col == 'Installation_Year': col_stats['lower_bound'] = 1995 col_stats['upper_bound'] = 2025 elif col in non_negative_features: col_stats['lower_bound'] = 0 # 非负特征下界为0 # 上界使用Z-score计算 col_stats['upper_bound'] = col_stats['mean'] + 3 * col_stats['std'] else: # 其他特征使用Z-score边界 col_stats['lower_bound'] = col_stats['mean'] - 3 * col_stats['std'] col_stats['upper_bound'] = col_stats['mean'] + 3 * col_stats['std'] # 合并到stats_df stats_df = pd.concat([stats_df, col_stats]) print("\n--- Z-score统计与异常值边界 ---") print(stats_df[['mean', 'std', 'min', 'max', 'z_threshold', 'lower_bound', 'upper_bound']]) # 检测异常值 outliers = {} for col in numeric_cols: lower = stats_df.loc[col, 'lower_bound'] upper = stats_df.loc[col, 'upper_bound'] if col in special_cols: # 排除-1的值 outliers[col] = df[(df[col] != -1) & ((df[col] < lower) | (df[col] > upper))] else: outliers[col] = df[(df[col] < lower) | (df[col] > upper)] # 异常值统计 outliers_stats = pd.DataFrame({'特征名称': numeric_cols,'异常值数量': [len(outliers[col]) for col in numeric_cols], '异常值比例(%)': [round(len(outliers[col])/len(df)*100, 2) for col in numeric_cols]}).sort_values('异常值比例(%)', ascending=False) print("\n--- 异常值统计结果 ---") print(outliers_stats[outliers_stats['异常值数量'] > 0]) # 3. 异常值可视化(处理前) print("\n=== 3. 异常值可视化(处理前) ===") # 选择异常值比例最高的4个特征进行可视化 top_anomaly_features = outliers_stats[outliers_stats['异常值数量'] > 0]['特征名称'].head(4).tolist() # 箱线图 fig, axes = plt.subplots(2, 2, figsize=(16, 12)) axes = axes.flatten() for i, col in enumerate(top_anomaly_features): sns.boxplot(data=sample_df, x=col, ax=axes[i]) axes[i].set_title(f'{col} 箱线图(异常值标记)') axes[i].axvline(x=stats_df.loc[col, 'lower_bound'], color='r', linestyle='--', label='下界') axes[i].axvline(x=stats_df.loc[col, 'upper_bound'], color='g', linestyle='--', label='上界') axes[i].legend() plt.tight_layout() plt.show() # 直方图可视化 fig, axes = plt.subplots(2, 2, figsize=(16, 12)) axes = axes.flatten() for i, col in enumerate(top_anomaly_features): sns.histplot(data=sample_df, x=col, kde=True, ax=axes[i]) axes[i].set_title(f'{col} 分布直方图') axes[i].axvline(x=stats_df.loc[col, 'lower_bound'], color='r', linestyle='--') axes[i].axvline(x=stats_df.loc[col, 'upper_bound'], color='g', linestyle='--') plt.tight_layout() plt.show() # 4. 异常值处理 print("\n=== 4. 异常值处理 ===") df_processed = df.copy() # 不同特征采用不同处理策略 handling_strategies = { 'Power_Consumption_kW': 'capping', # 盖帽法 'Temperature_C': 'capping', 'Vibration_mms': 'median', # 中位数替换 'Installation_Year': 'capping' # 安装年份使用盖帽法 } # 应用处理策略 for col in top_anomaly_features: lower = stats_df.loc[col, 'lower_bound'] upper = stats_df.loc[col, 'upper_bound'] if col in handling_strategies and handling_strategies[col] == 'capping': # 盖帽法 df_processed[col] = df_processed[col].clip(lower, upper) strategy = "盖帽法" else: # 中位数替换 if col in special_cols: median = df_processed[df_processed[col] != -1][col].median() else: median = df_processed[col].median() df_processed[col] = np.where( (df_processed[col] < lower) | (df_processed[col] > upper), median, df_processed[col]) strategy = "中位数替换" print(f"\n--- {col} 处理完成 ---") result_df = pd.DataFrame({'处理策略': [strategy],'异常值数量': [len(outliers[col])], '处理后异常值数量': [len(df_processed[(df_processed[col] < lower) | (df_processed[col] > upper)])]}) print(result_df) # 5. 处理前后对比可视化 print("\n=== 5. 处理前后对比可视化 ===") # 采样处理后数据 processed_sample = df_processed.sample(frac=0.1, random_state=42) # 对比箱线图 fig, axes = plt.subplots(2, 2, figsize=(16, 12)) axes = axes.flatten() for i, col in enumerate(top_anomaly_features): # 原始数据 sns.boxplot(data=sample_df, x=col, ax=axes[i], color='lightblue', label='处理前') # 处理后数据 sns.boxplot(data=processed_sample, x=col, ax=axes[i], color='lightgreen', label='处理后') axes[i].set_title(f'{col} 处理前后对比') axes[i].legend() plt.tight_layout() plt.show() # 对比直方图 fig, axes = plt.subplots(2, 2, figsize=(16, 12)) axes = axes.flatten() for i, col in enumerate(top_anomaly_features): # 原始数据 sns.histplot(data=sample_df, x=col, kde=True, ax=axes[i], color='lightblue', label='处理前', alpha=0.5) # 处理后数据 sns.histplot(data=processed_sample, x=col, kde=True, ax=axes[i], color='lightgreen', label='处理后', alpha=0.5) axes[i].set_title(f'{col} 分布对比') axes[i].legend() plt.tight_layout() plt.show() # 6. 总结 print("\n=== 6. 异常值处理总结 ===") summary = pd.DataFrame({'特征名称': top_anomaly_features,'异常值数量': [len(outliers[col]) for col in top_anomaly_features], '处理策略': [ '盖帽法' if col in handling_strategies and handling_strategies[col] == 'capping' else '中位数替换' for col in top_anomaly_features], '处理效果': ['有效消除异常值' for _ in top_anomaly_features]}) print(summary) print("\n--- 后续建议 ---") print("1. 处理后的数据可直接用于模型训练") print("2. 建议补充设备类型特征,实现更精细化的异常值检测") print("3. 对关键特征建立实时监控机制,及时发现新的异常值") # 保存处理后的数据 df_processed.to_csv('train_data_Zscore异常值处理后.csv', index=False) print("\n处理结果保存:") print("已保存处理后的数据至 'train_data_Zscore异常值处理后.csv'") if __name__ == "__main__": anomaly_detection_and_handling() 请帮我分别仔细地做出接下来的这个任务题目,每个步骤都做得完美详细,可视化的形式展示。】(如有环境问题请忽略,后续我会自己在运行代码中配置,请给我完整的代码) 题1: 使用机器编号、机器类型、运行小时数、温度、振动、声音、油位、冷却液位、功耗、距 上次维护天数、维护历史次数、故障历史次数、人工智能监控、过去30天错误编码等特征,构 建回归模型,预测机床设备在7天内是否会发生故障。要求输出模型准确率、召回率、F1值,并 分析前5个最重要的特征。

import numpy as np import matplotlib.pyplot as plt from pymatgen.io.vasp import Vasprun from pymatgen.core.structure import Structure from scipy.ndimage import gaussian_filter1d from scipy.spatial import cKDTree from tqdm import tqdm import matplotlib as mpl import warnings from collections import defaultdict import os import csv import argparse import multiprocessing from functools import partial import time import dill # 忽略可能的警告 warnings.filterwarnings("ignore", category=UserWarning) # 专业绘图设置 - 符合Journal of Chemical Physics要求 plt.style.use('seaborn-v0_8-whitegrid') mpl.rcParams.update({ 'font.family': 'serif', 'font.serif': ['Times New Roman', 'DejaVu Serif'], 'font.size': 12, 'axes.labelsize': 14, 'axes.titlesize': 16, 'xtick.labelsize': 12, 'ytick.labelsize': 12, 'figure.dpi': 600, 'savefig.dpi': 600, 'figure.figsize': (8, 6), 'lines.linewidth': 2.0, 'legend.fontsize': 10, 'legend.framealpha': 0.8, 'mathtext.default': 'regular', 'axes.linewidth': 1.5, 'xtick.major.width': 1.5, 'ytick.major.width': 1.5, 'xtick.major.size': 5, 'ytick.major.size': 5, }) def identify_atom_types(struct): """优化的原子类型识别函数,不再区分P=O和P-O""" # 1. 初始化数据结构 p_oxygens = {"P-O/P=O": [], "P-OH": []} # 合并P-O和P=O phosphate_hydrogens = [] water_oxygens = [] water_hydrogens = [] hydronium_oxygens = [] hydronium_hydrogens = [] fluoride_atoms = [i for i, site in enumerate(struct) if site.species_string == "F"] aluminum_atoms = [i for i, site in enumerate(struct) if site.species_string == "Al"] # 2. 构建全局KDTree all_coords = np.array([site.coords for site in struct]) kdtree = cKDTree(all_coords, boxsize=struct.lattice.abc) # 3. 识别磷酸基团 p_atoms = [i for i, site in enumerate(struct) if site.species_string == "P"] phosphate_oxygens = [] for p_idx in p_atoms: # 查找P周围的O原子 (距离 < 1.6Å) neighbors = kdtree.query_ball_point(all_coords[p_idx], r=1.6) p_o_indices = [idx for idx in neighbors if idx != p_idx and struct[idx].species_string == "O"] phosphate_oxygens.extend(p_o_indices) # 4. 识别所有H原子并确定归属 hydrogen_owners = {} h_atoms = [i for i, site in enumerate(struct) if site.species_string == "H"] for h_idx in h_atoms: neighbors = kdtree.query_ball_point(all_coords[h_idx], r=1.2) candidate_os = [idx for idx in neighbors if idx != h_idx and struct[idx].species_string == "O"] if not candidate_os: continue min_dist = float('inf') owner_o = None for o_idx in candidate_os: dist = struct.get_distance(h_idx, o_idx) if dist < min_dist: min_dist = dist owner_o = o_idx hydrogen_owners[h_idx] = owner_o # 5. 分类磷酸氧:带H的为P-OH,不带H的为P-O/P=O for o_idx in phosphate_oxygens: has_hydrogen = any(owner_o == o_idx for h_idx, owner_o in hydrogen_owners.items()) if has_hydrogen: p_oxygens["P-OH"].append(o_idx) else: p_oxygens["P-O/P=O"].append(o_idx) # 6. 识别水和水合氢离子 all_o_indices = [i for i, site in enumerate(struct) if site.species_string == "O"] non_phosphate_os = [o_idx for o_idx in all_o_indices if o_idx not in phosphate_oxygens] o_h_count = defaultdict(int) for h_idx, owner_o in hydrogen_owners.items(): o_h_count[owner_o] += 1 for o_idx in non_phosphate_os: h_count = o_h_count.get(o_idx, 0) attached_hs = [h_idx for h_idx, owner_o in hydrogen_owners.items() if owner_o == o_idx] if h_count == 2: water_oxygens.append(o_idx) water_hydrogens.extend(attached_hs) elif h_count == 3: hydronium_oxygens.append(o_idx) hydronium_hydrogens.extend(attached_hs) # 7. 识别磷酸基团的H原子 for o_idx in p_oxygens["P-OH"]: attached_hs = [h_idx for h_idx, owner_o in hydrogen_owners.items() if owner_o == o_idx] phosphate_hydrogens.extend(attached_hs) return { "phosphate_oxygens": p_oxygens, "phosphate_hydrogens": phosphate_hydrogens, "water_oxygens": water_oxygens, "water_hydrogens": water_hydrogens, "hydronium_oxygens": hydronium_oxygens, "hydronium_hydrogens": hydronium_hydrogens, "fluoride_atoms": fluoride_atoms, "aluminum_atoms": aluminum_atoms } def process_frame(struct, center_sel, target_sel, r_max, exclude_bonds, bond_threshold): """处理单帧结构计算""" atom_types = identify_atom_types(struct) centers = center_sel(atom_types) targets = target_sel(atom_types) if len(centers) == 0 or len(targets) == 0: return { "distances": np.array([], dtype=np.float64), "n_centers": 0, "n_targets": 0, "volume": struct.volume } center_coords = np.array([struct[i].coords for i in centers]) target_coords = np.array([struct[i].coords for i in targets]) lattice = struct.lattice kdtree = cKDTree(target_coords, boxsize=lattice.abc) k_val = min(50, len(targets)) if k_val == 0: return { "distances": np.array([], dtype=np.float64), "n_centers": len(centers), "n_targets": len(targets), "volume": struct.volume } try: query_result = kdtree.query(center_coords, k=k_val, distance_upper_bound=r_max) except Exception as e: print(f"KDTree query error: {str(e)}") return { "distances": np.array([], dtype=np.float64), "n_centers": len(centers), "n_targets": len(targets), "volume": struct.volume } if k_val == 1: if isinstance(query_result, tuple): distances, indices = query_result else: distances = query_result indices = np.zeros_like(distances, dtype=int) distances = np.atleast_1d(distances) indices = np.atleast_1d(indices) else: distances, indices = query_result valid_distances = [] for i in range(distances.shape[0]): center_idx = centers[i] for j in range(distances.shape[1]): dist = distances[i, j] if dist > r_max or np.isinf(dist): continue target_idx = targets[indices[i, j]] if exclude_bonds: actual_dist = struct.get_distance(center_idx, target_idx) if actual_dist < bond_threshold: continue valid_distances.append(dist) return { "distances": np.array(valid_distances, dtype=np.float64), "n_centers": len(centers), "n_targets": len(targets), "volume": struct.volume } def calculate_rdf_parallel(structures, center_sel, target_sel, r_max=8.0, bin_width=0.05, exclude_bonds=True, bond_threshold=1.3, workers=1): """并行计算径向分布函数""" bins = np.arange(0, r_max, bin_width) hist = np.zeros(len(bins) - 1) total_centers = 0 total_targets = 0 total_volume = 0 dill.settings['recurse'] = True func = partial(process_frame, center_sel=center_sel, target_sel=target_sel, r_max=r_max, exclude_bonds=exclude_bonds, bond_threshold=bond_threshold) with multiprocessing.Pool(processes=workers) as pool: results = [] for res in tqdm(pool.imap_unordered(func, structures), total=len(structures), desc="Calculating RDF"): results.append(res) n_frames = 0 for res in results: if res is None: continue n_frames += 1 valid_distances = res["distances"] n_centers = res["n_centers"] n_targets = res["n_targets"] volume = res["volume"] if len(valid_distances) > 0: hist += np.histogram(valid_distances, bins=bins)[0] total_centers += n_centers total_targets += n_targets total_volume += volume if n_frames == 0: r = bins[:-1] + bin_width/2 return r, np.zeros_like(r), {"position": None, "value": None} avg_density = total_targets / total_volume if total_volume > 0 else 0 r = bins[:-1] + bin_width/2 rdf = np.zeros_like(r) for i in range(len(hist)): r_lower = bins[i] r_upper = bins[i+1] shell_vol = 4/3 * np.pi * (r_upper**3 - r_lower**3) expected_count = shell_vol * avg_density * total_centers if expected_count > 1e-10: rdf[i] = hist[i] / expected_count else: rdf[i] = 0 # 使用高斯滤波替代Savitzky-Golay滤波 if len(rdf) > 10: rdf_smoothed = gaussian_filter1d(rdf, sigma=1.5) # 调整sigma控制平滑程度 else: rdf_smoothed = rdf # 计算主要峰值 peak_info = {} mask = (r >= 1.5) & (r <= 3.0) if np.any(mask) and np.any(rdf_smoothed[mask] > 0): peak_idx = np.argmax(rdf_smoothed[mask]) peak_pos = r[mask][peak_idx] peak_val = rdf_smoothed[mask][peak_idx] peak_info = {"position": peak_pos, "value": peak_val} else: peak_info = {"position": None, "value": None} return r, rdf_smoothed, peak_info # 定义选择器函数 def selector_phosphate_P_O_or_double(atom_types): """选择所有非羟基磷酸氧(P-O/P=O)""" return atom_types["phosphate_oxygens"]["P-O/P=O"] def selector_phosphate_P_OH(atom_types): return atom_types["phosphate_oxygens"]["P-OH"] def selector_phosphate_hydrogens(atom_types): return atom_types["phosphate_hydrogens"] def selector_water_only_hydrogens(atom_types): return atom_types["water_hydrogens"] def selector_hydronium_only_hydrogens(atom_types): return atom_types["hydronium_hydrogens"] def selector_water_only_oxygens(atom_types): return atom_types["water_oxygens"] def selector_hydronium_only_oxygens(atom_types): return atom_types["hydronium_oxygens"] def selector_fluoride_atoms(atom_types): return atom_types["fluoride_atoms"] def selector_aluminum_atoms(atom_types): return atom_types["aluminum_atoms"] def selector_all_phosphate_oxygens(atom_types): """选择所有磷酸氧(包括P-O/P=O和P-OH)""" return (atom_types["phosphate_oxygens"]["P-O/P=O"] + atom_types["phosphate_oxygens"]["P-OH"]) # 更新后的RDF分组配置 def get_rdf_groups(): """返回六张图的RDF分组配置(已优化)""" return { # 图1: Al的配位情况 "Al_Coordination": [ (selector_aluminum_atoms, selector_fluoride_atoms, "Al-F", "blue"), (selector_aluminum_atoms, selector_water_only_oxygens, "Al-Ow", "green"), (selector_aluminum_atoms, selector_all_phosphate_oxygens, "Al-Op", "red") ], # 图2: F与H形成的氢键 "F_Hydrogen_Bonding": [ (selector_fluoride_atoms, selector_water_only_hydrogens, "F-Hw", "lightblue"), (selector_fluoride_atoms, selector_hydronium_only_hydrogens, "F-Hh", "blue"), (selector_fluoride_atoms, selector_phosphate_hydrogens, "F-Hp", "darkblue") ], # 图3: 磷酸作为受体和供体(六组) "Phosphate_HBonding": [ # 受体部分(磷酸氧接受氢键) (selector_phosphate_P_O_or_double, selector_water_only_hydrogens, "P-O/P=O···Hw", "orange"), (selector_phosphate_P_O_or_double, selector_hydronium_only_hydrogens, "P-O/P=O···Hh", "red"), (selector_phosphate_P_OH, selector_water_only_hydrogens, "P-OH···Hw", "lightgreen"), (selector_phosphate_P_OH, selector_hydronium_only_hydrogens, "P-OH···Hh", "green"), # 供体部分(磷酸氢提供氢键) (selector_phosphate_hydrogens, selector_water_only_oxygens, "Hp···Ow", "lightblue"), (selector_phosphate_hydrogens, selector_hydronium_only_oxygens, "Hp···Oh", "blue") ], # 图4: 磷酸-水-水合氢离子交叉氢键 "Cross_Species_HBonding": [ (selector_phosphate_hydrogens, selector_water_only_oxygens, "Hp···Ow", "pink"), (selector_phosphate_hydrogens, selector_hydronium_only_oxygens, "Hp···Oh", "purple"), (selector_water_only_hydrogens, selector_all_phosphate_oxygens, "Hw···Op", "lightgreen"), (selector_water_only_hydrogens, selector_hydronium_only_oxygens, "Hw···Oh", "green"), (selector_hydronium_only_hydrogens, selector_water_only_oxygens, "Hh···Ow", "lightblue"), (selector_hydronium_only_hydrogens, selector_all_phosphate_oxygens, "Hh···Op", "blue") ], # 图5: 同类型分子内/间氢键 "Same_Species_HBonding": [ (selector_phosphate_hydrogens, selector_phosphate_P_O_or_double, "Hp···P-O/P=O", "red"), (selector_phosphate_hydrogens, selector_phosphate_P_OH, "Hp···P-OH", "orange"), (selector_water_only_hydrogens, selector_water_only_oxygens, "Hw···Ow", "lightblue"), (selector_hydronium_only_hydrogens, selector_hydronium_only_oxygens, "Hh···Oh", "blue") ], # 图6: O-O聚集分析 "O_O_Aggregation": [ (selector_all_phosphate_oxygens, selector_water_only_oxygens, "Op-Ow", "blue"), (selector_all_phosphate_oxygens, selector_hydronium_only_oxygens, "Op-Oh", "green"), (selector_all_phosphate_oxygens, selector_all_phosphate_oxygens, "Op-Op", "red"), (selector_water_only_oxygens, selector_hydronium_only_oxygens, "Ow-Oh", "purple"), (selector_water_only_oxygens, selector_water_only_oxygens, "Ow-Ow", "cyan"), (selector_hydronium_only_oxygens, selector_hydronium_only_oxygens, "Oh-Oh", "magenta") ] } def main(workers=1): # 定义要处理的体系 vasprun_files = { "System1": "vasprun1.xml", "System2": "vasprun2.xml", "System3": "vasprun3.xml", "System4": "vasprun4.xml" } # 获取RDF分组配置 rdf_groups = get_rdf_groups() # 标题映射 title_map = { "Al_Coordination": "Al Coordination Environment", "F_Hydrogen_Bonding": "F-H Hydrogen Bonding", "Phosphate_HBonding": "Phosphate H-bonding (Acceptor and Donor)", "Cross_Species_HBonding": "Cross H-bonding between Different Species", "Same_Species_HBonding": "Intra- and Inter-molecular H-bonding", "O_O_Aggregation": "O-O Aggregation Analysis" } # 存储所有数据 all_system_data = {} group_y_max = {group_name: 0 for group_name in list(rdf_groups.keys())} group_x_max = { "Al_Coordination": (0, 6), "F_Hydrogen_Bonding": (0, 6), "Phosphate_HBonding": (0, 6.0), "Cross_Species_HBonding": (0, 6.0), "Same_Species_HBonding": (0, 6.0), "O_O_Aggregation": (0, 6.0) } # 创建输出目录 os.makedirs("RDF_Plots", exist_ok=True) # 计算所有体系的所有RDF数据 for system_name, vasprun_file in vasprun_files.items(): print(f"\n{'='*50}") print(f"Processing {system_name}: {vasprun_file} with {workers} workers") print(f"{'='*50}") start_time = time.time() try: # 加载VASP结果 vr = Vasprun(vasprun_file, ionic_step_skip=5) structures = vr.structures print(f"Loaded {len(structures)} frames") # 存储体系数据 system_data = { "rdf_results": {}, "peak_infos": {} } # 计算所有RDF分组 for group_name, pairs in rdf_groups.items(): system_data["rdf_results"][group_name] = {} system_data["peak_infos"][group_name] = {} group_y_max_current = 0 for center_sel, target_sel, label, color in pairs: print(f"\nCalculating RDF for: {label}") try: r, rdf, peak_info = calculate_rdf_parallel( structures, center_sel, target_sel, r_max=10.0, exclude_bonds=True, bond_threshold=1.3, workers=workers ) system_data["rdf_results"][group_name][label] = (r, rdf, color) system_data["peak_infos"][group_name][label] = peak_info if len(rdf) > 0: current_max = np.max(rdf) if current_max > group_y_max_current: group_y_max_current = current_max if peak_info["position"] is not None: print(f" Peak for {label}: {peak_info['position']:.3f} Å (g(r) = {peak_info['value']:.2f})") else: print(f" No significant peak found for {label} in 1.5-3.0 Å range") except Exception as e: print(f"Error calculating RDF for {label}: {str(e)}") system_data["rdf_results"][group_name][label] = (np.array([]), np.array([]), color) system_data["peak_infos"][group_name][label] = {"position": None, "value": None} if group_y_max_current > group_y_max[group_name]: group_y_max[group_name] = group_y_max_current all_system_data[system_name] = system_data elapsed = time.time() - start_time print(f"\nCompleted processing for {system_name} in {elapsed:.2f} seconds") except Exception as e: print(f"Error processing {system_name}: {str(e)}") # 为每个分组添加余量 for group_name in group_y_max: group_y_max[group_name] = max(group_y_max[group_name] * 1.15, 3.0) # 确保最小值 # 第二步:生成符合期刊要求的图表 for system_name, system_data in all_system_data.items(): print(f"\nGenerating publication-quality plots for {system_name}") for group_name, group_data in system_data["rdf_results"].items(): fig, ax = plt.subplots(figsize=(8, 6)) # 设置坐标轴范围 xlim = group_x_max.get(group_name, (0, 6.0)) ylim = (0, group_y_max[group_name]) for label, (r, rdf, color) in group_data.items(): if len(r) > 0 and len(rdf) > 0: ax.plot(r, rdf, color=color, label=label, linewidth=2.0) ax.set_xlim(xlim) ax.set_ylim(ylim) # 期刊格式标签 ax.set_xlabel('Radial Distance (Å)', fontweight='bold') ax.set_ylabel('g(r)', fontweight='bold') # 添加体系名称到标题 ax.set_title(f"{system_name}: {title_map[group_name]}", fontsize=16, pad=15) # 精简图例 ncol = 3 if group_name == "Same_Species_HBonding" else 1 ax.legend(ncol=ncol, loc='best', framealpha=0.8, fontsize=10) # 添加氢键区域标记(除O-O聚集图外) if group_name != "O_O_Aggregation": ax.axvspan(1.5, 2.5, alpha=0.1, color='green', zorder=0) ax.text(1.7, ylim[1]*0.85, 'H-bond Region', fontsize=10) # 添加网格 ax.grid(True, linestyle='--', alpha=0.5) # 保存高分辨率图片 plt.tight_layout() filename = os.path.join("RDF_Plots", f"RDF_{system_name}_{group_name}.tiff") plt.savefig(filename, bbox_inches='tight', dpi=600, format='tiff') print(f"Saved publication plot: {filename}") plt.close() # 保存Origin兼容数据 for system_name, system_data in all_system_data.items(): save_origin_data(system_name, system_data) print("\nAll RDF analysis completed successfully!") def save_origin_data(system_name, system_data): """保存Origin兼容格式数据""" os.makedirs("Origin_Data", exist_ok=True) system_dir = os.path.join("Origin_Data", system_name) os.makedirs(system_dir, exist_ok=True) # 保存峰值信息 peak_info_path = os.path.join(system_dir, f"Peak_Positions_{system_name}.csv") with open(peak_info_path, 'w', newline='') as csvfile: writer = csv.writer(csvfile) writer.writerow(["Group", "Interaction", "Peak Position (A)", "g(r) Value"]) for group_name, peaks in system_data["peak_infos"].items(): for label, info in peaks.items(): if info["position"] is not None: writer.writerow([group_name, label, f"{info['position']:.3f}", f"{info['value']:.3f}"]) else: writer.writerow([group_name, label, "N/A", "N/A"]) print(f"Saved peak positions: {peak_info_path}") # 保存RDF数据 for group_name, group_results in system_data["rdf_results"].items(): group_dir = os.path.join(system_dir, group_name) os.makedirs(group_dir, exist_ok=True) for label, (r, rdf, color) in group_results.items(): if len(r) > 0 and len(rdf) > 0: safe_label = label.replace(" ", "_").replace("/", "_").replace("=", "_") safe_label = safe_label.replace("(", "").replace(")", "").replace("$", "") filename = f"RDF_{system_name}_{group_name}_{safe_label}.csv" filepath = os.path.join(group_dir, filename) with open(filepath, 'w', newline='') as csvfile: writer = csv.writer(csvfile) writer.writerow(["Distance (A)", "g(r)"]) for i in range(len(r)): writer.writerow([f"{r[i]:.6f}", f"{rdf[i]:.6f}"]) print(f"Saved Origin data: {filename}") if __name__ == "__main__": # 设置命令行参数 parser = argparse.ArgumentParser(description='Calculate RDF for VASP simulations') parser.add_argument('--workers', type=int, default=multiprocessing.cpu_count(), help=f'Number of parallel workers (default: {multiprocessing.cpu_count()})') args = parser.parse_args() print(f"Starting RDF analysis with {args.workers} workers...") main(workers=args.workers) 以上的代码计算的是RDF,其中该RDF的计算逻辑中表明了供体和氢之间的判别,而在这里通过RDF的计算结果强化了氢键距离的判别,要进一步对氢键的键角做一个粗略的统计,来得出对应的氢键准确的键角。RDF的结果显示我们只需要对九类氢键的计算,分别为P-O/P=O···Hw P-O/P=O···Hh P-O/P=O···Hp P-OH···Ow Hw···Ow Hh···Ow Hw···F Hh···F Hp···F,以上补充的逻辑中提供了化学键的判别(供体和H之间的距离),在这里我们对H和受体之间的距离再进一步补充,这九类分别的距离分别为2.375 2.275 2.175 2.275 2.375 2.225 2.225 2.175 2.275,基于此去做氢键角度大小的分布图,绘制KDE曲线和直方图叠加,首先将氢键的键角阈值设为120°

检查并优化:import pandas as pd import numpy as np import lightgbm as lgb import gc import os import chardet from sklearn.model_selection import train_test_split from tqdm import tqdm import psutil from sklearn.metrics import log_loss, mean_absolute_error from scipy.sparse import hstack, csr_matrix, save_npz, load_npz import warnings warnings.filterwarnings('ignore') # 内存优化函数 - 增强版 def optimize_dtypes(df, downcast_int=True, downcast_float=True, category_threshold=0.5): """优化DataFrame的数据类型以减少内存占用""" if df.empty: return df # 转换整数列为最小可用类型 if downcast_int: int_cols = df.select_dtypes(include=['int']).columns for col in int_cols: df[col] = pd.to_numeric(df[col], downcast='integer') # 转换浮点列为float32 if downcast_float: float_cols = df.select_dtypes(include=['float']).columns for col in float_cols: # 优先转换为float32而不是downcast='float'以获得更好控制 df[col] = df[col].astype(np.float32) # 转换对象列为分类类型 obj_cols = df.select_dtypes(include=['object']).columns for col in obj_cols: num_unique = df[col].nunique() num_total = len(df) if num_unique / num_total < category_threshold: df[col] = df[col].astype('category') return df # 增强数据加载函数 def load_data_safely(file_path, usecols=None, dtype=None, chunksize=50000, verbose=True): """安全加载大型CSV文件,优化内存使用""" try: if not os.path.exists(file_path): print(f"⚠️ 文件不存在: {file_path}") return pd.DataFrame() # 自动检测编码 with open(file_path, 'rb') as f: result = chardet.detect(f.read(100000)) encoding = result['encoding'] if result['confidence'] > 0.7 else 'latin1' # 获取文件大小用于进度条 file_size = os.path.getsize(file_path) / (1024 ** 2) # MB desc = f"加载 {os.path.basename(file_path)} ({file_size:.1f}MB)" # 分批读取并优化内存 chunks = [] reader = pd.read_csv( file_path, encoding=encoding, usecols=usecols, dtype=dtype, chunksize=chunksize, low_memory=False ) for chunk in tqdm(reader, desc=desc, disable=not verbose): # 优化数据类型 chunk = optimize_dtypes(chunk) chunks.append(chunk) if chunks: result = pd.concat(chunks, ignore_index=True) # 再次整体优化 result = optimize_dtypes(result) return result return pd.DataFrame() except Exception as e: print(f"⚠️ 加载 {file_path} 失败: {str(e)}") return pd.DataFrame() # 稀疏矩阵转换函数 - 优化版 def to_sparse_matrix(df, columns, fillna='MISSING', dtype=np.int8): """将分类特征转换为稀疏矩阵表示""" from sklearn.preprocessing import OneHotEncoder # 预处理数据 sparse_data = df[columns].fillna(fillna).astype(str) # 使用OneHotEncoder替代get_dummies以获得更好性能 encoder = OneHotEncoder(handle_unknown='ignore', sparse_output=True, dtype=dtype) sparse_matrix = encoder.fit_transform(sparse_data) return sparse_matrix, encoder # 增量训练函数 - 优化内存管理 def train_incremental(X, y, categorical_features, params, num_rounds=1000, chunk_size=100000): """分块增量训练模型以减少内存占用""" model = None callbacks = [lgb.early_stopping(stopping_rounds=50, verbose=0), lgb.log_evaluation(period=100)] for i in tqdm(range(0, len(X), chunk_size), desc="增量训练"): chunk_end = min(i + chunk_size, len(X)) # 使用视图避免复制数据 X_chunk = X.iloc[i:chunk_end] y_chunk = y.iloc[i:chunk_end] # 创建数据集后立即释放原始数据 train_data = lgb.Dataset( X_chunk, label=y_chunk, categorical_feature=categorical_features, free_raw_data=True # 训练后释放原始数据 ) if model is None: model = lgb.train( params, train_data, num_boost_round=num_rounds, callbacks=callbacks, keep_training_booster=True ) else: model = lgb.train( params, train_data, num_boost_round=num_rounds, init_model=model, callbacks=callbacks, keep_training_booster=True ) # 显式释放内存 del train_data, X_chunk, y_chunk gc.collect() return model # 历史数据加载函数 - 优化内存 def load_historical_data(days=32, verbose=True): """高效加载历史数据,支持分批处理""" see_list, click_list, play_list = [], [], [] for day in tqdm(range(1, days + 1), desc="加载历史数据", disable=not verbose): day_str = f"{day:02d}" # 加载曝光数据 - 仅加载必要列 see_path = f'see_{day_str}.csv' if os.path.exists(see_path): see = load_data_safely( see_path, usecols=['did', 'vid'], dtype={'did': 'category', 'vid': 'category'}, verbose=verbose ) if not see.empty: see_list.append(see) del see # 加载点击数据 - 优化日期处理 click_path = f'click_{day_str}.csv' if os.path.exists(click_path): click = load_data_safely( click_path, usecols=['did', 'vid', 'click_time'], dtype={'did': 'category', 'vid': 'category'}, verbose=verbose ) if not click.empty and 'click_time' in click.columns: # 直接解析日期为数值类型 click_dates = pd.to_datetime(click['click_time'], errors='coerce') click['date'] = click_dates.dt.strftime('%Y%m%d').astype('int32') click = click.drop(columns=['click_time']) click_list.append(click[['did', 'vid', 'date']]) del click, click_dates # 加载播放数据 play_path = f'playplus_{day_str}.csv' if os.path.exists(play_path): play = load_data_safely( play_path, usecols=['did', 'vid', 'play_time'], dtype={'did': 'category', 'vid': 'category', 'play_time': 'float32'}, verbose=verbose ) if not play.empty: play_list.append(play) del play gc.collect() # 使用concat时避免创建中间对象 return ( pd.concat(see_list, ignore_index=True, copy=False).drop_duplicates(['did', 'vid']) if see_list else pd.DataFrame(), pd.concat(click_list, ignore_index=True, copy=False).drop_duplicates(['did', 'vid']) if click_list else pd.DataFrame(), pd.concat(play_list, ignore_index=True, copy=False).drop_duplicates(['did', 'vid']) if play_list else pd.DataFrame() ) # 点击数据集构建 - 内存优化版 def build_click_dataset(hist_exposure, hist_click, sample_ratio=0.1, verbose=True): """构建点击数据集,包含负样本采样 - 内存优化版""" if hist_exposure.empty or hist_click.empty: print("⚠️ 历史曝光或点击数据为空,无法构建数据集") return pd.DataFrame() # 标记正样本 - 使用视图避免复制 pos_samples = hist_click[['did', 'vid']].copy() pos_samples['label'] = 1 # 创建曝光集索引用于高效查找 exposure_index = hist_exposure.set_index(['did', 'vid']).index # 分块处理负样本 neg_chunks = [] chunk_size = 500000 total_rows = len(hist_exposure) for start in tqdm(range(0, total_rows, chunk_size), desc="构建负样本", disable=not verbose): end = min(start + chunk_size, total_rows) chunk = hist_exposure.iloc[start:end] # 使用索引查找未点击的曝光 chunk['is_clicked'] = chunk.set_index(['did', 'vid']).index.isin(hist_click.set_index(['did', 'vid']).index) neg_chunk = chunk[~chunk['is_clicked']][['did', 'vid']] if not neg_chunk.empty and sample_ratio < 1.0: neg_chunk = neg_chunk.sample(frac=sample_ratio, random_state=42) neg_chunks.append(neg_chunk) del chunk, neg_chunk # 合并负样本 neg_samples = pd.concat(neg_chunks, ignore_index=True) neg_samples['label'] = 0 # 合并正负样本 click_data = pd.concat([pos_samples, neg_samples], ignore_index=True, copy=False) # 释放内存 del exposure_index, pos_samples, neg_samples, neg_chunks gc.collect() return click_data # 在合并操作前添加检查 if 'total' not in df.columns: print("警告:'total' 列不存在于 DataFrame 中") print("可用列名:", df.columns.tolist()) # 尝试找出可能的拼写错误 possible_matches = [col for col in df.columns if 'total' in col.lower()] if possible_matches: print("可能的匹配列:", possible_matches) # 特征工程函数 - 内存优化版 def add_click_features(df, did_features, vid_info, hist_click, hist_play, verbose=True): """添加关键特征,避免内存溢出 - 优化版""" if df.empty: return df # 1. 合并设备特征 - 仅选择必要列 if not did_features.empty and 'did' in did_features.columns: did_cols = ['did'] + [col for col in did_features.columns if col.startswith('f')] df = df.merge(did_features[did_cols], on='did', how='left') # 2. 合并视频特征 - 仅选择必要列 if not vid_info.empty and 'vid' in vid_info.columns: vid_cols = ['vid', 'item_duration'] + [col for col in vid_info.columns if col in ['item_cid', 'item_type']] df = df.merge(vid_info[vid_cols], on='vid', how='left') # 3. 预聚合统计特征 - 减少重复计算 stats = {} # 用户行为统计 if not hist_click.empty: stats['user_click_count'] = hist_click.groupby('did').size().astype('int32') stats['video_click_count'] = hist_click.groupby('vid').size().astype('int32') if not hist_play.empty: stats['user_total_play'] = hist_play.groupby('did')['play_time'].sum().astype('float32') stats['avg_play_time'] = hist_play.groupby('vid')['play_time'].mean().astype('float32') # 4. 合并统计特征 for name, stat_df in tqdm(stats.items(), desc="添加统计特征", disable=not verbose): if name in df.columns: continue df = df.merge(stat_df.rename(name), how='left', left_on=name.split('_')[1], right_index=True) # 5. 填充缺失值 - 使用更高效的方法 fill_values = { 'user_click_count': 0, 'user_total_play': 0, 'video_click_count': df['video_click_count'].median() if 'video_click_count' in df else 0, 'avg_play_time': df['avg_play_time'].median() if 'avg_play_time' in df else 0, 'item_duration': df['item_duration'].median() if 'item_duration' in df else 30.0 } for col, default in fill_values.items(): if col in df: # 使用inplace填充减少内存分配 df[col].fillna(default, inplace=True) # 6. 添加时间特征 - 使用数值替代分类 if 'date' in df: # 直接计算数值特征,避免创建datetime对象 df['day_of_week'] = (df['date'] % 7).astype('int8') df['is_weekend'] = (df['day_of_week'] >= 5).astype('int8') df.drop(columns=['date'], inplace=True, errors='ignore') return df # 主处理流程 - 内存优化版 def main(): """主处理流程,包含完整的内存优化策略""" # 初始内存监控 start_mem = memory_monitor("初始内存") # 定义内存优化的数据类型 dtypes = { 'did': 'category', 'vid': 'category', 'play_time': 'float32' } # 加载核心数据 print("开始加载核心数据...") did_features = load_data_safely('did_features_table.csv', dtype=dtypes) vid_info = load_data_safely('vid_info_table.csv', dtype=dtypes) memory_monitor("加载核心数据后") # 加载历史数据 - 减少加载天数 print("开始加载历史数据...") hist_exposure, hist_click, hist_play = load_historical_data(days=14) # 减少到14天 memory_monitor("加载历史数据后") # 构建点击数据集 if not hist_exposure.empty and not hist_click.empty: print("构建点击数据集...") click_train_data = build_click_dataset(hist_exposure, hist_click, sample_ratio=0.1) # 立即释放不再需要的数据 del hist_exposure, hist_click gc.collect() else: print("⚠️ 无法构建点击数据集") click_train_data = pd.DataFrame() memory_monitor("构建点击数据集后") # 添加特征 - 使用增量方式 if not click_train_data.empty: print("开始构建点击特征...") click_train_data = add_click_features( click_train_data, did_features, vid_info, hist_click if 'hist_click' in locals() else pd.DataFrame(), hist_play ) else: print("⚠️ 点击数据集为空,跳过特征构建") # 立即释放内存 del hist_play gc.collect() memory_monitor("添加特征后") # 准备训练数据 - 使用视图避免复制 if not click_train_data.empty: cols_to_drop = ['did', 'vid', 'label'] if 'date' in click_train_data.columns: cols_to_drop.append('date') X = click_train_data.drop(columns=cols_to_drop, errors='ignore') y = click_train_data['label'] else: X, y = pd.DataFrame(), pd.Series(dtype='float32') print("⚠️ 点击训练数据为空") # 划分数据集 - 使用索引避免复制 if len(X) > 0: indices = np.arange(len(X)) train_idx, val_idx = train_test_split(indices, test_size=0.2, random_state=42, stratify=y) X_train, X_val = X.iloc[train_idx], X.iloc[val_idx] y_train, y_val = y.iloc[train_idx], y.iloc[val_idx] else: print("⚠️ 训练数据为空,无法进行模型训练") X_train, X_val, y_train, y_val = pd.DataFrame(), pd.DataFrame(), pd.Series(), pd.Series() # 释放click_train_data del click_train_data, X, y, indices gc.collect() memory_monitor("划分数据集后") # 训练模型参数 - 调整为更节省内存的参数 params = { 'objective': 'binary', 'metric': 'binary_logloss', 'boosting_type': 'gbdt', 'num_leaves': 31, # 减少叶子节点数 'learning_rate': 0.05, 'feature_fraction': 0.7, # 减少特征使用比例 'bagging_fraction': 0.8, 'bagging_freq': 5, 'min_child_samples': 200, # 增加最小样本数 'verbosity': -1, 'max_depth': -1, # 避免过深 'seed': 42 } # 增量训练点击模型 if len(X_train) > 0: print("开始训练点击预测模型...") model_click = train_incremental(X_train, y_train, [], params, num_rounds=1000, chunk_size=100000) # 在验证集上评估 if len(X_val) > 0: # 分块预测避免内存峰值 chunk_size = 50000 val_preds = [] for i in range(0, len(X_val), chunk_size): chunk = X_val.iloc[i:i+chunk_size] val_preds.extend(model_click.predict(chunk)) val_logloss = log_loss(y_val, val_preds) print(f"验证集LogLoss: {val_logloss:.4f}") else: model_click = None print("⚠️ 训练数据为空,跳过点击预测模型训练") # 释放训练数据 del X_train, X_val, y_train, y_val gc.collect() memory_monitor("训练点击模型后") # 最终内存报告 end_mem = memory_monitor("处理完成") print(f"总内存消耗: {end_mem - start_mem:.2f} MB") # 内存监控函数 def memory_monitor(step_name=""): """监控内存使用情况""" process = psutil.Process(os.getpid()) mem_info = process.memory_info() print(f"{step_name} 内存使用: {mem_info.rss / (1024 ** 2):.2f} MB") return mem_info.rss / (1024 ** 2) # 返回MB if __name__ == "__main__": main()

最新推荐

recommend-type

造纸机变频分布传动与Modbus RTU通讯技术的应用及其实现

造纸机变频分布传动与Modbus RTU通讯技术的应用及其优势。首先,文中解释了变频分布传动系统的组成和功能,包括采用PLC(如S7-200SMART)、变频器(如英威腾、汇川、ABB)和触摸屏(如昆仑通泰)。其次,重点阐述了Modbus RTU通讯协议的作用,它不仅提高了系统的可靠性和抗干扰能力,还能实现对造纸机各个生产环节的精确监控和调节。最后,强调了该技术在提高造纸机运行效率、稳定性和产品质量方面的显著效果,适用于多种类型的造纸机,如圆网造纸机、长网多缸造纸机和叠网多缸造纸机。 适合人群:从事造纸机械制造、自动化控制领域的工程师和技术人员。 使用场景及目标:① 提升造纸机的自动化水平;② 实现对造纸机的精确控制,确保纸张质量和生产效率;③ 改善工业现场的数据传输和监控功能。 其他说明:文中提到的具体品牌和技术细节有助于实际操作和维护,同时也展示了该技术在不同纸厂的成功应用案例。
recommend-type

Visual C++.NET编程技术实战指南

根据提供的文件信息,可以生成以下知识点: ### Visual C++.NET编程技术体验 #### 第2章 定制窗口 - **设置窗口风格**:介绍了如何通过编程自定义窗口的外观和行为。包括改变窗口的标题栏、边框样式、大小和位置等。这通常涉及到Windows API中的`SetWindowLong`和`SetClassLong`函数。 - **创建六边形窗口**:展示了如何创建一个具有特殊形状边界的窗口,这类窗口不遵循标准的矩形形状。它需要使用`SetWindowRgn`函数设置窗口的区域。 - **创建异形窗口**:扩展了定制窗口的内容,提供了创建非标准形状窗口的方法。这可能需要创建一个不规则的窗口区域,并将其应用到窗口上。 #### 第3章 菜单和控制条高级应用 - **菜单编程**:讲解了如何创建和修改菜单项,处理用户与菜单的交互事件,以及动态地添加或删除菜单项。 - **工具栏编程**:阐述了如何使用工具栏,包括如何创建工具栏按钮、分配事件处理函数,并实现工具栏按钮的响应逻辑。 - **状态栏编程**:介绍了状态栏的创建、添加不同类型的指示器(如文本、进度条等)以及状态信息的显示更新。 - **为工具栏添加皮肤**:展示了如何为工具栏提供更加丰富的视觉效果,通常涉及到第三方的控件库或是自定义的绘图代码。 #### 第5章 系统编程 - **操作注册表**:解释了Windows注册表的结构和如何通过程序对其进行读写操作,这对于配置软件和管理软件设置非常关键。 - **系统托盘编程**:讲解了如何在系统托盘区域创建图标,并实现最小化到托盘、从托盘恢复窗口的功能。 - **鼠标钩子程序**:介绍了钩子(Hook)技术,特别是鼠标钩子,如何拦截和处理系统中的鼠标事件。 - **文件分割器**:提供了如何将文件分割成多个部分,并且能够重新组合文件的技术示例。 #### 第6章 多文档/多视图编程 - **单文档多视**:展示了如何在同一个文档中创建多个视图,这在文档编辑软件中非常常见。 #### 第7章 对话框高级应用 - **实现无模式对话框**:介绍了无模式对话框的概念及其应用场景,以及如何实现和管理无模式对话框。 - **使用模式属性表及向导属性表**:讲解了属性表的创建和使用方法,以及如何通过向导性质的对话框引导用户完成多步骤的任务。 - **鼠标敏感文字**:提供了如何实现点击文字触发特定事件的功能,这在阅读器和编辑器应用中很有用。 #### 第8章 GDI+图形编程 - **图像浏览器**:通过图像浏览器示例,展示了GDI+在图像处理和展示中的应用,包括图像的加载、显示以及基本的图像操作。 #### 第9章 多线程编程 - **使用全局变量通信**:介绍了在多线程环境下使用全局变量进行线程间通信的方法和注意事项。 - **使用Windows消息通信**:讲解了通过消息队列在不同线程间传递信息的技术,包括发送消息和处理消息。 - **使用CriticalSection对象**:阐述了如何使用临界区(CriticalSection)对象防止多个线程同时访问同一资源。 - **使用Mutex对象**:介绍了互斥锁(Mutex)的使用,用以同步线程对共享资源的访问,保证资源的安全。 - **使用Semaphore对象**:解释了信号量(Semaphore)对象的使用,它允许一个资源由指定数量的线程同时访问。 #### 第10章 DLL编程 - **创建和使用Win32 DLL**:介绍了如何创建和链接Win32动态链接库(DLL),以及如何在其他程序中使用这些DLL。 - **创建和使用MFC DLL**:详细说明了如何创建和使用基于MFC的动态链接库,适用于需要使用MFC类库的场景。 #### 第11章 ATL编程 - **简单的非属性化ATL项目**:讲解了ATL(Active Template Library)的基础使用方法,创建一个不使用属性化组件的简单项目。 - **使用ATL开发COM组件**:详细阐述了使用ATL开发COM组件的步骤,包括创建接口、实现类以及注册组件。 #### 第12章 STL编程 - **list编程**:介绍了STL(标准模板库)中的list容器的使用,讲解了如何使用list实现复杂数据结构的管理。 #### 第13章 网络编程 - **网上聊天应用程序**:提供了实现基本聊天功能的示例代码,包括客户端和服务器的通信逻辑。 - **简单的网页浏览器**:演示了如何创建一个简单的Web浏览器程序,涉及到网络通信和HTML解析。 - **ISAPI服务器扩展编程**:介绍了如何开发ISAPI(Internet Server API)服务器扩展来扩展IIS(Internet Information Services)的功能。 #### 第14章 数据库编程 - **ODBC数据库编程**:解释了ODBC(开放数据库互联)的概念,并提供了使用ODBC API进行数据库访问的示例。 - **ADO编程**:介绍了ADO(ActiveX Data Objects)技术,讲解了如何使用ADO进行数据库的增删改查等操作。 #### 第15章 HTML帮助系统编程 - **创建HTML帮助文件**:讲解了如何创建HTML帮助文件(.chm),这是微软帮助系统的一种形式。 - **区分上下文帮助**:提供了在应用程序中实现上下文敏感帮助的技术示例。 #### 第16章 安装和部署项目 - **用安装程序向导创建安装项目**:介绍了如何使用Visual Studio中的安装和部署项目向导,快速创建安装程序。 - **部署应用程序**:阐述了部署应用程序到目标系统的过程,以及如何确保应用程序的正确安装和运行。 以上内容涵盖了Visual C++.NET编程技术体验一书中涉及的各个技术点,并对每个章节中的示例做了详细的说明。这些知识点对于理解和掌握Visual C++.NET编程非常重要,并能够帮助读者在实际项目开发中应用这些技术。
recommend-type

HarmonyOS内核深度探秘:优化自由行旅游系统的策略

# 1. HarmonyOS内核概述与旅游系统需求分析 ## 1.1 HarmonyOS内核概述 HarmonyOS(鸿蒙操作系统)是由华为开发的一个分布式操作系统,旨在提供跨多种设备的无缝体验。其内核设计理念强调高效、安全和模块化,以微内核架构和分布式能力
recommend-type

tkinter模块所有控件

<think>好的,我现在需要回答用户关于Tkinter模块中所有控件列表的问题。首先,我需要回忆Tkinter的基础知识,确保自己了解各个控件的名称和用途。用户希望得到所有支持的控件列表,所以必须全面覆盖,不能遗漏。 首先,我应该从Tkinter的标准控件开始。常见的控件包括Label、Button、Entry这些基础部件。然后是Frame,用于布局,还有Canvas用于绘图。接下来是Checkbutton、Radiobutton,这些属于选择类控件。Listbox和Scrollbar通常一起使用,处理滚动内容。还有Scale(滑块)、Spinbox、Menu、Menubutton这些可能
recommend-type

局域网五子棋游戏:娱乐与聊天的完美结合

标题“网络五子棋”和描述“适合于局域网之间娱乐和聊天!”以及标签“五子棋 网络”所涉及的知识点主要围绕着五子棋游戏的网络版本及其在局域网中的应用。以下是详细的知识点: 1. 五子棋游戏概述: 五子棋是一种两人对弈的纯策略型棋类游戏,又称为连珠、五子连线等。游戏的目标是在一个15x15的棋盘上,通过先后放置黑白棋子,使得任意一方先形成连续五个同色棋子的一方获胜。五子棋的规则简单,但策略丰富,适合各年龄段的玩家。 2. 网络五子棋的意义: 网络五子棋是指可以在互联网或局域网中连接进行对弈的五子棋游戏版本。通过网络版本,玩家不必在同一地点即可进行游戏,突破了空间限制,满足了现代人们快节奏生活的需求,同时也为玩家们提供了与不同对手切磋交流的机会。 3. 局域网通信原理: 局域网(Local Area Network,LAN)是一种覆盖较小范围如家庭、学校、实验室或单一建筑内的计算机网络。它通过有线或无线的方式连接网络内的设备,允许用户共享资源如打印机和文件,以及进行游戏和通信。局域网内的计算机之间可以通过网络协议进行通信。 4. 网络五子棋的工作方式: 在局域网中玩五子棋,通常需要一个客户端程序(如五子棋.exe)和一个服务器程序。客户端负责显示游戏界面、接受用户输入、发送落子请求给服务器,而服务器负责维护游戏状态、处理玩家的游戏逻辑和落子请求。当一方玩家落子时,客户端将该信息发送到服务器,服务器确认无误后将更新后的棋盘状态传回给所有客户端,更新显示。 5. 五子棋.exe程序: 五子棋.exe是一个可执行程序,它使得用户可以在个人计算机上安装并运行五子棋游戏。该程序可能包含了游戏的图形界面、人工智能算法(如果支持单机对战AI的话)、网络通信模块以及游戏规则的实现。 6. put.wav文件: put.wav是一个声音文件,很可能用于在游戏进行时提供声音反馈,比如落子声。在网络环境中,声音文件可能被用于提升玩家的游戏体验,尤其是在局域网多人游戏场景中。当玩家落子时,系统会播放.wav文件中的声音,为游戏增添互动性和趣味性。 7. 网络五子棋的技术要求: 为了确保多人在线游戏的顺利进行,网络五子棋需要具备一些基本的技术要求,包括但不限于稳定的网络连接、高效的数据传输协议(如TCP/IP)、以及安全的数据加密措施(如果需要的话)。此外,还需要有一个良好的用户界面设计来提供直观和舒适的用户体验。 8. 社交与娱乐: 网络五子棋除了是一个娱乐游戏外,它还具有社交功能。玩家可以通过游戏内的聊天系统进行交流,分享经验和策略,甚至通过网络寻找新的朋友。这使得网络五子棋不仅是一个个人娱乐工具,同时也是一种社交活动。 总结来说,网络五子棋结合了五子棋游戏的传统魅力和现代网络技术,使得不同地区的玩家能够在局域网内进行娱乐和聊天,既丰富了人们的娱乐生活,又加强了人际交流。而实现这一切的基础在于客户端程序的设计、服务器端的稳定运行、局域网的高效通信,以及音效文件增强的游戏体验。
recommend-type

自由行旅游新篇章:HarmonyOS技术融合与系统架构深度解析

# 1. HarmonyOS技术概述 ## 1.1 HarmonyOS的起源与发展 HarmonyOS(鸿蒙操作系统)由华为公司开发,旨在构建全场景分布式OS,以应对不同设备间的互联问题。自从2019年首次发布以来,HarmonyOS迅速成长,并迅速应用于智能手机、平板、智能穿戴、车载设备等多种平台。该系
recommend-type

足底支撑相到达73%是什么问题

<think>好的,我现在要处理用户的问题:“足底支撑相达到73%可能的原因及影响”。首先,需要理解用户提到的“足底支撑相73%”具体指什么。足底支撑相通常指步态周期中脚与地面接触的时间占比,正常范围大约在60%-62%左右。如果达到73%,明显超出正常范围,可能意味着步态异常。 接下来,需要确定可能的原因。可能的原因包括生物力学异常,比如足弓异常(扁平足或高弓足)、踝关节活动度受限,或者肌肉力量不足,特别是小腿和足部肌肉。另外,神经系统疾病如脑卒中或帕金森病可能影响步态控制。骨骼关节问题如关节炎、髋膝关节病变也可能导致支撑时间延长。还有代偿性步态,比如因疼痛或受伤而改变步态模式。 然后是
recommend-type

宾馆预约系统开发与优化建议

宾馆预约系统是一个典型的在线服务应用,它允许用户通过互联网平台预定宾馆房间。这种系统通常包含多个模块,比如用户界面、房态管理、预订处理、支付处理和客户评价等。从技术层面来看,构建一个宾馆预约系统涉及到众多的IT知识和技术细节,下面将详细说明。 ### 标题知识点 - 宾馆预约系统 #### 1. 系统架构设计 宾馆预约系统作为一个完整的应用,首先需要进行系统架构设计,决定其采用的软件架构模式,如B/S架构或C/S架构。此外,系统设计还需要考虑扩展性、可用性、安全性和维护性。一般会采用三层架构,包括表示层、业务逻辑层和数据访问层。 #### 2. 前端开发 前端开发主要负责用户界面的设计与实现,包括用户注册、登录、房间搜索、预订流程、支付确认、用户反馈等功能的页面展示和交互设计。常用的前端技术栈有HTML, CSS, JavaScript, 以及各种前端框架如React, Vue.js或Angular。 #### 3. 后端开发 后端开发主要负责处理业务逻辑,包括用户管理、房间状态管理、订单处理等。后端技术包括但不限于Java (使用Spring Boot框架), Python (使用Django或Flask框架), PHP (使用Laravel框架)等。 #### 4. 数据库设计 数据库设计对系统的性能和可扩展性至关重要。宾馆预约系统可能需要设计的数据库表包括用户信息表、房间信息表、预订记录表、支付信息表等。常用的数据库系统有MySQL, PostgreSQL, MongoDB等。 #### 5. 网络安全 网络安全是宾馆预约系统的重要考虑因素,包括数据加密、用户认证授权、防止SQL注入、XSS攻击、CSRF攻击等。系统需要实现安全的认证机制,比如OAuth或JWT。 #### 6. 云服务和服务器部署 现代的宾馆预约系统可能部署在云平台上,如AWS, Azure, 腾讯云或阿里云。在云平台上,系统可以按需分配资源,提高系统的稳定性和弹性。 #### 7. 付款接口集成 支付模块需要集成第三方支付接口,如支付宝、微信支付、PayPal等,需要处理支付请求、支付状态确认、退款等业务。 #### 8. 接口设计与微服务 系统可能采用RESTful API或GraphQL等接口设计方式,提供服务的微服务化,以支持不同设备和服务的接入。 ### 描述知识点 - 这是我个人自己做的 请大家帮忙修改哦 #### 个人项目经验与团队合作 描述中的这句话暗示了该宾馆预约系统可能是由一个个人开发者创建的。个人开发和团队合作在软件开发流程中有着显著的不同。个人开发者需要关注的方面包括项目管理、需求分析、代码质量保证、测试和部署等。而在团队合作中,每个成员会承担不同的职责,需要有效的沟通和协作。 #### 用户反馈与迭代 描述还暗示了该系统目前处于需要外部反馈和修改的阶段。这表明系统可能还处于开发或测试阶段,需要通过用户的实际使用反馈来不断迭代改进。 ### 标签知识点 - 200 #### 未提供信息 “200”这个标签可能指的是HTTP状态码中表示请求成功(OK)的200状态码。但是,由于没有提供更多的上下文信息,无法进一步分析其在本例中的具体含义。 ### 压缩包子文件的文件名称列表知识点 - 1111 #### 文件命名与管理 “1111”这个文件名称可能是一个版本号、日期标记或者是一个简单的标识符。文件命名应当遵循一定的规则,以确保文件的可追溯性和管理的便利性。在软件开发过程中,合理组织文件和版本控制(如使用Git)是必不可少的。 综上所述,宾馆预约系统的开发是一项复杂的工程,它涉及前后端的开发、数据库设计、系统安全、接口设计等多个方面。开发者在开发过程中需要不断学习和应用各类IT知识,以确保系统能够安全、高效、稳定地运行。而对于个人开发项目,如何合理利用有限资源、高效地管理和优化项目过程也是至关重要的。
recommend-type

HarmonyOS在旅游领域的创新:揭秘最前沿应用实践

# 1. HarmonyOS旅游应用的市场前景分析 随着数字化转型的不断深入,旅游行业正面临着前所未有的变革。在这样的背景下,HarmonyOS作为一种新兴的操作系统,带来了全新的市场前景和机遇。本章将深入分析HarmonyOS在旅游应用领域的市场潜力、用户需求、以及技术创新对旅游体验的改善。 ## 1.1 市场需求与用户画像分析 旅游市场的需求持续增
recommend-type

数据架构师需要具备什么能力

<think>嗯,用户现在转向询问数据架构师的能力要求了,而且特别提到要参考站内引用内容。看来ta正在规划职业方向,可能是个技术岗位的进阶者或转行者。之前对话历史显示用户对技术细节有深度需求(比如C语言底层特性),这次提问虽然跨度大,但延续了“技术职业发展”这条主线。用户引用的三段材料其实很有意思:第一篇强调综合能力,第二篇突出编程基础,第三篇提到商业洞察力——这正好构成数据架构师的三个能力象限。用户给出的引用里埋了个关键矛盾:第二篇说“速成只能做码农”,第三篇说“需要持续学习”,暗示ta可能担心速成班的局限性。回应时得强调“扎实基础+持续成长”的平衡。技术层面需要覆盖三个维度:硬技能(数据库