西储大学数据集预处理代码
时间: 2025-04-30 18:42:51 浏览: 17
### 西储大学轴承数据集预处理代码 Python 实现
以下是针对西储大学轴承数据集的预处理代码实现,主要涉及读取 `.mat` 文件、提取信号数据并将其转换为适合后续分析的形式。
#### 1. 导入必要的库
为了完成数据预处理任务,需要引入一些常用的科学计算库:
```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import loadmat
```
上述代码片段展示了如何加载必需的第三方库来支持 MATLAB 文件解析和数值运算[^4]。
---
#### 2. 加载原始 MAT 数据文件
通过 `scipy.io.loadmat()` 函数可以方便地加载存储在 `.mat` 格式的实验数据。以下是一个简单的例子展示如何访问特定路径下的数据文件:
```python
path = '0hp_load_48_KHz/7_OR3.mat'
data_dict = loadmat(path)
# 查看字典中的键值对结构
print(data_dict.keys())
```
此部分用于确认目标变量名以便于下一步操作。
---
#### 3. 提取振动信号时间序列
通常情况下,在这些 .mat 文件内部会有一个数组保存实际测量得到的时间域波形样本点数列。下面演示怎样获取这样的核心信息并向量形式返回出来供后续使用:
```python
def extract_signal(mat_data, key='DE_time'):
"""
Extract the vibration signal from a loaded .mat file.
Args:
mat_data (dict): Loaded data dictionary using scipy.io.loadmat().
key (str): Key name of the desired signal within the dict.
Returns:
ndarray: Vibration time-series array extracted from the given dataset.
"""
try:
raw_signal = mat_data[key].flatten()
return raw_signal.astype(np.float32)
except KeyError:
raise ValueError(f"The specified key '{key}' does not exist in this .mat file.")
signal_array = extract_signal(data_dict, key='X097_DE_time') # Example key may vary based on actual datasets
plt.plot(signal_array[:1000]) # Plot first thousand samples to visualize waveform characteristics
plt.title('Vibration Signal Waveform')
plt.xlabel('Sample Index')
plt.ylabel('Amplitude')
plt.show()
```
这里定义了一个函数用来标准化不同命名约定之间的差异,并且提供了错误检测机制当指定字段不存在时抛出异常提示用户修正输入参数设置^。
---
#### 4. 分割窗口化处理
对于机器学习模型训练而言,往往需要将连续采集来的长条状记录切分成固定长度的小片断作为独立实例送入网络当中去学习特征表达能力。因此我们还需要设计一段逻辑来进行滑动窗分割动作:
```python
def sliding_window_splitter(time_series, window_size=1024, stride=512):
"""
Split long time series into overlapping windows with defined size and step length.
Args:
time_series (ndarray): Original continuous-time domain sequence obtained earlier.
window_size (int): Number of consecutive points included per segment after splitting.
stride (int): Distance moved forward between two adjacent segments during iteration process.
Yields:
Generator yielding successive chunks sliced out according to rules set above until end reached or no more valid slices remain available anymore.
"""
start_idx = 0
while True:
end_idx = start_idx + window_size
if end_idx >= len(time_series):
break
yield time_series[start_idx:end_idx]
start_idx += stride
windowed_segments = list(sliding_window_splitter(signal_array))
num_windows = len(windowed_segments)
print(f"Total number of generated windows: {num_windows}")
for i, seg in enumerate(windowed_segments[:5]):
plt.subplot(5, 1, i+1)
plt.plot(seg)
plt.title(f'Segment #{i}')
plt.tight_layout()
plt.show()
```
以上脚本实现了动态调整大小步幅组合策略从而灵活适应各种应用场景需求的同时还兼顾到了效率考量因素.
---
#### 5. 归一化与标签分配
最后一步就是把所有的分块都做一次线性变换使得它们落在某个预先设定好的范围内比如[-1,+1], 同时也要记得给每一个对应的类别打上正确的标记好让监督算法能够区分正负样例区别对待之。
```python
labels_mapping = {
'Normal': 0,
'InnerRaceFault': 1,
'OuterRaceFault': 2,
'BallElementFault': 3
}
def normalize_and_label(segments_list, label_name):
normalized_segs = [(seg - seg.mean()) / seg.std() for seg in segments_list]
labels = [labels_mapping[label_name]] * len(normalized_segs)
return np.array(normalized_segs), np.array(labels)
norm_seg, lbls = normalize_and_label(windowed_segments, 'Normal')
print(norm_seg.shape, lbls.shape)
```
至此为止整个流程已经完整呈现完毕可供读者复制粘贴直接跑起来验证效果啦!
---
###
阅读全文
相关推荐


















