深度学习分类模型实战：从环境搭建到模型部署_windows+pycharm使用django网页部署cnn图像分类网页-CSDN博客

本文链接：https://blog.csdn.net/qq_36431125/article/details/147232783

在人工智能技术飞速发展的今天，深度学习分类模型已成为图像识别、自然语言处理、生物医学等领域的核心技术支撑。无论是新手想要踏入深度学习领域，还是开发者希望将分类模型落地到实际项目，完整的实战流程经验都至关重要。

本文将以手把手教学的形式，带领读者从环境配置开始，逐步完成一个端到端的深度学习分类项目 —— 涵盖数据集制作、模型构建、训练验证、测试评估以及模型保存加载等全流程，帮助大家掌握模型开发的核心方法论。

本文以图像分类为例。接下来，让我们从搭建开发环境开始，一步一步构建属于自己的深度学习分类模型吧！后续章节将提供完整的代码示例与详细注释，帮助大家在实战中真正掌握深度学习分类模型的开发精髓。

1. 环境配置

Platform: Windows 11

GPU: GeForce RTX 4070 (8GB)

CUDA: CUDA 11.7

CuDNN: CuDNN 11

IDE: PyCharm

Python：3.8

numpy: 1.24.3

tensorflow: 2.13.0

keras: 2.13.1

matplotlib: 3.7.5

scikit-learn: 1.3.2

scikit-plot: 0.3.7

需要注意的是，如果项目中缺少sklearn和scikitplot，其安装的命令如下：

pip install scikit-learn
pip install scikit-plot

如果安装速度过慢的话，可以加入清华源。

pip install scikit-learn -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
pip install scikit-plot -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

如果还缺少其他的包，直接使用pip install+所需的包即可。

2. 数据集的制作

配置好编程环境之后，现在我们需要进行数据集的制作了，在数据集制作完成后，才可以进行代码的编写及训练测试。分类数据集的制作相较于目标检测和分割数据集的制作，更加简单，只需将相同种类的图片放入到同一个文件夹中，最后形成多个不同种类的文件夹。关于分类数据集的文件结构在我的博客(计算机视觉数据集介绍)可以找到。这里我以分类鸡、鸭、鱼为例，详细阐述如何制作一个分类数据集。

(1) 创建父文件夹，命名为dataset。

(2) 在dataset文件夹下分别创建chicken, duck和fish文件夹。

(3) 收集鸡、鸭、鱼的图片，分别放入指定的文件夹中。在这个步骤，我在百度-图片中分别找了这三种动物的图片并下载到指定的文件夹中。

我这里图片数量比较少，只是为了展示一下收集的过程，并不能真正用于训练和测试模型。毋庸置疑地是，如果想模型的训练效果越好，那么图片数量得越多越好。这里，我使用皮肤癌分类数据集作为本次模型用到的数据集。

关于该数据集中皮肤癌的介绍如下：皮肤癌是皮肤细胞的异常生长，最常发生于暴露在阳光下的皮肤。这种常见的癌症也可能出现在通常不暴露于阳光的皮肤区域。癌症始于健康细胞发生改变并生长失控，形成称为肿瘤的团块。肿瘤分为癌性和良性两种。癌性肿瘤是恶性的，意味着它们能够生长并扩散到身体其他部位。良性肿瘤可能长得很大，但不会侵袭邻近组织或扩散到身体其他部位。良性肿瘤边界清晰、光滑、规则，而恶性肿瘤边界不规则，且生长速度比良性肿瘤更快。

该文件夹中存在两个子文件夹，即train和test文件夹。顾名思义，train文件夹是用来模型的训练的，而test文件夹则是用来进行模型的测试的。在训练集中，良性的皮肤图片有1440张，恶性皮肤癌有1197张。在测试集中，良性的皮肤图片有360张，恶性皮肤癌有300张。

在train和test文件夹中，有benign和malignant两个子文件夹，分别代表良性和恶性，即为二分类数据集。

以下为皮肤癌数据集中的一些样本。第一行为良性，第二行为恶性肿瘤。

3. 模型的开发

(1) 导入程序所需的包

import random
import os
import glob
import time

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import tensorflow as tf
import tensorflow_hub as hub
from tensorflow.keras import layers, Sequential
from tensorflow.keras.utils import plot_model

import sklearn
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score, f1_score, matthews_corrcoef,
    confusion_matrix, ConfusionMatrixDisplay,
    classification_report, precision_recall_fscore_support
)
from scikitplot.metrics import plot_roc

(2) 配置参数设置

class CFG:
    EPOCHS = 10
    BATCH_SIZE = 32
    SEED = 42
    TF_SEED = 768
    HEIGHT = 224
    WIDTH = 224
    CHANNELS = 3
    IMAGE_SIZE = (224, 224, 3)

其中EPOCHS表示训练模型时整个数据集将被遍历的次数。在深度学习中，一次完整地遍历整个训练数据集被称为一个epoch。这里设置为10，意味着训练过程会将整个训练数据集使用10 次来更新模型的参数。

BATCH_SIZE指的是在每次训练迭代中用于更新模型参数的样本数量。在训练过程中，通常不会一次性使用整个数据集来更新模型，而是将数据集分成多个小的批次(batch)。这里设置为32，表示每次训练时会使用32个样本组成一个批次来计算损失并更新模型参数。

SEED是一个随机数种子。在机器学习和深度学习中，很多操作都涉及到随机数的生成，例如数据的划分、模型的初始化等。设置随机数种子可以确保每次运行代码时，随机数的生成序列是相同的，从而使得实验结果具有可重复性。这里将随机数种子设置为42。

TF_SEED同样是一个随机数种子，不过它是专门为 TensorFlow 库设置的。在使用 TensorFlow 进行深度学习时，有些操作（如模型初始化、数据打乱等）可能会依赖于 TensorFlow 自己的随机数生成器。设置这个种子可以确保 TensorFlow 中的随机操作具有可重复性。

HEIGHT、WIDTH和CHANNELS和图像有关了，HEIGHT：表示输入图像的高度，这里设置为224像素。WIDTH: 表示输入图像的宽度，同样设置为224像素。CHANNELS: 表示输入图像的通道数，这里设置为3，通常对应于彩色图像的红、绿、蓝三个通道。IMAGE_SIZE是一个元组，它综合了前面的HEIGHT、WIDTH和CHANNELS信息，用于表示输入图像的尺寸。这里表示输入图像的尺寸为224×224像素，且有3个通道。

(3) 设定固定的随机数种子

def seed_everything(seed=CFG.SEED):
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)

seed_everything(CFG.SEED)

(4) 检查数据集

# 定义路径
DATASET_PATH = "./skin_classification"
TRAIN_PATH = './skin_classification/train/'
TEST_PATH = './skin_classification/test/'

# 对数据集总体概括
print('数据集概括')
print('*'*50)
for dirpath, dirnames, filenames in os.walk(DATASET_PATH):
    print(f"There are {len(dirnames)} directories and {len(filenames)} images in {dirpath}")
print('*'*50)

可以看到数据集的具体数量：

使用glob库获取图片的路径和打印图片的数量

train_images = glob.glob(f"{TRAIN_PATH}**/*.jpg")
test_images = glob.glob(f"{TEST_PATH}**/*.jpg")

train_size = len(train_images)
test_size = len(test_images)

total = train_size + test_size

print(f'train samples count:\t\t{train_size}')
print(f'test samples count:\t\t{test_size}')
print('=======================================')
print(f'TOTAL:\t\t\t\t{total}')

对标签及路径生成Dataframe，便于后续处理

def generate_labels(image_paths):
    return [_.split('\\')[1] for _ in image_paths]

def build_df(image_paths):
    # 创建 dataframe
    df = pd.DataFrame({
        'image_path': image_paths,
        'label': generate_labels(image_paths)
    })

    # 生成标签编码
    df['label_encoded'] = df.apply(lambda row: 1 if row.label == 'malignant' else 0, axis=1)
    # 随机打乱并返回
    return df.sample(frac=1, random_state=CFG.SEED).reset_index(drop=True)

train_df = build_df(train_images)
test_df = build_df(test_images)

# 查看训练集前五个样本信息
print(train_df.head(5))

建立的Dataframe如下图所示。可以看到原始的标签为benign和malignant，但是这无法直接输入给模型，所以需要对其进行编码，用0和1表示，这样就可以输入至模型。在医学分类任务中，通常将0表示良性，1表示恶性。

随机查看一张图片和多张图片

def _load(image_path):
    # 读取图片r
    image = tf.io.read_file(image_path)
    image = tf.io.decode_jpeg(image, channels=3)
    # Resize
    image = tf.image.resize(image, [CFG.HEIGHT, CFG.WIDTH],
                            method=tf.image.ResizeMethod.LANCZOS3)
    # 归一化图片
    image = tf.cast(image, tf.float32) / 255.
    return image

def view_sample(image, label, color_map='rgb', fig_size=(8, 10)):
    plt.figure(figsize=fig_size)
    if color_map == 'rgb':
        plt.imshow(image)
    else:
        plt.imshow(tf.image.rgb_to_grayscale(image), cmap=color_map)
    plt.title(f'Label: {label}', fontsize=16)
    plt.show()
    return

# 从训练集中随机抽取一个
idx = random.sample(train_df.index.to_list(), 1)[0]
# 加载样本及其对应的标签
sample_image, sample_label = _load(train_df.image_path[idx]), train_df.label[idx]
# 可视化一张图片
view_sample(sample_image, sample_label, color_map='inferno')

def view_mulitiple_samples(df, sample_loader, count=10, color_map='rgb', fig_size=(14, 10)):
    rows = count // 5
    if count % 5 > 0:
        rows += 1
    idx = random.sample(df.index.to_list(), count)
    fig = plt.figure(figsize=fig_size)
    for column, _ in enumerate(idx):
        plt.subplot(rows, 5, column + 1)
        plt.title(f'Label: {df.label[_]}')

        if color_map == 'rgb':
            plt.imshow(sample_loader(df.image_path[_]))
        else:
            plt.imshow(tf.image.rgb_to_grayscale(sample_loader(df.image_path[_])), cmap=color_map)
    plt.show()
    return

# 查看多张图片
view_mulitiple_samples(train_df, _load,
                       count=25, color_map='inferno',
                       fig_size=(20, 24))

查看训练集和测试集的分布。可以看到训练集和测试集中类别是相对均衡的，这对于训练模型是比较好的。如果数据集中的类别极度不平衡，这会对模型的决策有很大的影响，使得模型更加倾向预测类别多的。

fig, (ax1, ax2) = plt.subplots(2, figsize=(14, 10))
fig.tight_layout(pad=6.0)
# 画出训练集的分布
ax1.set_title('Train Labels Distribution', fontsize=20)
train_distribution = train_df['label'].value_counts().sort_values()
sns.barplot(x=train_distribution.values,
            y=list(train_distribution.keys()),
            orient="h",
            ax=ax1)

# 画出测试集的分布
ax2.set_title('Test Labels Distribution', fontsize=20)
test_distribution = test_df['label'].value_counts().sort_values()
sns.barplot(x=test_distribution.values,
            y=list(test_distribution.keys()),
            orient="h",
            ax=ax2);
sns.despine();
plt.show()

尽管训练集和测试集中的类别比较平衡，但是还需要创建一个验证集用于超参数调优。

(5) 建立数据pipeline，以处理图像数据的加载并将其传递给模型。

使用训练集创建训练集 / 验证集，即从原来的训练集中分离出一部分(15%)作为验证集用来调参数，剩下的(85%)用来训练模型。

train_split_idx, val_split_idx, _, _ = train_test_split(train_df.index,
                                                        train_df.label_encoded,
                                                        test_size=0.15,
                                                        stratify=train_df.label_encoded,
                                                        random_state=CFG.SEED)
train_new_df = train_df.iloc[train_split_idx].reset_index(drop=True)
val_df = train_df.iloc[val_split_idx].reset_index(drop=True)

查看新的训练集和验证集的分布。

# 画出新训练集的分布
ax1.set_title('New Train Labels Distribution', fontsize=20)
train_new_distribution = train_new_df['label'].value_counts().sort_values()
sns.barplot(x=train_new_distribution.values,
            y=list(train_new_distribution.keys()),
            orient="h",
            ax=ax1)

# 画出验证集的分布
ax2.set_title('Validation Labels Distribution', fontsize=20)
val_distribution = val_df['label'].value_counts().sort_values()
sns.barplot(x=val_distribution.values,
            y=list(val_distribution.keys()),
            orient="h",
            ax=ax2);
plt.show()

数据增强层的建立。数据增强可以增加数据的多样性，让模型学习到更多不同的特征，从而提高模型的泛化能力，减少过拟合的风险。同时，它还能在一定程度上扩充数据集的规模，缓解数据不足对模型训练的限制。

# 建立数据增强层
augmentation_layer = Sequential([
    layers.RandomFlip(mode='horizontal_and_vertical', seed=CFG.TF_SEED),
    layers.RandomZoom(height_factor=(-0.1, 0.1), width_factor=(-0.1, 0.1), seed=CFG.TF_SEED),
], name='augmentation_layer')

image = tf.image.rgb_to_grayscale(sample_image)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 10))
fig.tight_layout(pad=6.0)
# 原图
ax1.set_title('Original Image', fontsize=20)
ax1.imshow(image, cmap='inferno');
# 数据增强之后的图
ax2.set_title('Augmented Image', fontsize=20)
ax2.imshow(augmentation_layer(image), cmap='inferno');
plt.show()

增强层生成的增强图像与原始图像略有不同。这正是所期望的，因为数据增强的目标是生成与原始数据集图像相似的增强图像，同时保留图像的关键特征。

创建训练数据pipeline、验证数据pipeline和测试数据pipeline

def encode_labels(labels, encode_depth=2):
    return tf.one_hot(labels, depth=encode_depth).numpy()


def create_pipeline(df, load_function, augment=False, batch_size=32, shuffle=False, cache=None, prefetch=False):
    '''
    Generates an input pipeline using the tf.data API given a Pandas DataFrame and image loading function.

    @params
        - df: (pd.DataFrame) -> DataFrame containing paths and labels
        - load_function: (function) -> function used to load images given their paths
        - augment: (bool) -> condition for applying augmentation
        - batch_size: (int) -> size for batched (default=32)
        - shuffle: (bool) -> condition for data shuffling, data is shuffled when True (default=False)
        - cache: (str) -> cache path for caching data, data is not cached when None (default=None)
        - prefetch: (bool) -> condition for prefeching data, data is prefetched when True (default=False)

    @returns
        - dataset: (tf.data.Dataset) -> dataset input pipeline used to train a TensorFlow model
    '''
    # 从DataFrame得到图片的路径及其对应的标签
    image_paths = df.image_path
    image_labels = encode_labels(df.label_encoded)
    AUTOTUNE = tf.data.AUTOTUNE

    # 从DataFrame创建数据集
    ds = tf.data.Dataset.from_tensor_slices((image_paths, image_labels))
    # 是否进行数据增强
    if augment:
        ds = ds.map(lambda x, y: (augmentation_layer(load_function(x)), y), num_parallel_calls=AUTOTUNE)
    else:
        ds = ds.map(lambda x, y: (load_function(x), y), num_parallel_calls=AUTOTUNE)
    # 是否进行随机打乱
    if shuffle:
        ds = ds.shuffle(buffer_size=1000)
    ds = ds.batch(batch_size)
    if cache != None:
        ds = ds.cache(cache)
    if prefetch:
        ds = ds.prefetch(buffer_size=AUTOTUNE)
    return ds

# 生成训练数据pipeline
train_ds = create_pipeline(train_new_df, _load, augment=True,
                           batch_size=CFG.BATCH_SIZE,
                           shuffle=False, prefetch=True)

# 生成验证数据pipeline
val_ds = create_pipeline(val_df, _load,
                         batch_size=CFG.BATCH_SIZE,
                         shuffle=False, prefetch=False)

# 生成测试数据pipeline
test_ds = create_pipeline(test_df, _load,
                          batch_size=CFG.BATCH_SIZE,
                          shuffle=False, prefetch=False)

(6) 建立baseline model。baseline model为其他更复杂的模型提供了一个参考标准，通过与 baseline model 的性能进行对比，可以直观地评估新模型或改进方法是否有效，以及在多大程度上提升了性能。这里的baseline model选择的是CNN模型，卷积神经网络（CNN）是一种用于深度学习任务的机器学习网络架构。卷积神经网络对于解决计算机视觉任务非常有用，因为它能够识别图像数据中与分类、目标检测、图像分割等相关的模式。卷积神经网络由卷积层组成，这些卷积层用于从图像数据中提取特征。对于分类问题，卷积神经网络包含一个全连接的分类头，该分类头利用卷积层所提取的特征来完成对输入图像进行分类的任务。在本次分类任务重，将使用一个基础的卷积神经网络作为皮肤癌检测的基线模型。

定义CNN模型。主要有卷积层，池化层和全连接层组成。

# 定义模型
def cnn_model():
    initializer = tf.keras.initializers.GlorotNormal()

    cnn_sequential = Sequential([
        layers.Input(shape=CFG.IMAGE_SIZE, dtype=tf.float32, name='input_image'),

        layers.Conv2D(16, kernel_size=3, activation='relu', kernel_initializer=initializer),
        layers.Conv2D(16, kernel_size=3, activation='relu', kernel_initializer=initializer),
        layers.MaxPool2D(pool_size=2, padding='valid'),

        layers.Conv2D(8, kernel_size=3, activation='relu', kernel_initializer=initializer),
        layers.Conv2D(8, kernel_size=3, activation='relu', kernel_initializer=initializer),
        layers.MaxPool2D(pool_size=2),

        layers.Flatten(),
        layers.Dropout(0.2),
        layers.Dense(128, activation='relu', kernel_initializer=initializer),
        layers.Dense(2, activation='sigmoid', kernel_initializer=initializer)
    ], name='cnn_sequential_model')

    return cnn_sequential

# 模型实例化
model_cnn = cnn_model()

# 打印模型结构信息
model_cnn.summary()

4. 训练和验证及测试

(1) 训练CNN模型，准备好数据和参数及一些训练策略，可以开始训练并验证模型了。

# 训练模型并返回数据
def train_model(model, num_epochs, callbacks_list, tf_train_data,
                tf_valid_data=None, shuffling=False):
    '''
        Trains a TensorFlow model and returns a dict object containing the model metrics history data.

        @params
        - model: (tf.keras.model) -> model to be trained
        - num_epochs: (int) -> number of epochs to train the model
        - callbacks_list: (list) -> list containing callback fuctions for model
        - tf_train_data: (tf.data.Dataset) -> dataset for model to be train on
        - tf_valid_data: (tf.data.Dataset) -> dataset for model to be validated on (default=None)
        - shuffling: (bool) -> condition for data shuffling, data is shuffled when True (default=False)

        @returns
        - model_history: (dict) -> dictionary containing loss and metrics values tracked during training
    '''

    model_history = {}

    if tf_valid_data != None:
        model_history = model.fit(tf_train_data,
                                  epochs=num_epochs,
                                  validation_data=tf_valid_data,
                                  validation_steps=int(len(tf_valid_data)),
                                  callbacks=callbacks_list,
                                  shuffle=shuffling)

    if tf_valid_data == None:
        model_history = model.fit(tf_train_data,
                                  epochs=num_epochs,
                                  callbacks=callbacks_list,
                                  shuffle=shuffling)
    return model_history

# 早停策略
early_stopping_callback = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=3,
    restore_best_weights=True)

# 学习率衰衰减
reduce_lr_callback = tf.keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss',
    patience=2,
    factor=0.1,
    verbose=1)

CALLBACKS = [early_stopping_callback, reduce_lr_callback]
# 模型评估指标
METRICS = ['accuracy']

tf.random.set_seed(CFG.SEED)

# 编译模型
model_cnn.compile(
    loss=tf.keras.losses.BinaryCrossentropy(),
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    metrics=METRICS
)

# 开始训练！！！
print(f'Training {model_cnn.name}.')
print(f'Train on {len(train_new_df)} samples, validate on {len(val_df)} samples.')
print('----------------------------------')

cnn_history = train_model(
    model_cnn, CFG.EPOCHS, CALLBACKS,
    train_ds, val_ds,
    shuffling=False
)

# 评估模型
cnn_evaluation = model_cnn.evaluate(test_ds)
# 生成模型预测概率以及相关的预测结果
cnn_test_probabilities = model_cnn.predict(test_ds, verbose=1)
cnn_test_predictions = tf.argmax(cnn_test_probabilities, axis=1)

以下为训练过程中损失值和准确率的变化，可以看到，随着训练的次数增多，训练的损失值是变小的，训练准确率逐步提高，最终达到了80%以上。这里只设置了EPOCH为10。可以将EPOCHS设置得更大，效果会更好点。

(2) 创建EfficientNetV2_b0模型并进行测试

# 定义efficient模型

# 从 TensorFlow Hub获取模型。
def get_tfhub_model(model_link, model_name, model_trainable=False):
    return hub.KerasLayer(model_link,
                          trainable=model_trainable,
                          name=model_name)

# 获取EfficientNetV2_B0模型
# 这里我是将网址：https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet21k_b0/feature_vector/2
# 中的内容下载并解压到当前目录中，重命名为EfficientNetV2_b0
efficientnet_v2_url = './EfficientNetV2_b0'
model_name = 'efficientnet_v2_b0'

# 仅用于推理，不训练
set_trainable=False

efficientnet_v2_b0 = get_tfhub_model(efficientnet_v2_url,
                                     model_name,
                                     model_trainable=set_trainable)

def efficientnet_v2_model():
    initializer = tf.keras.initializers.GlorotNormal()

    efficientnet_v2_sequential = Sequential([
        layers.Input(shape=CFG.IMAGE_SIZE, dtype=tf.float32, name='input_image'),
        efficientnet_v2_b0,
        layers.Dropout(0.2),
        layers.Dense(128, activation='relu', kernel_initializer=initializer),
        layers.Dense(2, dtype=tf.float32, activation='sigmoid', kernel_initializer=initializer)
    ], name='efficientnet_v2_sequential_model')

    return efficientnet_v2_sequential


# 实例化模型
model_efficientnet_v2 = efficientnet_v2_model()

# 打印模型信息
print(model_efficientnet_v2.summary())

EfficientNetV2_b0中的文件有：

模型的训练与验证

# 训练EfficientNet模型
tf.random.set_seed(CFG.SEED)

model_efficientnet_v2.compile(
    loss=tf.keras.losses.BinaryCrossentropy(),
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    metrics=METRICS
)

print(f'Training {model_efficientnet_v2.name}.')
print(f'Train on {len(train_new_df)} samples, validate on {len(val_df)} samples.')
print('----------------------------------')

efficientnet_v2_history = train_model(
    model_efficientnet_v2, CFG.EPOCHS, CALLBACKS,
    train_ds, val_ds,
    shuffling=False
)


efficientnet_v2_evaluation = model_efficientnet_v2.evaluate(test_ds)

efficientnet_v2_test_probabilities = model_efficientnet_v2.predict(test_ds, verbose=1)
efficientnet_v2_test_predictions = tf.argmax(efficientnet_v2_test_probabilities, axis=1)

可以看到，使用EfficientNet模型之后，准确率能达到92%以上，已经由于之前的baseline model的结果。

上述两种模型都是基于CNN架构的，另外自从Transformer结构被提出以来，陆陆续续出现了其他优秀的变种，最经典的莫过于Vision Transformer (ViT)，它首次将Transformer结构引入到计算机视觉当中，并取得了非常不错的效果。以下就是使用ViT结构进行皮肤癌分类的代码。

# 导入ViT，下载速度可能会有点慢
vit_model = vit.vit_b16(
    image_size=224,
    activation='sigmoid',
    pretrained=True,
    include_top=False,
    pretrained_top=False,
    # 根据自己的数据集修改
    classes=2
)

# 冻住模型层，仅用作推理
for layer in vit_model.layers:
    layer.trainable = False

# 定义ViT
def vit_b16_model():
    initializer = tf.keras.initializers.GlorotNormal()

    vit_b16_sequential = Sequential([
        layers.Input(shape=CFG.IMAGE_SIZE, dtype=tf.float32, name='input_image'),
        vit_model,
        layers.Dropout(0.2),
        layers.Dense(128, activation='relu', kernel_initializer=initializer),
        layers.Dense(2, dtype=tf.float32, activation='sigmoid', kernel_initializer=initializer)
    ], name='vit_b16_sequential_model')

    return vit_b16_sequential

# 实例化模型
model_vit_b16 = vit_b16_model()

# 训练并验证ViT

tf.random.set_seed(CFG.SEED)

model_vit_b16.compile(
    loss=tf.keras.losses.BinaryCrossentropy(),
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    metrics=METRICS
)

print(f'Training {model_vit_b16.name}.')
print(f'Train on {len(train_new_df)} samples, validate on {len(val_df)} samples.')
print('----------------------------------')

vit_b16_history = train_model(
    model_vit_b16, CFG.EPOCHS, CALLBACKS,
    train_ds, val_ds,
    shuffling=False
)

vit_b16_evaluation = model_vit_b16.evaluate(test_ds)

vit_b16_test_probabilities = model_vit_b16.predict(test_ds, verbose=1)
vit_b16_test_predictions = tf.argmax(vit_b16_test_probabilities, axis=1)

ViT 模型能够收敛到略高于验证损失的测试损失，并且测试损失和训练损失之间存在明显差异（可能出现了一些过拟合情况）。它还通过收敛到更低的测试损失并实现更高的准确率，性能优于卷积神经网络（CNN）模型。然而，与高效网络（EfficientNet）模型相比，它的性能略差。如果想要达到更好的性能，这个模型可能需要更多的训练轮次。实验表明，基于Transformer的架构的模型如果想要达到和CNN一样的性能，则对数据量的要求更高。

5. 可视化结果

(1) 3种模型训练损失和验证损失对比图

def plot_training_curves(history):
    
    loss = np.array(history.history['loss'])
    val_loss = np.array(history.history['val_loss'])

    accuracy = np.array(history.history['accuracy'])
    val_accuracy = np.array(history.history['val_accuracy'])

    epochs = range(len(history.history['loss']))

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 10))

    # Plot loss
    ax1.plot(epochs, loss, label='training_loss', marker='o')
    ax1.plot(epochs, val_loss, label='val_loss', marker='o')
    
    ax1.fill_between(epochs, loss, val_loss, where=(loss > val_loss), color='C0', alpha=0.3, interpolate=True)
    ax1.fill_between(epochs, loss, val_loss, where=(loss < val_loss), color='C1', alpha=0.3, interpolate=True)

    ax1.set_title('Loss (Lower Means Better)', fontsize=16)
    ax1.set_xlabel('Epochs', fontsize=12)
    ax1.legend()

    # Plot accuracy
    ax2.plot(epochs, accuracy, label='training_accuracy', marker='o')
    ax2.plot(epochs, val_accuracy, label='val_accuracy', marker='o')
    
    ax2.fill_between(epochs, accuracy, val_accuracy, where=(accuracy > val_accuracy), color='C0', alpha=0.3, interpolate=True)
    ax2.fill_between(epochs, accuracy, val_accuracy, where=(accuracy < val_accuracy), color='C1', alpha=0.3, interpolate=True)

    ax2.set_title('Accuracy (Higher Means Better)', fontsize=16)
    ax2.set_xlabel('Epochs', fontsize=12)
    ax2.legend();

# 绘制baseline模型
plot_training_curves(cnn_history)
# 绘制EfficientNet模型
plot_training_curves(efficientnet_v2_history)
# 绘制ViT模型
plot_training_curves(vit_b16_history)

观察到稳定收敛至更低的损失，且无过拟合迹象。

在训练 EfficientNet 模型期间可能发生了一些过拟合，因为训练损失以稳定的方式收敛，但验证损失在前几个轮次中不稳定，且最终收敛到比训练集更高的损失值。

在ViT模型的最后几个训练轮次中，可能发生了轻微的过拟合。尽管训练损失和验证损失以稳定的速率收敛，但在接近最后几个轮次时，两者之间形成了差距。

(2) 绘制混淆矩阵

# 绘画混淆矩阵
def plot_confusion_matrix(y_true, y_pred, classes='auto', figsize=(10, 10), text_size=12):
    # 生成混淆矩阵
    cm = confusion_matrix(y_true, y_pred)


    plt.figure(figsize=figsize)

    # 创建混淆矩阵热力图
    disp = sns.heatmap(
        cm, annot=True, cmap='Greens',
        annot_kws={"size": text_size}, fmt='g',
        linewidths=1, linecolor='black', clip_on=False,
        xticklabels=classes, yticklabels=classes)

    disp.set_title('Confusion Matrix', fontsize=24)
    disp.set_xlabel('Predicted Label', fontsize=20)
    disp.set_ylabel('True Label', fontsize=20)
    plt.yticks(rotation=0)

    plt.show()
    return

# 画baseline model的混淆矩阵
class_names = ['malignant', 'benign']

plot_confusion_matrix(
    test_df.label_encoded,
    cnn_test_predictions,
    figsize=(8, 8),
    classes=class_names)

# 画EfficientNet混淆矩阵
plot_confusion_matrix(
    test_df.label_encoded,
    efficientnet_v2_test_predictions,
    figsize=(8, 8),
    classes=class_names)

# 画ViT的混淆矩阵
plot_confusion_matrix(
    test_df.label_encoded,
    vit_b16_test_predictions,
    figsize=(8, 8),
    classes=class_names)

混淆矩阵的结构如下图，可以观察到，处于左上角到右下角的对角线上元素越多，说明模型的性能越好。

(3) 绘制ROC曲线

受试者工作特征（ROC，Receiver Operating Characteristic）曲线是一种度量，用于展示分类器系统在其判别阈值变化时的诊断能力。该曲线是通过在不同的分类阈值下，以纵轴表示真阳性率（TPR），横轴表示假阳性率（FPR）来绘制的。会计算受试者工作特征曲线下的面积（AUC），并将其作为一个指标，用以表明一个模型对数据点进行分类的优劣程度。

# 绘制baseline model的ROC Curve
plot_roc(test_df.label_encoded,
         cnn_test_probabilities,
         figsize=(10, 10), title_fontsize='large')

# 绘制EfficientNet模型的ROC Curve
plot_roc(test_df.label_encoded,
         efficientnet_v2_test_probabilities,
         figsize=(10, 10), title_fontsize='large')

# 绘制ViT模型的ROC Curve
plot_roc(test_df.label_encoded,
         vit_b16_test_probabilities,
         figsize=(10, 10), title_fontsize='large')

(4) 查看分类报告

# 查看分类报告
# baseline model
print(classification_report(test_df.label_encoded,
                            cnn_test_predictions,
                            target_names=class_names))

# EfficientNet V2 
print(classification_report(test_df.label_encoded,
                            efficientnet_v2_test_predictions,
                            target_names=class_names))

# ViT-b16
print(classification_report(test_df.label_encoded,
                            vit_b16_test_predictions,
                            target_names=class_names))

(5) 记录分类指标

# 记录分类指标
def generate_preformance_scores(y_true, y_pred, y_probabilities):
    model_accuracy = accuracy_score(y_true, y_pred)
    model_precision, model_recall, model_f1, _ = (
        precision_recall_fscore_support(
            y_true,
            y_pred,
            average="weighted"
        )
    )
    model_matthews_corrcoef = matthews_corrcoef(y_true, y_pred)

    print('=============================================')
    print(f'\nPerformance Metrics:\n')
    print('=============================================')
    print(f'accuracy_score:\t\t{model_accuracy:.4f}\n')
    print('_____________________________________________')
    print(f'precision_score:\t{model_precision:.4f}\n')
    print('_____________________________________________')
    print(f'recall_score:\t\t{model_recall:.4f}\n')
    print('_____________________________________________')
    print(f'f1_score:\t\t{model_f1:.4f}\n')
    print('_____________________________________________')
    print(f'matthews_corrcoef:\t{model_matthews_corrcoef:.4f}\n')
    print('=============================================')

    preformance_scores = {
        'accuracy_score': model_accuracy,
        'precision_score': model_precision,
        'recall_score': model_recall,
        'f1_score': model_f1,
        'matthews_corrcoef': model_matthews_corrcoef
    }
    return preformance_scores

# baseline model的性能分数
cnn_performance = generate_preformance_scores(test_df.label_encoded,
                                              cnn_test_predictions,
                                              cnn_test_probabilities)

# EfficientNet model的性能分数
efficientnet_v2_performance = generate_preformance_scores(test_df.label_encoded,
                                                          efficientnet_v2_test_predictions,
                                                          efficientnet_v2_test_probabilities)

# ViT模型的性能分数
vit_b16_performance = generate_preformance_scores(test_df.label_encoded,
                                                  vit_b16_test_predictions,
                                                  vit_b16_test_probabilities)

# 将上述三种模型的性能分数保存到DataFrame中
performance_df = pd.DataFrame({
    'model_cnn': cnn_performance,
    'model_efficientnet_v2': efficientnet_v2_performance,
    'model_vit_b16': vit_b16_performance
}).T

# 打印相关信息
print(performance_df)

# 柱状图的形式展示
performance_df.plot(
    kind="bar",
    figsize=(16, 8)
).legend(bbox_to_anchor=(1.0, 1.0))

plt.title('Performance Metrics', fontsize=20);
sns.despine();

baseline model:

EfficientNet:

ViT:

观察到，在所有指标上，ViT模型的表现都优于卷积神经网络（CNN）模型和EfficientNet模型，尤其是在马修斯相关系数（MCC）这一指标上。较高的马修斯相关系数意味着该模型的预测在统计学上具有较高的质量，并且该模型确实能够对未见过的样本进行有效泛化。

(6) 推理时间与模型性能之间的权衡

为了确定哪种模型最适合在现实场景中使用，需要考察与每个模型相关的权衡取舍，这要考虑到完成对一个样本的推理所需的时间，以及它的性能表现如何（即该模型对未见过的情况的泛化能力如何）。通过测量对给定样本推断出预测结果所需的平均时间，并将推理时间与一个有用的指标进行绘图对比来实现这一点。


# 权衡推理时间与性能；受硬件影响，在不同人的电脑上运行，获得时间可能会有所不同

# 计算推理时间
def compute_inference_time(model, ds, sample_count, inference_runs=5):
    total_inference_times = []
    inference_rates = []

    for _ in range(inference_runs):
        start = time.perf_counter()
        model.predict(ds)
        end = time.perf_counter()


        total_inference_time = end - start

        inference_rate = total_inference_time / sample_count

        total_inference_times.append(total_inference_time)
        inference_rates.append(inference_rate)


    avg_inference_time = sum(total_inference_times) / len(total_inference_times)
    avg_inference_time_uncertainty = (max(total_inference_times) - min(total_inference_times)) / 2

    avg_inference_rate = sum(inference_rates) / len(inference_rates)
    avg_inference_rate_uncertainty = (max(inference_rates) - min(inference_rates)) / 2

    print('====================================================')
    print(f'Model:\t\t{model.name}\n')
    print(f'Inference Time:\t{round(avg_inference_time, 6)}s \xB1 {round(avg_inference_time_uncertainty, 6)}s')
    print(
        f'Inference Rate:\t{round(avg_inference_rate, 6)}s/sample \xB1 {round(avg_inference_rate_uncertainty, 6)}s/sample')
    print('====================================================')

    return avg_inference_time, avg_inference_rate


cnn_inference = compute_inference_time(model_cnn, test_ds, len(test_df))

efficientnet_v2_inference = compute_inference_time(model_efficientnet_v2, test_ds, len(test_df))

vit_b16_inference = compute_inference_time(model_vit_b16, test_ds, len(test_df))


# 获取每个模型的马修斯系数
cnn_mcc = cnn_performance["matthews_corrcoef"]
efficientnet_mcc = efficientnet_v2_performance["matthews_corrcoef"]
vit_mcc = vit_b16_performance["matthews_corrcoef"]

plt.figure(figsize=(16, 8))

plt.scatter(cnn_inference[1], cnn_mcc, label=model_cnn.name)
plt.scatter(efficientnet_v2_inference[1], efficientnet_mcc, label=model_efficientnet_v2.name)
plt.scatter(vit_b16_inference[1], vit_mcc, label=model_vit_b16.name)

ideal_inference_rate = 0.0001
ideal_mcc = 1

# 绘制连接每个模型坐标与理想模型坐标的线
plt.scatter(ideal_inference_rate, ideal_mcc, label="Ideal Hypothetical Model", marker='s')
plt.plot([ideal_inference_rate, cnn_inference[1]], [ideal_mcc, cnn_mcc], ':')
plt.plot([ideal_inference_rate, efficientnet_v2_inference[1]], [ideal_mcc, efficientnet_mcc], ':')
plt.plot([ideal_inference_rate, vit_b16_inference[1]], [ideal_mcc, vit_mcc], ':')

plt.legend()
plt.title("Trade-Offs: Inference Rate vs. Matthews Correlation Coefficient", fontsize=20)
plt.xlabel("Inference Rate (s/sample)", fontsize=16)
plt.ylabel("Matthews Correlation Coefficient", fontsize=16);

sns.despine()

def dist(x1, x2, y1, y2):
    return np.sqrt(np.square(x2 - x1) + np.square(y2 - y1))

model_names = [model_cnn.name, model_efficientnet_v2.name, model_vit_b16.name]
model_scores = [cnn_mcc, efficientnet_mcc, vit_mcc]
model_rates = [cnn_inference[1], efficientnet_v2_inference[1], vit_b16_inference[1]]
trade_offs = [dist(ideal_inference_rate, inference_rate, ideal_mcc, score)
              for inference_rate, score in zip(model_rates, model_scores)]

print('Trade-Off Score: Inference Rate vs. MCC')
for name, inference_rate, score, trade in zip(model_names, model_rates, model_scores, trade_offs):
    print('---------------------------------------------------------')
    print(f'Model: {name}\n\nInference Rate: {inference_rate:.5f} | MCC: {score:.4f} | Trade-Off: {trade:.4f}')

# 最佳trade-off分数
print('=========================================================')
best_model_trade = min(trade_offs)
best_model_name = model_names[np.argmin(trade_offs)]
print(f'\nBest Optimal Model:\t{best_model_name}\nTrade-Off:\t\t{best_model_trade:.4f}\n')
print('=========================================================')

baseline model推理时间和推理率

EfficientNet推理时间和推理率

ViT推理时间和推理率

可以看到基于Transformer结构的ViT模型需要更长的推理时间

可以看到efficient离理想点最近，所以综合考虑性能和推理时间，EfficientNet的效果最好。

查看测试集的预测结果

# 查看测试集的预测结果
def view_multiple_predictions(df, model, sample_loader, count=10, color_map='rgb', title=None, fig_size=(14, 10)):
    rows = count // 5
    if count % 5 > 0:
        rows += 1

    idx = random.sample(df.index.to_list(), count)

    fig = plt.figure(figsize=fig_size)
    if title != None:
        fig.suptitle(title, fontsize=30)

    fig.tight_layout()
    fig.subplots_adjust(top=0.95)

    # Setup useful dictionaries
    label_set = {0: 'malignant', 1: 'benign'}
    color_set = {False: 'red', True: 'darkgreen'}

    for column, _ in enumerate(idx):
        # 预测标签
        img = sample_loader(df.image_path[_])
        probability = np.squeeze(
            model.predict(np.array([img]), verbose=0)
        )
        prediction = np.argmax(probability)

        correct_prediction = (prediction == df.label_encoded[_])

        ax = plt.subplot(rows, 5, column + 1)
        ax.set_title(
            f'Actual Label: {df.label[_]}',
            pad=20,
            fontsize=14,
            color=color_set[correct_prediction]
        )

        if color_map == 'rgb':
            ax.imshow(img)
        else:
            ax.imshow(tf.image.rgb_to_grayscale(img), cmap=color_map)

        txt = f'Prediction: {label_set[prediction]}\nProbability: {(100 * probability[prediction]):.2f}%'
        plt.xlabel(txt, fontsize=14, color=color_set[correct_prediction])

    return


# 查看baseline model的预测结果
view_multiple_predictions(
    test_df,
    model_cnn,
    _load,
    count=25,
    color_map='inferno',
    title='CNN Test Predictions',
    fig_size=(20, 28)
)

# 查看EfficientNet 模型的预测结果
view_multiple_predictions(
    test_df,
    model_efficientnet_v2,
    _load,
    count=25,
    color_map='inferno',
    title='EfficientNet V2 Test Predictions',
    fig_size=(20, 28)
)

# 查看ViT的预测结果
view_multiple_predictions(
    test_df,
    model_vit_b16,
    _load,
    count=25,
    color_map='inferno',
    title='ViT Test Predictions',
    fig_size=(20, 28)
)

plt.show()

其中红色表示预测错误的，绿色表示预测的种类和实际种类一致。

6. 项目全代码

import random
import os
import glob
import time

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import tensorflow as tf
import tensorflow_hub as hub
from tensorflow.keras import layers, Sequential
from tensorflow.keras.utils import plot_model

import sklearn
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score, f1_score, matthews_corrcoef,
    confusion_matrix, ConfusionMatrixDisplay,
    classification_report, precision_recall_fscore_support
)
from scikitplot.metrics import plot_roc

from vit_keras import vit

class CFG:
    EPOCHS = 10
    BATCH_SIZE = 32
    SEED = 42
    TF_SEED = 768
    HEIGHT = 224
    WIDTH = 224
    CHANNELS = 3
    IMAGE_SIZE = (224, 224, 3)

def seed_everything(seed=CFG.SEED):
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)

seed_everything(CFG.SEED)

# 定义路径
DATASET_PATH = "./skin_classification/"
TRAIN_PATH = './skin_classification/train/'
TEST_PATH = './skin_classification/test/'

# 对数据集总体概括
# print('数据集概括')
# print('*'*50)
# for dirpath, dirnames, filenames in os.walk(DATASET_PATH):
#     print(f"There are {len(dirnames)} directories and {len(filenames)} images in {dirpath}")
# print('*'*50)

train_images = glob.glob(f"{TRAIN_PATH}**/*.jpg")
test_images = glob.glob(f"{TEST_PATH}**/*.jpg")

train_size = len(train_images)
test_size = len(test_images)

total = train_size + test_size

# print(f'train samples count:\t\t{train_size}')
# print(f'test samples count:\t\t{test_size}')
# print('=======================================')
# print(f'TOTAL:\t\t\t\t{total}')

def generate_labels(image_paths):
    return [_.split('\\')[1] for _ in image_paths]

def build_df(image_paths):
    # 创建 dataframe
    df = pd.DataFrame({
        'image_path': image_paths,
        'label': generate_labels(image_paths)
    })

    # 生成标签编码
    df['label_encoded'] = df.apply(lambda row: 1 if row.label == 'malignant' else 0, axis=1)
    # 随机打乱并返回
    return df.sample(frac=1, random_state=CFG.SEED).reset_index(drop=True)

train_df = build_df(train_images)
test_df = build_df(test_images)

# # 查看训练集前五个样本信息
# print(train_df.head(5))

def _load(image_path):
    # 读取图片r
    image = tf.io.read_file(image_path)
    image = tf.io.decode_jpeg(image, channels=3)
    # Resize
    image = tf.image.resize(image, [CFG.HEIGHT, CFG.WIDTH],
                            method=tf.image.ResizeMethod.LANCZOS3)
    # 归一化图片
    image = tf.cast(image, tf.float32) / 255.
    return image

def view_sample(image, label, color_map='rgb', fig_size=(8, 10)):
    plt.figure(figsize=fig_size)
    if color_map == 'rgb':
        plt.imshow(image)
    else:
        plt.imshow(tf.image.rgb_to_grayscale(image), cmap=color_map)
    plt.title(f'Label: {label}', fontsize=16)
    plt.show()
    return

# 从训练集中随机抽取一个
idx = random.sample(train_df.index.to_list(), 1)[0]
# 加载样本及其对应的标签
sample_image, sample_label = _load(train_df.image_path[idx]), train_df.label[idx]
# 可视化一张图片
# view_sample(sample_image, sample_label, color_map='inferno')

def view_mulitiple_samples(df, sample_loader, count=10, color_map='rgb', fig_size=(14, 10)):
    rows = count // 5
    if count % 5 > 0:
        rows += 1
    idx = random.sample(df.index.to_list(), count)
    fig = plt.figure(figsize=fig_size)
    for column, _ in enumerate(idx):
        plt.subplot(rows, 5, column + 1)
        plt.title(f'Label: {df.label[_]}')

        if color_map == 'rgb':
            plt.imshow(sample_loader(df.image_path[_]))
        else:
            plt.imshow(tf.image.rgb_to_grayscale(sample_loader(df.image_path[_])), cmap=color_map)
    plt.show()
    return

# 查看多张图片
# view_mulitiple_samples(train_df, _load,
#                        count=25, color_map='inferno',
#                        fig_size=(20, 24))


# fig, (ax1, ax2) = plt.subplots(2, figsize=(14, 10))
# fig.tight_layout(pad=6.0)
# # 画出训练集的分布
# ax1.set_title('Train Labels Distribution', fontsize=20)
# train_distribution = train_df['label'].value_counts().sort_values()
# sns.barplot(x=train_distribution.values,
#             y=list(train_distribution.keys()),
#             orient="h",
#             ax=ax1)
#
# # 画出测试集的分布
# ax2.set_title('Test Labels Distribution', fontsize=20)
# test_distribution = test_df['label'].value_counts().sort_values()
# sns.barplot(x=test_distribution.values,
#             y=list(test_distribution.keys()),
#             orient="h",
#             ax=ax2);
# sns.despine();
# plt.show()

train_split_idx, val_split_idx, _, _ = train_test_split(train_df.index,
                                                        train_df.label_encoded,
                                                        test_size=0.15,
                                                        stratify=train_df.label_encoded,
                                                        random_state=CFG.SEED)
train_new_df = train_df.iloc[train_split_idx].reset_index(drop=True)
val_df = train_df.iloc[val_split_idx].reset_index(drop=True)


# fig, (ax1, ax2) = plt.subplots(2, figsize=(14, 10))
# fig.tight_layout(pad=6.0)
#
# # 画出新训练集的分布
# ax1.set_title('New Train Labels Distribution', fontsize=20)
# train_new_distribution = train_new_df['label'].value_counts().sort_values()
# sns.barplot(x=train_new_distribution.values,
#             y=list(train_new_distribution.keys()),
#             orient="h",
#             ax=ax1)
#
# # 画出验证集的分布
# ax2.set_title('Validation Labels Distribution', fontsize=20)
# val_distribution = val_df['label'].value_counts().sort_values()
# sns.barplot(x=val_distribution.values,
#             y=list(val_distribution.keys()),
#             orient="h",
#             ax=ax2);
# plt.show()
# sns.despine();

# 建立数据增强层
augmentation_layer = Sequential([
    layers.RandomFlip(mode='horizontal_and_vertical', seed=CFG.TF_SEED),
    layers.RandomZoom(height_factor=(-0.1, 0.1), width_factor=(-0.1, 0.1), seed=CFG.TF_SEED),
], name='augmentation_layer')

# image = tf.image.rgb_to_grayscale(sample_image)
#
# fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 10))
# fig.tight_layout(pad=6.0)
# # 原图
# ax1.set_title('Original Image', fontsize=20)
# ax1.imshow(image, cmap='inferno');
# # 数据增强之后的图
# ax2.set_title('Augmented Image', fontsize=20)
# ax2.imshow(augmentation_layer(image), cmap='inferno');
# plt.show()


def encode_labels(labels, encode_depth=2):
    return tf.one_hot(labels, depth=encode_depth).numpy()


def create_pipeline(df, load_function, augment=False, batch_size=32, shuffle=False, cache=None, prefetch=False):
    '''
    Generates an input pipeline using the tf.data API given a Pandas DataFrame and image loading function.

    @params
        - df: (pd.DataFrame) -> DataFrame containing paths and labels
        - load_function: (function) -> function used to load images given their paths
        - augment: (bool) -> condition for applying augmentation
        - batch_size: (int) -> size for batched (default=32)
        - shuffle: (bool) -> condition for data shuffling, data is shuffled when True (default=False)
        - cache: (str) -> cache path for caching data, data is not cached when None (default=None)
        - prefetch: (bool) -> condition for prefeching data, data is prefetched when True (default=False)

    @returns
        - dataset: (tf.data.Dataset) -> dataset input pipeline used to train a TensorFlow model
    '''
    # 从DataFrame得到图片的路径及其对应的标签
    image_paths = df.image_path
    image_labels = encode_labels(df.label_encoded)
    AUTOTUNE = tf.data.AUTOTUNE

    # 从DataFrame创建数据集
    ds = tf.data.Dataset.from_tensor_slices((image_paths, image_labels))
    # 是否进行数据增强
    if augment:
        ds = ds.map(lambda x, y: (augmentation_layer(load_function(x)), y), num_parallel_calls=AUTOTUNE)
    else:
        ds = ds.map(lambda x, y: (load_function(x), y), num_parallel_calls=AUTOTUNE)
    # 是否进行随机打乱
    if shuffle:
        ds = ds.shuffle(buffer_size=1000)
    ds = ds.batch(batch_size)
    if cache != None:
        ds = ds.cache(cache)
    if prefetch:
        ds = ds.prefetch(buffer_size=AUTOTUNE)
    return ds

# 生成训练数据pipeline
train_ds = create_pipeline(train_new_df, _load, augment=True,
                           batch_size=CFG.BATCH_SIZE,
                           shuffle=False, prefetch=True)

# 生成验证数据pipeline
val_ds = create_pipeline(val_df, _load,
                         batch_size=CFG.BATCH_SIZE,
                         shuffle=False, prefetch=False)

# 生成测试数据pipeline
test_ds = create_pipeline(test_df, _load,
                          batch_size=CFG.BATCH_SIZE,
                          shuffle=False, prefetch=False)


# 定义模型
def cnn_model():
    initializer = tf.keras.initializers.GlorotNormal()

    cnn_sequential = Sequential([
        layers.Input(shape=CFG.IMAGE_SIZE, dtype=tf.float32, name='input_image'),

        layers.Conv2D(16, kernel_size=3, activation='relu', kernel_initializer=initializer),
        layers.Conv2D(16, kernel_size=3, activation='relu', kernel_initializer=initializer),
        layers.MaxPool2D(pool_size=2, padding='valid'),

        layers.Conv2D(8, kernel_size=3, activation='relu', kernel_initializer=initializer),
        layers.Conv2D(8, kernel_size=3, activation='relu', kernel_initializer=initializer),
        layers.MaxPool2D(pool_size=2),

        layers.Flatten(),
        layers.Dropout(0.2),
        layers.Dense(128, activation='relu', kernel_initializer=initializer),
        layers.Dense(2, activation='sigmoid', kernel_initializer=initializer)
    ], name='cnn_sequential_model')

    return cnn_sequential

# 模型实例化
model_cnn = cnn_model()
#
# # 打印模型结构信息
# model_cnn.summary()

# 训练模型并返回数据
def train_model(model, num_epochs, callbacks_list, tf_train_data,
                tf_valid_data=None, shuffling=False):
    '''
        Trains a TensorFlow model and returns a dict object containing the model metrics history data.

        @params
        - model: (tf.keras.model) -> model to be trained
        - num_epochs: (int) -> number of epochs to train the model
        - callbacks_list: (list) -> list containing callback fuctions for model
        - tf_train_data: (tf.data.Dataset) -> dataset for model to be train on
        - tf_valid_data: (tf.data.Dataset) -> dataset for model to be validated on (default=None)
        - shuffling: (bool) -> condition for data shuffling, data is shuffled when True (default=False)

        @returns
        - model_history: (dict) -> dictionary containing loss and metrics values tracked during training
    '''

    model_history = {}

    if tf_valid_data != None:
        model_history = model.fit(tf_train_data,
                                  epochs=num_epochs,
                                  validation_data=tf_valid_data,
                                  validation_steps=int(len(tf_valid_data)),
                                  callbacks=callbacks_list,
                                  shuffle=shuffling)

    if tf_valid_data == None:
        model_history = model.fit(tf_train_data,
                                  epochs=num_epochs,
                                  callbacks=callbacks_list,
                                  shuffle=shuffling)
    return model_history

# 早停策略
early_stopping_callback = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=3,
    restore_best_weights=True)

# 学习率衰衰减
reduce_lr_callback = tf.keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss',
    patience=2,
    factor=0.1,
    verbose=1)

CALLBACKS = [early_stopping_callback, reduce_lr_callback]
# # 模型评估指标
METRICS = ['accuracy']


tf.random.set_seed(CFG.SEED)

# 编译模型
model_cnn.compile(
    loss=tf.keras.losses.BinaryCrossentropy(),
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    metrics=METRICS
)

# 开始训练！！！
print(f'Training {model_cnn.name}.')
print(f'Train on {len(train_new_df)} samples, validate on {len(val_df)} samples.')
print('----------------------------------')

cnn_history = train_model(
    model_cnn, CFG.EPOCHS, CALLBACKS,
    train_ds, val_ds,
    shuffling=False
)

# 评估模型
cnn_evaluation = model_cnn.evaluate(test_ds)

# 生成模型预测概率以及相关的预测结果
cnn_test_probabilities = model_cnn.predict(test_ds, verbose=1)
cnn_test_predictions = tf.argmax(cnn_test_probabilities, axis=1)

# 定义efficient模型

# 从TensorFlow Hub获取模型。
def get_tfhub_model(model_link, model_name, model_trainable=False):
    return hub.KerasLayer(model_link,
                          trainable=model_trainable,
                          name=model_name)

# 获取EfficientNetV2_B0模型
# 这里我是将网址：https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet21k_b0/feature_vector/2
# 中的内容下载并解压到当前目录中，重命名为EfficientNetV2_b0
efficientnet_v2_url = './EfficientNetV2_b0'
model_name = 'efficientnet_v2_b0'

# 仅用于推理，不训练
set_trainable=False

efficientnet_v2_b0 = get_tfhub_model(efficientnet_v2_url,
                                     model_name,
                                     model_trainable=set_trainable)

def efficientnet_v2_model():
    initializer = tf.keras.initializers.GlorotNormal()

    efficientnet_v2_sequential = Sequential([
        layers.Input(shape=CFG.IMAGE_SIZE, dtype=tf.float32, name='input_image'),
        efficientnet_v2_b0,
        layers.Dropout(0.2),
        layers.Dense(128, activation='relu', kernel_initializer=initializer),
        layers.Dense(2, dtype=tf.float32, activation='sigmoid', kernel_initializer=initializer)
    ], name='efficientnet_v2_sequential_model')

    return efficientnet_v2_sequential


# 实例化模型
model_efficientnet_v2 = efficientnet_v2_model()

# 打印模型信息
# print(model_efficientnet_v2.summary())

# 训练EfficientNet模型
tf.random.set_seed(CFG.SEED)

model_efficientnet_v2.compile(
    loss=tf.keras.losses.BinaryCrossentropy(),
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    metrics=METRICS
)

print(f'Training {model_efficientnet_v2.name}.')
print(f'Train on {len(train_new_df)} samples, validate on {len(val_df)} samples.')
print('----------------------------------')

efficientnet_v2_history = train_model(
    model_efficientnet_v2, CFG.EPOCHS, CALLBACKS,
    train_ds, val_ds,
    shuffling=False
)


efficientnet_v2_evaluation = model_efficientnet_v2.evaluate(test_ds)

efficientnet_v2_test_probabilities = model_efficientnet_v2.predict(test_ds, verbose=1)
efficientnet_v2_test_predictions = tf.argmax(efficientnet_v2_test_probabilities, axis=1)



# 导入ViT
vit_model = vit.vit_b16(
    image_size=224,
    activation='sigmoid',
    pretrained=True,
    include_top=False,
    pretrained_top=False,
    # 根据自己的数据集修改
    classes=2
)

# 冻住模型层，仅用作推理
for layer in vit_model.layers:
    layer.trainable = False

# 定义ViT
def vit_b16_model():
    initializer = tf.keras.initializers.GlorotNormal()

    vit_b16_sequential = Sequential([
        layers.Input(shape=CFG.IMAGE_SIZE, dtype=tf.float32, name='input_image'),
        vit_model,
        layers.Dropout(0.2),
        layers.Dense(128, activation='relu', kernel_initializer=initializer),
        layers.Dense(2, dtype=tf.float32, activation='sigmoid', kernel_initializer=initializer)
    ], name='vit_b16_sequential_model')

    return vit_b16_sequential

# 实例化模型
model_vit_b16 = vit_b16_model()

# 训练并验证ViT

tf.random.set_seed(CFG.SEED)

model_vit_b16.compile(
    loss=tf.keras.losses.BinaryCrossentropy(),
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    metrics=METRICS
)

print(f'Training {model_vit_b16.name}.')
print(f'Train on {len(train_new_df)} samples, validate on {len(val_df)} samples.')
print('----------------------------------')

vit_b16_history = train_model(
    model_vit_b16, CFG.EPOCHS, CALLBACKS,
    train_ds, val_ds,
    shuffling=False
)

vit_b16_evaluation = model_vit_b16.evaluate(test_ds)

vit_b16_test_probabilities = model_vit_b16.predict(test_ds, verbose=1)
vit_b16_test_predictions = tf.argmax(vit_b16_test_probabilities, axis=1)


def plot_training_curves(history):
    loss = np.array(history.history['loss'])
    val_loss = np.array(history.history['val_loss'])

    accuracy = np.array(history.history['accuracy'])
    val_accuracy = np.array(history.history['val_accuracy'])

    epochs = range(len(history.history['loss']))

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 10))

    # Plot loss
    ax1.plot(epochs, loss, label='training_loss', marker='o')
    ax1.plot(epochs, val_loss, label='val_loss', marker='o')

    ax1.fill_between(epochs, loss, val_loss, where=(loss > val_loss), color='C0', alpha=0.3, interpolate=True)
    ax1.fill_between(epochs, loss, val_loss, where=(loss < val_loss), color='C1', alpha=0.3, interpolate=True)

    ax1.set_title('Loss (Lower Means Better)', fontsize=16)
    ax1.set_xlabel('Epochs', fontsize=12)
    ax1.legend()

    # Plot accuracy
    ax2.plot(epochs, accuracy, label='training_accuracy', marker='o')
    ax2.plot(epochs, val_accuracy, label='val_accuracy', marker='o')

    ax2.fill_between(epochs, accuracy, val_accuracy, where=(accuracy > val_accuracy), color='C0', alpha=0.3,
                     interpolate=True)
    ax2.fill_between(epochs, accuracy, val_accuracy, where=(accuracy < val_accuracy), color='C1', alpha=0.3,
                     interpolate=True)

    ax2.set_title('Accuracy (Higher Means Better)', fontsize=16)
    ax2.set_xlabel('Epochs', fontsize=12)
    ax2.legend();
    plt.show()


# 绘制baseline模型
plot_training_curves(cnn_history)
# 绘制EfficientNet模型
plot_training_curves(efficientnet_v2_history)
# 绘制ViT模型
plot_training_curves(vit_b16_history)

# 绘画混淆矩阵
def plot_confusion_matrix(y_true, y_pred, classes='auto', figsize=(10, 10), text_size=12):
    # 生成混淆矩阵
    cm = confusion_matrix(y_true, y_pred)


    plt.figure(figsize=figsize)

    # 创建混淆矩阵热力图
    disp = sns.heatmap(
        cm, annot=True, cmap='Greens',
        annot_kws={"size": text_size}, fmt='g',
        linewidths=1, linecolor='black', clip_on=False,
        xticklabels=classes, yticklabels=classes)

    disp.set_title('Confusion Matrix', fontsize=24)
    disp.set_xlabel('Predicted Label', fontsize=20)
    disp.set_ylabel('True Label', fontsize=20)
    plt.yticks(rotation=0)

    plt.show()
    return

# 画baseline model的混淆矩阵
class_names = ['malignant', 'benign']

plot_confusion_matrix(
    test_df.label_encoded,
    cnn_test_predictions,
    figsize=(8, 8),
    classes=class_names)

# 画EfficientNet混淆矩阵
plot_confusion_matrix(
    test_df.label_encoded,
    efficientnet_v2_test_predictions,
    figsize=(8, 8),
    classes=class_names)

# 画ViT的混淆矩阵
plot_confusion_matrix(
    test_df.label_encoded,
    vit_b16_test_predictions,
    figsize=(8, 8),
    classes=class_names)


# 绘制baseline model的ROC Curve
plot_roc(test_df.label_encoded,
         cnn_test_probabilities,
         figsize=(10, 10), title_fontsize='large')

# 绘制EfficientNet模型的ROC Curve
plot_roc(test_df.label_encoded,
         efficientnet_v2_test_probabilities,
         figsize=(10, 10), title_fontsize='large')

# 绘制ViT模型的ROC Curve
plot_roc(test_df.label_encoded,
         vit_b16_test_probabilities,
         figsize=(10, 10), title_fontsize='large')

# 查看分类报告
print(classification_report(test_df.label_encoded,
                            cnn_test_predictions,
                            target_names=class_names))

# EfficientNet V2 ROC Curve
print(classification_report(test_df.label_encoded,
                            efficientnet_v2_test_predictions,
                            target_names=class_names))

# ViT-b16 ROC Curve
print(classification_report(test_df.label_encoded,
                            vit_b16_test_predictions,
                            target_names=class_names))

# 记录分类指标
def generate_preformance_scores(y_true, y_pred, y_probabilities):
    model_accuracy = accuracy_score(y_true, y_pred)
    model_precision, model_recall, model_f1, _ = (
        precision_recall_fscore_support(
            y_true,
            y_pred,
            average="weighted"
        )
    )
    model_matthews_corrcoef = matthews_corrcoef(y_true, y_pred)

    print('=============================================')
    print(f'\nPerformance Metrics:\n')
    print('=============================================')
    print(f'accuracy_score:\t\t{model_accuracy:.4f}\n')
    print('_____________________________________________')
    print(f'precision_score:\t{model_precision:.4f}\n')
    print('_____________________________________________')
    print(f'recall_score:\t\t{model_recall:.4f}\n')
    print('_____________________________________________')
    print(f'f1_score:\t\t{model_f1:.4f}\n')
    print('_____________________________________________')
    print(f'matthews_corrcoef:\t{model_matthews_corrcoef:.4f}\n')
    print('=============================================')

    preformance_scores = {
        'accuracy_score': model_accuracy,
        'precision_score': model_precision,
        'recall_score': model_recall,
        'f1_score': model_f1,
        'matthews_corrcoef': model_matthews_corrcoef
    }
    return preformance_scores

# baseline model的性能分数
cnn_performance = generate_preformance_scores(test_df.label_encoded,
                                              cnn_test_predictions,
                                              cnn_test_probabilities)

# EfficientNet model的性能分数
efficientnet_v2_performance = generate_preformance_scores(test_df.label_encoded,
                                                          efficientnet_v2_test_predictions,
                                                          efficientnet_v2_test_probabilities)

# ViT模型的性能分数
vit_b16_performance = generate_preformance_scores(test_df.label_encoded,
                                                  vit_b16_test_predictions,
                                                  vit_b16_test_probabilities)


# 将上述三种模型的性能分数保存到DataFrame中
performance_df = pd.DataFrame({
    'model_cnn': cnn_performance,
    'model_efficientnet_v2': efficientnet_v2_performance,
    'model_vit_b16': vit_b16_performance
}).T

# 打印相关信息
print(performance_df)

# 柱状图的形式展示
performance_df.plot(
    kind="bar",
    figsize=(16, 8)
).legend(bbox_to_anchor=(1.0, 1.0))

plt.title('Performance Metrics', fontsize=20);
sns.despine();


# 权衡推理时间与性能；受硬件影响，在不同人的电脑上运行，获得时间可能会有所不同

# 计算推理时间
def compute_inference_time(model, ds, sample_count, inference_runs=5):
    total_inference_times = []
    inference_rates = []

    for _ in range(inference_runs):
        start = time.perf_counter()
        model.predict(ds)
        end = time.perf_counter()


        total_inference_time = end - start

        inference_rate = total_inference_time / sample_count

        total_inference_times.append(total_inference_time)
        inference_rates.append(inference_rate)


    avg_inference_time = sum(total_inference_times) / len(total_inference_times)
    avg_inference_time_uncertainty = (max(total_inference_times) - min(total_inference_times)) / 2

    avg_inference_rate = sum(inference_rates) / len(inference_rates)
    avg_inference_rate_uncertainty = (max(inference_rates) - min(inference_rates)) / 2

    print('====================================================')
    print(f'Model:\t\t{model.name}\n')
    print(f'Inference Time:\t{round(avg_inference_time, 6)}s \xB1 {round(avg_inference_time_uncertainty, 6)}s')
    print(
        f'Inference Rate:\t{round(avg_inference_rate, 6)}s/sample \xB1 {round(avg_inference_rate_uncertainty, 6)}s/sample')
    print('====================================================')

    return avg_inference_time, avg_inference_rate


cnn_inference = compute_inference_time(model_cnn, test_ds, len(test_df))

efficientnet_v2_inference = compute_inference_time(model_efficientnet_v2, test_ds, len(test_df))

vit_b16_inference = compute_inference_time(model_vit_b16, test_ds, len(test_df))


# 获取每个模型的马修斯系数
cnn_mcc = cnn_performance["matthews_corrcoef"]
efficientnet_mcc = efficientnet_v2_performance["matthews_corrcoef"]
vit_mcc = vit_b16_performance["matthews_corrcoef"]

plt.figure(figsize=(16, 8))

plt.scatter(cnn_inference[1], cnn_mcc, label=model_cnn.name)
plt.scatter(efficientnet_v2_inference[1], efficientnet_mcc, label=model_efficientnet_v2.name)
plt.scatter(vit_b16_inference[1], vit_mcc, label=model_vit_b16.name)

ideal_inference_rate = 0.0001
ideal_mcc = 1

# 绘制连接每个模型坐标与理想模型坐标的线
plt.scatter(ideal_inference_rate, ideal_mcc, label="Ideal Hypothetical Model", marker='s')
plt.plot([ideal_inference_rate, cnn_inference[1]], [ideal_mcc, cnn_mcc], ':')
plt.plot([ideal_inference_rate, efficientnet_v2_inference[1]], [ideal_mcc, efficientnet_mcc], ':')
plt.plot([ideal_inference_rate, vit_b16_inference[1]], [ideal_mcc, vit_mcc], ':')

plt.legend()
plt.title("Trade-Offs: Inference Rate vs. Matthews Correlation Coefficient", fontsize=20)
plt.xlabel("Inference Rate (s/sample)", fontsize=16)
plt.ylabel("Matthews Correlation Coefficient", fontsize=16);

sns.despine()

def dist(x1, x2, y1, y2):
    return np.sqrt(np.square(x2 - x1) + np.square(y2 - y1))

model_names = [model_cnn.name, model_efficientnet_v2.name, model_vit_b16.name]
model_scores = [cnn_mcc, efficientnet_mcc, vit_mcc]
model_rates = [cnn_inference[1], efficientnet_v2_inference[1], vit_b16_inference[1]]
trade_offs = [dist(ideal_inference_rate, inference_rate, ideal_mcc, score)
              for inference_rate, score in zip(model_rates, model_scores)]

print('Trade-Off Score: Inference Rate vs. MCC')
for name, inference_rate, score, trade in zip(model_names, model_rates, model_scores, trade_offs):
    print('---------------------------------------------------------')
    print(f'Model: {name}\n\nInference Rate: {inference_rate:.5f} | MCC: {score:.4f} | Trade-Off: {trade:.4f}')

# 最佳trade-off分数
print('=========================================================')
best_model_trade = min(trade_offs)
best_model_name = model_names[np.argmin(trade_offs)]
print(f'\nBest Optimal Model:\t{best_model_name}\nTrade-Off:\t\t{best_model_trade:.4f}\n')
print('=========================================================')

# 查看测试集的预测结果
def view_multiple_predictions(df, model, sample_loader, count=10, color_map='rgb', title=None, fig_size=(14, 10)):
    rows = count // 5
    if count % 5 > 0:
        rows += 1

    idx = random.sample(df.index.to_list(), count)

    fig = plt.figure(figsize=fig_size)
    if title != None:
        fig.suptitle(title, fontsize=30)

    fig.tight_layout()
    fig.subplots_adjust(top=0.95)

    # Setup useful dictionaries
    label_set = {0: 'malignant', 1: 'benign'}
    color_set = {False: 'red', True: 'darkgreen'}

    for column, _ in enumerate(idx):
        # 预测标签
        img = sample_loader(df.image_path[_])
        probability = np.squeeze(
            model.predict(np.array([img]), verbose=0)
        )
        prediction = np.argmax(probability)

        correct_prediction = (prediction == df.label_encoded[_])

        ax = plt.subplot(rows, 5, column + 1)
        ax.set_title(
            f'Actual Label: {df.label[_]}',
            pad=20,
            fontsize=14,
            color=color_set[correct_prediction]
        )

        if color_map == 'rgb':
            ax.imshow(img)
        else:
            ax.imshow(tf.image.rgb_to_grayscale(img), cmap=color_map)

        txt = f'Prediction: {label_set[prediction]}\nProbability: {(100 * probability[prediction]):.2f}%'
        plt.xlabel(txt, fontsize=14, color=color_set[correct_prediction])

    return


# 查看baseline model的预测结果
view_multiple_predictions(
    test_df,
    model_cnn,
    _load,
    count=25,
    color_map='inferno',
    title='CNN Test Predictions',
    fig_size=(20, 28)
)

# 查看EfficientNet 模型的预测结果
view_multiple_predictions(
    test_df,
    model_efficientnet_v2,
    _load,
    count=25,
    color_map='inferno',
    title='EfficientNet V2 Test Predictions',
    fig_size=(20, 28)
)

# 查看ViT的预测结果
view_multiple_predictions(
    test_df,
    model_vit_b16,
    _load,
    count=25,
    color_map='inferno',
    title='ViT Test Predictions',
    fig_size=(20, 28)
)

plt.show()