cpg数据库处理_找到未提取的pdf

最新推荐文章于 2025-03-05 15:52:23 发布

weixin_30409849

最新推荐文章于 2025-03-05 15:52:23 发布

阅读量82

点赞数

CC 4.0 BY-SA版权

文章标签：数据库 python 人工智能

原文链接：http://www.cnblogs.com/webRobot/p/5882358.html

本课程通过实战操作，使用sklearn进行乳腺癌细胞的数据挖掘，包括数据处理及模型建立等关键步骤。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

sklearn实战-乳腺癌细胞数据挖掘（博主亲自录制视频）

https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

cpg数据库处理_找到未提取的pdf，存放于文件夹Chinese_undeal_pdfs

move_unextracted_pdfs.py

# -*- coding: utf-8 -*-
"""
Created on Sun Sep 18 17:06:15 2016

@author: Administrator
"""

# -*- coding: utf-8 -*-
"""
Spyder Editor

This is a temporary script file.
"""
import shutil,xlrd

excelFilename="unextracted.xlsx"
sheetName="Sheet1"
data = xlrd.open_workbook(excelFilename)
table = data.sheets()[0] 
#总pdf列表
totalpdfs_list=table.col_values(0)[1:]
extractedpdfs_list=table.col_values(1)[1:]
#已经提取的pdf文件列表
extractedpdfs_list1=[i for i in extractedpdfs_list if i!=""]
#未被提取的pdf文件列表
unextractedPdfs_list=[i for i in totalpdfs_list if i not in extractedpdfs_list1]
#移动失败的文件列表
failed_files=[]



#移动函数，目录里不匹配文件移入unmatching_file文件夹
def RemoveFile():
    dir="Chinese_undeal_pdfs"
    for file in unextractedPdfs_list:
        try:
            shutil.move(file,dir)
        except:
            failed_files.append(file)
            continue

RemoveFile()

移动英语pdf文件

remove_englishFile.py

# -*- coding: utf-8 -*-
"""
Spyder Editor
remove_englishFile.py
This is a temporary script file.
"""
import shutil,xlrd

excelFilename="be_cpg_English.xlsx"
sheetName="Sheet1"
data = xlrd.open_workbook(excelFilename)
table = data.sheets()[0] 
EnglishFile_list=table.col_values(0)[1:]
#移动函数，目录里不匹配文件移入unmatching_file文件夹
def RemoveFile():
    dir="English"
    for file in EnglishFile_list:
        shutil.move(file,dir)