python 实现代码：当response.status_code == 404，continue

当response.status_code == 404时，代码会跳过当前循环，继续执行下一个循环。这是因为HTTP状态码404表示请求的资源不存在，所以我们可以在代码中通过continue语句跳过这个不存在的资源，继续执行后面的代码。

from requests_html import HTMLSession import os class Spider: def init(self): self.base_url = 'https://s3-ap-northeast-1.amazonaws.com/data.binance.vision/data/spot/daily/klines' self.pair = '1INCHBTC' self.interval = '1d' self.session = HTMLSession() def get_urls(self): urls = [] # 首页 response = self.session.get(f'{self.base_url}/{self.pair}/{self.interval}/') if response.status_code == 200: for link in response.html.links: if link.endswith('.zip'): urls.append(link) # 分页 while True: response = self.session.get(response.html.links[-1]) if response.status_code != 200: # 请求失败 break for link in response.html.links: if link.endswith('.zip'): urls.append(link) if 'CHECKSUM' in response.html.links[-1]: break return urls def download_files(self): urls = self.get_urls() if not urls: print('下载失败') return if not os.path.exists('download_files'): os.mkdir('download_files') for url in urls: file_name = url.split('/')[-1] file_path = f'download_files/{file_name}' if os.path.exists(file_path): # 文件已存在 print(f'{file_name} 已存在') continue response = self.session.get(url) if response.status_code != 200: # 请求失败 print(f'{file_name} 下载失败') continue with open(file_path, 'wb') as f: f.write(response.content) print(f'{file_name} 下载成功') def run(self): self.download_files()

这是一个 Python 爬虫程序，用于从 Binance 网站下载 1INCHBTC 在 1 天内的 K 线数据。程序使用了 requests_html 库来处理 HTTP 请求和解析 HTML。在 Spider 类中，构造函数 __init__ 定义了程序需要的一些基础参数，如 Binance 的数据下载链接、交易对、时间间隔、HTTP 会话等。方法 get_urls 用来获取数据文件的下载链接列表，方法 download_files 用来下载这些数据文件，如果文件已存在则不会重复下载。最后，在 run 方法中调用了 download_files 方法来运行整个程序。

from requests_html import HTMLSession import os class Spider: def init(self): self.base_url = 'https://s3-ap-northeast-1.amazonaws.com/data.binance.vision/data/spot/daily/klines' self.pair = '1INCHBTC' self.interval = '1d' self.session = HTMLSession() def get_urls(self): urls = [] # 首页 response = self.session.get(f'{self.base_url}/{self.pair}/{self.interval}/') if response.status_code == 200: for link in response.html.links: if link.endswith('.zip'): urls.append(link) # 分页 while True: response = self.session.get(response.html.links[-1]) if response.status_code != 200: # 请求失败 break for link in response.html.links: if link.endswith('.zip'): urls.append(link) if 'CHECKSUM' in response.html.links[-1]: break return urls def download_files(self): urls = self.get_urls() if not urls: print('下载失败') return if not os.path.exists('download_files'): os.mkdir('download_files') for url in urls: file_name = url.split('/')[-1] file_path = f'/Users/huyang/Desktop/2023/Grach/{file_name}' if os.path.exists(file_path): # 文件已存在 print(f'{file_name} 已存在') continue response = self.session.get(url) if response.status_code != 200: # 请求失败 print(f'{file_name} 下载失败') continue with open(file_path, 'wb') as f: f.write(response.content) print(f'{file_name} 下载成功') def run(self): self.download_files()，将对应的下载数据的网站地址改为'http://www.greenfinance.org.cn/'

您需要更改 Spider 类中的以下属性和方法： ```python class Spider: def __init__(self): self.base_url = 'http://www.greenfinance.org.cn/' self.session = HTMLSession() def get_urls(self): urls = [] response = self.session.get(self.base_url) if response.status_code == 200: for link in response.html.links: if link.endswith('.xls') or link.endswith('.xlsx'): urls.append(link) return urls def download_files(self): urls = self.get_urls() if not urls: print('下载失败') return if not os.path.exists('download_files'): os.mkdir('download_files') for url in urls: file_name = url.split('/')[-1] file_path = f'/Users/huyang/Desktop/2023/Grach/{file_name}' if os.path.exists(file_path): print(f'{file_name} 已存在') continue response = self.session.get(url) if response.status_code != 200: print(f'{file_name} 下载失败') continue with open(file_path, 'wb') as f: f.write(response.content) print(f'{file_name} 下载成功') def run(self): self.download_files() ``` 这里的 `get_urls()` 方法会获取 http://www.greenfinance.org.cn/ 页面中所有以 `.xls` 或 `.xlsx` 结尾的链接，而 `download_files()` 方法会下载这些链接指向的文件。请注意更改 `file_path` 变量以指向您想要保存文件的位置。最后，您可以通过调用 `run()` 方法来运行该类并下载文件。

阅读全文

python 实现代码：当response.status_code == 404，continue

相关推荐

404代码。

Python中if __name__ == '__main__'作用解析

python-2.7.5-90.el7.x86_64.rpm

大家在看

RetweetBot::pizza:实现自动转发最新twitter到QQ的机器人

HA_PandoraRecovery211 数据恢复

世界行政范围.zip

mppt恒压法.rar

Hi5a控制器操作手册.pdf

最新推荐

BBS网络论坛---项目计划书.doc

项目管理--成虎--第二章课后习题答案.doc

网络安全基础与攻击防范教学PPT课件

【性能测试基准】：为RK3588选择合适的NVMe性能测试工具指南

setSceneRect

提供源文件的FLASH华丽翻书特效教程

【固态硬盘寿命延长】：RK3588平台NVMe维护技巧大公开

word32赋值word16

VC实现简单COM组件，初探COM编程技巧

【故障恢复策略】：RK3588与NVMe固态硬盘的容灾方案指南

Python中if name == 'main'作用解析