setting里的USER_AGENT和DownloaderMiddleware里的useragent的区别

本文解析了setting中的USER_AGENT作为全局变量的作用,以及DownloaderMiddleware里useragent如何为每个request分配,可覆盖前者。强调了spider中所有request共享全局USER_AGENT。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

setting里的USER_AGENT是全局的变量,分配给spider的,美运行一次就分配一次,spider里的所有的request必须共享。

DownloaderMiddleware里的useragent是分配给每一个request的,可以覆盖前者。

这些是啥?2025-07-27 17:56:44 [scrapy.utils.log] INFO: Scrapy 2.13.3 started (bot: scrapybot) 2025-07-27 17:56:44 [scrapy.utils.log] INFO: Versions: {'lxml': '5.4.0', 'libxml2': '2.11.9', 'cssselect': '1.3.0', 'parsel': '1.10.0', 'w3lib': '2.3.1', 'Twisted': '25.5.0', 'Python': '3.13.0 (tags/v3.13.0:60403a5, Oct 7 2024, 09:38:07) [MSC v.1941 ' '64 bit (AMD64)]', 'pyOpenSSL': '25.1.0 (OpenSSL 3.5.1 1 Jul 2025)', 'cryptography': '45.0.5', 'Platform': 'Windows-10-10.0.19045-SP0'} 2025-07-27 17:56:44 [scrapy.addons] INFO: Enabled addons: [] 2025-07-27 17:56:44 [asyncio] DEBUG: Using selector: SelectSelector 2025-07-27 17:56:44 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2025-07-27 17:56:44 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop 2025-07-27 17:56:44 [scrapy.extensions.telnet] INFO: Telnet Password: 70d0475b95b184a2 2025-07-27 17:56:44 [py.warnings] WARNING: C:\Users\12572\PyCharmMiscProject\.venv\Lib\site-packages\scrapy\extensions\feedexport.py:455: ScrapyDeprecationWarning: The `FEED_URI` and `FEED_FORMAT` settings have been deprecated in favor of the `FEEDS` setting. Please see the `FEEDS` setting docs for more details exporter = cls(crawler) 2025-07-27 17:56:44 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2025-07-27 17:56:44 [scrapy.crawler] INFO: Overridden settings: {'CONCURRENT_REQUESTS': 4, 'DOWNLOAD_DELAY': 1.5, 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' '(KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'} 2025-07-27 17:56:44 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2025-07-27 17:56:45 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.start.StartSpiderMiddleware', 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2025-07-27 17:56:45 [scrapy.middleware] INFO: Enabled item pipelines: [] 2025-07-27 17:56:45 [scrapy.core.engine] INFO: Spider opened 2025-07-27 17:56:45 [py.warnings] WARNING: C:\Users\12572\PyCharmMiscProject\.venv\Lib\site-packages\scrapy\core\spidermw.py:433: ScrapyDeprecationWarning: __main__.BilibiliSpider defines the deprecated start_requests() method. start_requests() has been deprecated in favor of a new method, start(), to support asynchronous code execution. start_requests() will stop being called in a future version of Scrapy. If you use Scrapy 2.13 or higher only, replace start_requests() with start(); note that start() is a coroutine (async def). If you need to maintain compatibility with lower Scrapy versions, when overriding start_requests() in a spider class, override start() as well; you can use super() to reuse the inherited start() implementation without copy-pasting. See the release notes of Scrapy 2.13 for details: https://docs.scrapy.org/en/2.13/news.html warn( 2025-07-27 17:56:45 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-07-27 17:56:45 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2025-07-27 17:56:45 [selenium.webdriver.common.selenium_manager] DEBUG: Selenium Manager binary found at: C:\Users\12572\PyCharmMiscProject\.venv\Lib\site-packages\selenium\webdriver\common\windows\selenium-manager.exe 2025-07-27 17:56:45 [selenium.webdriver.common.selenium_manager] DEBUG: Executing process: C:\Users\12572\PyCharmMiscProject\.venv\Lib\site-packages\selenium\webdriver\common\windows\selenium-manager.exe --browser chrome --language-binding python --output json
最新发布
07-28
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值