首页python爬取电商怎么翻页

python爬取电商怎么翻页

时间: 2025-03-26 18:14:04 浏览: 36

### 实现电商网站自动翻页爬取对于电商网站的网页翻页爬取，Playwright 是一种高效的选择，因为它可以模拟真实的浏览器行为并处理复杂的 JavaScript 加载过程[^3]。下面是一个具体的例子展示如何使用 Python 和 Playwright 来实现自动翻页。 #### 安装依赖库首先需要安装 `playwright` 库以及对应的 Chromium 浏览器： ```bash pip install playwright playwright install ``` #### 编写爬虫脚本接下来编写一段用于京东页面的商品列表抓取代码，并尝试访问多页内容： ```python import asyncio from playwright.async_api import async_playwright async def jd_spider(): url = 'https://search.jd.com/Search?keyword=Python%E4%B9%A6%E7%B1%8D&page={}' async with async_playwright() as p: browser = await p.chromium.launch(headless=False) context = await browser.new_context() for page_num in range(1, 5): # 假设只爬前四页作为示范 full_url = url.format(page_num * 2 - 1) # JD每两页URL参数加1 page = await context.new_page() await page.goto(full_url) items = await page.query_selector_all('.gl-item') for item in items[:3]: # 只打印前三项简化输出 title = await item.query_selector('.p-name em') price = await item.query_selector('.p-price i') print({ "title": (await title.text_content()).strip(), "price": (await price.text_content()).strip() }) await page.close() await browser.close() if __name__ == '__main__': asyncio.run(jd_spider()) ``` 这段代码实现了对指定关键词搜索结果页面的连续翻阅，并提取了部分商品的信息。注意这里为了适应京东的实际 URL 构造方式做了特殊调整（即每次增加两个单位），其他平台可能不需要这样做[^1]。

阅读全文