首页python 通过xpath 抓取网页表格中的文本和链接

python 通过xpath 抓取网页表格中的文本和链接

时间: 2024-12-10 20:54:39 浏览: 113

Python中使用XPath抓取网页表格中的文本和链接通常需要结合`requests`库获取网页内容，然后使用如`lxml`或`selenium`库解析HTML文档。这里是一个简单的示例： ```python import requests from lxml import etree # 发送请求获取网页 url = "https://example.com/table_page" # 替换为你想抓取的网页URL response = requests.get(url) # 检查请求是否成功 if response.status_code == 200: # 解析网页内容为XML树 html_content = response.text parser = etree.HTMLParser() tree = etree.fromstring(html_content, parser) # 使用XPath表达式定位表格元素 table_xpath = "//table[@id='your_table_id']" # 替换为实际的表格ID table = tree.xpath(table_xpath)[0] # 提取单元格中的文本和链接 texts_and_links = [] for row in table.iter('tr'): cells = row.findall('td') for cell in cells: text = cell.text_content() or '' link = cell.find('a') is not None and cell.find('a').get('href') or '' texts_and_links.append({'text': text, 'link': link}) # 打印提取的信息 for item in texts_and_links: print(f'Text: {item["text"]}, Link: {item["link"]}') else: print(f"Failed to fetch the page with status code: {response.status_code}")

阅读全文