1.起始地址http://j4b.xy4com/il_sii/symptom/1.htm,其余的url都是’http://j4.xy4y.com/il_sii/sy4ptom/+数组+(.htm),当有10136多个页面,想快速表达这个url方法如下:
for i in range(1, 10137):
start_urls.append('http://344b.x44y.com/il_sii/s44tom/' + str(i) + '.htm')
2.起始地址http://j3net/bw/toubu_t1_p1,第二页:http://j3net/bw/toubu_t1_p2
起始地址http://j3.net/bw/jingbu_t1_p1,第二页:http://j3.net/bw/jingbu_t1_p2
如何遍历全部的url呢?
links = ['/bw/toubu', '/bw/jingbu', '/bw/xiongbu', '/bw/fubu', '/bw/yaobu', '/bw/tunbu', '/bw/shangzhi']
links_size = [175, 25, 113, 107, 36, 7, 18]
start_urls = []
for li in range(len(links)):
for i in range(0, links_size[li]):
start_urls.append('http://3.net' + links[li] + '_t1_p' + str(i))
3.起始地址组成方式:
第一页:https://www.3next?start=1
第二页:https://www.3next?start=11
# 起始url,通过开发者工具,network中分析得出。以下使用了列表生成式,注意学习其使用方法。
# 以下只去了1,11,21,31页面,从1开始,10为步长进行取值
start_urls = ['https://www.3next?start=' + str(x) for x in range(1, 36, 10)]