五 写spider
1.知道了要爬取的内容,所以,我们首先在start_urls中设置如下:
start_urls=['https://top.taobao.com/index.php?topId=TR_FS&leafId=50010850','https://top.taobao.com/index.php?topId=TR_SM&leafId=1101','https://top.taobao.com/index.php?topId=TR_HZP&leafId=121454013','https://top.taobao.com/index.php?topId=TR_MY&leafId=50013618','https://top.taobao.com/index.php?topId=TR_SP&leafId=50008055','https://top.taobao.com/index.php?topId=TR_WT&leafId=50014075','https://top.taobao.com/index.php?topId=TR_JJ&leafId=50016434','https://top.taobao.com/index.php?topId=TR_ZH&leafId=50011975']
即从每一个大类的页面开始,提出这个页面所含有的所有子类的url。