安装BeautifulSoup
1、下载beautifulsoup4-4.3.2:http://www.crummy.com/software/BeautifulSoup/bs4/download/
2、解压到本地硬盘,C:\Python27\beautifulsoup4-4.3.2;
3、运行cmd, 执行python setup.py install或python setup.py build.
最基本的抓站
# -*- coding: cp936 -*-
#最基本的抓站
import urllib
content = urllib.urlopen("http://www.baidu.com").read()
ofile = open("web.html", 'w')
ofile.write(content)
ofile.close()
模拟浏览器抓取csdn
# -*- coding: cp936 -*-
import urllib2
url = "http://blog.csdn.net/"
res=urllib2.Request(url)
res.add_header('User-Agent','fake-client')#模拟浏览器登陆
response=urllib2.urlopen(res)
htmlcontent=response.read()
ofile = open("loginweb.html",'w')
ofile.write(htmlcontent)
ofile.close()