Go语言下有个爬虫软件pholcus,写了个爬虫的规则,抓的是人民网的最新新闻。
pholcus开源软件做的还是挺棒的,但是觉得Go语言不太好玩。
规则放到了Github:
Github: https://github.com/itibbers/pholcus-spider-lib
顺便贴一下:
package spider_lib
// 基础包
import (
"log"
// "github.com/PuerkitoBio/goquery" //DOM解析
"github.com/henrylee2cn/pholcus/app/downloader/request" //必需
// "github.com/henrylee2cn/pholcus/logs" //信息输出
. "github.com/henrylee2cn/pholcus/app/spider" //必需
// . "github.com/henrylee2cn/pholcus/app/spider/common" //选用
// net包
// "net/http" //设置http.Header
// "net/url"
// 编码包
// "encoding/xml"
"encoding/json"
// 字符串处理包
// "regexp"
// "strconv"
// "strings"