NIUCLOUD是一款SaaS管理后台框架多应用插件+云编译。上千名开发者、服务商正在积极拥抱开发者生态。欢迎开发者们免费入驻。一起助力发展! 广告
有些网站的文章列表是api请求的,这种网站也是可以爬取的 如这个网站 http://www.tiyuxiu.com/ ![](https://img.kancloud.cn/f0/3e/f03e1b18a6161ff3d0c2afd03da67b5b_716x937.png) <br/> 在检查器中看到api为 ![](https://img.kancloud.cn/66/11/661152ed198975aa38917af0faf0c682_554x315.png) <br/> 开始写代码 ```~~~ package main import ( "encoding/json" "github.com/PeterYangs/article-spider/apiSpider" "github.com/PeterYangs/article-spider/fileTypes" "github.com/PeterYangs/article-spider/form" "github.com/PeterYangs/tools" "log" "strconv" ) func main() { f := form.Form{ Host: "http://www.tiyuxiu.com", Channel: "/data/list_0_[PAGE].json?__t=16192263", Limit: 5, PageStart: 1, DetailFields: map[string]form.Field{ "title": {Types: fileTypes.SingleField, Selector: "body > div.container.main-container.clear.clearfix > div.pleft.mt10 > div.article-header > h1"}, "content": {Types: fileTypes.HtmlWithImage, Selector: "#main-content"}, "desc": {Types: fileTypes.Attr, Selector: "meta[name=\"description\"]", AttrKey: "content"}, "keyword": {Types: fileTypes.Attr, Selector: "meta[name=\"keywords\"]", AttrKey: "content"}, }, DetailMaxCoroutine: 1, ApiConversion: func(result string) []string { var jsons []map[string]interface{} err := json.Unmarshal([]byte(result), &jsons) if err != nil { log.Print(err) return []string{} } var linkList []string for _, m := range jsons { linkList = append(linkList, m["url"].(string)) } return linkList }, HttpHeader: map[string]string{"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"}, } apiSpider.Start(f) } ``` 和常规爬取不同的是,需要写好**ApiConversion**转换函数,也就是要将获取的api列表数据解析,并返回一个 **[]string** 的文章详情链接数据