💎一站式轻松地调用各大LLM模型接口,支持GPT4、智谱、豆包、星火、月之暗面及文生图、文生视频 广告
**爬取图片** 接着上一个的例子,如果我想获取文章详情页面中内容的第一张图 ``` package main import ( "github.com/PeterYangs/article-spider/fileTypes" "github.com/PeterYangs/article-spider/form" "github.com/PeterYangs/article-spider/spider" ) func main() { f := form.Form{ Host: "https://www.weixz.com", Channel: "/zxzx/list_[PAGE].html", Limit: 5, PageStart: 1, ListSelector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.information-main-list > ul > li", ListHrefSelector: "div.information-main-list-title > a", DetailFields: map[string]form.Field{ "title": {Types: fileTypes.SingleField, Selector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.informationContents > div.informationContentTitle > h1"}, "image": {Types: fileTypes.SingleImage, Selector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.informationContents > div.informationContentText img:nth-child(1)"}, }, } spider.Start(f) } ``` **fileTypes.SingleImage**是获取单个图片的类型,**Selector**为选择器 图片将下载在 **/image路径下**,获取的结果为 ![](https://img.kancloud.cn/57/09/5709501735b2d33f8a6c1f2039985090_1175x578.png) <br/> 有时候,我们希望图片加上特定的前缀,比如我想要所有的图片都加上/image,最终结果为/image/9387005ae5ef45a9b147a57f6da81042.jpg,修改代码为 ``` package main import ( "github.com/PeterYangs/article-spider/fileTypes" "github.com/PeterYangs/article-spider/form" "github.com/PeterYangs/article-spider/spider" ) func main() { f := form.Form{ Host: "https://www.weixz.com", Channel: "/zxzx/list_[PAGE].html", Limit: 5, PageStart: 1, ListSelector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.information-main-list > ul > li", ListHrefSelector: "div.information-main-list-title > a", DetailFields: map[string]form.Field{ "title": {Types: fileTypes.SingleField, Selector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.informationContents > div.informationContentTitle > h1"}, "image": {Types: fileTypes.SingleImage, Selector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.informationContents > div.informationContentText img:nth-child(1)", ImagePrefix: "/image"}, }, } spider.Start(f) } ``` ImagePrefix为图片名称前缀 <br/><br/> 现在又有一种情况,如果爬取的图片过多,全部集中在image文件夹下会有点乱,我想根据时间或者日期整理到子文件夹中,修改代码如下 <br/> 根据日期 ``` package main import ( "github.com/PeterYangs/article-spider/fileTypes" "github.com/PeterYangs/article-spider/form" "github.com/PeterYangs/article-spider/spider" ) func main() { f := form.Form{ Host: "https://www.weixz.com", Channel: "/zxzx/list_[PAGE].html", Limit: 5, PageStart: 1, ListSelector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.information-main-list > ul > li", ListHrefSelector: "div.information-main-list-title > a", DetailFields: map[string]form.Field{ "title": {Types: fileTypes.SingleField, Selector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.informationContents > div.informationContentTitle > h1"}, "image": {Types: fileTypes.SingleImage, Selector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.informationContents > div.informationContentText img:nth-child(1)", ImagePrefix: "/image",ImageDir: "[date:Ymd]"}, }, } spider.Start(f) } ``` 结果为 ![](https://img.kancloud.cn/b8/82/b882617437fd236901e557c6a1e4c1a5_979x543.png) <br/> 根据随机数 ``` package main import ( "github.com/PeterYangs/article-spider/fileTypes" "github.com/PeterYangs/article-spider/form" "github.com/PeterYangs/article-spider/spider" ) func main() { f := form.Form{ Host: "https://www.weixz.com", Channel: "/zxzx/list_[PAGE].html", Limit: 5, PageStart: 1, ListSelector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.information-main-list > ul > li", ListHrefSelector: "div.information-main-list-title > a", DetailFields: map[string]form.Field{ "title": {Types: fileTypes.SingleField, Selector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.informationContents > div.informationContentTitle > h1"}, "image": {Types: fileTypes.SingleImage, Selector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.informationContents > div.informationContentText img:nth-child(1)", ImagePrefix: "/image",ImageDir: "[random:1-100]"}, }, } spider.Start(f) } ``` 结果为 ![](https://img.kancloud.cn/4b/14/4b14591c5122179ba9afe49ae4095597_940x634.png) <br/><br/> 我现在又想将文件夹名称改为标题的名称,如: /image/珍珑对弈,零元购福利《天龙3D》新资料片今日上线!/b260e4e5fc324ea1bec80afefd9c05d8.jpg,修改代码为 ``` package main import ( "github.com/PeterYangs/article-spider/fileTypes" "github.com/PeterYangs/article-spider/form" "github.com/PeterYangs/article-spider/spider" ) func main() { f := form.Form{ Host: "https://www.weixz.com", Channel: "/zxzx/list_[PAGE].html", Limit: 5, PageStart: 1, ListSelector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.information-main-list > ul > li", ListHrefSelector: "div.information-main-list-title > a", DetailFields: map[string]form.Field{ "title": {Types: fileTypes.SingleField, Selector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.informationContents > div.informationContentTitle > h1"}, "image": {Types: fileTypes.SingleImage, Selector: "body > div > div.information-main.mt-20px.wd1200.displayFlex > div.information-main-left > div.informationContents > div.informationContentText img:nth-child(1)", ImagePrefix: "/image",ImageDir: "[singleField:title]"}, }, } spider.Start(f) } ``` 结果为 ![](https://img.kancloud.cn/a2/c9/a2c96485f7fc2e29bcc2db0b2f08a2d0_1525x679.png) **[singleField:title]** 中的title为singleField中的某一个字段