ThinkChat🤖让你学习和工作更高效,注册即送10W Token,即刻开启你的AI之旅 广告
[TOC] ## 1.安装BeautifulSoup ``` pip install beautifulsoup4 ``` ## 2.使用BeautifulSoup ### 2.1.基本使用 ~~~ import requests from bs4 import BeautifulSoup url = "https://hz.5i5j.com/" headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 ' 'Safari/537.36 ' } resp = requests.get(url, headers=headers) soup = BeautifulSoup(resp.text, 'html.parser') div_class_item_class = soup.find_all('div', "item") div_class_item = soup.find('div', "item") print(type(div_class_item_class)) print(div_class_item) ~~~ ### 2.2.解析器 | 解析器 | 使用方法 | | --- | --- | | Python标准库 | BeautifulSoup(resp.text, 'html.parser') | | lxml HTML解析器 | BeautifulSoup(resp.text, 'lxml') | | lxml XML解析器 | BeautifulSoup(resp.text, ["lxml","xml"]) BeautifulSoup(resp.text, 'xml') | | html5lib | BeautifulSoup(resp.text, 'html5lib ') | ### 2.3.BeautifulSoup相关方法 `find_all() `返回的是<class 'bs4.element.ResultSet'> `find()` 返回的是<class 'bs4.element.Tag'> ~~~ div_class_item_class = soup.find_all('div', "item") div_class_item = soup.find('div', "item") print(type(div_class_item_class)) # <class 'bs4.element.ResultSet'> print(type(div_class_item)) # <class 'bs4.element.Tag'> ~~~ `select()`