如何用beautifulsoup提取信息（用BeautifulSoup的select获取酷狗排行榜）

用Python进行爬取以下数据：

如何用beautifulsoup提取信息（用BeautifulSoup的select获取酷狗排行榜）(1)

重要公式：

1、from bs4 import BeautifulSoup

2、soup=BeautifulSoup(wb_data.text,'lxml')

3、soup.select

4、title.get_text()

根据网页的特征，是个静态的网页，带上headers，就能返回网页的数据。因此爬取是没问题的，只是对返回的数据，需要筛选提取。

如何用beautifulsoup提取信息（用BeautifulSoup的select获取酷狗排行榜）(2)

复制的内容：

#rankWrap > div.pc_temp_songlist.pc_rank_songlist_short > ul > li:nth-child(1) > a

这样我们只是得到其中一个a标签，要获取所有的a标签，需要改成以下的格式

titles=soup.select("div.pc_temp_songlist > ul > li > a")

这个title.get_text()是获取title的文本格式，返回的是字符串格式str。

from bs4 import BeautifulSoup import requests import time import json headers={ "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36" } def get_info(url): wb_data=requests.get(url,headers=headers) soup=BeautifulSoup(wb_data.text,'lxml') ranks=soup.select("span.pc_temp_num") titles=soup.select("div.pc_temp_songlist > ul > li > a") times=soup.select(" div.pc_temp_songlist > ul > li > span.pc_temp_tips_r > span") for rank,title,time in zip(ranks,titles,times): data={"rank":rank.get_text().strip(), "singer":title.get_text().split("-")[1].strip(), "song":title.get_text().split("-")[0].strip(), "time":time.get_text().strip() } print(data) if __name__ == '__main__': url="https://www.kugou.com/yy/rank/home/1-6666.html?from=rank" get_info(url)

以上是所有的代码。

如何用beautifulsoup提取信息（用BeautifulSoup的select获取酷狗排行榜）

最新推荐

热门推荐