爬虫select问题：无法得到预期结果

#/usr/bin/env python

#-- coding:utf-8 --
import requests
from bs4 import BeautifulSoup

#需求：爬取小说所有的章节标题和章节内容fanqienovel.com/page/6844802947079...
if name ==”main“:

#对首页的页面数据进行爬取
headers={
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36 Edg/99.0.1150.55"
}
url="https://fanqienovel.com/page/6844802947079998468?enter_from=Rank"
page_text=requests.get(url=url,headers=headers).text

#在首页解析出章节的标题和详情页的url
#1.实例化BeautifulSoup对象，需要将页面源码数据加载到该对象中
soup=BeautifulSoup(page_text,"lxml")
#解析章节标题和详情页的url
li_list=soup.select(".div.volume volume_first > div.chapter > div")#很有可能是这里出现了问题，所以我感觉我就是不会去寻找这个层级什么的，希望老师帮忙解答一下，谢谢老师
fp=open("./fanqie.text","w",encoding="utf-8")
for div in li_list:
    title=div.a.string
    detail_url="https://fanqienovel.com/"+div.a["href"]
    #对详情页发起请求，解析出章节内容
    detal_page_text=requests.get(url=detail_url,headers=headers).text
    #解析出详情页中相关的章节内容
    detal_soup=BeautifulSoup(detal_page_text,"lxml")
    div_tag=detal_soup.find("div",class_="muye-reader-content noselect")
    #解析到了章节的内容
    content=div_tag.text
    fp.write(title+":"+content+"\n")
    print(title,"爬取成功！！！！")

各位可以看看这个程序哪里错了！谢谢！

#/usr/bin/env python #-- coding:utf-8 -- import requests from bs4 import BeautifulSoup #需求：爬取小说所有的章节标题和章节内容 fanqienovel.com/page/6844802947079... #对首页的页面数据进行爬取 headers={ "User-Agent":("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 " "(KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36 Edg/99.0.1150.55") } url="https://fanqienovel.com/page/6844802947079998468?enter_from=Rank" page_text=requests.get(url=url,headers=headers).text #在首页解析出章节的标题和详情页的url #1.实例化BeautifulSoup对象，需要将页面源码数据加载到该对象中 soup=BeautifulSoup(page_text,"lxml") #解析章节标题和详情页的url li_list=soup.select( "div.volume.volume_first + div.chapter > div.chapter-item") # 修改处 fp=open("./fanqie.txt","w",encoding="utf-8") total = len(li_list) for i, div in enumerate(li_list): title=div.a.string detail_url="https://fanqienovel.com"+div.a["href"] # 修改处 #对详情页发起请求，解析出章节内容 response = requests.get(url=detail_url,headers=headers) # 修改处 if response.status_code != 200: # 修改处 print(title, '爬取失败！！！！') continue detal_page_text=response.text # 修改处 #解析出详情页中相关的章节内容 detal_soup=BeautifulSoup(detal_page_text,"lxml") div_tag=detal_soup.find("div",class_="muye-reader-content noselect") #解析到了章节的内容 content=div_tag.text fp.write(title+":"+content+"\n") print(title,"爬取成功！！！！") fp.close() # 修改处

d:\>python test3.py 第一章困龙出狱，神医归来爬取成功！！！！第二章出手救美爬取成功！！！！ ... 第1576章隐岛爬取成功！！！！第1577章广成传人爬取成功！！！！第1578章寿与天齐爬取成功！！！！

d:\>pip install lxml Collecting lxml Downloading lxml-4.8.0-cp39-cp39-win_amd64.whl (3.6 MB) ---------------------------------------- 3.6/3.6 MB 3.1 MB/s eta 0:00:00 Installing collected packages: lxml Successfully installed lxml-4.8.0

C:\Users\Jason\PycharmProjects\pythonProject\venv\Scripts\python.exe C:/Users/Jason/PycharmProjects/pythonProject/main.py Traceback (most recent call last): File "C:\Users\Jason\PycharmProjects\pythonProject\main.py", line 4, in <module> soup=BeautifulSoup(text,"lxml") File "C:\Users\Jason\PycharmProjects\pythonProject\venv\lib\site-packages\bs4\__init__.py", line 245, in __init__ raise FeatureNotFound( bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library? Process finished with exit code 1

Jason990420

1.9k 声望 / 個人 @ 個人

最佳答案

基本上, Process finished with exit code 0 基本上, 代表代码运行正常结束.

不行，他会报错的

这句话对事情没有帮助, 至少得说一下, 报什么错啊 ?!

后面不可以是 item

不可以嗎 ? 為什麼 ? 出什么错了 ?!

更新后代码

代码运行结果

file

3年前评论

RK346 （楼主）

行我试一下去

Traceback (most recent call last): File "E:\007X\网站爬虫.py", line 19, in soup=BeautifulSoup(page_text,"lxml") File "E:\007X\venv\lib\site-packages\bs4__init.py", line 245, in init__ raise FeatureNotFound( bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

Jason990420 （作者）

pip install lxml

还是不行，呜呜

讨论数量: 27

这样呢 ?

li_list=soup.select("div.volume.volume_first + div.chapter > div.chapter-item")

Process finished with exit code 0他总是出现这句话。不知道为什么

RK346

9 声望

RK346 （作者）（楼主）

出现了这个语句

@RK346 pip install lxml

@Jason990420 他说是无效的语言

@Jason990420 安装环境没有问题呀，我安装requests和bs4呀

Uploading file...

不是 BeautifulSoup 的参数, 是要安装库

这个库安装了吧

我在pythoncharm里面安装的

没有通过景象网站安装，是不是还要去镜像网站暗转一下是吗

哥这个库怎么下载呀，我新人，不好意思，麻烦你了

@RK346 bs4.Beautiful 找不到 ... 你的工作环境有问题, pycharm 还是虚拟环境 ???

@Jason990420 我用的pycharm

@Jason990420 工作环境是有一些问题，以前学习的时候就有一点感觉，那哥要不我吧所有关于Python的东西删除掉，之后重新安装可以吗

![Uploading file...]()

???

@Jason990420 哥我不会安装lxml库

简单的代码

from bs4 import BeautifulSoup

text = "<head> <div> </div> </head>"
soup=BeautifulSoup(text,"lxml")

soup.select("div")

有安裝 bs4, 没加裝 lxml 的运行結果

安裝 lxml 的步骤

最後的运行結果

C:\Users\Jason\PycharmProjects\pythonProject\venv\Scripts\python.exe C:/Users/Jason/PycharmProjects/pythonProject/main.py

Process finished with exit code 0

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容，与人为善，比聪明更重要！

帮助

爬虫select问题：无法得到预期结果

社区赞助商

关于 LearnKu

资源推荐

服务提供商

其他信息

爬虫select问题：无法得到预期结果

社区赞助商

关于 LearnKu

资源推荐

服务提供商

其他信息

请登录