爬取某小说网站,代码未报错,运行后下载文件为空,请高手指点下
```python
import requests
from lxml import etree
url = "https://www.doupo321.com/yijianduzun/" # 小说网址 斗破小说网
re = requests.get(url) # 访问小说网站,发送一个get请求
re.encoding = "utf-8"
html = etree.HTML(re.text)
urs = html.xpath("/html/body/div[1]/div[2]/div[1]/div[3]/div[2]/ul//@href")
shu_name = html.xpath(
"/html/body/div[1]/div[2]/div[1]/div[1]/div[2]/h1/text()")[0]
Y = 0
print(f"{shu_name}开始下载,共{len(urs)}章")
for i in urs:
urls1 = url + i
re1 = requests.get(urls1) # re1章节页面
re1.encoding = "utf-8"
html1 = etree.HTML(re1.text)
内容 = html1.xpath(
"/html/body/div[1]/div[1]/div[4]//text()")
neir = ''
for x in 内容:
neir = neir + str(x) + "\n" # str(x) x为待被转换成字符串的参数;\n为分隔每行数据,用于打印多行数据
with open(shu_name + ".txt", "a", encoding="utf-8") as f: # 将内容写入[书名]
f.write(neir)
Y = Y + 1
print(f"第{Y}章下载完成")
if Y == 10:
exit()
程序运行后,下载的小说一剑独尊为空,print(urs)、print(shu_name)均有值,print(内容)为空,高度怀疑是
内容 = html1.xpath(
“/html/body/div[1]/div[1]/div[4]//text()”)
绝对路径有问题,结合小说网页html,麻烦懂绝对路径的指点下
Wrong url for each chapter, revised as following.
Demo Code