爬取某小说网站,代码未报错,运行后下载文件为空,请高手指点下

```python
import requests
from lxml import etree
url = "https://www.doupo321.com/yijianduzun/"  # 小说网址 斗破小说网
re = requests.get(url)  # 访问小说网站,发送一个get请求
re.encoding = "utf-8"
html = etree.HTML(re.text)
urs = html.xpath("/html/body/div[1]/div[2]/div[1]/div[3]/div[2]/ul//@href")
shu_name = html.xpath(
    "/html/body/div[1]/div[2]/div[1]/div[1]/div[2]/h1/text()")[0]
Y = 0
print(f"{shu_name}开始下载,共{len(urs)}章")
for i in urs:
    urls1 = url + i
    re1 = requests.get(urls1)  # re1章节页面
    re1.encoding = "utf-8"
    html1 = etree.HTML(re1.text)
    内容 = html1.xpath(
          "/html/body/div[1]/div[1]/div[4]//text()")
    neir = ''
    for x in 内容:
        neir = neir + str(x) + "\n"   # str(x) x为待被转换成字符串的参数;\n为分隔每行数据,用于打印多行数据
    with open(shu_name + ".txt", "a", encoding="utf-8") as f:  # 将内容写入[书名]
        f.write(neir)
    Y = Y + 1
    print(f"第{Y}章下载完成")
    if Y == 10:  
        exit()

程序运行后,下载的小说一剑独尊为空,print(urs)、print(shu_name)均有值,print(内容)为空,高度怀疑是
内容 = html1.xpath(
“/html/body/div[1]/div[1]/div[4]//text()”)
绝对路径有问题,结合小说网页html,麻烦懂绝对路径的指点下

Jason990420
最佳答案

Wrong url for each chapter, revised as following.

url_base = "https://www.doupo321.com"
urls1 = url_base + i


Demo Code

import requests
from lxml import etree


url = "https://www.doupo321.com/yijianduzun/"  # 小说网址 斗破小说网
url_base = "https://www.doupo321.com"

re = requests.get(url)  # 访问小说网站,发送一个get请求
re.encoding = "utf-8"
html = etree.HTML(re.text)
urs = html.xpath("/html/body/div[1]/div[2]/div[1]/div[3]/div[2]/ul//@href")
shu_name = html.xpath("/html/body/div[1]/div[2]/div[1]/div[1]/div[2]/h1/text()")[0]
Y = 0
print(f"{shu_name}开始下载,共{len(urs)}章")


for i in urs:
    urls1 = url_base + i
    re1 = requests.get(urls1)  # re1章节页面
    re1.encoding = "utf-8"
    html1 = etree.HTML(re1.text)
    内容 = html1.xpath("/html/body/div[1]/div[1]/div[4]//text()")
    neir = ''
    for x in 内容:
        neir = neir + str(x) + "\n"   # str(x) x为待被转换成字符串的参数;\n为分隔每行数据,用于打印多行数据
    """
    with open(shu_name + ".txt", "a", encoding="utf-8") as f:  # 将内容写入[书名]
        f.write(neir)
    """
    print(neir)
    Y = Y + 1
    print(f"第{Y}章下载完成")
    if Y == 1:
        exit()
一剑独尊开始下载,共2842章
上一章
返回目录
下一章

zj_wap2();

笨蛋只需一秒记住斗破小说网,
www.doupo321.com
,如果被/浏览器/转码,阅读体验极差请退出/转码/阅读。

 青城,叶家,祖祠。
...
 大长老冷声道:“这是我们众长老一致的决定。”

第1章下载完成
3周前 评论
ZHY2023CXZ (楼主) 3周前
Jason990420 (作者) 3周前
ZHY2023CXZ (楼主) 3周前
讨论数量: 7

建议你不要纯对url,小说网站一般都会有反爬虫策略,最好用playwright这种框架去爬,这个框架正好有python版本的

3周前 评论
ZHY2023CXZ (楼主) 3周前
  1. url = "https://www.doupo321.com/yijianduzun"
  2. urls1 = url + i.replace("/yijianduzun", "")
  3. 内容 = html1.xpath('/html/body/div[1]/div/div[4]/p/text()')
3周前 评论
Jason990420

Wrong url for each chapter, revised as following.

url_base = "https://www.doupo321.com"
urls1 = url_base + i


Demo Code

import requests
from lxml import etree


url = "https://www.doupo321.com/yijianduzun/"  # 小说网址 斗破小说网
url_base = "https://www.doupo321.com"

re = requests.get(url)  # 访问小说网站,发送一个get请求
re.encoding = "utf-8"
html = etree.HTML(re.text)
urs = html.xpath("/html/body/div[1]/div[2]/div[1]/div[3]/div[2]/ul//@href")
shu_name = html.xpath("/html/body/div[1]/div[2]/div[1]/div[1]/div[2]/h1/text()")[0]
Y = 0
print(f"{shu_name}开始下载,共{len(urs)}章")


for i in urs:
    urls1 = url_base + i
    re1 = requests.get(urls1)  # re1章节页面
    re1.encoding = "utf-8"
    html1 = etree.HTML(re1.text)
    内容 = html1.xpath("/html/body/div[1]/div[1]/div[4]//text()")
    neir = ''
    for x in 内容:
        neir = neir + str(x) + "\n"   # str(x) x为待被转换成字符串的参数;\n为分隔每行数据,用于打印多行数据
    """
    with open(shu_name + ".txt", "a", encoding="utf-8") as f:  # 将内容写入[书名]
        f.write(neir)
    """
    print(neir)
    Y = Y + 1
    print(f"第{Y}章下载完成")
    if Y == 1:
        exit()
一剑独尊开始下载,共2842章
上一章
返回目录
下一章

zj_wap2();

笨蛋只需一秒记住斗破小说网,
www.doupo321.com
,如果被/浏览器/转码,阅读体验极差请退出/转码/阅读。

 青城,叶家,祖祠。
...
 大长老冷声道:“这是我们众长老一致的决定。”

第1章下载完成
3周前 评论
ZHY2023CXZ (楼主) 3周前
Jason990420 (作者) 3周前
ZHY2023CXZ (楼主) 3周前

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!