爬取某小说网站,代码报错,IndexError: list index out of range

import requests
from lxml import etree
# 1、获取要爬的urls
urls = [
    'https://www.777zw.net/book/5d/37eefc2f6e/{}.html'.format(i) for i in range(1, 148)]
# print(urls) 正确
# 2、保存小说地址
#
# 3、获取小说内容
def get_text(url):
    r = requests.get(url)
    r.encoding = 'utf-8'
    html = etree.HTML(r.text)
    title = html.xpath(
        "/html/body/div[4]/div/div/div[1]/a[2]/text()")
    text = html.xpath(
        "/html/body/div[4]/div/div/div[2]/h1//text()")  # 读取第一章内容"
    with open(title[0] + ".doc", encoding="utf-8") as f:
        for i in text:
            f.write(i)

if __name__ == '__main__':
    for url in urls:
        get_text(url)

运行后,显示

讨论数量: 11
Jason990420

It looks like that you got a wrong x-path.

title = html.xpath("/html/body/div[3]/div[1]/div/div/div[2]/div[1]/h1/text()")
print(repr(title))
[]
>>> title = []
>>> title[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range


Wrong xpath for the title

import requests
from lxml import etree

url = 'https://www.777zw.net/book/5d/37eefc2f6e/1.html'
r = requests.get(url)
r.encoding = 'utf-8'
html = etree.HTML(r.text)
# title = html.xpath("/html/body/div[3]/div[1]/div/div/div[2]/div[1]/h1/text()")

title1 = html.xpath("/html/body/div[4]/div/div/div[1]/a[2]/text()")
print(title1[0])
title2 = html.xpath("/html/body/div[4]/div/div/div[2]/h1/text()")
print(title2[0])
新覆雨翻云
第1章 楞严:达先天境中上段
11个月前 评论
ZHY2023CXZ (楼主) 11个月前
Jason990420 (作者) 11个月前
Jason990420 (作者) 11个月前
ZHY2023CXZ (楼主) 11个月前
Jason990420 (作者) 11个月前
ZHY2023CXZ (楼主) 11个月前
Jason990420 (作者) 11个月前
ZHY2023CXZ (楼主) 11个月前

数组索引越界,检查下数据

11个月前 评论
ZHY2023CXZ (楼主) 11个月前

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!