python爬取小说怎么爬取

问答 / 12 / 2 / 创建于 2年前

请问一下怎么爬取一下这个一本小说的全部内容啊，我不是很能理解，for i 循环那边不是很明白怎么搞，小说的网址是www.favzoom.com/wushibuxiu/
import requests
from lxml import etree
import time
url = ‘www.favzoom.com/index/wushibuxiu/'
head = {
‘Referer’: ‘www.favzoom.com/index/wushibuxiu/',
‘users-agent’:’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.39’
}
response = requests.get(url,headers = head,verify = False)

print(response.text)

html = etree.HTML(response.text)

#[0]列表的第0位
novel_name = html.xpath(‘/html/body/div[1]/div/div[2]/div/h1’)[0]

print(novel_name)

novel_directory = html.xpath(‘/html/body/div[2]/div[1]’)

print(novel_directory)

#访问太快易报错，设置休眠时间
time.sleep(5)

for i in novel_directory:
com_url = ‘hwww.favzoom.com/wushibuxiu/143863.html'+i

# print(com_url)

response2 = requests.get(com_url,headers=head)
html2 = etree.HTML(response2.text)
novel_chapter = html2.xpath(‘//*[@id=”ss-reader-main”]/div[2]/h1’)[0]

# print(novel_chapter)

novel_content = ‘\n’.join(html2.xpath(‘//*[@id=”article”]’))

# print(novel_content)

‘w’每次写入文件时会把上一次文件中内容清空，’a’追加内容，不会覆盖前面的内容

with open(r”D:\浏览器下载\小说” + novel_chapter + “.txt”, “w”, encoding=”utf-8”) as file:
file.write(novel_chapter+’\n’+novel_content+’\n’)
file.close()
print(“下载成功”+novel_chapter)