[求助] Pixiv爬虫:通过requests爬取包含作者全部作品的ajax请求链接的时候,返回的数据不完整
尝试爬取Pixiv中一个作者的全部作品,在详情页抓取到一个ajax的数据包,它的url在浏览器打开是可以返回所有的数据,但是通过requests请求时,只能访问部分数据。
代码如下
import requests
http_address = '127.0.0.1:1088'
proxies = {
"http": http_address,
"https": http_address
}
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0",
"referer": "https://www.pixiv.net/users/432332/illustrations"
}
url = "https://www.pixiv.net/ajax/user/432332/profile/all?lang=zh"
resp = requests.get(url=url, proxies=proxies)
print(resp.text)
拿到的数据如下:{"error":false,"message":"","body":{"illusts":{"56725501":null,"44981873":null},...
浏览器请求的数据如下:{"error":false,"message":"","body":{"illusts":{"71282994":null,"67511354":null,"64667165":null,"62930956":null,"56725501":null,"54229008":null,"51870535":null,"44981873":null},...
缺失了一部分的作品id
请问大佬们有什么好的处理办法吗?
推荐文章: