希望把已经用正则表达式获取的 url 变量里的数据再次用正则表达式获取,卡在了再次获取上,请求帮助。
我想爬取wallpaper的壁纸,wallpaper的组成是一张大网页内的图片点进去是单个的大图。
我想要获取清楚一点的大图,我的正则表达式在获得小图时成功,但是进一步在原来获取的链接的基础上再次获得大图就出现了报错
requests.exceptions.MissingSchema: Invalid URL ‘urls’: No schema supplied. Perhaps you meant urls?
在查阅资料后无解,请求大神指点。
import requests
import re
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36 Edg/81.0.416.64"
}
response = requests.get('https://wallhaven.cc/search?q=id:5&ref=fp',headers=header)
html = response.text
urls = re.findall('<a class="preview" href="(.*?)" target="_blank" >',html)
response1 = requests.get('urls',headers=header)
print(urls)
urls = response1.text
urls1 = re.findall('<img id="wallpaper" src="(.*?)" alt=".*?" data-wallpaper-id="eozq2o" data-wallpaper-width="1253" data-wallpaper-height="900" crossOrigin="anonymous" />',urls)
print(urls1)
#报错内容:requests.exceptions.MissingSchema: Invalid URL 'urls': No schema supplied. Perhaps you meant http://urls?
完整的爬取代码仅供参考:
注意:
pip install lxml
来下载xpath模块(xpath在爬虫中很有用的,不推荐正则爬虫)