python 爬虫 blessing skin 的简单爬取

Coolest 的个人博客 / 0 / 3 / 创建于 5年前 / 更新于 4年前

用requests来爬取mc著名皮肤网站blessing skin

blessing skin网站介绍：blessing skin网址为skin.prinzeugen.net/ 是深受mc玩家喜爱的皮肤网站。它有着比之前我们爬取过的little skin还多的皮肤。

-blessing skin

需要准备的东西

python解释器
爬虫库requests

爬取blessing skin的思路：
通过skin.prinzeugen.net/skinlib/show/ + 皮肤号码的形式获得到皮肤的网址。在网页源代码里获得皮肤的详细介绍。然后通过skin.prinzeugen.net/raw/ + 皮肤号码.png的形式获得到下载的网址，请求并保存到本地。

程序的使用：
让用户输入皮肤号码，程序返回出皮肤的详细介绍，并且询问用户是否要下载此皮肤。若不存在此皮肤号码，将重新让用户输入皮肤号码。

import requests
import re

首先，导入re和requests。re模块主要是帮我们来寻找出皮肤的详细介绍，而requests就是我们这个爬虫项目的主要爬虫框架。

如果要获取图片链接，我们就需要让用户输入一个号码。之后，我们就可以拼接链接了！

str_id = input("请输入要查看的皮肤号码：")
url = 'https://skin.prinzeugen.net/skinlib/show/' + str_id + '.png'

别忘了加上.png哟！
接着再请求拼接之后的链接。

image = requests.get(url).content

我们已经以图片的方式请求了url，这个时候，我们就可以用with关键字保存到本地了！不过在这之前，我们还需要一个保存的名字，我们就拿1.png来做名字。

with open(保存的路径+'1.png','wb') as file:
    file.write(image)

打开保存的目录，成功了！
本人还在空闲时间扩展了一下代码，大家可以参考一下。

import requests
import re
import os
import time
print("小提示：ctrl + c来快捷退出程序")
def catch():
    try:
        print("指定皮肤号查看下载模式请输入1，输入其它键默认为批量下载模式。")
        word = input(">>>")
        if word == '1':
            while True:
                str_id = input("请输入要查看的皮肤号码：")
                while str_id.isdigit() == False:
                    str_id = input("请输入要查看的皮肤号码：")
                url = 'https://skin.prinzeugen.net/skinlib/show/' + str_id
                text = requests.get(url).text
                check = re.findall('<p>(.*?)</p>',text)
                if check[0] == 'Details: The requested texture was already deleted.' or check[0] == 'Details: The requested texture is private and only visible to the uploader and admins.':
                    print("无法访问此皮肤！请重新输入一个皮肤号码！")
                    continue
                skin_name = re.findall('<title>(.*?) - Blessing Skin</title>',text)[0]
                likes = re.findall('<span id="likes">(.*?)</span>',text)[0]
                model = re.findall('<span id="model">(.*?)</span>',text)[0]
                size = re.findall('<td>(.*?)</td>',text)[3]
                print('''\n--皮肤内容--
名称：%s
适用模型：%s
文件大小：%s
收藏：%s\n'''%(skin_name,model,size,likes))
                choose = input("是否要下载此皮肤？(y/n)：").lower()
                while choose != 'y' and choose != 'n':
                    choose = input("是否要下载此皮肤？(y/n)：").lower()
                if choose == 'y':
                    path = input("皮肤保存路径（请使用斜杠“/”）:")
                    while '/' not in path:
                        print("请使用斜杠“/”！")
                        path = input("皮肤保存路径：")
                    check = os.path.exists(path)
                    while check == False:
                        print("目录不存在！")
                        path = input("皮肤保存路径（请使用斜杠“/”）:")
                        while '/' not in path:
                            print("请使用斜杠“/”！")
                            path = input("皮肤保存路径：")
                        check = os.path.exists(path)   
                    skn_url = 'https://skin.prinzeugen.net/raw/' + str_id + '.png'
                    image = requests.get(skn_url).content
                    img_name = skn_url.split('/')[4]
                    print("下载中...")
                    with open(path + '/' + img_name,'wb') as file:
                        file.write(image)
                        success = True
                    if success:
                        print("下载成功！")
                    else:
                        print("下载失败！")
                catch()
        else:
            print("注意：如果在批量下载的过程当中遇到不存在的皮肤，将不会下载！")
            id1 = input("请输入批量下载的开头皮肤号码：")
            while id1.isdigit() == False:
                id1 = input("请输入批量下载的开头皮肤号码：")
            id2 = input("请输入批量下载的结尾皮肤号码：")
            while id2.isdigit() == False:
                id2 = input("请输入批量下载的结尾皮肤号码：")
            check = False
            while check == False:
                path = input("皮肤保存路径（请使用斜杠“/”）:")
                while '/' not in path:
                    print("请使用斜杠“/”！")
                    path = input("皮肤保存路径：")
                check = os.path.exists(path)
                if check == False:
                    print("目录不存在！")
            id1 = int(id1)
            id2 = int(id2)
            print("下载中...")
            for i in range(id1,id2+1):
                url = 'https://skin.prinzeugen.net/skinlib/show/' + str(i)
                text = requests.get(url).text
                check = re.findall('<p>(.*?)</p>',text)
                if check[0] == 'Details: The requested texture was already deleted.' or check[0] == 'Details: The requested texture is private and only visible to the uploader and admins.':
                    continue
                img_url = 'https://skin.prinzeugen.net/raw/' + str(i) + '.png'
                image = requests.get(img_url).content
                img_name = img_url.split('/')[4]
                with open(path + '/' + img_name,'wb') as file:
                    file.write(image)

            print("下载完成！")
            catch()       
    except KeyboardInterrupt:
        print("您退出了程序！")
        time.sleep(3.5)
        exit()
    except requests.exceptions.ConnectionError:
        print("网络异常！")
        time.sleep(3.5)
        exit()

catch()