请教一个关于批量下载的问题

刚学习python,遇到一个关于批量下载的问题,本地文档a.txt里有如下内容:

http://www.aaa.com/1.txt
http://www.aaa.com/2.txt
http://www.aaa.com/3.txt
http://www.aaa.com/4.txt
http://www.aaa.com/5.txt
...
http://www.aaa.com/999.txt
http://www.aaa.com/1000.txt

现在自己已经用代码能实现批量下载了,但是单线程,单进程的情况下速度很慢(文件很小,io等待时间比较长)如果想加速下载,不想排队一个个的下载,用什么样的方式比较好

angenet

17 声望

暂无个人描述~

0 人点赞

推荐文章：

更多推荐...

置顶

[进度 100.00%] Python Masonite 4.0 中文翻译召集（Python 中的类 Laravel 框架） 15 / 19 |

公告

Python Masonite 框架中文翻译召集（Python 中的类 Laravel 框架） 24 / 25 |

博客

收集了一些各大网站 python 的登陆方式,希望对学习 python 的小白，和想写爬虫的你们有所帮助,,本项目用于研究和分享各大网站的模拟登陆方式 17 / 5 |

翻译

Python 3.7 的一些新特性 10 / 2 |

链接

快速掌握一个语言最常用的 50% 11 / 1 |

翻译

使用 Python 一步步搭建自己的区块链 22 / 1 |

Jason990420

1.9k 声望 / 個人 @ 個人

最佳答案

Something like this

import time
import threading
import requests

def download(url, index):
    response = requests.get(url)
    # print(f'{index:0>2d}:{url} downlaoded.')

urls = [f'https://learnku.com/' for i in range(100)]

now = time.time()

threads = []
for i, url in enumerate(urls):
    thread = threading.Thread(target=download, args=(url, i), daemon=True)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

seconds = time.time() - now

print(f'All URLs downloaded in {seconds:.2f} seconds')

Note:

The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests in a given amount of time ("rate limiting"). A Retry-After header might be included to this response indicating how long to wait before making a new request.
Should try to download it again if failed.

2年前评论

讨论数量: 4

Tacks

课程读者 503 声望

放入队列中，多进程消费，每个进程中还是单线程执行下载；
thread 多线程下载；
如果是curl，可以用 libcurl multi 并行请求下载，也相当于多线程；

...应该差不多这个思路

2年前评论

angenet

17 声望

还是有点看不懂,大佬,能帮我代码化一下吗?万分感谢

2年前评论

Jason990420

1.9k 声望 / 個人 @ 個人

Something like this

import time
import threading
import requests

def download(url, index):
    response = requests.get(url)
    # print(f'{index:0>2d}:{url} downlaoded.')

urls = [f'https://learnku.com/' for i in range(100)]

now = time.time()

threads = []
for i, url in enumerate(urls):
    thread = threading.Thread(target=download, args=(url, i), daemon=True)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

seconds = time.time() - now

print(f'All URLs downloaded in {seconds:.2f} seconds')

Note:

The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests in a given amount of time ("rate limiting"). A Retry-After header might be included to this response indicating how long to wait before making a new request.
Should try to download it again if failed.

2年前评论

JohnnyWu

见习助教 15 声望

可以用协程，更轻便一点

使用协程就需要把同步的操作都换成异步处理，引用aiohttp、aiofiles

import aiohttp
import asyncio
import aiofiles


async def download(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            await resp.text()
            # to do something
            async with aiofiles.open('a.txt', mode='w', encoding='utf-8') as f:
                await f.write('write something')


async def main():
    urls = ['url1', 'url2', 'url3']
    task = []
    for url in urls:
        task.append(asyncio.create_task(download(url)))
    await asyncio.wait(task)

if __name__ == '__main__':
    asyncio.run(main())

2年前评论

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容，与人为善，比聪明更重要！

帮助

请教一个关于批量下载的问题

推荐文章：

可以用协程，更轻便一点

社区赞助商

关于 LearnKu

资源推荐

服务提供商

其他信息

请教一个关于批量下载的问题

推荐文章：

可以用协程，更轻便一点

社区赞助商

关于 LearnKu

资源推荐

服务提供商

其他信息

请登录