Laravel 下使用 Guzzle 编写多线程爬虫实战

说明

Guzzle 库是一套强大的 PHP HTTP 请求套件。

本文重点演示如何使用 Guzzle 发起多线程请求。

参考

创建命令

1. 运行命令行创建命令

php artisan make:console MultithreadingRequest --command=test:multithreading-request

2. 注册命令

编辑 app/Console/Kernel.php,在 $commands 数组中增加:

Commands\MultithreadingRequest::class,

3. 测试下命令

修改 app/Console/Commands/MultithreadingRequest.php 文件,在 handle 方法中增加:

$this->info('hello');

输出:

$ php artisan test:multithreading-request
hello

4. 安装 Guzzle

composer require guzzlehttp/guzzle "6.2"

直接贴代码

一份可运行的代码胜过千言万语呀。

下面代码是 app/Console/Commands/MultithreadingRequest.php 里的内容:

<?php namespace App\Console\Commands;

use GuzzleHttp\Client;
use GuzzleHttp\Pool;
use GuzzleHttp\Psr7\Request;
use GuzzleHttp\Exception\ClientException;
use Illuminate\Console\Command;

class MultithreadingRequest extends Command
{
    private $totalPageCount;
    private $counter        = 1;
    private $concurrency    = 7;  // 同时并发抓取

    private $users = ['CycloneAxe', 'appleboy', 'Aufree', 'lifesign',
                        'overtrue', 'zhengjinghua', 'NauxLiu'];

    protected $signature = 'test:multithreading-request';
    protected $description = 'Command description';

    public function __construct()
    {
        parent::__construct();
    }

    public function handle()
    {
        $this->totalPageCount = count($this->users);

        $client = new Client();

        $requests = function ($total) use ($client) {
            foreach ($this->users as $key => $user) {

                $uri = 'https://api.github.com/users/' . $user;
                yield function() use ($client, $uri) {
                    return $client->getAsync($uri);
                };
            }
        };

        $pool = new Pool($client, $requests($this->totalPageCount), [
            'concurrency' => $this->concurrency,
            'fulfilled'   => function ($response, $index){

                $res = json_decode($response->getBody()->getContents());

                $this->info("请求第 $index 个请求,用户 " . $this->users[$index] . " 的 Github ID 为:" .$res->id);

                $this->countedAndCheckEnded();
            },
            'rejected' => function ($reason, $index){
                $this->error("rejected" );
                $this->error("rejected reason: " . $reason );
                $this->countedAndCheckEnded();
            },
        ]);

        // 开始发送请求
        $promise = $pool->promise();
        $promise->wait();
    }

    public function countedAndCheckEnded()
    {
        if ($this->counter < $this->totalPageCount){
            $this->counter++;
            return;
        }
        $this->info("请求结束!");
    }
}

运行结果:

$ php artisan test:multithreading-request
请求第 5 个请求,用户 zhengjinghua 的 Github ID 为:3413430
请求第 6 个请求,用户 NauxLiu 的 Github ID 为:9570112
请求第 0 个请求,用户 CycloneAxe 的 Github ID 为:6268176
请求第 1 个请求,用户 appleboy 的 Github ID 为:21979
请求第 2 个请求,用户 Aufree 的 Github ID 为:5310542
请求第 3 个请求,用户 lifesign 的 Github ID 为:2189610
请求第 4 个请求,用户 overtrue 的 Github ID 为:1472352
请求结束!

注意请求是同时发送过去的,因为 concurrency 并发设置了 7,所以 7 个请求同时发送,只不过接收到返回的时间点不一样。

完。

beers: :beers: :beers:


Practice makes perfect.

本帖已被设为精华帖!
《L03 构架 API 服务器》
你将学到如 RESTFul 设计风格、PostMan 的使用、OAuth 流程,JWT 概念及使用 和 API 开发相关的进阶知识。
《L04 微信小程序从零到发布》
从小程序个人账户申请开始,带你一步步进行开发一个微信小程序,直到提交微信控制台上线发布。
讨论数量: 24
Summer

@lambq 收到,我也发现了,偶尔出现,找个时间定位下问题。

3年前 评论
Summer

@Cooper :smile: 从 timeline 里面找的几个熟悉的,哈哈

3年前 评论

我靠,好东西,刚好需要

2年前 评论

看不懂这里的用法。主要是pool类、还有yield,谁能帮忙解释下。

1年前 评论

5.5版要用
php artisan make:command MultithreadingRequest --command=test:multithreading-request
不能用
php artisan make:console MultithreadingRequest --command=test:multithreading-request

1年前 评论

要是出一个整套的爬虫教程就好了,感觉php在爬虫方面不如python啊

1年前 评论
Littlesqx

不是多线程吧

9个月前 评论

一份可运行的代码胜过千言万语。。。。福利福利收藏收藏

9个月前 评论

这是协程吧,多线程 需要装下 pthreads 扩展。

9个月前 评论

好高级,yield,匿名函数

5个月前 评论
梦之马

我怎么获取所有并发完成之后的结果啊,我发现 fulfilled 是每一个成功的请求后会执行的

5个月前 评论

为啥我最后一步出错了,
rejected reason: GuzzleHttp\Exception\RequestException: cURL error 60: SSL certificate problem: unable to get local issuer certificate (see http://curl.haxx.se/libcurl/c/libcurl-erro....
html) in D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlFactory.php:187
Stack trace:

0 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlFactory.php(150): GuzzleHttp\Handler\CurlFactory::createRejection(Object(GuzzleHttp\Handler\EasyHandle), Array)

1 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlFactory.php(103): GuzzleHttp\Handler\CurlFactory::finishError(Object(GuzzleHttp\Handler\CurlMultiHandler), Object(GuzzleHt

tp\Handler\EasyHandle), Object(GuzzleHttp\Handler\CurlFactory))

2 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlMultiHandler.php(179): GuzzleHttp\Handler\CurlFactory::finish(Object(GuzzleHttp\Handler\CurlMultiHandler), Object(GuzzleHt

tp\Handler\EasyHandle), Object(GuzzleHttp\Handler\CurlFactory))

3 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlMultiHandler.php(108): GuzzleHttp\Handler\CurlMultiHandler->processMessages()

4 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlMultiHandler.php(123): GuzzleHttp\Handler\CurlMultiHandler->tick()

5 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(246): GuzzleHttp\Handler\CurlMultiHandler->execute(true)

6 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(223): GuzzleHttp\Promise\Promise->invokeWaitFn()

7 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(267): GuzzleHttp\Promise\Promise->waitIfPending()

8 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(225): GuzzleHttp\Promise\Promise->invokeWaitList()

9 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(62): GuzzleHttp\Promise\Promise->waitIfPending()

10 D:\laravel_pachong\vendor\guzzlehttp\promises\src\EachPromise.php(101): GuzzleHttp\Promise\Promise->wait()

11 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(246): GuzzleHttp\Promise\EachPromise->GuzzleHttp\Promise{closure}(true)

12 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(223): GuzzleHttp\Promise\Promise->invokeWaitFn()

13 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(62): GuzzleHttp\Promise\Promise->waitIfPending()

14 D:\laravel_pachong\app\Console\Commands\MultithreadingRequest.php(62): GuzzleHttp\Promise\Promise->wait()

15 [internal function]: App\Console\Commands\MultithreadingRequest->handle()

16 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Container\BoundMethod.php(29): call_user_func_array(Array, Array)

17 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Container\BoundMethod.php(87): Illuminate\Container\BoundMethod::Illuminate\Container{closure}()

18 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Container\BoundMethod.php(31): Illuminate\Container\BoundMethod::callBoundMethod(Object(Illuminate\Foundation\Application)

, Array, Object(Closure))

19 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Container\Container.php(549): Illuminate\Container\BoundMethod::call(Object(Illuminate\Foundation\Application), Array, Arr

ay, NULL)

20 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Console\Command.php(183): Illuminate\Container\Container->call(Array)

21 D:\laravel_pachong\vendor\symfony\console\Command\Command.php(255): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Illuminate\Console\

OutputStyle))

22 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Console\Command.php(170): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput)

, Object(Illuminate\Console\OutputStyle))

23 D:\laravel_pachong\vendor\symfony\console\Application.php(953): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\O

utput\ConsoleOutput))

24 D:\laravel_pachong\vendor\symfony\console\Application.php(248): Symfony\Component\Console\Application->doRunCommand(Object(App\Console\Commands\MultithreadingRequest), Object(Symfon

y\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))

25 D:\laravel_pachong\vendor\symfony\console\Application.php(148): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Compon

ent\Console\Output\ConsoleOutput))

26 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Console\Application.php(88): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput),

Object(Symfony\Component\Console\Output\ConsoleOutput))

27 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Foundation\Console\Kernel.php(121): Illuminate\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput),

Object(Symfony\Component\Console\Output\ConsoleOutput))

28 D:\laravel_pachong\artisan(37): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput

))

29 {main}

璇锋眰缁撴潫锛?

4个月前 评论

@vio_xiaohei 禁用证书验证就可以了

public function handle()
    {
        $this->totalPageCount = count($this->users);

        $client = new Client([
            'verify' => false
        ]);

        $requests = function ($total) use ($client) {
            foreach ($this->users as $key => $user) {

                $uri = 'https://api.github.com/users/' . $user;
                yield function() use ($client, $uri) {
                    return $client->getAsync($uri);
                };
            }
        };
        ........
3个月前 评论

请勿发布不友善或者负能量的内容。与人为善,比聪明更重要!

社区文档:

将托管在 packagist.org 和 github.com 的扩展包使用国内 CDN 加速
GitHub Laravel 扩展包 TOP 250
速查表方便快速查询框架功能,支持手机访问,支持中英文版本
Laravel 中文文档,由社区用户翻译和维护,将会保持一直更新
此文档的目的,就是为了提高技术团队的凝聚力、一致性和生产效率。
开发环境的部署,开发者工具的选择,适用于 Mac 和 Windows。
浓缩过后的精华
Laravel Nova 后台管理面板文档的中文翻译
Lumen 中文文档,由社区用户翻译和维护,将会保持一直更新
Laravel 下知名扩展包 Dingo API 的中文文档,Laravel API 开发必知必会