Laravel 下使用 Guzzle 编写多线程爬虫实战

说明

Guzzle 库是一套强大的 PHP HTTP 请求套件。

本文重点演示如何使用 Guzzle 发起多线程请求。

参考

创建命令

1. 运行命令行创建命令

php artisan make:console MultithreadingRequest --command=test:multithreading-request

2. 注册命令

编辑 app/Console/Kernel.php,在 $commands 数组中增加:

Commands\MultithreadingRequest::class,

3. 测试下命令

修改 app/Console/Commands/MultithreadingRequest.php 文件,在 handle 方法中增加:

$this->info('hello');

输出:

$ php artisan test:multithreading-request
hello

4. 安装 Guzzle

composer require guzzlehttp/guzzle "6.2"

直接贴代码

一份可运行的代码胜过千言万语呀。

下面代码是 app/Console/Commands/MultithreadingRequest.php 里的内容:

<?php namespace App\Console\Commands;

use GuzzleHttp\Client;
use GuzzleHttp\Pool;
use GuzzleHttp\Psr7\Request;
use GuzzleHttp\Exception\ClientException;
use Illuminate\Console\Command;

class MultithreadingRequest extends Command
{
    private $totalPageCount;
    private $counter        = 1;
    private $concurrency    = 7;  // 同时并发抓取

    private $users = ['CycloneAxe', 'appleboy', 'Aufree', 'lifesign',
                        'overtrue', 'zhengjinghua', 'NauxLiu'];

    protected $signature = 'test:multithreading-request';
    protected $description = 'Command description';

    public function __construct()
    {
        parent::__construct();
    }

    public function handle()
    {
        $this->totalPageCount = count($this->users);

        $client = new Client();

        $requests = function ($total) use ($client) {
            foreach ($this->users as $key => $user) {

                $uri = 'https://api.github.com/users/' . $user;
                yield function() use ($client, $uri) {
                    return $client->getAsync($uri);
                };
            }
        };

        $pool = new Pool($client, $requests($this->totalPageCount), [
            'concurrency' => $this->concurrency,
            'fulfilled'   => function ($response, $index){

                $res = json_decode($response->getBody()->getContents());

                $this->info("请求第 $index 个请求,用户 " . $this->users[$index] . " 的 Github ID 为:" .$res->id);

                $this->countedAndCheckEnded();
            },
            'rejected' => function ($reason, $index){
                $this->error("rejected" );
                $this->error("rejected reason: " . $reason );
                $this->countedAndCheckEnded();
            },
        ]);

        // 开始发送请求
        $promise = $pool->promise();
        $promise->wait();
    }

    public function countedAndCheckEnded()
    {
        if ($this->counter < $this->totalPageCount){
            $this->counter++;
            return;
        }
        $this->info("请求结束!");
    }
}

运行结果:

$ php artisan test:multithreading-request
请求第 5 个请求,用户 zhengjinghua 的 Github ID 为:3413430
请求第 6 个请求,用户 NauxLiu 的 Github ID 为:9570112
请求第 0 个请求,用户 CycloneAxe 的 Github ID 为:6268176
请求第 1 个请求,用户 appleboy 的 Github ID 为:21979
请求第 2 个请求,用户 Aufree 的 Github ID 为:5310542
请求第 3 个请求,用户 lifesign 的 Github ID 为:2189610
请求第 4 个请求,用户 overtrue 的 Github ID 为:1472352
请求结束!

注意请求是同时发送过去的,因为 concurrency 并发设置了 7,所以 7 个请求同时发送,只不过接收到返回的时间点不一样。

完。

beers: :beers: :beers:

摈弃世俗浮躁,追求技术精湛
本帖已被设为精华帖!
Summer
《L05 电商实战》
从零开发一个电商项目,功能包括电商后台、商品 & SKU 管理、购物车、订单管理、支付宝支付、微信支付、订单退款流程、优惠券等
《L03 构架 API 服务器》
你将学到如 RESTFul 设计风格、PostMan 的使用、OAuth 流程,JWT 概念及使用 和 API 开发相关的进阶知识。
讨论数量: 17

5.5版要用
php artisan make:command MultithreadingRequest --command=test:multithreading-request
不能用
php artisan make:console MultithreadingRequest --command=test:multithreading-request

6年前 评论

要是出一个整套的爬虫教程就好了,感觉php在爬虫方面不如python啊

6年前 评论
KaneYoung 1年前

这是协程吧,多线程 需要装下 pthreads 扩展。
--------------2019-06-27-------------------
curl 底层确实是多线程,是自己肤浅了。

5年前 评论

@vio_xiaohei 禁用证书验证就可以了

public function handle()
    {
        $this->totalPageCount = count($this->users);

        $client = new Client([
            'verify' => false
        ]);

        $requests = function ($total) use ($client) {
            foreach ($this->users as $key => $user) {

                $uri = 'https://api.github.com/users/' . $user;
                yield function() use ($client, $uri) {
                    return $client->getAsync($uri);
                };
            }
        };
        ........
5年前 评论

不是多线程吧

5年前 评论
Summer

@lambq 收到,我也发现了,偶尔出现,找个时间定位下问题。

7年前 评论
Summer

@Cooper :smile: 从 timeline 里面找的几个熟悉的,哈哈

7年前 评论

我靠,好东西,刚好需要

7年前 评论

看不懂这里的用法。主要是pool类、还有yield,谁能帮忙解释下。

6年前 评论

一份可运行的代码胜过千言万语。。。。福利福利收藏收藏

5年前 评论

好高级,yield,匿名函数

5年前 评论
梦之马

我怎么获取所有并发完成之后的结果啊,我发现 fulfilled 是每一个成功的请求后会执行的

5年前 评论

为啥我最后一步出错了,
rejected reason: GuzzleHttp\Exception\RequestException: cURL error 60: SSL certificate problem: unable to get local issuer certificate (see http://curl.haxx.se/libcurl/c/libcurl-erro....
html) in D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlFactory.php:187
Stack trace:

0 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlFactory.php(150): GuzzleHttp\Handler\CurlFactory::createRejection(Object(GuzzleHttp\Handler\EasyHandle), Array)

1 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlFactory.php(103): GuzzleHttp\Handler\CurlFactory::finishError(Object(GuzzleHttp\Handler\CurlMultiHandler), Object(GuzzleHt

tp\Handler\EasyHandle), Object(GuzzleHttp\Handler\CurlFactory))

2 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlMultiHandler.php(179): GuzzleHttp\Handler\CurlFactory::finish(Object(GuzzleHttp\Handler\CurlMultiHandler), Object(GuzzleHt

tp\Handler\EasyHandle), Object(GuzzleHttp\Handler\CurlFactory))

3 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlMultiHandler.php(108): GuzzleHttp\Handler\CurlMultiHandler->processMessages()

4 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlMultiHandler.php(123): GuzzleHttp\Handler\CurlMultiHandler->tick()

5 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(246): GuzzleHttp\Handler\CurlMultiHandler->execute(true)

6 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(223): GuzzleHttp\Promise\Promise->invokeWaitFn()

7 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(267): GuzzleHttp\Promise\Promise->waitIfPending()

8 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(225): GuzzleHttp\Promise\Promise->invokeWaitList()

9 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(62): GuzzleHttp\Promise\Promise->waitIfPending()

10 D:\laravel_pachong\vendor\guzzlehttp\promises\src\EachPromise.php(101): GuzzleHttp\Promise\Promise->wait()

11 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(246): GuzzleHttp\Promise\EachPromise->GuzzleHttp\Promise{closure}(true)

12 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(223): GuzzleHttp\Promise\Promise->invokeWaitFn()

13 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(62): GuzzleHttp\Promise\Promise->waitIfPending()

14 D:\laravel_pachong\app\Console\Commands\MultithreadingRequest.php(62): GuzzleHttp\Promise\Promise->wait()

15 [internal function]: App\Console\Commands\MultithreadingRequest->handle()

16 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Container\BoundMethod.php(29): call_user_func_array(Array, Array)

17 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Container\BoundMethod.php(87): Illuminate\Container\BoundMethod::Illuminate\Container{closure}()

18 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Container\BoundMethod.php(31): Illuminate\Container\BoundMethod::callBoundMethod(Object(Illuminate\Foundation\Application)

, Array, Object(Closure))

19 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Container\Container.php(549): Illuminate\Container\BoundMethod::call(Object(Illuminate\Foundation\Application), Array, Arr

ay, NULL)

20 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Console\Command.php(183): Illuminate\Container\Container->call(Array)

21 D:\laravel_pachong\vendor\symfony\console\Command\Command.php(255): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Illuminate\Console\

OutputStyle))

22 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Console\Command.php(170): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput)

, Object(Illuminate\Console\OutputStyle))

23 D:\laravel_pachong\vendor\symfony\console\Application.php(953): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\O

utput\ConsoleOutput))

24 D:\laravel_pachong\vendor\symfony\console\Application.php(248): Symfony\Component\Console\Application->doRunCommand(Object(App\Console\Commands\MultithreadingRequest), Object(Symfon

y\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))

25 D:\laravel_pachong\vendor\symfony\console\Application.php(148): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Compon

ent\Console\Output\ConsoleOutput))

26 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Console\Application.php(88): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput),

Object(Symfony\Component\Console\Output\ConsoleOutput))

27 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Foundation\Console\Kernel.php(121): Illuminate\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput),

Object(Symfony\Component\Console\Output\ConsoleOutput))

28 D:\laravel_pachong\artisan(37): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput

))

29 {main}

璇锋眰缁撴潫锛?

5年前 评论

$requests = function ($total) use ($client)这个$total没用到啊,解释一下

4年前 评论

某一个请求超时,怎么让它走 rejected 这个函数?

1年前 评论
test2018 3个月前

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容,与人为善,比聪明更重要!