Laravel 下使用 Guzzle 编写多线程爬虫实战
说明
Guzzle 库是一套强大的 PHP HTTP 请求套件。
本文重点演示如何使用 Guzzle 发起多线程请求。
参考
创建命令
1. 运行命令行创建命令
php artisan make:console MultithreadingRequest --command=test:multithreading-request
2. 注册命令
编辑 app/Console/Kernel.php
,在 $commands 数组中增加:
Commands\MultithreadingRequest::class,
3. 测试下命令
修改 app/Console/Commands/MultithreadingRequest.php
文件,在 handle
方法中增加:
$this->info('hello');
输出:
$ php artisan test:multithreading-request
hello
4. 安装 Guzzle
composer require guzzlehttp/guzzle "6.2"
直接贴代码
一份可运行的代码胜过千言万语呀。
下面代码是 app/Console/Commands/MultithreadingRequest.php
里的内容:
<?php namespace App\Console\Commands;
use GuzzleHttp\Client;
use GuzzleHttp\Pool;
use GuzzleHttp\Psr7\Request;
use GuzzleHttp\Exception\ClientException;
use Illuminate\Console\Command;
class MultithreadingRequest extends Command
{
private $totalPageCount;
private $counter = 1;
private $concurrency = 7; // 同时并发抓取
private $users = ['CycloneAxe', 'appleboy', 'Aufree', 'lifesign',
'overtrue', 'zhengjinghua', 'NauxLiu'];
protected $signature = 'test:multithreading-request';
protected $description = 'Command description';
public function __construct()
{
parent::__construct();
}
public function handle()
{
$this->totalPageCount = count($this->users);
$client = new Client();
$requests = function ($total) use ($client) {
foreach ($this->users as $key => $user) {
$uri = 'https://api.github.com/users/' . $user;
yield function() use ($client, $uri) {
return $client->getAsync($uri);
};
}
};
$pool = new Pool($client, $requests($this->totalPageCount), [
'concurrency' => $this->concurrency,
'fulfilled' => function ($response, $index){
$res = json_decode($response->getBody()->getContents());
$this->info("请求第 $index 个请求,用户 " . $this->users[$index] . " 的 Github ID 为:" .$res->id);
$this->countedAndCheckEnded();
},
'rejected' => function ($reason, $index){
$this->error("rejected" );
$this->error("rejected reason: " . $reason );
$this->countedAndCheckEnded();
},
]);
// 开始发送请求
$promise = $pool->promise();
$promise->wait();
}
public function countedAndCheckEnded()
{
if ($this->counter < $this->totalPageCount){
$this->counter++;
return;
}
$this->info("请求结束!");
}
}
运行结果:
$ php artisan test:multithreading-request
请求第 5 个请求,用户 zhengjinghua 的 Github ID 为:3413430
请求第 6 个请求,用户 NauxLiu 的 Github ID 为:9570112
请求第 0 个请求,用户 CycloneAxe 的 Github ID 为:6268176
请求第 1 个请求,用户 appleboy 的 Github ID 为:21979
请求第 2 个请求,用户 Aufree 的 Github ID 为:5310542
请求第 3 个请求,用户 lifesign 的 Github ID 为:2189610
请求第 4 个请求,用户 overtrue 的 Github ID 为:1472352
请求结束!
注意请求是同时发送过去的,因为 concurrency
并发设置了 7,所以 7 个请求同时发送,只不过接收到返回的时间点不一样。
完。
beers: :beers: :beers:
本帖已被设为精华帖!
@lambq 收到,我也发现了,偶尔出现,找个时间定位下问题。
@Cooper :smile: 从 timeline 里面找的几个熟悉的,哈哈
我靠,好东西,刚好需要
看不懂这里的用法。主要是pool类、还有yield,谁能帮忙解释下。
5.5版要用
php artisan make:command MultithreadingRequest --command=test:multithreading-request
不能用
php artisan make:console MultithreadingRequest --command=test:multithreading-request
要是出一个整套的爬虫教程就好了,感觉php在爬虫方面不如python啊
不是多线程吧
一份可运行的代码胜过千言万语。。。。福利福利收藏收藏
这是协程吧,多线程 需要装下 pthreads 扩展。
--------------2019-06-27-------------------
curl 底层确实是多线程,是自己肤浅了。
好高级,yield,匿名函数
我怎么获取所有并发完成之后的结果啊,我发现 fulfilled 是每一个成功的请求后会执行的
为啥我最后一步出错了,
rejected reason: GuzzleHttp\Exception\RequestException: cURL error 60: SSL certificate problem: unable to get local issuer certificate (see http://curl.haxx.se/libcurl/c/libcurl-erro....
html) in D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlFactory.php:187
Stack trace:
0 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlFactory.php(150): GuzzleHttp\Handler\CurlFactory::createRejection(Object(GuzzleHttp\Handler\EasyHandle), Array)
1 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlFactory.php(103): GuzzleHttp\Handler\CurlFactory::finishError(Object(GuzzleHttp\Handler\CurlMultiHandler), Object(GuzzleHt
tp\Handler\EasyHandle), Object(GuzzleHttp\Handler\CurlFactory))
2 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlMultiHandler.php(179): GuzzleHttp\Handler\CurlFactory::finish(Object(GuzzleHttp\Handler\CurlMultiHandler), Object(GuzzleHt
tp\Handler\EasyHandle), Object(GuzzleHttp\Handler\CurlFactory))
3 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlMultiHandler.php(108): GuzzleHttp\Handler\CurlMultiHandler->processMessages()
4 D:\laravel_pachong\vendor\guzzlehttp\guzzle\src\Handler\CurlMultiHandler.php(123): GuzzleHttp\Handler\CurlMultiHandler->tick()
5 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(246): GuzzleHttp\Handler\CurlMultiHandler->execute(true)
6 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(223): GuzzleHttp\Promise\Promise->invokeWaitFn()
7 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(267): GuzzleHttp\Promise\Promise->waitIfPending()
8 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(225): GuzzleHttp\Promise\Promise->invokeWaitList()
9 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(62): GuzzleHttp\Promise\Promise->waitIfPending()
10 D:\laravel_pachong\vendor\guzzlehttp\promises\src\EachPromise.php(101): GuzzleHttp\Promise\Promise->wait()
11 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(246): GuzzleHttp\Promise\EachPromise->GuzzleHttp\Promise{closure}(true)
12 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(223): GuzzleHttp\Promise\Promise->invokeWaitFn()
13 D:\laravel_pachong\vendor\guzzlehttp\promises\src\Promise.php(62): GuzzleHttp\Promise\Promise->waitIfPending()
14 D:\laravel_pachong\app\Console\Commands\MultithreadingRequest.php(62): GuzzleHttp\Promise\Promise->wait()
15 [internal function]: App\Console\Commands\MultithreadingRequest->handle()
16 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Container\BoundMethod.php(29): call_user_func_array(Array, Array)
17 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Container\BoundMethod.php(87): Illuminate\Container\BoundMethod::Illuminate\Container{closure}()
18 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Container\BoundMethod.php(31): Illuminate\Container\BoundMethod::callBoundMethod(Object(Illuminate\Foundation\Application)
, Array, Object(Closure))
19 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Container\Container.php(549): Illuminate\Container\BoundMethod::call(Object(Illuminate\Foundation\Application), Array, Arr
ay, NULL)
20 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Console\Command.php(183): Illuminate\Container\Container->call(Array)
21 D:\laravel_pachong\vendor\symfony\console\Command\Command.php(255): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Illuminate\Console\
OutputStyle))
22 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Console\Command.php(170): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput)
, Object(Illuminate\Console\OutputStyle))
23 D:\laravel_pachong\vendor\symfony\console\Application.php(953): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\O
utput\ConsoleOutput))
24 D:\laravel_pachong\vendor\symfony\console\Application.php(248): Symfony\Component\Console\Application->doRunCommand(Object(App\Console\Commands\MultithreadingRequest), Object(Symfon
y\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
25 D:\laravel_pachong\vendor\symfony\console\Application.php(148): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Compon
ent\Console\Output\ConsoleOutput))
26 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Console\Application.php(88): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput),
Object(Symfony\Component\Console\Output\ConsoleOutput))
27 D:\laravel_pachong\vendor\laravel\framework\src\Illuminate\Foundation\Console\Kernel.php(121): Illuminate\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput),
Object(Symfony\Component\Console\Output\ConsoleOutput))
28 D:\laravel_pachong\artisan(37): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput
))
29 {main}
璇锋眰缁撴潫锛?
@vio_xiaohei 禁用证书验证就可以了
$requests = function ($total) use ($client)这个$total没用到啊,解释一下
某一个请求超时,怎么让它走 rejected 这个函数?