抖音直播弹幕抓取PHP版本
最近需要抓取抖音直播的弹幕消息,网上找了一下基本上都是 python 的版本,虽然用起来没有太大的影响,但本着 PHP 是世界上最好的语言 就写了一个简单的脚本方便使用。以下是主要代码:
首先通过直播链接获取 ttwid
$client = new Client(); $response = $client->get($liveUrl, [ 'headers' => [ 'accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'User-Agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36', 'cookie' => '__ac_nonce=0638733a400869171be51', ] ]); $cookieString = $response->getHeader('Set-Cookie'); $cookieArray = explode(';', $cookieString[0]); $ttwidStr = $cookieArray[0]; return substr($ttwidStr, strpos($ttwidStr, '=') + 1);
在通过该链接解析出roomid
$html = $response->getBody()->getContents(); $pattern = '/roomId\\\\":\\\\"(\d+)\\\\"/'; preg_match($pattern, $html, $matches); return $matches[1];
拼接出websocket 连接和请求头
$header = [ 'cookie' => 'ttwid=' . $ttwid, 'user-agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36', ]; $webSocketUrl = 'ws://webcast3-ws-web-lq.douyin.com/webcast/im/push/v2/?app_name=douyin_web&version_code=180800&webcast_sdk_version=1.3.0&update_version_code=1.3.0&compress=gzip&internal_ext=internal_src:dim|wss_push_room_id:' . $liveRoomId . '|wss_push_did:7188358506633528844|dim_log_id:20230521093022204E5B327EF20D5CDFC6|fetch_time:1684632622323|seq:1|wss_info:0-1684632622323-0-0|wrds_kvs:WebcastRoomRankMessage-1684632106402346965_WebcastRoomStatsMessage-1684632616357153318&cursor=t-1684632622323_r-1_d-1_u-1_h-1&host=https://live.douyin.com&aid=6383&live_id=1&did_rule=3&debug=false&maxCacheMessageNumber=20&endpoint=live_pc&support_wrds=1&im_path=/webcast/im/fetch/&user_unique_id=7188358506633528844&device_platform=web&cookie_enabled=true&screen_width=1440&screen_height=900&browser_language=zh&browser_platform=MacIntel&browser_name=Mozilla&browser_version=5.0%20(Macintosh;%20Intel%20Mac%20OS%20X%2010_15_7)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/113.0.0.0%20Safari/537.36&browser_online=true&tz_name=Asia/Shanghai&identity=audience&room_id=' . $liveRoomId . '&heartbeatDuration=0&signature=00000000';
- 最后通过workman 中的AsyncTcpConnection 进行链接获取数据
$wsClient = new AsyncTcpConnection($webSocketUrl);
// 设置以ssl加密方式访问,使之成为wss
$wsClient->transport = 'ssl';
$wsClient->headers = $header;
$parseMsg = new ParseMsg($conn);
$wsClient->onMessage = [$parseMsg, 'on_message'];
$wsClient->connect();
具体具体的解析代码和 protobuf 我放在github 上面了,需要的朋友自己去看吧。
还有一个比较重要的点是弹幕消息是通过google 的 protobuf 协议进行编码,需要大家了解一下protobuf 协议
提供一个测试地址吧 ws://47.93.122.172:4200
消息格式如下:
{
"url":"https://live.douyin.com/619592756125"
}
新增 docker 一键部署,方便使用~
本作品采用《CC 协议》,转载必须注明作者和本文链接
推荐文章: