用ebpf排查“connection reset by peer”网络问题
背景介绍
在机器192.168.2.120(内核5.19.0)上安装有rinetd端口转发服务(代理本机的php服务),对外端口是92,对其进行压测:
go-wrk -c 1000 'http://192.168.2.120:92'
结果有大量超时错误,还有报错“connection reset by peer”,于是决定用ebpf工具来查看报错的具体原因。
查询调用链
根据报错可以知道是客户端请求的时候,服务端返回了reset包,我们可以用ebpf跟踪发包函数dev_hard_start_xmit的调用链,运行bpftrace dev_hard_start_xmit.bt
,dev_hard_start_xmit.bt脚本如下:
kprobe:dev_hard_start_xmit
{
$skb = (struct sk_buff*) arg0;
$ip_header = ((struct iphdr *) ($skb->head + $skb->network_header));
if ($ip_header->saddr == 0x7802a8c0) {
$tcp_header= ((struct tcphdr *) ($skb->head + $skb->network_header + ($ip_header->ihl << 2)));
$sport = $tcp_header->source;
$dport = $tcp_header->dest;
$dport = ($dport >> 8) | (($dport << 8) & 0xff00);
$sport = ($sport >> 8) | (($sport << 8) & 0xff00);
// 发rest包时,源port=92,tcp头部的rst标志位是1
if ($sport == 92 && $tcp_header->rst == 1) {
@[kstack, comm] = count();
}
}
}
得到调用栈最多的两个如下:
@[
dev_hard_start_xmit+1
__dev_xmit_skb+752
__dev_queue_xmit+864
neigh_hh_output+167
ip_finish_output2+485
__ip_finish_output+195
ip_finish_output+46
ip_output+120
ip_push_pending_frames+187
ip_send_unicast_reply+782
tcp_v4_send_reset+834
tcp_v4_do_rcv+556
tcp_v4_rcv+3866
ip_protocol_deliver_rcu+60
ip_local_deliver_finish+134
ip_local_deliver+270
ip_sublist_rcv_finish+111
ip_sublist_rcv+381
ip_list_rcv+249
__netif_receive_skb_list_core+553
netif_receive_skb_list_internal+403
napi_complete_done+122
e1000e_poll+218
__napi_poll+48
net_rx_action+389
__softirqentry_text_start+213
__irq_exit_rcu+141
irq_exit_rcu+14
common_interrupt+74
asm_common_interrupt+39
, php-fpm]: 500
@[
dev_hard_start_xmit+1
__dev_xmit_skb+752
__dev_queue_xmit+864
neigh_hh_output+167
ip_finish_output2+485
__ip_finish_output+195
ip_finish_output+46
ip_output+120
ip_local_out+100
__ip_queue_xmit+394
ip_queue_xmit+21
__tcp_transmit_skb+2324
tcp_send_active_reset+247
__tcp_close+373
tcp_close+37
inet_release+69
__sock_release+63
sock_close+21
__fput+156
____fput+14
task_work_run+97
exit_to_user_mode_loop+273
exit_to_user_mode_prepare+185
syscall_exit_to_user_mode+42
do_syscall_64+105
entry_SYSCALL_64_after_hwframe+99
, rinetd]: 26706
其实第一个调用链是服务端收到syn包后,发现全连接队列已经满了(rinetd的全连接数设置的是5,比较小),于是直接发reset包。第二个调用链疑似用户态主动调用了close导致内核发了reset包。我们继续验证这两个结论。
第一个调用链
首先定位是tcp_v4_do_rcv处调用了tcp_v4_send_reset进而发送了reset包,查看前者的源码和后者的源码,发现两者之间的调用链条很多,无法定位具体是什么原因导致发了reset包。所以再次参考之前的文章,用两个脚本程序来定位:
// 先运行bpftrace tcp_v4_send_reset.bt,脚本如下
kprobe:tcp_v4_send_reset
{
$skb = (struct sk_buff*) arg1;
$ip_header = ((struct iphdr *) ($skb->head + $skb->network_header));
if ($ip_header->daddr == 0x7802a8c0)
{
$tcp_header= ((struct tcphdr *) ($skb->head + $skb->network_header + ($ip_header->ihl << 2)));
$sport = $tcp_header->source;
$dport = $tcp_header->dest;
$dport = ($dport >> 8) | (($dport << 8) & 0xff00);
$sport = ($sport >> 8) | (($sport << 8) & 0xff00);
if ($dport == 92) {
debugf("%d > %d", $sport, $dport);
}
}
}
// 再运行./funcgraph -a -p 3632351 -m 5 tcp_v4_rcv
// 这里的3632351是php-fpm的主进程id
得到的结果如下:
2865008.230196 | 5) php-fpm-1788381 | | tcp_v4_do_rcv() {
2865008.230196 | 5) php-fpm-1788381 | | cookie_v4_check() {
2865008.230197 | 5) php-fpm-1788381 | | __cookie_v4_check() {
2865008.230197 | 5) php-fpm-1788381 | 0.586 us | cookie_hash();
2865008.230198 | 5) php-fpm-1788381 | 0.302 us | cookie_hash();
2865008.230198 | 5) php-fpm-1788381 | 1.583 us | }
2865008.230198 | 5) php-fpm-1788381 | 2.304 us | }
2865008.230199 | 5) php-fpm-1788381 | | tcp_rcv_state_process();
2865008.230206 | 5) php-fpm-1788381 | | tcp_v4_send_reset() {
2865008.230210 | 5) php-fpm-1788381 | | /* bpf_trace_printk: 48803 > 92 */
对比源码可以知道,是调用tcp_rcv_state_process时返回了非0的值,所以才调用了tcp_v4_send_reset。接下来就需要看tcp_rcv_state_process的处理过程了,这里把bpftrace tcp_v4_send_reset.bt再改造一下,方便知道此时socket的状态:
kprobe:tcp_v4_send_reset
{
$skb = (struct sk_buff*) arg1;
$ip_header = ((struct iphdr *) ($skb->head + $skb->network_header));
if ($ip_header->daddr == 0x7802a8c0)
{
$tcp_header= ((struct tcphdr *) ($skb->head + $skb->network_header + ($ip_header->ihl << 2)));
$sport = $tcp_header->source;
$dport = $tcp_header->dest;
$dport = ($dport >> 8) | (($dport << 8) & 0xff00);
$sport = ($sport >> 8) | (($sport << 8) & 0xff00);
if ($dport == 92) {
debugf("%d > %d", $sport, $dport);
}
}
}
kprobe:tcp_rcv_state_process
{
$sk = (struct sock*) arg0;
$skb = (struct sk_buff*) arg1;
$ip_header = ((struct iphdr *) ($skb->head + $skb->network_header));
if ($ip_header->daddr == 0x7802a8c0)
{
$tcp_header= ((struct tcphdr *) ($skb->head + $skb->network_header + ($ip_header->ihl << 2)));
$sport = $tcp_header->source;
$dport = $tcp_header->dest;
$dport = ($dport >> 8) | (($dport << 8) & 0xff00);
$sport = ($sport >> 8) | (($sport << 8) & 0xff00);
if ($dport == 92) {
debugf("%d %d %d", $tcp_header->syn, $tcp_header->ack, $sk->__sk_common.skc_state);
}
}
}
再次运行得到的结果如下:
2865008.230196 | 5) php-fpm-1788381 | | tcp_v4_do_rcv() {
2865008.230196 | 5) php-fpm-1788381 | | cookie_v4_check() {
2865008.230197 | 5) php-fpm-1788381 | | __cookie_v4_check() {
2865008.230197 | 5) php-fpm-1788381 | 0.586 us | cookie_hash();
2865008.230198 | 5) php-fpm-1788381 | 0.302 us | cookie_hash();
2865008.230198 | 5) php-fpm-1788381 | 1.583 us | }
2865008.230198 | 5) php-fpm-1788381 | 2.304 us | }
2865008.230199 | 5) php-fpm-1788381 | | tcp_rcv_state_process() {
2865008.230204 | 5) php-fpm-1788381 | | /* bpf_trace_printk: 0 1 10 */
2865008.230206 | 5) php-fpm-1788381 | 7.341 us | }
2865008.230206 | 5) php-fpm-1788381 | | tcp_v4_send_reset() {
2865008.230210 | 5) php-fpm-1788381 | | /* bpf_trace_printk: 48803 > 92 */
bpf_trace_printk: 0 1 10
表示收到的包是ack包,当前的socket的监听的socket。
由此可以知道是客户端向监听的socket发送了ack包,导致服务端直接响应reset,至于为什么会出现这样的情况呢?猜测是客户端和服务端建立的连接已经在服务端关闭了,而客户端并不知道,而发送了ack包。
第二个调用链
如何定位这个调用链是一个close系统调用呢,这里执行bpftrace dev_hard_start_xmit_v2.bt
确认一下,脚本如下:
kprobe:dev_hard_start_xmit
{
$skb = (struct sk_buff*) arg0;
$ip_header = ((struct iphdr *) ($skb->head + $skb->network_header));
if ($ip_header->saddr == 0x7802a8c0) {
$tcp_header= ((struct tcphdr *) ($skb->head + $skb->network_header + ($ip_header->ihl << 2)));
$sport = $tcp_header->source;
$dport = $tcp_header->dest;
$dport = ($dport >> 8) | (($dport << 8) & 0xff00);
$sport = ($sport >> 8) | (($sport << 8) & 0xff00);
if ($sport == 92 && $tcp_header->fin == 1) {
// 在这里输出结果可以看到系统调用号是3,也就是close
@[kstack, @callid[tid]] = count();
delete(@callid[tid]);
}
}
}
// 这里跟踪所有的系统调用,先把系统调用号存起来
tracepoint:raw_syscalls:sys_enter
{
@callid[tid] = args->id;
}
那为什么rinetd会调用close导致发送了reset呢?观察rinetd的日志有大量的local-socket-failed错误,下载了对应rinetd版本的源码,可以看到这个错误出现在rinetd接收了外部客户端的连接,需要把请求转发给本机的服务,如果和本机的服务建立连接失败,它会主动的关闭外部客户端的连接。
本作品采用《CC 协议》,转载必须注明作者和本文链接