程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
 程式師世界 >> 編程語言 >> 網頁編程 >> PHP編程 >> 關於PHP編程 >> Linux下超時重傳時間(RTO)的實現探究

Linux下超時重傳時間(RTO)的實現探究

編輯:關於PHP編程

Linux下超時重傳時間(RTO)的實現探究


最近出現了網絡超時的問題要排查,大致按照如圖思路去排查

1.排除代碼邏輯問題,TCP相關可能的BUG,內核參數等問題;

2.排查KVM問題時,在同一個宿主機的不同KVM上,復現了超時問題。

發現大部分異常連接時長都在1s左右,通過抓包分析,可以看到這部分的包被重傳了,重傳的時間固定為1秒。

這裡重傳時間為什麼是1秒呢,相關的標准和實際實現是怎樣的呢?

本文主要討論的就是這部分內容(基於centos的2.6.32-358)

RFC標准


超時重傳時間(RTO)是由當前網絡狀況(RTT),然後根據一個算法來決定。這部分相關內容《TCP/IP詳解卷1》中有提到,但是已經過時了。

去RFC查了下,重傳超時相關最新的是RFC6298,他更新了RFC1122並且廢棄了RFC2988

稍微介紹一下其中內容,有興趣的可以點進去看

RFC6298

1 重申了RTO的基本計算方法:

首先有個通過時鐘得到的時間參數RTO_MIN

初始化:

第一次計算:

以後的計算:

RTO的最小值建議是1秒,最大值必須大於60秒

2 對於同一個包的多次重傳,必須使用Karn算法,也就是剛才看到的雙倍增長

另外RTT采樣不能使用重傳的包,除非開啟了timestamps參數(利用該參數可以准確計算出RTT)

3 當4*RTTVAR趨向於0時,得到的值必須向RTO_MIN時間靠近

經驗上時鐘越准確越好,最好誤差在100ms內

4 RTO計時器的管理

(1)發送數據(包括重傳時),檢查計時器是否啟動,若沒有則啟動。當收到該數據的ACK時刪除計時器

(2)使用RTO = RTO * 2的方式進行退避

(3)新的FALLBACK特性:當計時器在等待SYN報文時過期,且當前TCP實現使用了小於3秒的RTO,那麼該連接對的RTO必須被重設為3秒,重設的RTO將用在正式數據的傳輸上(就是三次握手結束以後)


對linux的實際實現進行抓包分析

三次握手的syn包發送

12345601:00:00.129688 IP 172.16.3.14.1868 > 172.16.10.40.80: Flags [S], seq 3774079837, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 001:00:01.129065 IP 172.16.3.14.1868 > 172.16.10.40.80: Flags [S], seq 3774079837, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 001:00:03.129063 IP 172.16.3.14.1868 > 172.16.10.40.80: Flags [S], seq 3774079837, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 001:00:07.129074 IP 172.16.3.14.1868 > 172.16.10.40.80: Flags [S], seq 3774079837, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 001:00:15.129072 IP 172.16.3.14.1868 > 172.16.10.40.80: Flags [S], seq 3774079837, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 001:00:31.129128 IP 172.16.3.14.1868 > 172.16.10.40.80: Flags [S], seq 3774079837, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0

從1秒起雙倍遞增

值得注意是實質上第五次超時以後等到第六次,才會通知上層連接超時,那一共是63秒

三次握手的syncak包發送

123456701:17:20.084839 IP 172.16.3.15.2535 > 172.16.3.14.80: Flags [S], seq 1297135388, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 001:17:20.084908 IP 172.16.3.14.80 > 172.16.3.15.2535: Flags [S.], seq 1194120443, ack 1297135389, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 001:17:21.284093 IP 172.16.3.14.80 > 172.16.3.15.2535: Flags [S.], seq 1194120443, ack 1297135389, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 001:17:23.284088 IP 172.16.3.14.80 > 172.16.3.15.2535: Flags [S.], seq 1194120443, ack 1297135389, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 001:17:27.284095 IP 172.16.3.14.80 > 172.16.3.15.2535: Flags [S.], seq 1194120443, ack 1297135389, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 001:17:35.284097 IP 172.16.3.14.80 > 172.16.3.15.2535: Flags [S.], seq 1194120443, ack 1297135389, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 001:17:51.284093 IP 172.16.3.14.80 > 172.16.3.15.2535: Flags [S.], seq 1194120443, ack 1297135389, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0

從1秒起雙倍遞增

正常的數據包發送

1234567891011121314151601:32:20.443757 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:32:20.644600 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:32:21.046579 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:32:21.850632 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:32:23.458555 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:32:26.674594 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:32:33.106601 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:32:45.970567 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:33:11.698415 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:34:03.154300 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:35:46.065892 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:37:46.065382 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:39:46.064917 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:41:46.064466 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:43:46.064060 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 1101:45:46.063675 IP 172.16.3.15.2548 > 172.16.3.14.80: Flags [P.], seq 3319667389:3319667400, ack 1233846614, win 115, length 11

從0.2秒起雙倍遞增,最大到120秒,一共15次

值得注意的是從32分開始,47分才結束,也就是15分鐘25秒左右

linux是否支持了FALLBACK特性,做一個簡單的測試

123456789101112131415161718192021222324252627282930server開啟iptables後,client連接server,在5次超時次數內關閉iptables23:35:01.036565 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [S], seq 2364912154, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 023:35:02.036152 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [S], seq 2364912154, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 023:35:04.036126 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [S], seq 2364912154, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 023:35:08.036127 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [S], seq 2364912154, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 023:35:16.036131 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [S], seq 2364912154, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 023:35:16.036842 IP 172.16.10.40.12345 > 172.16.3.14.6071: Flags [S.], seq 3634006739, ack 2364912155, win 14600, options [mss 1460], length 023:35:16.036896 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [.], ack 3634006740, win 14600, length 0接著server開啟iptables後,client發送數據包,在15次超時次數內關閉iptables23:35:48.129273 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [P.], seq 2364912155:2364912156, ack 3634006740, win 14600, length 123:35:51.129120 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [P.], seq 2364912155:2364912156, ack 3634006740, win 14600, length 123:35:57.129070 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [P.], seq 2364912155:2364912156, ack 3634006740, win 14600, length 123:36:09.129068 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [P.], seq 2364912155:2364912156, ack 3634006740, win 14600, length 123:36:09.129802 IP 172.16.10.40.12345 > 172.16.3.14.6071: Flags [.], ack 2364912156, win 14600, length 0接著server不開iptables時,client發送數據包23:36:15.217231 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [P.], seq 2364912156:2364912157, ack 3634006740, win 14600, length 123:36:15.217766 IP 172.16.10.40.12345 > 172.16.3.14.6071: Flags [.], ack 2364912157, win 14600, length 0接著server開啟iptables,client發送數據包23:36:26.658172 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [P.], seq 2364912157:2364912158, ack 3634006740, win 14600, length 123:36:26.859055 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [P.], seq 2364912157:2364912158, ack 3634006740, win 14600, length 123:36:27.261065 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [P.], seq 2364912157:2364912158, ack 3634006740, win 14600, length 123:36:28.065106 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [P.], seq 2364912157:2364912158, ack 3634006740, win 14600, length 123:36:29.673132 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [P.], seq 2364912157:2364912158, ack 3634006740, win 14600, length 123:36:32.889068 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [P.], seq 2364912157:2364912158, ack 3634006740, win 14600, length 123:36:39.321091 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [P.], seq 2364912157:2364912158, ack 3634006740, win 14600, length 123:36:52.185135 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [P.], seq 2364912157:2364912158, ack 3634006740, win 14600, length 123:37:17.913091 IP 172.16.3.14.6071 > 172.16.10.40.12345: Flags [P.], seq 2364912157:2364912158, ack 3634006740, win 14600, length 1

從這個測試中可以發現,當三次握手時RTT超過1秒時,數據發送階段的RTO為3秒(服務端的SYNACK發生超時也是如此)

而後正常的一次RTT後,RTO重新收斂到200ms左右

再看看timestamps的支持如何


1234567891011121314151617server開啟iptables後,client連接server,在5次超時次數內關閉iptables23:47:47.754316 IP 172.16.3.14.8603 > 172.16.10.40.12345: Flags [S], seq 479022248, win 14600, options [mss 1460,sackOK,TS val 2336007392 ecr 0,nop,wscale 7], length 023:47:48.754079 IP 172.16.3.14.8603 > 172.16.10.40.12345: Flags [S], seq 479022248, win 14600, options [mss 1460,sackOK,TS val 2336008392 ecr 0,nop,wscale 7], length 023:47:50.754088 IP 172.16.3.14.8603 > 172.16.10.40.12345: Flags [S], seq 479022248, win 14600, options [mss 1460,sackOK,TS val 2336010392 ecr 0,nop,wscale 7], length 023:47:54.754083 IP 172.16.3.14.8603 > 172.16.10.40.12345: Flags [S], seq 479022248, win 14600, options [mss 1460,sackOK,TS val 2336014392 ecr 0,nop,wscale 7], length 023:48:02.754094 IP 172.16.3.14.8603 > 172.16.10.40.12345: Flags [S], seq 479022248, win 14600, options [mss 1460,sackOK,TS val 2336022392 ecr 0,nop,wscale 7], length 023:48:02.754683 IP 172.16.10.40.12345 > 172.16.3.14.8603: Flags [S.], seq 697602971, ack 479022249, win 14480, options [mss 1460,nop,nop,TS val 4044659641 ecr 2336022392], length 023:48:02.754742 IP 172.16.3.14.8603 > 172.16.10.40.12345: Flags [.], ack 697602972, win 14600, options [nop,nop,TS val 2336022392 ecr 4044659641], length 0接著server開啟iptables後,client發送數據包,在15次超時次數內關閉iptables23:48:11.944170 IP 172.16.3.14.8603 > 172.16.10.40.12345: Flags [P.], seq 479022249:479022250, ack 697602972, win 14600, options [nop,nop,TS val 2336031582 ecr 4044659641], length 123:48:12.145036 IP 172.16.3.14.8603 > 172.16.10.40.12345: Flags [P.], seq 479022249:479022250, ack 697602972, win 14600, options [nop,nop,TS val 2336031783 ecr 4044659641], length 123:48:12.547084 IP 172.16.3.14.8603 > 172.16.10.40.12345: Flags [P.], seq 479022249:479022250, ack 697602972, win 14600, options [nop,nop,TS val 2336032185 ecr 4044659641], length 123:48:13.351106 IP 172.16.3.14.8603 > 172.16.10.40.12345: Flags [P.], seq 479022249:479022250, ack 697602972, win 14600, options [nop,nop,TS val 2336032989 ecr 4044659641], length 123:48:14.959080 IP 172.16.3.14.8603 > 172.16.10.40.12345: Flags [P.], seq 479022249:479022250, ack 697602972, win 14600, options [nop,nop,TS val 2336034597 ecr 4044659641], length 123:48:18.175092 IP 172.16.3.14.8603 > 172.16.10.40.12345: Flags [P.], seq 479022249:479022250, ack 697602972, win 14600, options [nop,nop,TS val 2336037813 ecr 4044659641], length 123:48:24.607088 IP 172.16.3.14.8603 > 172.16.10.40.12345: Flags [P.], seq 479022249:479022250, ack 697602972, win 14600, options [nop,nop,TS val 2336044245 ecr 4044659641], length 1

可以看到開啟了timestamps後,FALLBACK機制重設RTO為3秒將不會起作用


linux的對RTO計算的微調

linux對RTO計算的實際實現和RFC文檔相比還是有所出入的,如果只按照RFC文檔去按圖索骥,那麼在實際的RTO估計上會誤入歧途

1 根據上一段可以發現,他把RTO的最小值設為200ms(甚至在ubuntu上是50ms,而RFC建議1秒),最大值設置為120秒(RFC強制60秒以上)

2 根據我對linux代碼的分析,在RTT劇烈抖動的情況下,linux的實現減輕了急劇改變的RTT干擾,使得RTO的趨勢圖更加平滑

這一點體現在兩點微調上:

微調1

當滿足以下條件時

說明R'的波動太大了,和平滑過的RTT值比,差值的比RTTVAR還大

於是

而RFC文檔是

可以看到,和RFC文檔相比平滑系數乘以了1/8,表示R'對RTTVAR的影響將減小,使得RTTVAR更平滑,RTO也會更平滑

微調2

當RTTVAR減少的時候,會對RTTVAR做一次平滑處理,使得RTO不會下降的太離譜出現陡峭的趨勢圖

<img src="http://latex.codecogs.com/gif.latex?if%28max%28RTTVAR%27,RTO%5C_MIN%29&space;%3C&space;RTTVAR%29" title="if(max(RTTVAR',RTO\_MIN)

這裡RTTVAR'指的是當前根據RTT計算得到的值,這個值限制了下限(RTO_MIN)以後和上一個RTT時的RTTVAR比較,當發現減少時,使用1/4系數來做平滑處理

這裡為什麼不對增大的情況做處理呢?我認為是因為RTO增大的話其實沒事,但是如果減少量很大的話,可能會引起spurious retransmission(關於這個名詞,詳細見上文提到的RFC文檔)


人為介入修改RTO的方法

回到最初的問題,是否能縮短RTO的值,而且這個RTO值如何根據linux的實際實現去預估

顯然RTO初始值(包括FALLBACK)是不能改變的,這部分是固死寫在代碼裡的

而三次握手以外的RTO值是可以預估的

預估時假設網絡穩定,RTT始終不變為R(否則由於微調1和2,將極其復雜)

那麼SRTT將始終為R,RTTVAR將始終為0.5R

<img src="http://latex.codecogs.com/gif.latex?if%28RTO%5C_MIN&space;%3C&space;4RTTVAR%29" title="if(RTO\_MIN

否則

因此只需要改變RTO_MIN的值,就能顯著影響RTO的值

RTO_MIN的設置

RTO_MIN的設置是根據ip route來實現的

12345678910111213[[email protected] ~]# ping www.baidu.comPING www.a.shifen.com (180.97.33.107) 56(84) bytes of data.64 bytes from 180.97.33.107: icmp_seq=1 ttl=51 time=30.8 ms64 bytes from 180.97.33.107: icmp_seq=2 ttl=51 time=29.9 ms獲得百度的IP後[[email protected] ~]# ip route add 180.97.33.108/32 via 172.16.3.1 rto_min 20[[email protected] ~]# nc www.baidu.com 80[[email protected] ~]# ss -eipn '( dport =:www )'State Recv-Q Send-Q Local Address:Port Peer Address:PortESTAB 0 0 172.16.3.14:14149 180.97.33.108:80 users:(("nc",7162,3)) ino:48057454 sk:ffff88023905adc0sack cubic wscale:7,7 rto:81 rtt:27/13.5 cwnd:10 send 4.3Mbps rcv_space:14600

因為RTO_MIN < 2R,所以RTO = 3R = 27 * 3 = 81

如果是內網的話,RTT非常小

1234567[[email protected] ~]# ip route add 172.16.3.16/32 via 172.16.3.1 rto_min 20[[email protected] ~]# nc 172.16.3.16 22SSH-2.0-OpenSSH_5.3[[email protected] ~]# ss -eipn '( dport =:22 )'State Recv-Q Send-Q Local Address:Port Peer Address:PortESTAB 0 0 172.16.3.14:57578 172.16.3.16:22 users:(("nc",7272,3)) ino:48059707 sk:ffff88023b7c7000sack cubic wscale:7,7 rto:21 rtt:1/0.5 ato:40 cwnd:10 send 116.8Mbps rcv_space:14600

因為RTO_MIN > 2R,所以RTO = R + RTO_MIN = 1 + 20 = 21

如果對內網的整個網絡有自信的話,也可以不設置目標IP,直接對全部連接生效,如下

1ip route change dev eth0 rto_min 20ms

總結

1 linux的超時重傳實現大體上參考了RFC,但是有一部分微調:

RFC只有一個RTO初始值,為1秒。而linux的實現將三次握手階段的包的RTO設為1秒,其余包初始時間設為0.2秒

由於RFC規定的算法不夠完美,linux的實際實現在RTT劇烈抖動的情況下,減輕了急劇改變的RTT干擾,使得RTO的趨勢圖更加平滑

2 連接的SYN重傳時間,在除非重新編譯內核的情況下是無法調整的,但是push包是可以調整重傳時間的

3 在比較穩定的網絡中,假設設置的rto最小值為RTO_MIN

  1. 上一頁:
  2. 下一頁:
Copyright © 程式師世界 All Rights Reserved