Just finished TCP tuning for SL6, kernel 2.6.32-358.14.1.el6.x86_64, so share the experience here. In the past, I played with several types of 10G NIC, all on SL5, only some of them survived from my test, they fail at either at poor performance, or data corruption during multiple streams transfers. To be noted my test is multiple streams test for storage nodes, receive and deliver data in a large range of RTT(0.1 to 300ms), clients mixed with 1G and 10G NIC. In my recent test, I used the node has 32GB memory, Mellanox 10G NIC, 12 CPUs, it was a SL5 node, just upgraded to SL6. I mounted two LUNs so it has enough I/O bandwith for the test. The first driver I tested was 2.0 which came with SL6.4, it was not succeful, it cashed the kernel in 3 minutes with the following error in kernel. kernel: swapper: page allocation failure. order:2, mode:0x4020 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-358.14.1.el6.x86_64 #1 kernel: Call Trace: kernel: <IRQ> [<ffffffff8112c197>] ? __alloc_pages_nodemask+0x757/0x8d0 kernel: [<ffffffff8147fa38>] ? ip_local_deliver+0x98/0xa0 kernel: [<ffffffff811609ea>] ? alloc_pages_current+0xaa/0x110 kernel: [<ffffffffa01efaa7>] ? mlx4_en_alloc_frags+0x57/0x330 [mlx4_en] kernel: [<ffffffff8144aada>] ? napi_frags_finish+0x9a/0xb0 kernel: [<ffffffffa01f02df>] ? mlx4_en_process_rx_cq+0x55f/0x990 [mlx4_en] kernel: [<ffffffffa01f074f>] ? mlx4_en_poll_rx_cq+0x3f/0x80 ... Then, I tried version 1.5.10, which also generates some memory allocation errors, but with some further tunings, it passed my stress tests. Performance is also very good. Here are sysctl.conf # sysctl -p net.ipv4.ip_forward = 0 net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.default.accept_source_route = 0 kernel.sysrq = 0 kernel.core_uses_pid = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 68719476736 kernel.shmall = 4294967296 vm.min_free_kbytes = 131072 net.ipv4.tcp_sack = 0 net.ipv4.tcp_timestamps = 0 net.core.netdev_max_backlog = 250000 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_max_syn_backlog = 8192 net.core.optmem_max = 33554432 net.core.rmem_max = 33554432 net.core.wmem_max = 33554432 net.core.rmem_default = 33554432 net.core.wmem_default = 33554432 net.ipv4.tcp_rmem = 4096 87380 33554432 net.ipv4.tcp_wmem = 4096 65536 33554432 net.ipv4.tcp_mem = 6672016 6682016 7185248 To be noted is that, I increased tcp_mem to let TCP has more memory, for my data server is mainly being used for data transfer. So, if you have a server also doing something else, then probably you should lower the number for other applications. But, I noticed, under heavy stress test, default configuration could cause memory allocation error. For SACK and timestamps, I disabled them both to save CPU resources. After kernel 2.6.25, there are lots of patches for SACK to save avoid too much CPU usage. I switched them off mainly because I did not see significent different under stress test. I set syn_backlog, max_backlog to higher number mainly because there could be short time high rate data taking, but I did not set txqueuelen to higher(default is 1000), also I did not set Mellanox adaptive-rx to off. You could try this if you server traffic patten changes all the time. For example, sometime quiet, then sometime very busy. Here are some references I think they are all useful and with very good explainations. http://www.acc.umu.se/~maswan/linux-netperf.txt http://fasterdata.es.net/host-tuning/linux/ http://en.wikipedia.org/wiki/TCP_window_scale_option http://www.psc.edu/index.php/networking/641-tcp-tune http://man7.org/linux/man-pages/man7/tcp.7.html https://www.frozentux.net/ipsysctl-tutorial/ipsysctl-tutorial.html#TCPVARIABLES http://en.wikipedia.org/wiki/Transmission_Control_Protocol http://www.linuxvox.com/2009/11/what-is-the-linux-kernel-parameter-tcp_low_latency http://www.ibm.com/developerworks/library/l-tcp-sack/ More references for other platforms
![]()
![]()
|
Linux >