Posts Tagged ‘ISPs’

Resolving “nf_conntrack: table full, dropping packet.” flood message in dmesg Linux kernel log

Wednesday, March 28th, 2012

nf_conntrack_table_full_dropping_packet
On many busy servers, you might encounter in /var/log/syslog or dmesg kernel log messages like

nf_conntrack: table full, dropping packet

to appear repeatingly:

[1737157.057528] nf_conntrack: table full, dropping packet.
[1737157.160357] nf_conntrack: table full, dropping packet.
[1737157.260534] nf_conntrack: table full, dropping packet.
[1737157.361837] nf_conntrack: table full, dropping packet.
[1737157.462305] nf_conntrack: table full, dropping packet.
[1737157.564270] nf_conntrack: table full, dropping packet.
[1737157.666836] nf_conntrack: table full, dropping packet.
[1737157.767348] nf_conntrack: table full, dropping packet.
[1737157.868338] nf_conntrack: table full, dropping packet.
[1737157.969828] nf_conntrack: table full, dropping packet.
[1737157.969928] nf_conntrack: table full, dropping packet
[1737157.989828] nf_conntrack: table full, dropping packet
[1737162.214084] __ratelimit: 83 callbacks suppressed

There are two type of servers, I've encountered this message on:

1. Xen OpenVZ / VPS (Virtual Private Servers)
2. ISPs – Internet Providers with heavy traffic NAT network routers
 

I. What is the meaning of nf_conntrack: table full dropping packet error message

In short, this message is received because the nf_conntrack kernel maximum number assigned value gets reached.
The common reason for that is a heavy traffic passing by the server or very often a DoS or DDoS (Distributed Denial of Service) attack. Sometimes encountering the err is a result of a bad server planning (incorrect data about expected traffic load by a company/companeis) or simply a sys admin error…

– Checking the current maximum nf_conntrack value assigned on host:

linux:~# cat /proc/sys/net/ipv4/netfilter/ip_conntrack_max
65536

– Alternative way to check the current kernel values for nf_conntrack is through:

linux:~# /sbin/sysctl -a|grep -i nf_conntrack_max
error: permission denied on key 'net.ipv4.route.flush'
net.netfilter.nf_conntrack_max = 65536
error: permission denied on key 'net.ipv6.route.flush'
net.nf_conntrack_max = 65536

– Check the current sysctl nf_conntrack active connections

To check present connection tracking opened on a system:

:

linux:~# /sbin/sysctl net.netfilter.nf_conntrack_count
net.netfilter.nf_conntrack_count = 12742

The shown connections are assigned dynamicly on each new succesful TCP / IP NAT-ted connection. Btw, on a systems that work normally without the dmesg log being flooded with the message, the output of lsmod is:

linux:~# /sbin/lsmod | egrep 'ip_tables|conntrack'
ip_tables 9899 1 iptable_filter
x_tables 14175 1 ip_tables

On servers which are encountering nf_conntrack: table full, dropping packet error, you can see, when issuing lsmod, extra modules related to nf_conntrack are shown as loaded:

linux:~# /sbin/lsmod | egrep 'ip_tables|conntrack'
nf_conntrack_ipv4 10346 3 iptable_nat,nf_nat
nf_conntrack 60975 4 ipt_MASQUERADE,iptable_nat,nf_nat,nf_conntrack_ipv4
nf_defrag_ipv4 1073 1 nf_conntrack_ipv4
ip_tables 9899 2 iptable_nat,iptable_filter
x_tables 14175 3 ipt_MASQUERADE,iptable_nat,ip_tables

 

II. Remove completely nf_conntrack support if it is not really necessery

It is a good practice to limit or try to omit completely use of any iptables NAT rules to prevent yourself from ending with flooding your kernel log with the messages and respectively stop your system from dropping connections.

Another option is to completely remove any modules related to nf_conntrack, iptables_nat and nf_nat.
To remove nf_conntrack support from the Linux kernel, if for instance the system is not used for Network Address Translation use:

/sbin/rmmod iptable_nat
/sbin/rmmod ipt_MASQUERADE
/sbin/rmmod rmmod nf_nat
/sbin/rmmod rmmod nf_conntrack_ipv4
/sbin/rmmod nf_conntrack
/sbin/rmmod nf_defrag_ipv4

Once the modules are removed, be sure to not use iptables -t nat .. rules. Even attempt to list, if there are any NAT related rules with iptables -t nat -L -n will force the kernel to load the nf_conntrack modules again.

Btw nf_conntrack: table full, dropping packet. message is observable across all GNU / Linux distributions, so this is not some kind of local distribution bug or Linux kernel (distro) customization.
 

III. Fixing the nf_conntrack … dropping packets error

– One temporary, fix if you need to keep your iptables NAT rules is:

linux:~# sysctl -w net.netfilter.nf_conntrack_max=131072

I say temporary, because raising the nf_conntrack_max doesn't guarantee, things will get smoothly from now on.
However on many not so heavily traffic loaded servers just raising the net.netfilter.nf_conntrack_max=131072 to a high enough value will be enough to resolve the hassle.

– Increasing the size of nf_conntrack hash-table

The Hash table hashsize value, which stores lists of conntrack-entries should be increased propertionally, whenever net.netfilter.nf_conntrack_max is raised.

linux:~# echo 32768 > /sys/module/nf_conntrack/parameters/hashsize
The rule to calculate the right value to set is:
hashsize = nf_conntrack_max / 4

– To permanently store the made changes ;a) put into /etc/sysctl.conf:

linux:~# echo 'net.netfilter.nf_conntrack_count = 131072' >> /etc/sysctl.conf
linux:~# /sbin/sysct -p

b) put in /etc/rc.local (before the exit 0 line):

echo 32768 > /sys/module/nf_conntrack/parameters/hashsize

Note: Be careful with this variable, according to my experience raising it to too high value (especially on XEN patched kernels) could freeze the system.
Also raising the value to a too high number can freeze a regular Linux server running on old hardware.

– For the diagnosis of nf_conntrack stuff there is ;

/proc/sys/net/netfilter kernel memory stored directory. There you can find some values dynamically stored which gives info concerning nf_conntrack operations in "real time":

linux:~# cd /proc/sys/net/netfilter
linux:/proc/sys/net/netfilter# ls -al nf_log/

total 0
dr-xr-xr-x 0 root root 0 Mar 23 23:02 ./
dr-xr-xr-x 0 root root 0 Mar 23 23:02 ../
-rw-r--r-- 1 root root 0 Mar 23 23:02 0
-rw-r--r-- 1 root root 0 Mar 23 23:02 1
-rw-r--r-- 1 root root 0 Mar 23 23:02 10
-rw-r--r-- 1 root root 0 Mar 23 23:02 11
-rw-r--r-- 1 root root 0 Mar 23 23:02 12
-rw-r--r-- 1 root root 0 Mar 23 23:02 2
-rw-r--r-- 1 root root 0 Mar 23 23:02 3
-rw-r--r-- 1 root root 0 Mar 23 23:02 4
-rw-r--r-- 1 root root 0 Mar 23 23:02 5
-rw-r--r-- 1 root root 0 Mar 23 23:02 6
-rw-r--r-- 1 root root 0 Mar 23 23:02 7
-rw-r--r-- 1 root root 0 Mar 23 23:02 8
-rw-r--r-- 1 root root 0 Mar 23 23:02 9

 

IV. Decreasing other nf_conntrack NAT time-out values to prevent server against DoS attacks

Generally, the default value for nf_conntrack_* time-outs are (unnecessery) large.
Therefore, for large flows of traffic even if you increase nf_conntrack_max, still shorty you can get a nf_conntrack overflow table resulting in dropping server connections. To make this not happen, check and decrease the other nf_conntrack timeout connection tracking values:

linux:~# sysctl -a | grep conntrack | grep timeout
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_events_retry_timeout = 15
net.ipv4.netfilter.ip_conntrack_generic_timeout = 600
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent2 = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_recv = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 432000
net.ipv4.netfilter.ip_conntrack_tcp_timeout_fin_wait = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close_wait = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_last_ack = 30
net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close = 10
net.ipv4.netfilter.ip_conntrack_tcp_timeout_max_retrans = 300
net.ipv4.netfilter.ip_conntrack_udp_timeout = 30
net.ipv4.netfilter.ip_conntrack_udp_timeout_stream = 180
net.ipv4.netfilter.ip_conntrack_icmp_timeout = 30

All the timeouts are in seconds. net.netfilter.nf_conntrack_generic_timeout as you see is quite high – 600 secs = (10 minutes).
This kind of value means any NAT-ted connection not responding can stay hanging for 10 minutes!

The value net.netfilter.nf_conntrack_tcp_timeout_established = 432000 is quite high too (5 days!)
If this values, are not lowered the server will be an easy target for anyone who would like to flood it with excessive connections, once this happens the server will quick reach even the raised up value for net.nf_conntrack_max and the initial connection dropping will re-occur again …

With all said, to prevent the server from malicious users, situated behind the NAT plaguing you with Denial of Service attacks:

Lower net.ipv4.netfilter.ip_conntrack_generic_timeout to 60 – 120 seconds and net.ipv4.netfilter.ip_conntrack_tcp_timeout_established to stmh. like 54000

linux:~# sysctl -w net.ipv4.netfilter.ip_conntrack_generic_timeout = 120
linux:~# sysctl -w net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 54000

This timeout should work fine on the router without creating interruptions for regular NAT users. After changing the values and monitoring for at least few days make the changes permanent by adding them to /etc/sysctl.conf

linux:~# echo 'net.ipv4.netfilter.ip_conntrack_generic_timeout = 120' >> /etc/sysctl.conf
linux:~# echo 'net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 54000' >> /etc/sysctl.conf

Linux Bond network interfaces to merge multiple interfaces ISPs traffic – Combine many interfaces NIC into one on Debian / Ubuntu / CentOS / Fedora / RHEL Linux

Tuesday, December 16th, 2014

how-to-create-bond-linux-agregated-network-interfaces-for-increased-network-thoroughput-debian-ubuntu-centos-fedora-rhel
Bonding Network Traffic
 (link aggregation) or NIC teaming is used to increase connection thoroughput and as a way to provide redundancy for a services / applications in case of some of the network connection (eth interfaces) fail. Networking Bonding is mostly used in large computer network providers (ISPs), infrastructures, university labs or big  computer network accessible infrastructures or even by enthusiatst to run home-server assuring its >= ~99% connectivity to the internet by bonding few Internet Providers links into single Bonded Network interface. One of most common use of Link Aggreegation nowadays is of course in Cloud environments.  

 Boding Network Traffic is a must know and (daily use) skill for the sys-admin of both Small Company Office network environment up to the large Professional Distributed Computing networks, as novice GNU /  Linux sys-admins would probably have never heard it and sooner or later they will have to, I've created this article as a quick and dirty guide on configuring Linux bonding across most common used Linux distributions.

It is assumed that the server where you need network boding to be configured has at least 2 or more PCI Gigabyte NICs with hardware driver for Linux supporting Jumbo Frames and some relatively fresh up2date Debian Linux >=6.0.*, Ubuntu 10+ distro, CentOS 6.4, RHEL 5.1, SuSE etc.
 

1. Bond Network ethernet interfaces on Debian / Ubutnu and Deb based distributions

To make network bonding possible on Debian and derivatives you need to install support for it through ifenslave package (command).

apt-cache show ifenslave-2.6|grep -i descript -A 8
Description: Attach and detach slave interfaces to a bonding device
 This is a tool to attach and detach slave network interfaces to a bonding
 device. A bonding device will act like a normal Ethernet network device to
 the kernel, but will send out the packets via the slave devices using a simple
 round-robin scheduler. This allows for simple load-balancing, identical to
 "channel bonding" or "trunking" techniques used in switches.
 .
 The kernel must have support for bonding devices for ifenslave to be useful.
 This package supports 2.6.x kernels and the most recent 2.4.x kernels.

 

apt-get –yes install ifenslave-2.6

 

Bonding interface works by creating a "Virtual" network interface on a Linux kernel level, it sends and receives packages via special
slave devices using simple round-robin scheduler. This makes possible a very simple network load balancing also known as "channel bonding" and "trunking"
supported by all Intelligent network switches

Below is a text diagram showing tiny Linux office network router configured to bond ISPs interfaces for increased thoroughput:

 

Internet
 |                  204.58.3.10 (eth0)
ISP Router/Firewall 10.10.10.254 (eth1)
   
                              | -----+------ Server 1 (Debian FTP file server w/ eth0 & eth1) 10.10.10.1
      +------------------+ --- |
      | Gigabit Ethernet       |------+------ Server 2 (MySQL) 10.10.10.2
      | with Jumbo Frame       |
      +------------------+     |------+------ Server 3 (Apache Webserver) 10.10.10.3
                               |
                               |------+-----  Server 4 (Squid Proxy / Qmail SMTP / DHCP) 10.10.10.4
                               |
                               |------+-----  Server 5 (Nginx CDN static content Webserver) 10.10.10.5
                               |
                               |------+-----  WINDOWS Desktop PCs / Printers & Scanners, Other network devices 

 

Next to configure just installed ifenslave Bonding  
 

vim /etc/modprobe.d/bonding.conf

alias bond0 bonding
  options bonding mode=0 arp_interval=100 arp_ip_target=10.10.10.254, 10.10.10.2, 10.10.10.3, 10.10.10.4, 10.10.10.5


Where:

  1. mode=0 : Set the bonding policies to balance-rr (round robin). This is default mode, provides load balancing and fault tolerance.
  2. arp_interval=100 : Set the ARP link monitoring frequency to 100 milliseconds. Without option you will get various warning when start bond0 via /etc/network/interfaces
  3. arp_ip_target=10.10.10.254, 10.10.10.2, … : Use the 10.10.10.254 (router ip) and 10.10.10.2-5 IP addresses to use as ARP monitoring peers when arp_interval is > 0. This is used determine the health of the link to the targets. Multiple IP addresses must be separated by a comma. At least one IP address must be given (usually I set it to router IP) for ARP monitoring to function. The maximum number of targets that can be specified is 16.

Next to make bonding work its necessery to load the bonding kernel module:

modprobe -v bonding mode=0 arp_interval=100 arp_ip_target=10.10.10.254, 10.10.10.2, 10.10.10.3, 10.10.10.4, 10.10.10.5

 

Loading the bonding module should spit some good output in /var/log/messages (check it out with tail -f /var/log/messages)

Now to make bonding active it is necessery to reload networking (this is extremely risky if you don't have some way of Console Web Java / VPN Access such as IPKVM / ILO / IDRAC), so reloading the network be absolutely sure to either do it through a cronjob which will automatically do the network restart with new settings and revert back to old configuration whether network is inaccessible or assure physical access to the server console if the server is at your disposal.

Whatever the case make sure you backup:

 cp /etc/network/interfaces /etc/network/interfaces.bak

vim /etc/network/interfaces

############ WARNING ####################
# You do not need an "iface eth0" nor an "iface eth1" stanza.
# Setup IP address / netmask / gateway as per your requirements.
#######################################
auto lo
iface lo inet loopback
 
# The primary network interface
auto bond0
iface bond0 inet static
    address 10.10.10.1
    netmask 255.255.255.0
    network 192.168.1.0
    gateway 10.10.10.254
    slaves eth0 eth1
    # jumbo frame support
    mtu 9000
    # Load balancing and fault tolerance
    bond-mode balance-rr
    bond-miimon 100
    bond-downdelay 200
    bond-updelay 200
    dns-nameservers 10.10.10.254
    dns-search nixcraft.net.in

 


As you can see from config there are some bond specific configuration variables that can be tuned, they can have positive / negative impact in some cases on network thoroughput. As you can see bonding interfaces has slaves (this are all other ethXX) interfaces. Bonded traffic will be available via one single interface, such configuration is great for webhosting providers with multiple hosted sites as usually hosting thousand websites on the same server or one single big news site requires a lot of bandwidth and of course requires a redundancy of data (guarantee it is up if possible 7/24h.

Here is what of configs stand for

 
  • mtu 9000 : Set MTU size to 9000. This is related to Jumbo Frames.
  • bond-mode balance-rr : Set bounding mode profiles to "Load balancing and fault tolerance". See below for more information.
  • bond-miimon 100 : Set the MII link monitoring frequency to 100 milliseconds. This determines how often the link state of each slave is inspected for link failures.
  • bond-downdelay 200 : Set the time, t0 200 milliseconds, to wait before disabling a slave after a link failure has been detected. This option is only valid for the bond-miimon.
  • bond-updelay 200 : Set the time, to 200 milliseconds, to wait before enabling a slave after a link recovery has been detected. This option is only valid for the bond-miimon.
  • dns-nameservers 192.168.1.254 : Use 192.168.1.254 as dns server.
  • dns-search nixcraft.net.in : Use nixcraft.net.in as default host-name lookup (optional).

To get the best network thorougput you might want to play with different bounding policies. To learn more and get the list of all bounding policies check out Linux ethernet Bounding driver howto

To make the new bounding active restart network:
 

/etc/init.d/networking stop
sleep 5;
/etc/init.d/networking start


2. Fedora / CentOS RHEL Linux network Bond 

Configuring eth0, eth1, eth2 into single bond0 NIC network virtual device is with few easy steps:

a) Create following bond0 configuration file:
 

vim /etc/sysconfig/network-scripts/ifcfg-bond0

 

DEVICE=bond0
IPADDR=10.10.10.20
NETWORK=10.10.10.0
NETMASK=255.255.255.0
GATEWAY=10.10.10.1
USERCTL=no
BOOTPROTO=none
ONBOOT=yes


b) Modify ifcfg-eth0 and ifcfg-eth0 files /etc/sysconfig/network-scripts/

– Edit ifcfg-eth0

vim /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none

– Edit ifcfg-eth1

vim /etc/sysconfig/network-scripts/ifcfg-eth1

DEVICE=eth0
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none


c) Load bond driver through modprobe.conf

vim /etc/modprobe.conf

alias bond0 bonding
options bond0 mode=balance-alb miimon=100


Manually load the bonding kernel driver to make it affective without server reboot:
 

modprobe bonding

d) Restart networking to load just configured bonding 
 

service network restart


3. Testing Bond Success / Fail status

Periodically if you have to administrate a bonded interface Linux server it is useful to check Bonds Link Status:

cat /proc/net/bonding/bond0
 

Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1e:0b:d6:6c:8f

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1e:0b:d6:6c:8c

To check out which interfaces are bonded you can either use (on older Linux kernels)
 

/sbin/ifconfig -a


If ifconfig is not returning IP addresses / interfaces of teamed up eths, to check NICs / IPs:

/bin/ip a show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet 127.0.0.2/8 brd 127.255.255.255 scope host secondary lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 00:1e:0b:d6:6c:8c brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 00:1e:0b:d6:6c:8c brd ff:ff:ff:ff:ff:ff
7: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 00:1e:0b:d6:6c:8c brd ff:ff:ff:ff:ff:ff
    inet 10.239.15.173/27 brd 10.239.15.191 scope global bond0
    inet 10.239.15.181/27 brd 10.239.15.191 scope global secondary bond0:7156web
    inet6 fe80::21e:bff:fed6:6c8c/64 scope link
       valid_lft forever preferred_lft forever


In case of Bonding interface failure you will get output like:

Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200
Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:xx:yy:zz:tt:31
Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:xx:yy:zz:tt:30

Failure to start / stop bonding is also logged in /var/log/messages so its a good idea to check there too once launched:
 

tail -f /var/log/messages
Dec  15 07:18:15 nas01 kernel: [ 6271.468218] e1000e: eth1 NIC Link is Down
Dec 15 07:18:15 nas01 kernel: [ 6271.548027] bonding: bond0: link status down for interface eth1, disabling it in 200 ms.
Dec  15 07:18:15 nas01 kernel: [ 6271.748018] bonding: bond0: link status definitely down for interface eth1, disabling it

On bond failure you will get smthing like:

Dec  15 04:19:15 micah01 kernel: [ 6271.468218] e1000e: eth1 NIC Link is Down
Dec  15 04:19:15 micah01 kernel: [ 6271.548027] bonding: bond0: link status down for interface eth1, disabling it in 200 ms.
Dec  15 04:19:15 micah01 kernel: [ 6271.748018] bonding: bond0: link status definitely down for interface eth1, disabling it


4. Adding removing interfaces to the bond interactively
 

You can set the mode through sysfs virtual filesystem with:

echo active-backup > /sys/class/net/bond0/bonding/mode

If you want to try adding an ethernet interface to the bond, type:

echo +ethN > /sys/class/net/bond0/bonding/slaves

To remove an interface type:

echo -ethN > /sys/class/net/bond0/bonding/slaves


In case if you're wondering how many bonding devices you can have, well the "sky is the limit" you can have, it is only limited by the number of NIC cards Linux kernel / distro support and ofcourse how many physical NIC slots are on your server.

To monitor (in real time) adding  / removal of new ifaces to the bond use:
 

watch -n 1 ‘cat /proc/net.bonding/bond0′

 

Boost local network performance (Increase network thoroughput) by enabling Jumbo Frames on GNU / Linux

Saturday, March 10th, 2012

Jumbo Frames boost local network performance in GNU / Linux

So what is Jumbo Frames? and why, when and how it can increase the network thoroughput on Linux?

Jumbo Frames are Ethernet frames with more than 1500 bytes of payload. They can carry up to 9000 bytes of payload. Many Gigabit switches and network cards supports them.
Jumbo frames is a networking standard for many educational networks like AARNET. Unfortunately most commercial ISPs doesn't support them and therefore enabling Jumbo frames will rarely increase bandwidth thoroughput for information transfers over the internet.
Hopefully in the years to come with the constant increase of bandwidths and betterment of connectivity, jumbo frames package transfers will be supported by most ISPs as well.
Jumbo frames network support is just great for is small local – home networks and company / corporation office intranets.

Thus enabling Jumbo Frame is absolutely essential for "local" ethernet networks, where large file transfers occur frequently. Such networks are networks where, there is often a Video or Audio streaming with high quality like HD quality on servers running File Sharing services like Samba, local FTP sites,Webservers etc.

One other advantage of enabling jumbo frames is reduce of general server overhead and decrease in CPU load / (CPU usage), when transferring large or enormous sized files.Therefore having jumbo frames enabled on office network routers with GNU / Linux or any other *nix OS is vital.

Jumbo Frames traffic is supported in GNU / Linux kernel since version 2.6.17+ in earlier 2.4.x it was possible through external third party kernel patches.

1. Manually increase MTU to 9000 with ifconfig to enable Jumbo frames

debian:~# /sbin/ifconfig eth0 mtu 9000

The default MTU on most GNU / Linux (if not all) is 1500, to check the default set MTU with ifconfig:

linux:~# /sbin/ifconfig eth0|grep -i mtu
UP BROADCAST MULTICAST MTU:1500 Metric:1

To take advantage of Jumbo Frames, all that has to be done is increase the default Maximum Transmission Unit from 1500 to 9000

For those who don't know MTU is the largest physical packet size that can be transferred over the network. MTU is measured by default in bytes. If a information has to be transferred over the network which exceeds the lets say 1500 MTU (bytes), it will be chopped and transferred in few packs each of 1500 size.

MTUs differ on different netework topologies. Just for info here are the few main MTUs for main network types existing today:
 

  • 16 MBit/Sec Token Ring – default MTU (17914)
  • 4 Mbits/Sec Token Ring – default MTU (4464)
  • FDDI – default MTU (4352)
  • Ethernet – def MTU (1500)
  • IEEE 802.3/802.2 standard – def MTU (1492)
  • X.25 (dial up etc.) – def MTU (576)
  • Jumbo Frames – def max MTU (9000)

Setting the MTU packet frames to 9000 to enable Jumbo Frames is done with:

linux:~# /sbin/ifconfig eth0 mtu 9000

If the command returns nothing, this most likely means now the server can communicate on eth0 with MTUs of each 9000 and therefore the network thoroughput will be better. In other case, if the network card driver or card is not a gigabit one the cmd will return error:

SIOCSIFMTU: Invalid argument

2. Enabling Jumbo Frames on Debian / Ubuntu etc. "the Debian way"

a.) Jumbo Frames on ethernet interfaces with static IP address assigned Edit /etc/network/interfaces and you should have for each of the interfaces you would like to set the Jumbo Frames, records similar to:

Raising the MTU to 9000 if for one time can be done again manually with ifconfig

debian:~# /sbin/ifconfig eth0 mtu 9000

iface eth0 inet static
address 192.168.0.5
network 192.168.0.0
gateway 192.168.0.254
netmask 255.255.255.0
mtu 9000

For each of the interfaces (eth1, eth2 etc.), add a chunk similar to one above changing the changing the IPs, Gateway and Netmask.

If the server is with two gigabit cards (eth0, eth1) supporting Jumbo frames add to /etc/network/interfaces :

iface eth0 inet static
address 192.168.0.5
network 192.168.0.0
gateway 192.168.0.254
netmask 255.255.255.0
mtu 9000

iface eth1 inet static
address 192.168.0.6
network 192.168.0.0
gateway 192.168.0.254
netmask 255.255.255.0
mtu 9000

b.) Jumbo Frames on ethernet interfaces with dynamic IP obtained via DHCP

Again in /etc/network/interfaces put:

auto eth0
iface eth0 inet dhcp
post-up /sbin/ifconfig eth0 mtu 9000

3. Setting Jumbo Frames on Fedora / CentOS / RHEL "the Redhat way"

Enabling jumbo frames on all Gigabit lan interfaces (eth0, eth1, eth2 …) in Fedora / CentOS / RHEL is done through files:
 

  • /etc/sysconfig/network-script/ifcfg-eth0
  • /etc/sysconfig/network-script/ifcfg-eth1

etc. …
append in each one at the end of the respective config:

MTU=9000

[root@fedora ~]# echo 'MTU=9000' >> /etc/sysconfig/network-scripts/ifcfg-eth


a quick way to set Maximum Transmission Unit to 9000 for all network interfaces on on Redhat based distros is by executing the following loop:

[root@centos ~]# for i in $(echo /etc/sysconfig/network-scripts/ifcfg-eth*); do \echo 'MTU=9000' >> $i
done

P.S.: Be sure that all your interfaces are supporting MTU=9000, otherwise increase while the MTU setting is set will return SIOCSIFMTU: Invalid argument err.
The above loop is to be used only, in case you have a group of identical machines with Lan Cards supporting Gigabit networks and loaded kernel drivers supporting MTU up to 9000.

Some Intel and Realtek Gigabit cards supports only a maximum MTU of 7000, 7500 etc., so if you own a card like this check what is the max MTU the card supports and set it in the lan device configuration.
If increasing the MTU is done on remote server through SSH connection, be extremely cautious as restarting the network might leave your server inaccessible.

To check if each of the server interfaces are "Gigabit ready":

[root@centos ~]# /sbin/ethtool eth0|grep -i 1000BaseT
1000baseT/Half 1000baseT/Full
1000baseT/Half 1000baseT/Full

If you're 100% sure there will be no troubles with enabling MTU > 1500, initiate a network reload:

[root@centos ~]# /etc/init.d/network restart
...

4. Enable Jumbo Frames on Slackware Linux

To list the ethernet devices and check they are Gigabit ones issue:

bash-4.1# lspci | grep [Ee]ther
0c:00.0 Ethernet controller: D-Link System Inc Gigabit Ethernet Adapter (rev 11)
0c:01.0 Ethernet controller: D-Link System Inc Gigabit Ethernet Adapter (rev 11)

Setting up jumbo frames on Slackware Linux has two ways; the slackware way and the "universal" Linux way:

a.) the Slackware way

On Slackware Linux, all kind of network configurations are done in /etc/rc.d/rc.inet1.conf

Usual config for eth0 and eth1 interfaces looks like so:

# Config information for eth0:
IPADDR[0]="10.10.0.1"
NETMASK[0]="255.255.255.0"
USE_DHCP[0]=""
DHCP_HOSTNAME[1]=""
# Config information for eth1:
IPADDR[1]="10.1.1.1"
NETMASK[1]="255.255.255.0"
USE_DHCP[1]=""
DHCP_HOSTNAME[1]=""

To raise the MTU to 9000, the variables MTU[0]="9000" and MTU[1]="9000" has to be included after each interface config block, e.g.:

# Config information for eth0:
IPADDR[0]="172.16.1.1"
NETMASK[0]="255.255.255.0"
USE_DHCP[0]=""
DHCP_HOSTNAME[1]=""
MTU[0]="9000"
# Config information for eth1:
IPADDR[1]="10.1.1.1"
NETMASK[1]="255.255.255.0"
USE_DHCP[1]=""
DHCP_HOSTNAME[1]=""
MTU[1]="9000"

bash-4.1# /etc/rc.d/rc.inet1 restart
...

b.) The "Universal" Linux way

This way is working on most if not all Linux distributions.
Insert in /etc/rc.local:

/sbin/ifconfig eth0 mtu 9000 up
/sbin/ifconfig eth1 mtu 9000 up

5. Check if Jumbo Frames are properly enabled

There are at least two ways to display the MTU settings for eths.

a.) Using grepping the MTU from ifconfig

linux:~# /sbin/ifconfig eth0|grep -i mtu
UP BROADCAST MULTICAST MTU:9000 Metric:1
linux:~# /sbin/ifconfig eth1|grep -i mtu
UP BROADCAST MULTICAST MTU:9000 Metric:1

b.) Using ip command from iproute2 package to get MTU

linux:~# ip route get 192.168.2.134
local 192.168.2.134 dev lo src 192.168.2.134
cache mtu 9000 advmss 1460 hoplimit 64

linux:~# ip route show dev wlan0
192.168.2.0/24 proto kernel scope link src 192.168.2.134
default via 192.168.2.1

You see MTU is now set to 9000, so the two server lans, are now able to communicate with increased network thoroughput.
Enjoy the accelerated network transfers 😉