Archive for the ‘Networking’ Category

How to filter dhcp traffic between two networks running separate DHCP servers to prevent IP assignment issues and MAC duplicate addresses

Tuesday, February 8th, 2022

Tracking the Problem of MAC duplicates on Linux routers

If you have two networks that see each other and they're not separated in VLANs but see each other sharing a common netmask lets say or, it might happend that there are 2 dhcp servers for example (isc-dhcp-server running on and dhcpd running on can broadcast their services to both LANs (netmask and Local Net LAN The result out of this is that some devices might pick up their IP address via DHCP from the wrong dhcp server.

Normally if you have a fully controlled little or middle class home or office network (10 – 15 electronic devices nodes) connecting to the LAN in a mixed moth some are connected via one of the Networks via connected Wifi to others are LANned and using static IP adddresses and traffic is routed among two ISPs and each network can see the other network, there is always a possibility of things to go wrong. This is what happened to me so this is how this post was born.

The best practice from my experience so far is to define each and every computer / phone / laptop host joining the network and hence later easily monitor what is going on the network with something like iptraf-ng / nethogs  / iperf – described in prior  how to check internet spepeed from console and in check server internet connectivity speed with speedtest-cliiftop / nload or for more complex stuff wireshark or even a simple tcpdump. No matter the tools network monitoring is only part on solving network issues. A very must have thing in a controlled network infrastructure is defining every machine part of it to easily monitor later with the monitoring tools. Defining each and every host on the Hybrid computer networks makes administering the network much easier task and  tracking irregularities on time is much more likely. 

Since I have such a hybrid network here hosting a couple of XEN virtual machines with Linux, Windows 7 and Windows 10, together with Mac OS X laptops as well as MacBook Air notebooks, I have followed this route and tried to define each and every host based on its MAC address to pick it up from the correct DHCP1 server (that is distributing IPs for Internet Provider 1 (ISP 1), that is mostly few computers attached UTP LAN cables via LiteWave LS105G Gigabit Switch as well from DHCP2 – used only to assigns IPs to servers and a a single Wi-Fi Access point configured to route incoming clients via Linux NAT gateway server.

To filter out the unwanted IPs from the DHCPD not to propagate I've so far used a little trick to  Deny DHCP MAC Address for unwanted clients and not send IP offer for them.

To give you more understanding,  I have to clear it up I don't want to have automatic IP assignments from DHCP2 / LAN2 to DHCP1 / LAN1 because (i don't want machines on DHCP1 to end up with IP like or DHCP2 (to have, as such a wrong IP delegation could potentially lead to MAC duplicates IP conflicts. MAC Duplicate IP wrong assignments for those older or who have been part of administrating large ISP network infrastructures  makes the network communication unstable for no apparent reason and nodes partially unreachable at times or full time …

However it seems in the 21-st century which is the century of strangeness / computer madness in the 2022, technology advanced so much that it has massively started to break up some good old well known sysadmin standards well documented in the RFCs I know of my youth, such as that every electronic equipment manufactured Vendor should have a Vendor Assigned Hardware MAC Address binded to it that will never change (after all that was the idea of MAC addresses wasn't it !). 
Many mobile devices nowadays however, in the developers attempts to make more sophisticated software and Increase Anonimity on the Net and Security, use a technique called  MAC Address randomization (mostly used by hackers / script kiddies of the early days of computers) for their Wi-Fi Net Adapter OS / driver controlled interfaces for the sake of increased security (the so called Private WiFi Addresses). If a sysadmin 10-15 years ago has seen that he might probably resign his profession and turn to farming or agriculture plant growing, but in the age of digitalization and "cloud computing", this break up of common developed network standards starts to become the 'new normal' standard.

I did not suspected there might be a MAC address oddities, since I spare very little time on administering the the network. This was so till recently when I accidently checked the arp table with:

Hypervisor:~# arp -an     5c:89:b5:f2:e8:d8      (Unknown)    00:15:3e:d3:8f:76       (Unknown)


and consequently did a network MAC Address ARP Scan with arp-scan (if you never used this little nifty hacker tool I warmly recommend it !!!)
If you don't have it installed it is available in debian based linuces from default repos to install

Hypervisor:~# apt-get install –yes arp-scan

It is also available on CentOS / Fedora / Redhat and other RPM distros via:

Hypervisor:~# yum install -y arp-scan



Hypervisor:~# arp-scan –interface=eth1    00:16:3e:0f:48:05       Xensource, Inc.    00:16:3e:04:11:1c       Xensource, Inc.    00:15:3e:bb:45:45       Xensource, Inc.    00:15:3e:59:96:8e       Xensource, Inc.    00:15:3e:d3:8f:77       Xensource, Inc.    8c:89:b5:f2:e8:d8       Micro-Star INT'L CO., LTD     5c:89:b5:f2:e8:d8      (Unknown)    00:15:3e:d3:8f:76       (Unknown)

192.168.x.91     02:a0:xx:xx:d6:64        (Unknown)
192.168.x.91     02:a0:xx:xx:d6:64        (Unknown)  (DUP: 2)

N.B. !. I found it helpful to check all available interfaces on my Linux NAT router host.

As you see the scan revealed, a whole bunch of MAC address mess duplicated MAC hanging around, destroying my network topology every now and then 
So far so good, the MAC duplicates and strangely hanging around MAC addresses issue, was solved relatively easily with enabling below set of systctl kernel variables.

1. Fixing Linux ARP common well known Problems through disabling arp_announce / arp_ignore / send_redirects kernel variables disablement


Linux answers ARP requests on wrong and unassociated interfaces per default. This leads to the following two problems:

ARP requests for the loopback alias address are answered on the HW interfaces (even if NOARP on lo0:1 is set). Since loopback aliases are required for DSR (Direct Server Return) setups this problem is very common (but easy to fix fortunately).

If the machine is connected twice to the same switch (e.g. with eth0 and eth1) eth2 may answer ARP requests for the address on eth1 and vice versa in a race condition manner (confusing almost everything).

This can be prevented by specific arp kernel settings. Take a look here for additional information about the nature of the problem (and other solutions): ARP flux.

To fix that generally (and reboot safe) we  include the following lines into


Hypervisor:~# cp -rpf /etc/sysctl.conf /etc/sysctl.conf_bak_07-feb-2022
Hypervisor:~# cat >> /etc/sysctl.conf

# LVS tuning


Press CTRL + D simultaneusly to Write out up-pasted vars.

To read more on Load Balancer using direct routing and on LVS and the arp problem here

2. Digging further the IP conflict / dulicate MAC Problems

Even after this arp tunings (because I do have my Hypervisor 2 LAN interfaces connected to 1 switch) did not resolved the issues and still my Wireless Connected devices via network (ISP2) were randomly assigned the wrong range IPs 192.168.0.XXX/24 as well as the wrong gateway (ISP1).
After thinking thoroughfully for hours and checking the network status with various tools and thanks to the fact that my wife has a MacBook Air that was always complaining that the IP it tried to assign from the DHCP was already taken, i"ve realized, something is wrong with DHCP assignment.
Since she owns a IPhone 10 with iOS and this two devices are from the same vendor e.g. Apple Inc. And Apple's products have been having strange DHCP assignment issues from my experience for quite some time, I've thought initially problems are caused by software on Apple's devices.
I turned to be partially right after expecting the logs of DHCP server on the Linux host (ISP1) finding that the phone of my wife takes IP in 192.168.0.XXX, insetad of IP from (which has is a combined Nokia Router with 2.4Ghz and 5Ghz Wi-Fi and LAN router provided by ISP2 in that case Vivacom). That was really puzzling since for me it was completely logical thta the iDevices must check for DHCP address directly on the Network of the router to whom, they're connecting. Guess my suprise when I realized that instead of that the iDevices does listen to the network on a wide network range scan for any DHCPs reachable baesd on the advertised (i assume via broadcast) address traffic and try to connect and take the IP to the IP of the DHCP which responds faster !!!! Of course the Vivacom Chineese produced Nokia router responded DHCP requests and advertised much slower, than my Linux NAT gateway on ISP1 and because of that the Iphone and iOS and even freshest versions of Android devices do take the IP from the DHCP that responds faster, even if that router is not on a C class network (that's invasive isn't it??). What was even more puzzling was the automatic MAC Randomization of Wifi devices trying to connect to my ISP1 configured DHCPD and this of course trespassed any static MAC addresses filtering, I already had established there.

Anyways there was also a good think out of tthat intermixed exercise 🙂 While playing around with the Gigabit network router of vivacom I found a cozy feature SCHEDULEDING TURNING OFF and ON the WIFI ACCESS POINT  – a very useful feature to adopt, to stop wasting extra energy and lower a bit of radiation is to set a swtich off WIFI AP from 12:30 – 06:30 which are the common sleeping hours or something like that.

3. What is MAC Randomization and where and how it is configured across different main operating systems as of year 2022?

Depending on the operating system of your device, MAC randomization will be available either by default on most modern mobile OSes or with possibility to have it switched on:

  • Android Q: Enabled by default 
  • Android P: Available as a developer option, disabled by default
  • iOS 14: Available as a user option, disabled by default
  • Windows 10: Available as an option in two ways – random for all networks or random for a specific network

Lately I don't have much time to play around with mobile devices, and I do not my own a luxury mobile phone so, the fact this ne Androids have this MAC randomization was unknown to me just until I ended a small mess, based on my poor configured networks due to my tight time constrains nowadays.

Finding out about the new security feature of MAC Randomization, on all Android based phones (my mother's Nokia smartphone and my dad's phone, disabled the feature ASAP:

4. Disable MAC Wi-Fi Ethernet device Randomization on Android

MAC Randomization creates a random MAC address when joining a Wi-Fi network for the first time or after “forgetting” and rejoining a Wi-Fi network. It Generates a new random MAC address after 24 hours of last connection.

Disabling MAC Randomization on your devices. It is done on a per SSID basis so you can turn off the randomization, but allow it to function for hotspots outside of your home.

  1. Open the Settings app
  2. Select Network and Internet
  3. Select WiFi
  4. Connect to your home wireless network
  5. Tap the gear icon next to the current WiFi connection
  6. Select Advanced
  7. Select Privacy
  8. Select "Use device MAC"

5. Disabling MAC Randomization on MAC iOS, iPhone, iPad, iPod

To Disable MAC Randomization on iOS Devices:

Open the Settings on your iPhone, iPad, or iPod, then tap Wi-Fi or WLAN


  1. Tap the information button next to your network
  2. Turn off Private Address
  3. Re-join the network

Of course next I've collected their phone Wi-Fi adapters and made sure the included dhcp MAC deny rules in /etc/dhcp/dhcpd.conf are at place.

The effect of the MAC Randomization for my Network was terrible constant and strange issues with my routings and networks, which I always thought are caused by the openxen hypervisor Virtualization VM bugs etc.

That continued for some months now, and the weird thing was the issues always started when I tried to update my Operating system to the latest packetset, do a reboot to load up the new piece of software / libraries etc. and plus it happened very occasionally and their was no obvious reason for it.


6. How to completely filter dhcp traffic between two network router hosts
IP / to stop 2 or more configured DHCP servers
on separate networks see each other

To prevent IP mess at DHCP2 server side (which btw is ISC DHCP server, taking care for IP assignment only for the Servers on the network running on Debian 11 Linux), further on I had to filter out any DHCP UDP traffic with iptables completely.
To prevent incorrect route assignments assuming that you have 2 networks and 2 routers that are configurred to do Network Address Translation (NAT)-ing Router 1:, Router 2:

You have to filter out UDP Protocol data on Port 67 and 68 from the respective source and destination addresses.

In firewall rules configuration files on your Linux you need to have some rules as:

# filter outgoing dhcp traffic from to
-A INPUT -p udp -m udp –dport 67:68 -s -d -j DROP
-A OUTPUT -p udp -m udp –dport 67:68 -s -d -j DROP
-A FORWARD -p udp -m udp –dport 67:68 -s -d -j DROP

-A INPUT -p udp -m udp –dport 67:68 -s -d -j DROP
-A OUTPUT -p udp -m udp –dport 67:68 -s -d -j DROP
-A FORWARD -p udp -m udp –dport 67:68 -s -d -j DROP

-A INPUT -p udp -m udp –sport 67:68 -s -d -j DROP
-A OUTPUT -p udp -m udp –sport 67:68 -s -d -j DROP
-A FORWARD -p udp -m udp –sport 67:68 -s -d -j DROP

You can download also with above rules from here

Applying this rules, any traffic of DHCP between 2 routers is prohibited and devices from Net: will no longer wrongly get assinged IP addresses from Network range: as it happened to me.

7. Filter out DHCP traffic based on MAC completely on Linux with arptables

If even after disabling MAC randomization on all devices on the network, and you know physically all the connecting devices on the Network, if you still see some weird MAC addresses, originating from a wrongly configured ISP traffic router host or whatever, then it is time to just filter them out with arptables.

## drop traffic prevent mac duplicates due to vivacom and bergon placed in same network –
dchp1-server:~# arptables -A INPUT –source-mac 70:e2:83:12:44:11 -j DROP

To list arptables configured on Linux host

dchp1-server:~# arptables –list -n

If you want to be paranoid sysadmin you can implement a MAC address protection with arptables by only allowing a single set of MAC Addr / IPs and dropping the rest.

dchp1-server:~# arptables -A INPUT –source-mac 70:e2:84:13:45:11 -j ACCEPT
dchp1-server:~# arptables -A INPUT  –source-mac 70:e2:84:13:45:12 -j ACCEPT

dchp1-server:~# arptables -L –line-numbers
Chain INPUT (policy ACCEPT)
1 -j DROP –src-mac 70:e2:84:13:45:11
2 -j DROP –src-mac 70:e2:84:13:45:12

Once MACs you like are accepted you can set the INPUT chain policy to DROP as so:

dchp1-server:~# arptables -P INPUT DROP

If you later need to temporary, clean up the rules inside arptables on any filtered hosts flush all rules inside INPUT chain, like that

dchp1-server:~#  arptables -t INPUT -F

How to configure bond0 bonding and network bridging for KVM Virtual machines on Redhat / CentOS / Fedora Linux

Tuesday, February 16th, 2021

 1. Intro to Redhat RPM based distro /etc/sysconfig/network-scripts/* config vars shortly explained

On RPM based Linux distributions configuring network has a very specific structure. As a sysadmin just recently I had a task to configure Networking on 2 Machines to be used as Hypervisors so the servers could communicate normally to other Networks via some different intelligent switches that are connected to each of the interfaces of the server. The idea is the 2 redhat 8.3 machines to be used as  Hypervisor (HV) and each of the 2 HVs to each be hosting 2 Virtual guest Machines with preinstalled another set of Redhat 8.3 Ootpa. I've recently blogged on how to automate a bit installing the KVM Virtual machines with using predefined kickstart.cfg file.

The next step after install was setting up the network. Redhat has a very specific network configuration well known under /etc/sysconfig/network-scripts/ifcfg-eno*# or if you have configured the Redhats to fix the changing LAN card naming ens, eno, em1 to legacy eth0, eth1, eth2 on CentOS Linux – e.g. to be named as /etc/sysconfig/network-scripts/{ifcfg-eth0,1,2,3}.

The first step to configure the network from that point is to come up with some network infrastrcture that will be ready on the HV nodes server-node1 server-node2 for the Virtual Machines to be used by server-vm1, server-vm2.

Thus for the sake of myself and some others I decide to give here the most important recognized variables that can be placed inside each of the ifcfg-eth0,ifcfg-eth1,ifcfg-eth2 …

A standard ifcfg-eth0 confing would look something this:

[root@redhat1 :~ ]# cat /etc/sysconfig/network-scripts/ifcfg-eth0

Lets say few words to each of the variables to make it more clear to people who never configured Newtork on redhat without the help of some of the console ncurses graphical like tools such as nmtui or want to completely stop the Network-Manager to manage the network and thus cannot take the advantage of using nmcli (a command-line tool for controlling NetworkManager).

Here is a short description of each of above configuration parameters:

TYPE=device_type: The type of network interface device
BOOTPROTO=protocol: Where protocol is one of the following:

  • none: No boot-time protocol is used.
  • bootp: Use BOOTP (bootstrap protocol).
  • dhcp: Use DHCP (Dynamic Host Configuration Protocol).
  • static: if configuring static IP


  • yes: This interface is set as the default route for IPv4|IPv6 traffic.
  • no: This interface is not set as the default route.

Usually most people still don't use IPV6 so better to disable that

IPV6INIT=answer: Where answer is one of the following:

  • yes: Enable IPv6 on this interface. If IPV6INIT=yes, the following parameters could also be set in this file:

IPV6ADDR=IPv6 address

IPV6_DEFAULTGW=The default route through the specified gateway

  • no: Disable IPv6 on this interface.

IPV4_FAILURE_FATAL|IPV6_FAILURE_FATAL=answer: Where answer is one of the following:

  • yes: This interface is disabled if IPv4 or IPv6 configuration fails.
  • no: This interface is not disabled if configuration fails.

ONBOOT=answer: Where answer is one of the following:

  • yes: This interface is activated at boot time.
  • no: This interface is not activated at boot time.

HWADDR=MAC-address: The hardware address of the Ethernet device
IPADDRN=address: The IPv4 address assigned to the interface
PREFIXN=N: Length of the IPv4 netmask value
GATEWAYN=address: The IPv4 gateway address assigned to the interface. Because an interface can be associated with several combinations of IP address, network mask prefix length, and gateway address, these are numbered starting from 0.
DNSN=address: The address of the Domain Name Servers (DNS)
DOMAIN=DNS_search_domain: The DNS search domain (this is the search you usually find in /etc/resolv.conf)

Other interesting file that affects how routing is handled on a Redhat Linux is


[root@redhat1 :~ ]# cat /etc/sysconfig/network
# Created by anaconda

Having this gateway defined does add a default gateway

This file specifies global network settings. For example, you can specify the default gateway, if you want to apply some network settings such as routings, Alias IPs etc, that will be valid for all configured and active configuration red by systemctl start network scripts or the (the network-manager if such is used), just place it in that file.

Other files of intesresting to control how resolving is being handled on the server worthy to check are 




If you want to set a preference of /etc/hosts being red before /etc/resolv.conf and DNS resolving for example you need to have inside it, below is default behavior of it.

root@redhat1 :~ ]#   grep -i hosts /etc/nsswitch.conf
#     hosts: files dns
#     hosts: files dns  # from user file
# Valid databases are: aliases, ethers, group, gshadow, hosts,
hosts:      files dns myhostname

As you can see the default order is to read first files (meaning /etc/hosts) and then the dns (/etc/resolv.conf)
hosts: files dns

Now with this short intro description on basic values accepted by Redhat's /etc/sysconfig/network-scripts/ifcfg* prepared configurations.

I will give a practical example of configuring a bond0 interface with 2 members which were prepared based on Redhat's Official documentation found in above URLs:

# Bonding on RHEL 7 documentation

# Network Bridge with Bond documentation

2. Configuring a single bond connection on eth0 / eth2 and setting 3 bridge interfaces bond -> br0, br1 -> eth1, br2 -> eth2

The task on my machines was to set up from 4 lan cards one bonded interface as active-backup type of bond with bonded lines on eth0, eth2 and 3 other 2 eth1, eth2 which will be used for private communication network that is connected via a special dedicated Switches and Separate VLAN 50, 51 over a tagged dedicated gigabit ports.

As said the 2 Servers had each 4 Broadcom Network CARD interfaces each 2 of which are paired (into a single card) and 2 of which are a solid Broadcom NetXtreme Dual Port 10GbE SFP+ and Dell Broadcom 5720 Dual Port 1Gigabit Network​.


On each of server-node1 and server-node2 we had 4 Ethernet Adapters properly detected on the Redhat

root@redhat1 :~ ]# lspci |grep -i net
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
19:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
19:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)

I've already configured as prerogative net.ifnames=0 to /etc/grub2/boot.cfg and Network-Manager service disabled on the host (hence to not use Network Manager you'll see in below configuration NM_CONTROLLED="no" is telling the Redhat servers is not to be trying NetworkManager for more on that check my previous article Disable NetworkManager automatic Ethernet Interface Management on Redhat Linux , CentOS 6 / 7 / 8.

3. Types of Network Bonding

mode=0 (balance-rr)

This mode is based on Round-robin policy and it is the default mode. This mode offers fault tolerance and load balancing features. It transmits the packets in Round robin fashion that is from the first available slave through the last.

mode-1 (active-backup)

This mode is based on Active-backup policy. Only one slave is active in this band, and another one will act only when the other fails. The MAC address of this bond is available only on the network adapter part to avoid confusing the switch. This mode also provides fault tolerance.

mode=2 (balance-xor)

This mode sets an XOR (exclusive or) mode that is the source MAC address is XOR’d with destination MAC address for providing load balancing and fault tolerance. Each destination MAC address the same slave is selected.

mode=3 (broadcast)

This method is based on broadcast policy that is it transmitted everything on all slave interfaces. It provides fault tolerance. This can be used only for specific purposes.

mode=4 (802.3ad)

This mode is known as a Dynamic Link Aggregation mode that has it created aggregation groups having same speed. It requires a switch that supports IEEE 802.3ad dynamic link. The slave selection for outgoing traffic is done based on a transmit hashing method. This may be changed from the XOR method via the xmit_hash_policy option.

mode=5 (balance-tlb)

This mode is called Adaptive transmit load balancing. The outgoing traffic is distributed based on the current load on each slave and the incoming traffic is received by the current slave. If the incoming traffic fails, the failed receiving slave is replaced by the MAC address of another slave. This mode does not require any special switch support.

mode=6 (balance-alb)

This mode is called adaptive load balancing. This mode does not require any special switch support.

Lets create the necessery configuration for the bond and bridges

[root@redhat1 :~ ]# cat ifcfg-bond0
BONDING_OPTS="mode=1 miimon=100 primary=eth0"

[root@redhat1 :~ ]# cat ifcfg-bond0.10

[root@redhat1 :~ ]# cat ifcfg-br0

[root@redhat1 :~ ]# cat ifcfg-br1

[root@redhat1 :~ ]# cat ifcfg-br2

[root@redhat1 :~ ]# cat ifcfg-eth0

[root@redhat1 :~ ]# cat ifcfg-eth1

[root@redhat1 :~ ]#  cat ifcfg-eth2

[root@redhat1 :~ ]#  cat ifcfg-eth3

[root@redhat2 :~ ]# cat ifcfg-bond0
BONDING_OPTS="mode=1 miimon=100 primary=eth0"

# cat ifcfg-bond0.10

[root@redhat2 :~ ]# cat ifcfg-br0

[root@redhat2 :~ ]#  cat ifcfg-br1

[root@redhat2 :~ ]# cat ifcfg-br2

[root@redhat2 :~ ]# cat ifcfg-eth0

[root@redhat2 :~ ]# cat ifcfg-eth1

[root@redhat2 :~ ]# cat ifcfg-eth2

Notice that the bond0 configuration does not have an IP assigned this is done on purpose as we're using the interface channel bonding together with attached bridge for the VM. Usual bonding on a normal physical hardware hosts where no virtualization use is planned is perhaps a better choice. If you however try to set up an IP address in that specific configuration shown here and you try to reboot the machine, you will end up with inacessible machine over the network like I did and you will need to resolve configuration via some kind of ILO / IDRAC interface.

4. Generating UUID for ethernet devices bridges and bonds

One thing to note is the command uuidgen you might need that to generate UID identificators to fit in the new network config files.


[root@redhat2 :~ ]#uuidgen br2
[root@redhat2 :~ ]# uuidgen br1
[root@redhat2 :~ ]# uuidgen br0

5. How to make KVM Virtual Machines see configured Network bridges (modify VM XML)

To make the Virtual machines installed see the bridges I had to

[root@redhat1 :~ ]#virsh edit VM_name1
[root@redhat1 :~ ]#virsh edit VM_name2

[root@redhat2 :~ ]#virsh edit VM_name1
[root@redhat2 :~ ]#virsh edit VM_name2

Find the interface network configuration and change it to something like:

    <interface type='bridge'>
      <mac address='22:53:00:56:5d:ac'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    <interface type='bridge'>
      <mac address='22:53:00:2a:5f:01'/>
      <source bridge='br1'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    <interface type='bridge'>
      <mac address='22:34:00:4a:1b:6c'/>
      <source bridge='br2'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>

6. Testing the bond  is up and works fine

# ip addr show bond0
The result is the following:


4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:cb:25:82 brd ff:ff:ff:ff:ff:ff

The bond should be visible in the normal network interfaces with ip address show or /sbin/ifconfig


# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0c:29:ab:2a:fa
Slave queue ID: 0


According to the output eth0 is the active slave.

The active slaves device files (eth0 in this case) is found in virtual file system /sys/

# find /sys -name *eth0

You can remove a bond member say eth0 by 


 cd to the pci* directory
Example: /sys/devices/pci000:00/000:00:15.0


# echo 1 > remove

At this point the eth0 device directory structure that was previously located under /sys/devices/pci000:00/000:00:15.0 is no longer there.  It was removed and the device no longer exists as seen by the OS.

You can verify this is the case with a simple ifconfig which will no longer list the eth0 device.
You can also repeat the cat /proc/net/bonding/bond0 command from Step 1 to see that eth0 is no longer listed as active or available.
You can also see the change in the messages file.  It might look something like this:

2021-02-12T14:13:23.363414-06:00 redhat1  device eth0: device has been deleted
2021-02-12T14:13:23.368745-06:00 redhat1 kernel: [81594.846099] bonding: bond0: releasing active interface eth0
2021-02-12T14:13:23.368763-06:00 redhat1 kernel: [81594.846105] bonding: bond0: Warning: the permanent HWaddr of eth0 – 00:0c:29:ab:2a:f0 – is still in use by bond0. Set the HWaddr of eth0 to a different address to avoid conflicts.
2021-02-12T14:13:23.368765-06:00 redhat1 kernel: [81594.846132] bonding: bond0: making interface eth1 the new active one.


Another way to test the bonding is correctly switching between LAN cards on case of ethernet hardware failure is to bring down one of the 2 or more bonded interfaces, lets say you want to switch from active-backup from eth1 to eth2, do:

# ip link set dev eth0 down

That concludes the test for fail over on active slave failure.

7. Bringing bond updown (rescan) bond with no need for server reboot

You know bonding is a tedious stuff that sometimes breaks up badly so only way to fix the broken bond seems to be a init 6 (reboot) cmd but no actually that is not so.

You can also get the deleted device back with a simple pci rescan command:

# echo 1 > /sys/bus/pci/rescan

The eth0 interface should now be back
You can see that it is back with an ifconfig command, and you can verify that the bond sees it with this command:

# cat /proc/net/bonding/bond0

That concludes the test of the bond code seeing the device when it comes back again.

The same steps can be repeated only this time using the eth1 device and file structure to fail the active slave in the bond back over to eth0.

8. Testing the bond with ifenslave command (ifenslave command examples)

Below is a set of useful information to test the bonding works as expected with ifenslave command  comes from "iputils-20071127" package

– To show information of all the inerfaces

                  # ifenslave -a
                  # ifenslave –all-interfaces 


– To change the active slave

                  # ifenslave -c bond0 eth1
                  # ifenslave –change-active bond0 eth1 


– To remove the slave interface from the bonding device

                  # ifenslave -d eth1
                  # ifenslave –detach bond0 eth1 


– To show master interface info

                  # ifenslave bond0 


– To set the bond device down and automatically release all the slaves

                  # ifenslave bond1 down 

– To get the usage info

                  # ifenslave -u
                  # ifenslave –usage 

– To set to verbose mode

                  # ifenslave -v
                  # ifenslave –verbose 

9. Testing the bridge works fine

Historically over the years all kind of bridges are being handled with the brctl part of bridge-utils .deb / .rpm installable package.

The classical way to check a bridge is working is to do

# brctl show
# brctl show br0; brctl show br1; brctl show br2

# brctl showmacs br0


Unfortunately with redhat 8 this command is no longer available so to get information about configured bridges you need to use instead:


# bridge link show
3:eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master bridge0 state forwarding priority 32 cost 100
4:eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master bridge0 state listening priority 32 cost 100

10. Troubleshooting network connectivity issues on bond bridges and LAN cards

Testing the bond connection and bridges can route proper traffic sometimes is a real hassle so here comes at help the good old tcpdump

If you end up with issues with some of the ethernet interfaces between HV1 and HV2 to be unable to talk to each other and you have some suspiciousness that some colleague from the network team has messed up a copper (UTP) cable or there is a connectivity fiber optics issues. To check the VLAN tagged traffic headers on the switch you can listen to each and every bond0 and br0, br1, br2 eth0, eth1, eth2, eth3 configured on the server like so:

# tcpdump -i bond0 -nn -e vlan

Some further investigation on where does a normal ICMP traffic flows once everything is setup is a normal thing to do, hence just try to route a normal ping via the different server interfaces:

# ping -I bond0 DSTADDR

# ping -i eth0 DSTADDR

# ping -i eth1 DSTADDR

# ping -i eth2 DSTADDR

After conducting the ping do the normal for network testing big ICMP packages (64k) ping to make sure there are no packet losses etc., e.g:

# ping -I eth3 -s 64536  DSTADDR

If for 10 – 20 seconds the ping does not return package losses then you should be good.

Update reverse sshd config with cronjob to revert if sshd reload issues

Friday, February 12th, 2021


Say you're doing ssh hardening modifying /etc/ssh/sshd_config for better system security or just changing options in sshd due to some requirements. But you follow the wrong guide and you placed some ssh variable which is working normally on newer SSH versions ssh OpenSSH_8.0p1 / or 7 but the options are applied on older SSH server and due to that restarting sshd via /etc/init.d/… or systemctl restart sshd cuts your access to remote server located in a DC and not attached to Admin LAN port, and does not have a working ILO or IDRAC configured and you have to wait for a couple of hours for some Support to go to the server Room / Rack / line location to have access to a Linux physical tty console and fix it by reverting the last changes you made to sshd and restarting.

Thus logical question comes what can you do to assure yourself you would not cut your network access to remote machine after modifying OpenSSHD and normal SSHD restart?

There is an old trick, I'm using for years now but perhaps if you're just starting with Linux as a novice system administrator or a server support guy you would not know it, it is as simple as setting a cron job for some minutes to periodically overwrite the sshd configuration with a copy of the old working version of sshd before modification.

Here is this nice nify trick which saved me headache of call on technical support line to ValueWeb when I was administering some old Linux servers back in the 2000s

root@server:~# crontab -u root -e

# create /etc/ssh/sshd_config backup file
cp -rpf /etc/ssh/sshd_config /etc/ssh/sshd_config_$(date +%d-%m-%y)
# add to cronjob to execute every 15 minutes and ovewrite sshd with the working version just in case
*/15 * * * * /bin/cp -rpf /etc/ssh/sshd_config_$(date +%d-%m-%y) /etc/ssh/sshd_config && /bin/systemctl restart sshd
# restart sshd 
cp -rpf /etc/ssh/sshd_config_$(date +%d-%m-%y) /etc/ssh/sshd_config && /bin/systemctl restart sshd

Copy paste above cron definitions and leave them on for some time. Do the /etc/ssh/sshd_config modifications and once you're done restart sshd by lets say

root@server:~#  killall -HUP sshd 

If the ssh connectivity continues to work edit the cron job again and delete all lines and save again.
If you're not feeling confortable with vim as a text editor (in case you're a complete newbie and you don't know) how to get out of vim. Before doing all little steps you can do on the shell with  export EDITOR=nano or export EDITOR=mcedit cmds,this will change the default text editor on the shell. 

Hope this helps someone… Enjoy 🙂

Check server Internet connectivity Speedtest from Linux terminal CLI

Friday, August 7th, 2020


If you are a system administrator of a dedicated server and you have no access to Xserver Graphical GNOME / KDE etc. environment and you wonder how you can track the bandwidth connectivity speed of remote system to the internet and you happen to have a modern Linux distribution, here is few ways to do a speedtest.

1. Use speedtest-cli command line tool to test connectivity


speedtest-cli is a tiny tool written in python, to use it hence you need to have python installed on the server.
It is available both for Redhat Linux distros and Debians / Ubuntus etc. in the list of standard installable packages.

a) Install speedtest-cli on Fedora / CentOS / RHEL

On CentOS / RHEL / Scientific Linux lower than ver 8:



$ sudo yum install python

On CentOS 8 / RHEL 8 user type the following command to install Python 3 or 2:



$sudo yum install python3
$ sudo yum install python2




On Fedora Linux version 22+



$ sudo dnf install python
$ sudo dnf install pytho3


Once python is at place download or in case if link is not reachable download mirrored version of on here



$ wget -O speedtest-cli
$ chmod +x speedtest-cli


Then it is time to run script speedtest-screenshot-linux-terminal-console-cli-cmd
To test enabled Bandwidth on the server



$ python speedtest-cli

b) Install speedtest-cli on Debian

On Latest Debian 10 Buster speedtest is available out of the box in regular .deb repositories, so fetch it with apt


# apt install –yes speedtest-cli


You can give now speedtest-cli a try with –bytes arguments to get speed values in bytes instead of bits or if you want to generate an image with test results in picture just like it will appear if you use inside a gui browser, use the –share option





2. Getting connectivity results of all defined speedtest test City Locations

Speedtest has a list of servers through which a Upload and Download speed is tested, to run speedtest-cli to test with each and every server and get a better picture on what kind of connectivity to expect from your server towards the closest region capital cities, fetch speedtest-servers.php list and use a small shell loop below is how:






root@pcfreak:~#  wget
–2020-08-07 16:31:34–
Преобразувам (…,,, …
Connecting to (||:80… успешно свързване.
HTTP изпратено искане, чакам отговор… 301 Moved Permanently
Адрес: [следва]
–2020-08-07 16:31:34–
Connecting to (||:443… успешно свързване.
HTTP изпратено искане, чакам отговор… 307 Temporary Redirect
Адрес: [следва]
–2020-08-07 16:31:35–
Преобразувам (…
Connecting to (||:443… успешно свързване.
HTTP изпратено искане, чакам отговор… 200 OK
Дължина: 211695 (207K) [text/xml]
Saving to: ‘speedtest-servers.php’
speedtest-servers.php                  100%[==========================================================================>] 206,73K  –.-KB/s    in 0,1s
2020-08-07 16:31:35 (1,75 MB/s) – ‘speedtest-servers.php’ saved [211695/211695]

Once file is there with below loop we extract all file defined servers id="" 's 

root@pcfreak:~# for i in $(cat speedtest-servers.php | egrep -Eo 'id="[0-9]{4}"' |sed -e 's#id="##' -e 's#"##g'); do speedtest-cli  –server $i; done
Retrieving configuration…
Testing from Vivacom (…
Retrieving server list…
Retrieving information for the selected server…
Hosted by Telecoms Ltd. (Varna) [38.88 km]: 25.947 ms
Testing download speed……………………………………………………………………..
Download: 57.71 Mbit/s
Testing upload speed…………………………………………………………………………………………
Upload: 93.85 Mbit/s
Retrieving configuration…
Testing from Vivacom (…
Retrieving server list…
Retrieving information for the selected server…
Hosted by GMB Computers (Constanta) [94.03 km]: 80.247 ms
Testing download speed……………………………………………………………………..
Download: 35.86 Mbit/s
Testing upload speed…………………………………………………………………………………………
Upload: 80.15 Mbit/s
Retrieving configuration…
Testing from Vivacom (…




For better readability you might want to add the ouput to a file or even put it to run periodically on a cron if you have some suspcion that your server Internet dedicated lines dies out to some general locations sometimes.

3. Testing UPlink speed with Download some big file from source location

In the past a classical way to test the bandwidth connectivity of your Internet Service Provider was to fetch some big file, Linux guys should remember it was almost a standard to roll a download of Linux kernel source .tar file with some test browser as elinks / lynx / w3c.
speedtest-screenshot-kernel-org-shot1 speedtest-screenshot-kernel-org-shot2
or if those are not at hand test connectivity on remote free shell servers whatever file downloader as wget or curl was used.
Analogical method is still possible, for example to use wget to get an idea about bandwidtch connectivity, let it roll below 500 mb from to /dev/null few times:


$ wget –output-document=/dev/null

$ wget –output-document=/dev/null

$ wget –output-document=/dev/null


# wget -O /dev/null –progress=dot:mega ; date
–2020-08-07 13:56:49–
Resolving (…
Connecting to (||:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 10485760 (10M) [application/octet-stream]
Saving to: ‘/dev/null’

     0K …….. …….. …….. …….. …….. …….. 30%  142M 0s
  3072K …….. …….. …….. …….. …….. …….. 60%  179M 0s
  6144K …….. …….. …….. …….. …….. …….. 90%  204M 0s
  9216K …….. ……..                                    100%  197M=0.06s

2020-08-07 13:56:50 (173 MB/s) – ‘/dev/null’ saved [10485760/10485760]

Fri 07 Aug 2020 01:56:50 PM UTC

To be sure you have a real picture on remote machine Internet speed it is always a good idea to run download of random big files on a certain locations that are well known to have a very stable Internet bandwidth to the Internet backbone routers.

4. Using Simple shell script to test Internet speed

Fetch and use


wget && chmod u+x && bash



5. Using iperf to test connectivity between two servers 


iperf is another good tool worthy to mention that can be used to test the speed between client and server.

To use iperf install it with apt and do on the server machine to which bandwidth will be tested:


# iperf -s 


On the client machine do:


# iperf -c 


where is the IP of the server where iperf was spawned to listen.

6. Using Netflix fast to determine Internet connection speed on host


fast is a service provided by Netflix. Its web interface is located at and it has a command-line interface available through npm (npm is a package manager for nodejs) so if you don't have it you will have to install it first with:

# apt install –yes npm


Note that if you run on Debian this will install you some 249 new nodejs packages which you might not want to have on the system, so this is useful only for machines that has already use of nodejs.


$ fast


     82 Mbps ↓

The command returns your Internet download speed. To get your upload speed, use the -u flag:


$ fast -u


   ⠧ 80 Mbps ↓ / 8.2 Mbps ↑


7. Use speedometer / iftop to measure incoming and outgoing traffic on interface

If you're measuring connectivity on a live production server system, then you might consider that the measurement output might not be exactly correct especially if you're measuring the Uplink / Downlink on a Heavy loaded webserver / Mail Server / Samba or DNS server.
If this is the case a very useful tools to consider to extract the already taken traffic used on your Incoming and Outgoing ( TX / RX ) Network interfaces
are speedometer and iftop, they're present and installable depending on the OS via yum / apt or the respective package manager.


To install on Debian server:




# apt install –yes iftop speedometer


The most basic use to check the live received traffic in a nice Ncurses like text graphic is with: 





# speedometer -r 


To generate real time ASCII art graph on RX / TX traffic do:



# speedometer -r eth0 -t eth0






# iftop -P -i eth0









Send TCP / UDP strings and commands to Local and Remote Applications without netcat with Bash

Friday, July 24th, 2020


Did you ever needed to send TCP / UDP packets manually to send commands to local or remote applications, having a fully functional BASH Shell but not having the luxury to have NC (Netcat Swiss Army Knife of Networking) tool?
This happens if you have some Linux based embeded device as Arduino or a Linux server with a high security PCI requirement which can't affort to have Netcat in place or another portable hardware with a Linux kernel, that needs to communicate in UDP for any reason but you don't want to waste additional 28KB or physically you have access to a Linux device that doesn't have netcat but you want to be able to send UDP externally …

SInce some time in newer GNU Bash's releases support for TCP / UDP data sending is described in Bash's Manual and should be working it is not as good as you might expect but for a small things it could save you the day.

The syntax to use it is:


To open new socket connection to a UDP / TCP protocol with bash you have to simply open a new Shell handler (lets say 3) to:






1. Get GOOGLE HTML Source with simple BASH / Getting URL Index with bash sockets

If you happen to have access to a machine where no network downloader tool or a text browser such as curl, wget, lynx, links is available but you want to dump the content of a index.html or any other URL with simply bash you can do it like so:

exec 3<>/dev/tcp/
echo -e "GET / HTTP/1.1\r\n\r\n" >&3 

cat <&3

If you need to open a connection to a Internet Domain with bash and store the output into a separate .html file:

exec 3<>/dev/tcp/
echo -e "GET / HTTP/1.1\r\n\r\n" >&3 

cat <&3 | tee -a output.html

Note that this will work only if you're logged into into an interactive shell.
If you want instead do it from a shell script (and omit usage) of wget etc. use something like basic script :

exec 3<>/dev/tcp/
printf "GET /\r\n\r\n" >&3
while IFS= read -r -u3 -t2 line || [[ -n “$line” ]]; do echo "$line"; done
exec 3>&-
exec 3<&-


2. Sending UDP protocol data via bash socket

To send test  variables or commands data to localhost ( UDP listening service:

echo 'TEST COMMAND' > /dev/udp/


echo "Any UDP data" > /dev/udp/

If you happent to have netcat or running on a bash shell that doesn't properly support TCP / UDP sending you can always do it netcat way:

echo "Command" | nc -u -w0 3000

Of course this little hack is useful just for simple things and eventually for more comlex stuff and scripting you would like to use a fully functional HTML reader ( W3C compliant Web Browser )
still  for a quick dirty stuff Bash socketing from the console rocks pretty much ! 🙂


How to debug failing service in systemctl and add a new IP network alias in CentOS Linux

Wednesday, January 15th, 2020


If you get some error with some service that is start / stopped via systemctl you might be pondering how to debug further why the service is not up then then you'll be in the situation I was today.
While on one configured server with 8 eth0 configured ethernet network interfaces the network service was reporting errors, when atempted to restart the RedHat way via:

service network restart

to further debug what the issue was as it was necessery I had to find a way how to debug systemctl so here is how:


How to do a verbose messages status for sysctlct?


linux:~# systemctl status network

linux:~# systemctl status network


Another useful hint is to print out only log messages for the current boot, you can that with:

# journalctl -u service-name.service -b


if you don't want to have the less command like page separation ( paging ) use the –no-pager argument.


# journalctl -u network –no-pager

Jan 08 17:09:14 lppsq002a network[8515]: Bringing up interface eth5:  [  OK  ]

    Jan 08 17:09:15 lppsq002a network[8515]: Bringing up interface eth6:  [  OK  ]
    Jan 08 17:09:15 lppsq002a network[8515]: Bringing up interface eth7:  [  OK  ]
    Jan 08 17:09:15 lppsq002a systemd[1]: network.service: control process exited, code=exited status=1
    Jan 08 17:09:15 lppsq002a systemd[1]: Failed to start LSB: Bring up/down networking.
    Jan 08 17:09:15 lppsq002a systemd[1]: Unit network.service entered failed state.
    Jan 08 17:09:15 lppsq002a systemd[1]: network.service failed.
    Jan 15 11:04:45 lppsq002a systemd[1]: Starting LSB: Bring up/down networking…
    Jan 15 11:04:45 lppsq002a network[55905]: Bringing up loopback interface:  [  OK  ]
    Jan 15 11:04:45 lppsq002a network[55905]: Bringing up interface eth0:  RTNETLINK answers: File exists
    Jan 15 11:04:45 lppsq002a network[55905]: [  OK  ]
    Jan 15 11:04:45 lppsq002a network[55905]: Bringing up interface eth1:  RTNETLINK answers: File exists
    Jan 15 11:04:45 lppsq002a network[55905]: [  OK  ]
    Jan 15 11:04:46 lppsq002a network[55905]: Bringing up interface eth2:  ERROR     : [/etc/sysconfig/network-scripts/ifup-eth] Device eth2 has different MAC address than expected, ignoring.
    Jan 15 11:04:46 lppsq002a network[55905]: [FAILED]
    Jan 15 11:04:46 lppsq002a network[55905]: Bringing up interface eth3:  RTNETLINK answers: File exists
    Jan 15 11:04:46 lppsq002a network[55905]: [  OK  ]
    Jan 15 11:04:46 lppsq002a network[55905]: Bringing up interface eth4:  ERROR     : [/etc/sysconfig/network-scripts/ifup-eth] Device eth4 does not seem to be present, delaying initialization.
    Jan 15 11:04:46 lppsq002a network[55905]: [FAILED]
    Jan 15 11:04:46 lppsq002a network[55905]: Bringing up interface eth5:  RTNETLINK answers: File exists
    Jan 15 11:04:46 lppsq002a network[55905]: [  OK  ]
    Jan 15 11:04:46 lppsq002a network[55905]: Bringing up interface eth6:  RTNETLINK answers: File exists
    Jan 15 11:04:47 lppsq002a network[55905]: [  OK  ]
    Jan 15 11:04:47 lppsq002a network[55905]: Bringing up interface eth7:  RTNETLINK answers: File exists
    Jan 15 11:04:47 lppsq002a network[55905]: [  OK  ]
    Jan 15 11:04:47 lppsq002a network[55905]: RTNETLINK answers: File exists
    Jan 15 11:04:47 lppsq002a network[55905]: RTNETLINK answers: File exists
    Jan 15 11:04:47 lppsq002a network[55905]: RTNETLINK answers: File exists
    Jan 15 11:04:47 lppsq002a network[55905]: RTNETLINK answers: File exists
    Jan 15 11:04:47 lppsq002a network[55905]: RTNETLINK answers: File exists
    Jan 15 11:04:47 lppsq002a network[55905]: RTNETLINK answers: File exists
    Jan 15 11:04:47 lppsq002a network[55905]: RTNETLINK answers: File exists
    Jan 15 11:04:47 lppsq002a network[55905]: RTNETLINK answers: File exists
    Jan 15 11:04:47 lppsq002a network[55905]: RTNETLINK answers: File exists
    Jan 15 11:04:47 lppsq002a systemd[1]: network.service: control process exited, code=exited status=1
    Jan 15 11:04:47 lppsq002a systemd[1]: Failed to start LSB: Bring up/down networking.
    Jan 15 11:04:47 lppsq002a systemd[1]: Unit network.service entered failed state.
    Jan 15 11:04:47 lppsq002a systemd[1]: network.service failed.
    Jan 15 11:08:22 lppsq002a systemd[1]: Starting LSB: Bring up/down networking…
    Jan 15 11:08:22 lppsq002a network[56841]: Bringing up loopback interface:  [  OK  ]
    Jan 15 11:08:22 lppsq002a network[56841]: Bringing up interface eth0:  RTNETLINK answers: File exists
    Jan 15 11:08:22 lppsq002a network[56841]: [  OK  ]
    Jan 15 11:08:26 lppsq002a network[56841]: Bringing up interface eth1:  RTNETLINK answers: File exists
    Jan 15 11:08:26 lppsq002a network[56841]: [  OK  ]
    Jan 15 11:08:26 lppsq002a network[56841]: Bringing up interface eth2:  ERROR     : [/etc/sysconfig/network-scripts/ifup-eth] Device eth2 has different MAC address than expected, ignoring.
    Jan 15 11:08:26 lppsq002a network[56841]: [FAILED]
    Jan 15 11:08:26 lppsq002a network[56841]: Bringing up interface eth3:  RTNETLINK answers: File exists
    Jan 15 11:08:27 lppsq002a network[56841]: [  OK  ]



Another useful thing debug arguments is the -xe to do:

# journalctl -xe –no-pager


  • -x (– catalog)
    Augment log lines with explanation texts from the message catalog.
    This will add explanatory help texts to log messages in the output
    where this is available.
  •  -e ( –pager-end )  Immediately jump to the end of the journal inside the implied pager


Finally after fixing the /etc/sysconfig/networking-scripts/* IP configuration issues I had all the 8 Ethernet interfaces to work as expected

# systemctl status network




2. Adding a new IP alias to eth0 interface

Further on I had  to add an IP Alias on the CenOS via its networking configuration, this is done by editing /etc/sysconfig/network-scripts/ifcfg* files.
To create an IP alias for first lan interface eth0, I've had to created a new file named ifcfg-eth0:0

linux:~# cd /etc/sysconfig/network-scripts/
linux:~# vim ifcfg-eth0:0

with below content


Adding this IP address network alias works across all RPM based distributions and should work also on Fedora and Open SuSE as well as Suse Enterprise Linux.
If you however prefer to use a text GUI and do it the CentOS server administration way you can use nmtui (Text User Interface for controlling NetworkManager). tool.

linux:~# nmtui




How to clear ARP cache on Linux / Windows for a single IP address / Flush All IPs ARP cache

Wednesday, December 11th, 2019


On times of Public Internet IP migration or Local IPs between Linux servers or especially in clustered Linux Application Services running on environments like Pacemaker / Corosync / Heartbeat with services such as Haproxy.
Once an IP gets migrated due to complex network and firewall settings often the Migrated IP from Linux Server 1 (A) to Linux Server 2 (B) keeps time until a request to reload the Internet server IP ARP cache with to point to the new IP location, causing a disruption of accessibility to the Newly configured IP address on the new locations. I will not get much into details here what are the ARP (Address Resolution protocol) and Network ARP records on a Network attached Computer and how they correspond uniquely to each IP address assigned on Ethernet or Aliased network Interfaces (eth0 eth0:1 eth0:2) . But in this article, I'll briefly explain once IP Version 4 address is migrated from one server Data Center location to another DC, how the unique corresponding ARP record kept in OS system memory should be flushed in the ARP corresponding Operating System so called ARP table (of which you should think as a logical block in memory keeping a Map of where IP addresses are located physically on a Network recognized by the corresponding Unique MAC Address.

1. List the current ARP cache entries do

Arp is part of net-tools on Debian GNU / Linux and is also available and installed by default on virtually any Linux distribution Fedora / CentOS / RHEL / Ubuntu / Arch Linux and even m$ Windows NT / XP / 2000 / 10 / whatever, the only difference is Linux tool has a bit of more functionality and has a bit more complex use.
Easiest use of arp on GNU / Linux OS-es is.

# arp -an 

The -a lists all records and -n flag is here to omit IP resolving as some IPs are really slow to resolve and output of command could get lagged.

2. Delete one IP entry from the cache

Assuming only one IP address was migrated, if you want to delete the IP entry from local ARP table on any interface:

# arp -d

It is useful to delete an ARP cached entry for IP address only on a certain interface, to do so:

# /usr/sbin/arp -i eth1 -d


3. Create ARP entry MAC address with a static one for tightened security

A useful Hack is to (assign) / bind specific Static MAC addresses to be static in the ARP cache, this is very useful to improve security and fight an ARP poisoning attacks.
Doing so is pretty easy, to do so:

Above will staticly make IP to always appear in the ARP cache table to the MAC 00:50:ba:85:85:ca. So even if we have another system with the same MAC
trying to spoof our location and thus break our real record location for the Hostname in the network holding in reality the MAC 00:50:ba:85:85:ca, poisoning us
trying to make our host to recognize to a different address this will not happen as the static ARP will be kept unchanged in ARP caching table.


 # arp -s 00:50:ba:85:85:ca


4. Flush all ARP records only for specific Ethernet Interface

After the IP on interface was migrated run:


# ip link set arp off dev eth0 ; ip link set arp on dev eth0


5. Remove a set of few IPs only migrated ARP cache entries


# for i in; do sudo arp -d $i; done

Once old ARP entries are removed the arp command would return as:


linux:~$ arp
? ( at <incomplete>  on eth1
? ( at <incomplete>  on eth2

The / entry now shows as incomplete, which means the ARP entry will be refreshed when it is needed again, this would also depend
on the used network switches / firewalls in the network settings so often could take up to 1 minute or so..


6. Flush all ARP table records on Linux



# ip -s -s neigh flush all


7. Delete ARP Cache on FreeBSD and other BSDs

# arp -d -a 


8.  Flush arp cache on Windows

Run command prompt as Administrator -> (cmd.exe)  and do:

C:\> ipconfig /all
netsh interface ip delete arpcache


9. Monitoring the arp table

On servers with multiple IP addresses, where you expect a number of IP addresses migrated to change it is useful to use watch + arp like so:

# watch -n 0.1 'arp -an'

The -n 0.1 will make the arp -an be rerun every 10 miliseconds and by the way is a useful trick to monitor stuff returned by commands that needs a higher refresh frequency.


In short in this article, was explained how to list your arp cache table.The arp command is also available both on Linux and Windows) and as integral part of OS networking it is useful to check thoroghfully to its man page (man arp).
Explained was how to create Static ARP table records to prevent ARP poisoning attacks on a server.
I went through how to delete only a single ARP records (in case if) only certain IPs on a host are changed and an ARP cache entry reload is needed, as well as how to flush the complete set of ARP records need to get refreshed, sometimes useful on networks with Buggy Network Switches or when completely changing the set of IP-addresses assigned on a server host.

Rsync copy files with root privileges between servers with root superuser account disabled

Tuesday, December 3rd, 2019



Sometimes on servers that follow high security standards in companies following PCI Security (Payment Card Data Security) standards it is necessery to have a very weird configurations on servers,to be able to do trivial things such as syncing files between servers with root privileges in a weird manners.This is the case for example if due to security policies you have disabled root user logins via ssh server and you still need to synchronize files in directories such as lets say /etc , /usr/local/etc/ /var/ with root:root user and group belongings.

Disabling root user logins in sshd is controlled by a variable in /etc/ssh/sshd_config that on most default Linux OS
installations is switched on, e.g. 

grep -i permitrootlogin /etc/ssh/sshd_config
PermitRootLogin yes

Many corporations use Vulnerability Scanners such as Qualys are always having in their list of remote server scan for SSH Port 22 to turn have the PermitRootLogin stopped with:


PermitRootLogin no

In this article, I'll explain a scenario where we have synchronization between 2 or more servers Server A / Server B, whatever number of servers that have already turned off this value, but still need to
synchronize traditionally owned and allowed to write directories only by root superuser, here is 4 easy steps to acheive it.


1. Add rsyncuser to Source Server (Server A) and Destination (Server B)

a. Execute on Src Host:


groupadd rsyncuser
useradd -g 1000 -c 'Rsync user to sync files as root src_host' -d /home/rsyncuser -m rsyncuser


b. Execute on Dst Host:


groupadd rsyncuser
useradd -g 1000 -c 'Rsync user to sync files dst_host' -d /home/rsyncuser -m rsyncuser


2. Generate RSA SSH Key pair to be used for passwordless authentication

a. On Src Host

su – rsyncuser

ssh-keygen -t rsa -b 4096


b. Check .ssh/ generated key pairs and make sure the directory content look like.


[rsyncuser@src-host .ssh]$ cd ~/.ssh/;  ls -1



3. Copy to Destination host server under authorized_keys


scp ~/.ssh/  rsyncuser@dst-host:~/.ssh/authorized_keys


Next fix permissions of authorized_keys file for rsyncuser as anyone who have access to that file (that exists as a user account) on the system
could steal the key and use it to run rsync commands and overwrite remotely files, like overwrite /etc/passwd /etc/shadow files with his custom crafted credentials
and hence hack you 🙂

Hence, On Destionation Host Server B fix permissions with:

su – rsyncuser; chmod 0600 ~/.ssh/authorized_keys
[rsyncuser@dst-host ~]$

An alternative way for the lazy sysadmins is to use the ssh-copy-id command


$ ssh-copy-id rsyncuser@
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed — if you are prompted now it is to install the new keys
root@'s password: 


For improved security here to restrict rsyncuser to be able to run only specific command such as very specific script instead of being able to run any command it is good to use little known command= option
once creating the authorized_keys


4. Test ssh passwordless authentication works correctly

For that Run as a normal ssh from rsyncuser

On Src Host


[rsyncuser@src-host ~]$ ssh rsyncuser@dst-host

Perhaps here is time that for those who, think enabling a passwordless authentication is not enough secure and prefer to authorize rsyncuser via a password red from a secured file take a look in my prior article how to login to remote server with password provided from command line as a script argument / Running same commands on many servers 

5. Enable rsync in sudoers to be able to execute as root superuser (copy files as root)


For this step you will need to have sudo package installed on the Linux server.

Then, Execute once logged in as root on Destionation Server (Server B)


[root@dst-host ~]# grep 'rsyncuser ALL' /etc/sudoers|wc -l || echo ‘rsyncuser ALL=NOPASSWD:/usr/bin/rsync’ >> /etc/sudoers


Note that using rsync with a ALL=NOPASSWD in /etc/sudoers could pose a high security risk for the system as anyone authorized to run as rsyncuser is able to overwrite and
respectivle nullify important files on Destionation Host Server B and hence easily mess the system, even shell script bugs could produce a mess, thus perhaps a better solution to the problem
to copy files with root privileges with the root account disabled is to rsync as normal user somewhere on Dst_host and use some kind of additional script running on Dst_host via lets say cron job and
will copy gently files on selective basis.

Perhaps, even a better solution would be if instead of granting ALL=NOPASSWD:/usr/bin/rsync in /etc/sudoers is to do ALL=NOPASSWD:/usr/local/bin/
that will get triggered, once the files are copied with a regular rsyncuser acct.


6. Test rsync passwordless authentication copy with superuser works

Do some simple copy, lets say copy files on Encrypted tunnel configurations located under some directory in /etc/stunnel on Server A to /etc/stunnel on Server B

The general command to test is like so:

rsync -aPz -e 'ssh' '–rsync-path=sudo rsync' /var/log rsyncuser@$dst_host:/root/tmp/

This will copy /var/log files to /root/tmp, you will get a success messages for the copy and the files will be at destination folder if succesful.


On Src_Host run:


[rsyncuser@src-host ~]$ dst=FQDN-DST-HOST; user=rsyncuser; src_dir=/etc/stunnel; dst_dir=/root/tmp;  rsync -aP -e 'ssh' '–rsync-path=sudo rsync' $src_dir  $rsyncuser@$dst:$dst_dir;


7. Copying files with root credentials via script

The simlest file to use to copy a bunch of predefined files  is best to be handled by some shell script, the most simple version of it, could look something like this.

# On server1 use something like this
# On server2 dst server
# add in /etc/sudoers
# rsyncuser ALL=NOPASSWD:/usr/bin/rsync




for i in $(echo ${src[@]}); do
rsync -aPvz –delete –dry-run -e 'ssh' '–rsync-path=sudo rsync' "$i" $rsyncuser@$dst_host:$dst_dir"$i";

In above script as you can see, we define a bunch of files that will be copied in bash array and then run a loop to take each of them and copy to testination dir.
A very sample version of the script 


Lets do short overview on what we have done here. First Created rsyncuser on SRC Server A and DST Server B, set up the key pair on both copied the keys to make passwordless login possible,
set-up rsync to be able to write as root on Dst_Host / testing all the setup and pinpointing a small script that can be used as a backbone to develop something more complex
to sync backups or keep system configurations identicatial – for example if you have doubts that some user might by mistake change a config etc.
In short it was pointed the security downsides of using rsync NOPASSWD via /etc/sudoers and few ideas given that could be used to work on if you target even higher
PCI standards.


Scanning ports with netcat “nc” command on Linux and UNIX / Checking for firewall filtering between source and destination with nc

Friday, September 6th, 2019


Netcat ( nc ) is one of that tools, that is well known in the hacker (script kiddie) communities, but little underestimated in the sysadmin world, due to the fact nmap (network mapper) – the network exploratoin and security auditing tool has become like the standard penetration testing TCP / UDP port tool

nc is feature-rich network debugging and investigation tool with tons of built-in capabilities for reading from and writing to network connections using TCP or UDP.

Its Plethora of features includes port listening, port scanning & Transferring files due to which it is often used by Hackers and PenTesters as Backdoor. Netcat was written by a guy we know as the Hobbit <>.

For a start-up and middle sized companies if nmap is missing on server usually it is okay to install it without risking to open a huge security hole, however in Corporate world, due to security policies often nmap is not found on the servers but netcat (nc) is present on the servers so you have to learn, if you haven't so to use netcat for the usual IP range port scans, if you're so used to nmap.

There are different implementations of Netcat, whether historically netcat was UNIX (BSD) program with a latest release of March 1996. The Linux version of NC is GNU Netcat (official source here) and is POSIX compatible. The other netcat in Free Software OS-es is OpenBSD's netcat whose ported version is also used in FreeBSD. Mac OS X also comes with default prebundled netcat on its Mac OS X from OS X version (10.13) onwards, on older OS X-es it is installable via MacPorts package repo, even FreeDOS has a port of it called NTOOL.

The (Swiss Army Knife of Embedded Linux) busybox includes a default leightweight version of netcat and Solaris has the OpenBSD netcat version bundled.

A cryptography enabled version fork exists that supports that supports integrated transport encryption capabilities called Cryptcat.

The Nmap suite also has included rewritten version of GNU Netcat named Ncat, featuring new possibilities such as "Connection Brokering", TCP/UDP Redirection, SOCKS4 client and server support, ability to "Chain" Ncat processes, HTTP CONNECT proxying (and proxy chaining), SSL connect/listen support and IP address/connection filtering. Just like Nmap, Ncat is cross-platform.

In this small article I'll very briefly explain on basic netcat – known as the TCP Army knife tool port scanning for an IP range of UDP / TCP ports.


1. Scanning for TCP opened / filtered ports remote Linux / Windows server


Everyone knows scanning of a port is possible with a simple telnet request towards the host, e.g.:



The most basic netcat use that does the same is achiavable with:


220 jeremiah ESMTP Exim 4.92 Thu, 05 Sep 2019 20:39:41 +0300

Beside scanning the remote port, using netcat interactively as pointing in above example, if connecting to HTTP Web services, you can request remote side to return a webpage by sending a false referer, source host and headers, this is also easy doable with curl / wget and lynx but doing it with netcat just like with telnet could be fun, here is for example how to request an INDEX page with spoofed HTTP headers.

nc Web-Host.COM 25
GET / HTTP/1.1
User-Agent: my-spoofed-browser


2. Performing a standard HTTP request with netcat


To do so just pype the content with a standard bash integrated printf function with the included end of line (the unix one is \n but to be OS independent it is better to use r\n  – the end of line complition character for Windows.


printf "GET /index.html HTTP/1.0\r\nHost:\r\n\r\n" | nc 80


3. Scanning a range of opened / filtered UDP ports


To scan for lets say opened remote system services on the very common important ports opened from UDP port 25 till, 1195 – more specifically for:

  • UDP Bind Port 53
  • Time protocol Port (37)
  • TFTP (69)
  • Kerberos (88)
  • NTP 123
  • Netbios (137,138,139)
  • SNMP (161)
  • LDAP 389
  • Microsoft-DS (Samba 445)
  • Route BGP (52)
  • LDAPS (639)
  • openvpn (1194)


nc -vzu 25 1195


UDP tests will show opened, if no some kind of firewall blocking, the -z flag is given to scan only for remote listening daemons without sending any data to them.


4. Port Scanning TCP listening ports with Netcat


As prior said using netcat to scan for remote opened HTTP Web Server on port 80 an FTP on Port 23 or a Socks Proxy or MySQL Database on 3306 / PostgreSQL DB on TCP 5432 is very rare case scenario.

Below is example to scan a Local network situated IP for TCP open ports from port 1 till 7000.


# nc -v -n -z -w 5 1-7000

           nc: connect to 80 (tcp) failed: Connection refused
           nc: connect to 20 (tcp) failed: Connection refused
           Connection to port [tcp/ssh] succeeded!
           nc: connect to 23 (tcp) failed: Connection refused


Be informed that scanning with netcat is much more slower, than nmap, so specifying smaller range of ports is always a good idea to reduce annoying waiting …

The -w flag is used to set a timeout to remote connection, usually on a local network situated machines the timeout could be low -w 1 but for machines across different Data Centers (let say one in Berlin and one in Seattle), use as a minimum -w 5.

If you expect remote service to be responsive (as it should always be), it is a nice idea to use netcat with a low timeout (-w) value of 1 below is example:

netcat -v -z -n -w 1 scanned-hosts 1-1023


5. Port scanning range of IP addresses with netcat

If you have used Nmap you know scanning for a network range is as simple as running something like nmap -sP -P0 192.168.0.* (to scan from IP range 1-255 map -sP -P0 (to scan from local IPs ending in 1-150) or giving the network mask of the scanned network, e.g. nmap -sF – for more examples please check my previous article Checking port security on Linux with nmap (examples).

But what if nmap is not there and want to check a bunch 10 Splunk servers (software for searching, monitoring, and analyzing machine-generated big data, via a Web-style interface.), with netcat to find, whether the default Splunk connection port 9997 is opened or not:


for i in `seq 1 10`; do nc -z -w 5 -vv splunk0$ 9997; done


6. Checking whether UDP port traffic is allowed to destination server


Assuring you have access on Source traffic (service) Host A  and Host B (remote destination server where a daemon will be set-upped to listen on UDP port and no firewall in the middle Network router or no traffic control and filtering software HUB is preventing the sent UDP proto traffic, lets say an ntpd will be running on its standard 123 port there is done so:

– On host B (the remote machine which will be running ntpd and should be listening on port 123), run netcat to listen for connections


# nc -l -u -p 123
Listening on [] (family 2, port 123)

Make sure there is no ntpd service actively running on the server, if so stop it with /etc/init.d/ntpd stop
and run above command. The command should run as superuser as UDP port 123 is from the so called low ports from 1-1024 and binding services on such requires root privileges.

– On Host A (UDP traffic send host


nc -uv remote-server-host 123



If the remote port is not reachable due to some kind of network filtering, you will get "connection refused".
An important note to make is on some newer Linux distributions netcat might be silently trying to connect by default using IPV6, bringing false positives of filtered ports due to that. Thus it is generally a good idea, to make sure you're connecting to IPV6


$ nc -uv -4 remote-server-host 123


Another note to make here is netcat's UDP connection takes 2-3 seconds, so make sure you wait at least 4-8 seconds for a very distant located hosts that are accessed over a multitude of routers.

7. Checking whether TCP port traffic allowed to DST remote server

To listen for TCP connections on a specified location (external Internet IP or hostname), it is analogous to listening for UDP connections.

Here is for example how to bind and listen for TCP connections on all available Interface IPs (localhost, eth0, eth1, eth2 etc.)

nc -lv 12345


Then on client host test the connection with


nc -vv 12345
Connection to 12345 port [tcp/*] succeeded!


8. Proxying traffic with netcat

Another famous hackers use of Netcat is its proxying possibility, to proxy anything towards a third party application with UNIX so any content returned be printed out on the listening nc spawned daemon like process.
For example one application is traffic SMTP (Mail traffic) with netcat, below is example of how to proxy traffic from Host B -> Host C (in that case the yandex current mail server

linux-srv:~# nc -l 12543 | nc 25

Now go to Host A or any host that has TCP/IP protocol access to port 12543 on proxy-host Host B (linux-srv) and connect to it on 12543 with another netcat or telnet.

to make netcat keep connecting to MX (Mail Exchange) server you can run it in a small never ending bash shell while loop, like so:


linux-srv:~# while :; do nc -l 12543 | nc 25; done

 Below are screenshots of a connection handshake between Host B (linux-srv) proxy host and Host A (the end client connecting) and Host C (



Host B netcat as a (Proxy)

that is possible in combination of UNIX and named pipes (for more on Named pipes check my previous article simple linux logging with named pipes), here is how to run a single netcat version to proxy any traffic in a similar way as the good old tinyproxy.

On Proxy host create the pipe and pass the incoming traffic towards and write back any output received back in the named pipe.

# mkfifo backpipe
# nc -l 8080 0<backpipe | nc 80 1>backpipe

Other useful netcat proxy set-up is to simulate a network connectivity failures.

For instance, if server:port on TCP 1080 is the normal host application would connect to, you can to set up a forward proxy from port 2080 with

    nc -L server:1080 2080

then set-up and run the application to connect to localhost:2080 (nc proxy port)

    /path/to/application_bin –server=localhost –port=2080

Now application is connected to localhost:2080, which is forwarded to server:1080 through netcat. To simulate a network connectivity failure, just kill the netcat proxy and check the logs of application_bin.

Using netcat as a bind shell (make any local program / process listen and deliver via nc)


netcat can be used to make any local program that can receive input and send output to a server, this use is perhaps little known by the junior sysadmin, but a favourite use of l337 h4x0rs who use it to spawn shells on remote servers or to make connect back shell. The option to do so is -e

-e – option spawns the executable with its input and output redirected via network socket.

One of the most famous use of binding a local OS program to listen and receive / send content is by
making netcat as a bind server for local /bin/bash shell.

Here is how

nc -l -p 4321 -e /bin/sh

If necessery specify the bind hostname after -l. Then from any client connect to 4321 (and if it is opened) you will gain a shell with the user with which above netcat command was run. Note that many modern distribution versions such as Debian / Fedora / SuSE Linux's netcat binary is compiled without the -e option (this works only when compiled with -DGAPING_SECURITY_HOLE), removal in this distros is because option is potentially opening a security hole on the system.

If you're interested further on few of the methods how modern hackers bind new backdoor shell or connect back shell, check out Spawning real tty shells article.


For more complex things you might want to check also socat (SOcket CAT) – multipurpose relay for bidirectional data transfer under Linux.
socat is a great Linux Linux / UNIX TCP port forwarder tool similar holding the same spirit and functionality of netcat plus many, many more.

On some of the many other UNIX operating systems that are lacking netcat or nc / netcat commands can't be invoked a similar utilitiesthat should be checked for and used instead are:

ncat, pnetcat, socat, sock, socket, sbd

To use nmap's ncat to spawn a shell for example that allows up to 3 connections and listens for connects only from network on port 8081:

ncat –exec "/bin/bash" –max-conns 3 –allow -l 8081 –keep-open


9. Copying files over network with netcat

Another good hack often used by hackers to copy files between 2 servers Server1 and Server2 who doesn't have any kind of FTP / SCP / SFTP / SSH / SVN / GIT or any kind of Web copy support service – i.e. servers only used as a Database systems that are behind a paranoid sysadmin firewall is copying files between two servers with netcat.

On Server2 (the Machine on which you want to store the file)

nc -lp 2323 > files-archive-to-copy.tar.gz

On server1 (the Machine from where file is copied) run:

nc -w 5 2323 < files-archive-to-copy.tar.gz


Note that the downside of such transfers with netcat is data transferred is unencrypted so any one with even a simple network sniffer or packet analyzier such as iptraf or tcpdump could capture the file, so make sure the file doesn't contain sensitive data such as passwords.

Copying partition images like that is perhaps best way to get disk images from a big server onto a NAS (when you can't plug the NAS into the server).

10. Copying piped archived directory files with netcat


On computer A:

export ARIBTRARY_PORT=3232
nc -l $ARBITRARY_PORT | tar vzxf –

On Computer B:

tar vzcf – files_or_directories | nc computer_a $ARBITRARY_PORT


11. Creating a one page webserver with netcat and ncat

As netcat could listen to port and print content of a file, it can be set-up with a bit of bash shell scripting to serve
as a one page webserver, or even combined with some perl scripting and bash to create a multi-serve page webserver if needed.

To make netact serve a page to any connected client run in a screen / tmux session following code:


while true; do nc -l -p 80 -q 1 < somepage.html; done


Another interesting fun example if you have installed ncat (is a small web server that connects current time on server on connect).

ncat -lkp 8080 –sh-exec 'echo -ne "HTTP/1.0 200 OK\r\n\r\nThe date is "; date;'


12. Cloning Hard disk partitions with netcat

rsync is a common tool used to clone hard disk partitions over network. However if rsync is not installed on a server and netcat is there you can use it instead, lets say we want to clone /dev/sdb
from Server1 to Server2 assuming (Server1 has a configured working Local or Internet connection).


On Server2 run:

nc -l -p 4321 | dd of=/dev/sdb


Following on Server2 to start the Partition / HDD cloning process run


dd if=/dev/sdb | nc 4321


Where is the IP address listen configured on Server2 (in case you don't know it, check the listening IP to access with /sbin/ifconfig).

Next you have to wait for some short or long time depending on the partiiton or Hard drive, number of files / directories and allocated disk / partition size.

To clone /dev/sda (a main partiiton) from Server1 to Server2 first requirement is that it is not mounted, thus to have it unmounted on a system assuming you have physical access to the host, you can boot some LiveCD Linux distribution such as Knoppix Live CD on Server1, manually set-up networking with ifconfig or grab an IP via DHCP from the central DHCP server and repeat above example.

Happy netcating 🙂

Fix staled NFS on server with dmesg error log nfs: server nfs-server not responding, still trying

Saturday, March 16th, 2019


On a server today I've found to have found a number of NFS mounts mounted through /etc/fstab file definitions that were hanging;

nfs-server:~# df -hT

 command kept hanging as well as any attempt to access the mounted NFS directory was not possible.
The server with the hanged Network File System is running SLES (SuSE Enterprise Linux 12 SP3) a short investigation in the kernel logs (dmesg) as well as /var/log/messages reveales following errors:


nfs-server:~# dmesg
[3117414.856995] nfs: server nfs-server OK
[3117595.104058] nfs: server nfs-server not responding, still trying
[3117625.032864] nfs: server nfs-server OK
[3117805.280036] nfs: server nfs-server not responding, still trying
[3117835.209110] nfs: server nfs-server OK
[3118015.456045] nfs: server nfs-server not responding, still trying
[3118045.384930] nfs: server nfs-server OK
[3118225.568029] nfs: server nfs-server not responding, still trying
[3118255.560536] nfs: server nfs-server OK
[3118435.808035] nfs: server nfs-server not responding, still trying
[3118465.736463] nfs: server nfs-server OK
[3118645.984057] nfs: server nfs-server not responding, still trying
[3118675.912595] nfs: server nfs-server OK
[3118886.098614] nfs: server nfs-server OK
[3119066.336035] nfs: server nfs-server not responding, still trying
[3119096.274493] nfs: server nfs-server OK
[3119276.512033] nfs: server nfs-server not responding, still trying
[3119306.440455] nfs: server nfs-server OK
[3119486.688029] nfs: server nfs-server not responding, still trying
[3119516.616622] nfs: server nfs-server OK
[3119696.864032] nfs: server nfs-server not responding, still trying
[3119726.792650] nfs: server nfs-server OK
[3119907.040037] nfs: server nfs-server not responding, still trying
[3119936.968691] nfs: server nfs-server OK
[3120117.216053] nfs: server nfs-server not responding, still trying
[3120147.144476] nfs: server nfs-server OK
[3120328.352037] nfs: server nfs-server not responding, still trying
[3120567.496808] nfs: server nfs-server OK
[3121370.592040] nfs: server nfs-server not responding, still trying
[3121400.520779] nfs: server nfs-server OK
[3121400.520866] nfs: server nfs-server OK

It took me a short while to investigate and check the NetApp remote NFS storage filesystem and investigate the Virtual Machine that is running on top of OpenXen Hypervisor system.
The NFS storage permissions of the exported file permissions were checked and they were in a good shape, also a reexport of the NFS mount share was re-exported and on the Linux
mount host the following commands ran to remount the hanged Filesystems:


nfs-server:~# umount -f /mnt/nfs_share
nfs-server:~# umount -l /mnt/nfs_share
nfs-server:~# umount -lf /mnt/nfs_share1
nfs-server:~# umount -lf /mnt/nfs_share2
nfs-server:~# mount -t nfs -o remount /mnt/nfs_share

that fixed one of the hanged mount, but as I didn't wanted to manually remount each of the NFS FS-es, I've remounted them all with:

nfs-server:~# mount -a -t nfs

This solved it but, the fix seemed unpermanent as in a time while the issue started reoccuring and I've spend some time
in further investigation on the weird NFS hanging problem has led me to the following blog post where the same problem was described and it was pointed the root cause of it lays
in parameter for MTU which seems to be quite high MTU 9000 and this over the years has prooven to cause problems with NFS especially due to network router (switches) configurations
which seem to have a filters for MTU and are passing only packets with low MTU levels and using rsize / wzise custom mount NFS values in /etc/fstab could lead to this strange NFS hangs.

Below is a list of Maximum Transmission  Unit (MTU) for Media Transport excerpt taken from wikipedia as of time of writting this article.

In my further research on the issue I've come across this very interesting article which explains a lot on "Large Internet" and Internet Performance

I've used tracepath command which is doing basicly the same as traceroute but could be run without root user and discovers hops (network routers) and shows MTU between path -> destionation.

Below is a sample example

nfs-server:~# tracepath
 1?: [LOCALHOST]                      pmtu 1500
 1:                                           0.909ms
 1:                                           0.966ms
 2:                                         0.859ms
 3:                              1.138ms reached
     Resume: pmtu 1500 hops 3 back 3


Optiomal pmtu for this connection is to be 1500 .traceroute in some cases might return hops with 'no reply' if there is a router UDP  packet filtering implemented on it.

The high MTU value for the Storage network connection interface on eth1 was evident with a simple:


 nfs-server:~# /sbin/ifconfig |grep -i eth -A 2
eth0      Link encap:Ethernet  HWaddr 00:16:3E:5C:65:74
          inet addr:  Bcast:  Mask:

eth1      Link encap:Ethernet  HWaddr 00:16:3E:5C:65:76
          inet addr:  Bcast:  Mask:

The fix was as simple to lower MTU value for eth1 Ethernet interface to 1500 which is the value which most network routers are configured too.

To apply the new MTU to the eth1 interface without restarting the SuSE SLES networking , I first used ifconfig one time with:


 nfs-server:~# /sbin/ifconfig eth1 mtu 1500
 nfs-server:~# ip addr show

To make the setting permanent on next  SuSE boot:

I had to set the MTU=1500 value in


nfs-server:~#  ip address show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 8c:89:a5:f2:e8:d8 brd ff:ff:ff:ff:ff:ff
    inet brd scope global eth1
       valid_lft forever preferred_lft forever


Then to remount the NFS mounted hanged filesystems once again ran:

nfs-server:~# mount -a -t nfs

Many network routers keeps the MTU to low as 1500 also because a higher values causes IP packet fragmentation when using NFS over UDP where IP packet fragmentation and packet
reassembly requires significant amount of CPU at both ends of the network connection.
Packet fragmentation also exposes network traffic to greater unreliability, since a complete RPC request must be retransmitted if a UDP packet fragment is dropped for any reason.
Any increase of RPC retransmissions, along with the possibility of increased timeouts, are the single worst impediment to performance for NFS over UDP.
This and many more is very well explained in Optimizing NFS Performance page (which is a must reading) for any sys admin that plans to use NFS frequently.

Even though lowering MTU (Maximum Transmission Union) value does solved my problem at some cases especially in a modern local LANs with Jumbo Frames, allowing and increasing the MTU to 9000 bytes
might be a good idea as this will increase the amount of packet size.and will raise network performance, however as always on distant networks with many router hops keeping MTU value as low as 1492 / 5000 is always a good idea.