Posts Tagged ‘hosts’

How to configure bond0 bonding and network bridging for KVM Virtual machines on Redhat / CentOS / Fedora Linux

Tuesday, February 16th, 2021

configure-bond0-bonding-channel-with-bridges-on-hypervisor-host-for-guest-KVM-virtual-machines-howto-sample-Hypervisor-Virtual-machines-pic
 1. Intro to Redhat RPM based distro /etc/sysconfig/network-scripts/* config vars shortly explained

On RPM based Linux distributions configuring network has a very specific structure. As a sysadmin just recently I had a task to configure Networking on 2 Machines to be used as Hypervisors so the servers could communicate normally to other Networks via some different intelligent switches that are connected to each of the interfaces of the server. The idea is the 2 redhat 8.3 machines to be used as  Hypervisor (HV) and each of the 2 HVs to each be hosting 2 Virtual guest Machines with preinstalled another set of Redhat 8.3 Ootpa. I've recently blogged on how to automate a bit installing the KVM Virtual machines with using predefined kickstart.cfg file.

The next step after install was setting up the network. Redhat has a very specific network configuration well known under /etc/sysconfig/network-scripts/ifcfg-eno*# or if you have configured the Redhats to fix the changing LAN card naming ens, eno, em1 to legacy eth0, eth1, eth2 on CentOS Linux – e.g. to be named as /etc/sysconfig/network-scripts/{ifcfg-eth0,1,2,3}.

The first step to configure the network from that point is to come up with some network infrastrcture that will be ready on the HV nodes server-node1 server-node2 for the Virtual Machines to be used by server-vm1, server-vm2.

Thus for the sake of myself and some others I decide to give here the most important recognized variables that can be placed inside each of the ifcfg-eth0,ifcfg-eth1,ifcfg-eth2 …

A standard ifcfg-eth0 confing would look something this:
 

[root@redhat1 :~ ]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV4_FAILURE_FATAL=no
NAME=eth0
UUID=…
ONBOOT=yes
HWADDR=0e:a4:1a:b6:fc:86
IPADDR0=10.31.24.10
PREFIX0=23
GATEWAY0=10.31.24.1
DNS1=192.168.50.3
DNS2=10.215.105.3
DOMAIN=example.com
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes


Lets say few words to each of the variables to make it more clear to people who never configured Newtork on redhat without the help of some of the console ncurses graphical like tools such as nmtui or want to completely stop the Network-Manager to manage the network and thus cannot take the advantage of using nmcli (a command-line tool for controlling NetworkManager).

Here is a short description of each of above configuration parameters:

TYPE=device_type: The type of network interface device
BOOTPROTO=protocol: Where protocol is one of the following:

  • none: No boot-time protocol is used.
  • bootp: Use BOOTP (bootstrap protocol).
  • dhcp: Use DHCP (Dynamic Host Configuration Protocol).
  • static: if configuring static IP

EFROUTE|IPV6_DEFROUTE=answer

  • yes: This interface is set as the default route for IPv4|IPv6 traffic.
  • no: This interface is not set as the default route.

Usually most people still don't use IPV6 so better to disable that

IPV6INIT=answer: Where answer is one of the following:

  • yes: Enable IPv6 on this interface. If IPV6INIT=yes, the following parameters could also be set in this file:

IPV6ADDR=IPv6 address

IPV6_DEFAULTGW=The default route through the specified gateway

  • no: Disable IPv6 on this interface.

IPV4_FAILURE_FATAL|IPV6_FAILURE_FATAL=answer: Where answer is one of the following:

  • yes: This interface is disabled if IPv4 or IPv6 configuration fails.
  • no: This interface is not disabled if configuration fails.

ONBOOT=answer: Where answer is one of the following:

  • yes: This interface is activated at boot time.
  • no: This interface is not activated at boot time.

HWADDR=MAC-address: The hardware address of the Ethernet device
IPADDRN=address: The IPv4 address assigned to the interface
PREFIXN=N: Length of the IPv4 netmask value
GATEWAYN=address: The IPv4 gateway address assigned to the interface. Because an interface can be associated with several combinations of IP address, network mask prefix length, and gateway address, these are numbered starting from 0.
DNSN=address: The address of the Domain Name Servers (DNS)
DOMAIN=DNS_search_domain: The DNS search domain (this is the search Domain-name.com you usually find in /etc/resolv.conf)

Other interesting file that affects how routing is handled on a Redhat Linux is

/etc/sysconfig/network

[root@redhat1 :~ ]# cat /etc/sysconfig/network
# Created by anaconda
GATEWAY=10.215.105.

Having this gateway defined does add a default gateway

This file specifies global network settings. For example, you can specify the default gateway, if you want to apply some network settings such as routings, Alias IPs etc, that will be valid for all configured and active configuration red by systemctl start network scripts or the (the network-manager if such is used), just place it in that file.

Other files of intesresting to control how resolving is being handled on the server worthy to check are 

/etc/nsswitch.conf

and

/etc/hosts

If you want to set a preference of /etc/hosts being red before /etc/resolv.conf and DNS resolving for example you need to have inside it, below is default behavior of it.
 

root@redhat1 :~ ]#   grep -i hosts /etc/nsswitch.conf
#     hosts: files dns
#     hosts: files dns  # from user file
# Valid databases are: aliases, ethers, group, gshadow, hosts,
hosts:      files dns myhostname

As you can see the default order is to read first files (meaning /etc/hosts) and then the dns (/etc/resolv.conf)
hosts: files dns

Now with this short intro description on basic values accepted by Redhat's /etc/sysconfig/network-scripts/ifcfg* prepared configurations.


I will give a practical example of configuring a bond0 interface with 2 members which were prepared based on Redhat's Official documentation found in above URLs:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_networking/configuring-network-bonding_configuring-and-managing-networking
 

# Bonding on RHEL 7 documentation
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-network_bonding_using_the_command_line_interface

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-verifying_network_configuration_bonding_for_redundancy

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s2-networkscripts-interfaces_network-bridge

# Network Bridge with Bond documentation
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/sec-Configuring_a_VLAN_over_a_Bond

https://docs.fedoraproject.org/en-US/Fedora/24/html/Networking_Guide/sec-Network_Bridge_with_Bond.html


2. Configuring a single bond connection on eth0 / eth2 and setting 3 bridge interfaces bond -> br0, br1 -> eth1, br2 -> eth2

The task on my machines was to set up from 4 lan cards one bonded interface as active-backup type of bond with bonded lines on eth0, eth2 and 3 other 2 eth1, eth2 which will be used for private communication network that is connected via a special dedicated Switches and Separate VLAN 50, 51 over a tagged dedicated gigabit ports.

As said the 2 Servers had each 4 Broadcom Network CARD interfaces each 2 of which are paired (into a single card) and 2 of which are a solid Broadcom NetXtreme Dual Port 10GbE SFP+ and Dell Broadcom 5720 Dual Port 1Gigabit Network​.

2-ports-broadcom-netxtreme-dual-port-10GBe-spf-plus

On each of server-node1 and server-node2 we had 4 Ethernet Adapters properly detected on the Redhat

root@redhat1 :~ ]# lspci |grep -i net
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
19:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
19:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)


I've already configured as prerogative net.ifnames=0 to /etc/grub2/boot.cfg and Network-Manager service disabled on the host (hence to not use Network Manager you'll see in below configuration NM_CONTROLLED="no" is telling the Redhat servers is not to be trying NetworkManager for more on that check my previous article Disable NetworkManager automatic Ethernet Interface Management on Redhat Linux , CentOS 6 / 7 / 8.

3. Types of Network Bonding

mode=0 (balance-rr)

This mode is based on Round-robin policy and it is the default mode. This mode offers fault tolerance and load balancing features. It transmits the packets in Round robin fashion that is from the first available slave through the last.

mode-1 (active-backup)

This mode is based on Active-backup policy. Only one slave is active in this band, and another one will act only when the other fails. The MAC address of this bond is available only on the network adapter part to avoid confusing the switch. This mode also provides fault tolerance.

mode=2 (balance-xor)

This mode sets an XOR (exclusive or) mode that is the source MAC address is XOR’d with destination MAC address for providing load balancing and fault tolerance. Each destination MAC address the same slave is selected.

mode=3 (broadcast)

This method is based on broadcast policy that is it transmitted everything on all slave interfaces. It provides fault tolerance. This can be used only for specific purposes.

mode=4 (802.3ad)

This mode is known as a Dynamic Link Aggregation mode that has it created aggregation groups having same speed. It requires a switch that supports IEEE 802.3ad dynamic link. The slave selection for outgoing traffic is done based on a transmit hashing method. This may be changed from the XOR method via the xmit_hash_policy option.

mode=5 (balance-tlb)

This mode is called Adaptive transmit load balancing. The outgoing traffic is distributed based on the current load on each slave and the incoming traffic is received by the current slave. If the incoming traffic fails, the failed receiving slave is replaced by the MAC address of another slave. This mode does not require any special switch support.

mode=6 (balance-alb)

This mode is called adaptive load balancing. This mode does not require any special switch support.

Lets create the necessery configuration for the bond and bridges

[root@redhat1 :~ ]# cat ifcfg-bond0
DEVICE=bond0
NAME=bond0
TYPE=Bond
BONDING_MASTER=yes
#IPADDR=10.50.21.16
#PREFIX=26
#GATEWAY=10.50.0.1
#DNS1=172.20.88.2
ONBOOT=yes
BOOTPROTO=none
BONDING_OPTS="mode=1 miimon=100 primary=eth0"
NM_CONTROLLED="no"
BRIDGE=br0


[root@redhat1 :~ ]# cat ifcfg-bond0.10
DEVICE=bond0.10
BOOTPROTO=none
ONPARENT=yes
#IPADDR=10.50.21.17
#NETMASK=255.255.255.0
VLAN=yes

[root@redhat1 :~ ]# cat ifcfg-br0
STP=yes
BRIDGING_OPTS=priority=32768
TYPE=Bridge
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
#IPV6INIT=yes
#IPV6_AUTOCONF=yes
#IPV6_DEFROUTE=yes
#IPV6_FAILURE_FATAL=no
#IPV6_ADDR_GEN_MODE=stable-privacy
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=br0
UUID=4451286d-e40c-4d8c-915f-7fc12a16d595
DEVICE=br0
ONBOOT=yes
IPADDR=10.50.50.16
PREFIX=26
GATEWAY=10.50.0.1
DNS1=172.20.0.2
NM_CONTROLLED=no

[root@redhat1 :~ ]# cat ifcfg-br1
STP=yes
BRIDGING_OPTS=priority=32768
TYPE=Bridge
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=no
IPV4_FAILURE_FATAL=no
#IPV6INIT=yes
#IPV6_AUTOCONF=yes
#IPV6_DEFROUTE=yes
#IPV6_FAILURE_FATAL=no
#IPV6_ADDR_GEN_MODE=stable-privacy
IPV6INIT=no
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=br1
UUID=40360c3c-47f5-44ac-bbeb-77f203390d29
DEVICE=br1
ONBOOT=yes
##IPADDR=10.50.51.241
PREFIX=28
##GATEWAY=10.50.0.1
##DNS1=172.20.0.2
NM_CONTROLLED=no

[root@redhat1 :~ ]# cat ifcfg-br2
STP=yes
BRIDGING_OPTS=priority=32768
TYPE=Bridge
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=no
IPV4_FAILURE_FATAL=no
#IPV6INIT=yes
#IPV6_AUTOCONF=yes
#IPV6_DEFROUTE=yes
#IPV6_FAILURE_FATAL=no
#IPV6_ADDR_GEN_MODE=stable-privacy
IPV6INIT=no
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=br2
UUID=fbd5c257-2f66-4f2b-9372-881b783276e0
DEVICE=br2
ONBOOT=yes
##IPADDR=10.50.51.243
PREFIX=28
##GATEWAY=10.50.0.1
##DNS1=172.20.10.1
NM_CONTROLLED=no
NM_CONTROLLED=no
BRIDGE=br0

[root@redhat1 :~ ]# cat ifcfg-eth0
TYPE=Ethernet
NAME=bond0-slaveeth0
BOOTPROTO=none
#UUID=61065574-2a9d-4f16-b16e-00f495e2ee2b
DEVICE=eth0
ONBOOT=yes
MASTER=bond0
SLAVE=yes
NM_CONTROLLED=no

[root@redhat1 :~ ]# cat ifcfg-eth1
TYPE=Ethernet
NAME=eth1
UUID=b4c359ae-7a13-436b-a904-beafb4edee94
DEVICE=eth1
ONBOOT=yes
BRIDGE=br1
NM_CONTROLLED=no

[root@redhat1 :~ ]#  cat ifcfg-eth2
TYPE=Ethernet
NAME=bond0-slaveeth2
BOOTPROTO=none
#UUID=821d711d-47b9-490a-afe7-190811578ef7
DEVICE=eth2
ONBOOT=yes
MASTER=bond0
SLAVE=yes
NM_CONTROLLED=no

[root@redhat1 :~ ]#  cat ifcfg-eth3
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
#BOOTPROTO=dhcp
BOOTPROTO=none
DEFROUTE=no
IPV4_FAILURE_FATAL=no
#IPV6INIT=yes
#IPV6_AUTOCONF=yes
#IPV6_DEFROUTE=yes
#IPV6_FAILURE_FATAL=no
#IPV6_ADDR_GEN_MODE=stable-privacy
IPV6INIT=no
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
BRIDGE=br2
NAME=eth3
UUID=61065574-2a9d-4f16-b16e-00f495e2ee2b
DEVICE=eth3
ONBOOT=yes
NM_CONTROLLED=no

[root@redhat2 :~ ]# cat ifcfg-bond0
DEVICE=bond0
NAME=bond0
TYPE=Bond
BONDING_MASTER=yes
#IPADDR=10.50.21.16
#PREFIX=26
#GATEWAY=10.50.21.1
#DNS1=172.20.88.2
ONBOOT=yes
BOOTPROTO=none
BONDING_OPTS="mode=1 miimon=100 primary=eth0"
NM_CONTROLLED="no"
BRIDGE=br0

# cat ifcfg-bond0.10
DEVICE=bond0.10
BOOTPROTO=none
ONPARENT=yes
#IPADDR=10.50.21.17
#NETMASK=255.255.255.0
VLAN=yes
NM_CONTROLLED=no
BRIDGE=br0

[root@redhat2 :~ ]# cat ifcfg-br0
STP=yes
BRIDGING_OPTS=priority=32768
TYPE=Bridge
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
#IPV6INIT=yes
#IPV6_AUTOCONF=yes
#IPV6_DEFROUTE=yes
#IPV6_FAILURE_FATAL=no
#IPV6_ADDR_GEN_MODE=stable-privacy
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=br0
#UUID=f87e55a8-0fb4-4197-8ccc-0d8a671f30d0
UUID=4451286d-e40c-4d8c-915f-7fc12a16d595
DEVICE=br0
ONBOOT=yes
IPADDR=10.50.21.17
PREFIX=26
GATEWAY=10.50.21.1
DNS1=172.20.88.2
NM_CONTROLLED=no

[root@redhat2 :~ ]#  cat ifcfg-br1
STP=yes
BRIDGING_OPTS=priority=32768
TYPE=Bridge
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=no
IPV4_FAILURE_FATAL=no
#IPV6INIT=no
#IPV6_AUTOCONF=no
#IPV6_DEFROUTE=no
#IPV6_FAILURE_FATAL=no
#IPV6_ADDR_GEN_MODE=stable-privacy
IPV6INIT=no
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=br1
UUID=40360c3c-47f5-44ac-bbeb-77f203390d29
DEVICE=br1
ONBOOT=yes
##IPADDR=10.50.21.242
PREFIX=28
##GATEWAY=10.50.21.1
##DNS1=172.20.88.2
NM_CONTROLLED=no

[root@redhat2 :~ ]# cat ifcfg-br2
STP=yes
BRIDGING_OPTS=priority=32768
TYPE=Bridge
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=no
IPV4_FAILURE_FATAL=no
#IPV6INIT=no
#IPV6_AUTOCONF=no
#IPV6_DEFROUTE=no
#IPV6_FAILURE_FATAL=no
#IPV6_ADDR_GEN_MODE=stable-privacy
IPV6INIT=no
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=br2
UUID=fbd5c257-2f66-4f2b-9372-881b783276e0
DEVICE=br2
ONBOOT=yes
##IPADDR=10.50.21.244
PREFIX=28
##GATEWAY=10.50.21.1
##DNS1=172.20.88.2
NM_CONTROLLED=no

[root@redhat2 :~ ]# cat ifcfg-eth0
TYPE=Ethernet
NAME=bond0-slaveeth0
BOOTPROTO=none
#UUID=ee950c07-7eb2-463b-be6e-f97e7ad9d476
DEVICE=eth0
ONBOOT=yes
MASTER=bond0
SLAVE=yes
NM_CONTROLLED=no

[root@redhat2 :~ ]# cat ifcfg-eth1
TYPE=Ethernet
NAME=eth1
UUID=ffec8039-58f0-494a-b335-7a423207c7e6
DEVICE=eth1
ONBOOT=yes
BRIDGE=br1
NM_CONTROLLED=no

[root@redhat2 :~ ]# cat ifcfg-eth2
TYPE=Ethernet
NAME=bond0-slaveeth2
BOOTPROTO=none
#UUID=2c097475-4bef-47c3-b241-f5e7f02b3395
DEVICE=eth2
ONBOOT=yes
MASTER=bond0
SLAVE=yes
NM_CONTROLLED=no


Notice that the bond0 configuration does not have an IP assigned this is done on purpose as we're using the interface channel bonding together with attached bridge for the VM. Usual bonding on a normal physical hardware hosts where no virtualization use is planned is perhaps a better choice. If you however try to set up an IP address in that specific configuration shown here and you try to reboot the machine, you will end up with inacessible machine over the network like I did and you will need to resolve configuration via some kind of ILO / IDRAC interface.

4. Generating UUID for ethernet devices bridges and bonds

One thing to note is the command uuidgen you might need that to generate UID identificators to fit in the new network config files.

Example:
 

[root@redhat2 :~ ]#uuidgen br2
e7995e15-7f23-4ea2-80d6-411add78d703
[root@redhat2 :~ ]# uuidgen br1
05e0c339-5998-414b-b720-7adf91a90103
[root@redhat2 :~ ]# uuidgen br0
e6d7ff74-4c15-4d93-a150-ff01b7ced5fb


5. How to make KVM Virtual Machines see configured Network bridges (modify VM XML)

To make the Virtual machines installed see the bridges I had to

[root@redhat1 :~ ]#virsh edit VM_name1
[root@redhat1 :~ ]#virsh edit VM_name2

[root@redhat2 :~ ]#virsh edit VM_name1
[root@redhat2 :~ ]#virsh edit VM_name2

Find the interface network configuration and change it to something like:

    <interface type='bridge'>
      <mac address='22:53:00:56:5d:ac'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='22:53:00:2a:5f:01'/>
      <source bridge='br1'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='22:34:00:4a:1b:6c'/>
      <source bridge='br2'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </interface>


6. Testing the bond  is up and works fine

# ip addr show bond0
The result is the following:

 

4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:cb:25:82 brd ff:ff:ff:ff:ff:ff


The bond should be visible in the normal network interfaces with ip address show or /sbin/ifconfig

 

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0c:29:ab:2a:fa
Slave queue ID: 0

 

According to the output eth0 is the active slave.

The active slaves device files (eth0 in this case) is found in virtual file system /sys/

# find /sys -name *eth0
/sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0/net/eth0
/sys/devices/virtual/net/bond0/lower_eth0
/sys/class/net/eth0


You can remove a bond member say eth0 by 

 

 cd to the pci* directory
Example: /sys/devices/pci000:00/000:00:15.0

 

# echo 1 > remove


At this point the eth0 device directory structure that was previously located under /sys/devices/pci000:00/000:00:15.0 is no longer there.  It was removed and the device no longer exists as seen by the OS.

You can verify this is the case with a simple ifconfig which will no longer list the eth0 device.
You can also repeat the cat /proc/net/bonding/bond0 command from Step 1 to see that eth0 is no longer listed as active or available.
You can also see the change in the messages file.  It might look something like this:

2021-02-12T14:13:23.363414-06:00 redhat1  device eth0: device has been deleted
2021-02-12T14:13:23.368745-06:00 redhat1 kernel: [81594.846099] bonding: bond0: releasing active interface eth0
2021-02-12T14:13:23.368763-06:00 redhat1 kernel: [81594.846105] bonding: bond0: Warning: the permanent HWaddr of eth0 – 00:0c:29:ab:2a:f0 – is still in use by bond0. Set the HWaddr of eth0 to a different address to avoid conflicts.
2021-02-12T14:13:23.368765-06:00 redhat1 kernel: [81594.846132] bonding: bond0: making interface eth1 the new active one.

 

Another way to test the bonding is correctly switching between LAN cards on case of ethernet hardware failure is to bring down one of the 2 or more bonded interfaces, lets say you want to switch from active-backup from eth1 to eth2, do:
 

# ip link set dev eth0 down


That concludes the test for fail over on active slave failure.

7. Bringing bond updown (rescan) bond with no need for server reboot

You know bonding is a tedious stuff that sometimes breaks up badly so only way to fix the broken bond seems to be a init 6 (reboot) cmd but no actually that is not so.

You can also get the deleted device back with a simple pci rescan command:

# echo 1 > /sys/bus/pci/rescan


The eth0 interface should now be back
You can see that it is back with an ifconfig command, and you can verify that the bond sees it with this command:

# cat /proc/net/bonding/bond0


That concludes the test of the bond code seeing the device when it comes back again.

The same steps can be repeated only this time using the eth1 device and file structure to fail the active slave in the bond back over to eth0.

8. Testing the bond with ifenslave command (ifenslave command examples)

Below is a set of useful information to test the bonding works as expected with ifenslave command  comes from "iputils-20071127" package

– To show information of all the inerfaces

                  # ifenslave -a
                  # ifenslave –all-interfaces 

 

– To change the active slave

                  # ifenslave -c bond0 eth1
                  # ifenslave –change-active bond0 eth1 

 

– To remove the slave interface from the bonding device

                  # ifenslave -d eth1
                  # ifenslave –detach bond0 eth1 

 

– To show master interface info

                  # ifenslave bond0 

 

– To set the bond device down and automatically release all the slaves

                  # ifenslave bond1 down 

– To get the usage info

                  # ifenslave -u
                  # ifenslave –usage 

– To set to verbose mode

                  # ifenslave -v
                  # ifenslave –verbose 

9. Testing the bridge works fine

Historically over the years all kind of bridges are being handled with the brctl part of bridge-utils .deb / .rpm installable package.

The classical way to check a bridge is working is to do

# brctl show
# brctl show br0; brctl show br1; brctl show br2

# brctl showmacs br0
 

etc.

Unfortunately with redhat 8 this command is no longer available so to get information about configured bridges you need to use instead:

 

# bridge link show
3:eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master bridge0 state forwarding priority 32 cost 100
4:eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master bridge0 state listening priority 32 cost 100


10. Troubleshooting network connectivity issues on bond bridges and LAN cards

Testing the bond connection and bridges can route proper traffic sometimes is a real hassle so here comes at help the good old tcpdump

If you end up with issues with some of the ethernet interfaces between HV1 and HV2 to be unable to talk to each other and you have some suspiciousness that some colleague from the network team has messed up a copper (UTP) cable or there is a connectivity fiber optics issues. To check the VLAN tagged traffic headers on the switch you can listen to each and every bond0 and br0, br1, br2 eth0, eth1, eth2, eth3 configured on the server like so:

# tcpdump -i bond0 -nn -e vlan


Some further investigation on where does a normal ICMP traffic flows once everything is setup is a normal thing to do, hence just try to route a normal ping via the different server interfaces:

# ping -I bond0 DSTADDR

# ping -i eth0 DSTADDR

# ping -i eth1 DSTADDR

# ping -i eth2 DSTADDR


After conducting the ping do the normal for network testing big ICMP packages (64k) ping to make sure there are no packet losses etc., e.g:

# ping -I eth3 -s 64536  DSTADDR


If for 10 – 20 seconds the ping does not return package losses then you should be good.

Howto Upgrade IBM Spectrum Protect Backup Client TSM 7.X to 8.1.8, Update Tivoli 8.1.8 to 8.1.11 on CentOS and Redhat Linux

Thursday, December 3rd, 2020

 

IBM-spectrum-protect-backup-logo-tivoli-tsm-logo

Having another day of a system administrator boredom, we had a task to upgrade some Tivoli TSM Backup clients running on a 20+ machines powered by CentOS and RHEL Linux to prepare the systems to be on the latest patched IBM Spectrum Backup client version available from IBM. For the task of patching I've used a central server where, I've initially downloaded the provided TSM client binaries archives. From this machine, we have copied TivSM*.tar to each and every system that needs to be patched and then patched. The task is not too complex as the running TSM in the machines are all at the same version and all running a recent patched version of Linux. Hence to make sure all works as expected we have tested TSM is upgraded from 7.X.X to 8.X.X on one machine and then test 8.1.8 to 8.1.11 upgrade on another one. Once having confirmed that Backups works as expected after upgrade. We have proceeded to do it massively on each of the rest 20+ hosts.
Below article's goal is to help some lazy sysadmin with the task to prepare an TSM Backup upgrade procedure to standartize TSM Upgrade, which as many of the IBM's softwares is very specific and its upgrade requires, a bit of manual work and extra cautious as there seems to be no easy way (or at least I don't know it), to do the upgrade by simply adding an RPM repository and doing, something like yum install tivsm*.


0. Check if there is at least 2G free of space

According to documentation the minimum space you need to a functional install without having it half installed or filling up your filesystem is 2 Gigabytes of Free Memory on a filesystem where the .tar and rpms will be living.

Thus check what is the situation with your filesystem where you wills store the .tar archice and extract .RPM files / install the RPM files.

# df -h

1. Download the correct tarball with 8.1 Client

On one central machine you would need to download the Tivoli you can do that via wget / curl / lynx whatever is at hand on the Linux server.

As of time of writting this article TSM's 8.1.11 location is at
URL:

http://public.dhe.ibm.com/storage/tivoli-storage-management/maintenance/client/v8r1/Linux/LinuxX86/BA/v8111/

I've made a local download mirror of Tivoli TSM 8.1.11 here.
In case you need to install IBM Spectrum Backup Client to a PCI secured environment to a DMZ-ed LAN network from a work PC you can Download it first from your local PC and via Citrix client upload program or WinSCP upload it to a central replication host from where you will later copy to each of the other server nodes that needs to be upgraded.

Lets Copy archive to all Server hosts where you want it later installed, using a small hack

Assuming you already have an Excel document or a Plain text document with all the IPs of the affected hosts where you will need to get TSM upgraded. Extract this data and from it create a plain text file /home/user/hosts.txt containing all the machine IPs lined up separated with carriage return separations (\n), so you can loop over each one and use scp to send the files.

– Replicate Tivoli tar to all machine hosts where you want to get IBM Spectrum installed or upgraded.
Do it with a loop like this:

# for i in $(cat hosts.txt); do scp 8.1.11.0-TIV-TSMBAC-LinuxX86.tar user@$i:/home/user/; done

 Copy to a Copy buffer temporary your server password assuming all your passwords to each machine are identical and paste your login user pass for each host to initiate transfer
 

2. SSH to each of the Machine hosts IPs

Once you login to the host you want to upgrade
Go to your user $HOME /home/user and create files where we'll temporary store Tivoli archive files and extract RPMs

[root@linux-server user]# mkdir -p ~/tsm/TSM_BCK/
[root@linux-server user]# mv 8.1.11.0-TIV-TSMBAC-LinuxX86.tar ~/tsm
[root@linux-server user]# cd tsm
[root@linux-server user]# tar -xvvf 8.1.11.0-TIV-TSMBAC-LinuxX86.tar
gskcrypt64-8.0.55.17.linux.x86_64.rpm
GSKit.pub.pgp
gskssl64-8.0.55.17.linux.x86_64.rpm
README_api.htm
README.htm
RPM-GPG-KEY-ibmpkg
TIVsm-API64.x86_64.rpm
TIVsm-APIcit.x86_64.rpm
TIVsm-BAcit.x86_64.rpm
TIVsm-BAhdw.x86_64.rpm
TIVsm-BA.x86_64.rpm
TIVsm-filepath-source.tar.gz
TIVsm-JBB.x86_64.rpm
TIVsm-WEBGUI.x86_64.rpm
update.txt

3. Create backup of old backup files

It is always a good idea to keep old backup files

[root@linux-server tsm]# cp -av /opt/tivoli/tsm/client/ba/bin/dsm.opt ~/tsm/TSM_BCK/dsm.opt_bak_$(date +'%Y_%M_%H')
[root@linux-server tsm]# cp -av /opt/tivoli/tsm/client/ba/bin/dsm.sys ~/tsm/TSM_BCK/dsm.sys_bak_$(date +'%Y_%M_%H')

[root@linux-server tsm]# [[ -f /etc/adsm/TSM.PWD ]] && cp -av /etc/adsm/TSM.PWD ~/TSM_BCK/ || echo 'file doesnt exist'

/etc/adsm/TSM.PWD this file is only there as legacy for TSM it contained encrypted passwords inver 7 for updates. In TSM v.8 encryption file is not there as new mechanism for sensitive data was introduced.
Be aware that from Tivoli 8.X it will return error
exist'

!! Note – if dsm.opt , dsm.sys files are on different locations – please use correct full path locations !!

4. Stop  dsmcad – TSM Service daemon

[root@linux-server tsm]# systemctl stop dsmcad

5. Locate and deinstall all old Clients

Depending on the version to upgrade if you're upgrading from TSM version 7 to 8, you will get output like.

[root@linux-server tsm]# rpm -qa | grep 'TIVsm-'
TIVsm-BA-7.1.6-2.x86_64
TIVsm-API64-7.1.6-2.x86_64

If you're one of this paranoid admins you can remove TIVsm packs  one by one.

[root@linux-server tsm]# rpm -e TIVsm-BA-7.1.6-2.x86_64
[root@linux-server tsm]# rpm -e TIVsm-API64-7.1.6-2.x86_64

Instead if upgrading from version 8.1.8 to 8.1.11 due to the Security CVE advisory recently published by IBM e.g. (IBM Runtime Vulnerability affects IBM Spectrum Backup archive Client) and  vulnerability in Apache Commons Log4J affecting IBM Spectrum Protect Backup Archive Client.

[root@linux-server tsm]# rpm -qa | grep 'TIVsm-'
TIVsm-API64-8.1.8-0.x86_64
TIVsm-BA-8.1.8-0.x86_64

Assuming you're not scared of a bit automation you can straight do it with below one liner too 🙂

# rpm -e $(rpm -qa | grep TIVsm)

[root@linux-server tsm]# rpm -qa | grep gsk
[root@linux-server tsm]# rpm -e gskcrypt64 gskssl64

6. Check uninstallation success:

[root@linux-server tsm]# rpm -qa | grep TIVsm
[root@linux-server tsm]# rpm -qa | grep gsk

Here you should an Empty output, if packages are not on the system, e.g. Empty output is good output ! 🙂

7. Install new client IBM Spectrum Client (Tivoli Storage Manager) and lib dependencies

[root@linux-server tsm]# rpm -ivh gskcrypt64-8.0.55.4.linux.x86_64.rpm
[root@linux-server tsm]# rpm -ivh gskssl64-8.0.55.4.linux.x86_64.rpm

 If you're lazy to type you can do as well

[root@linux-server tsm]# rpm -Uvh gsk*

Next step is to install main Tivoli SM components the the API files and BA (The Backup Archive Client)

[root@linux-server tsm]# rpm -ivh TIVsm-API64.x86_64.rpm
[root@linux-server tsm]# rpm -ivh TIVsm-BA.x86_64.rpm

If you have to do it on multiple servers and you do it manually following a guide like this, you might instead want to install them with one liner.

[root@linux-server tsm]# rpm -ivh TIVsm-API64.x86_64.rpm TIVsm-BA.x86_64.rpm

There are some Not mandatory "Common Inventory Technology" components (at some cases if you're using the API install it we did not need that), just for the sake if you need them on your servers due to backup architecture, install also below commented rpm files.

## rpm -ivh TIVsm-APIcit.x86_64.rpm

## rpm -ivh TIVsm-BAcit.x86_64.rpm

These packages not needed only for operation WebGUI TSM GUI management, (JBB) Journal Based Backup, BAhdw (the ONTAP library)


— TIVsm-WEBGUI.x86_64.rpm
— TIVsm-JBB.x86_64.rpm
— TIVsm-BAhdw.x86_64.rpm

8. Start and enable dsmcad service

[root@linux-server tsm]# systemctl stop dsmcad

You will get

##Warning: dsmcad.service changed on disk. Run 'systemctl daemon-reload' to reload units.

[root@linux-server tsm]# systemctl daemon-reload

[root@linux-server tsm]# systemctl start dsmcad


## enable dsmcad – it is disabled by default after install

[root@linux-server ~]# systemctl enable dsmcad

[root@linux-server tsm]# systemctl status dsmcad

9. Check dmscad service is really running

Once enabled IBM TSM will spawn a process in the bacground dmscad if it started properly you should have the process backgrounded.

[root@linux-server tsm]# ps -ef|grep -i dsm|grep -v grep
root      2881     1  0 18:05 ?        00:00:01 /usr/bin/dsmcad

If process is not there there might be some library or something not at place preventing the process to start …

10. Check DSMCAD /var/tsm logs for errors

After having dsmcad process enabled and running in background

[root@linux-server tsm]# grep -i Version /var/tsm/sched.log|tail -1
12/03/2020 18:06:29   Server Version 8, Release 1, Level 10.000

 

[root@linux-server tsm]# cat /var/tsm/dsmerror.log

To see the current TSM configuration files we can  grep out comments *

[root@linux-server tsm]# grep -v '*' /opt/tivoli/tsm/client/ba/bin/dsm.sys

Example Configuration of the agent:
—————————————————-
   *TSM SERVER NODE Location
   Servername           tsm_server
   COMMmethod           TCPip
   TCPPort              1400
   TCPServeraddress     tsmserver2.backuphost.com
   NodeName             NODE.SERVER-TO-BACKUP-HOSTNAME.COM
   Passwordaccess       generate
   SCHEDLOGNAME         /var/tsm/sched.log
   SCHEDLOGRETENTION    21 D
   SCHEDMODE            POLLING
   MANAGEDServices      schedule
   ERRORLOGNAME         /var/tsm/dsmerror.log
   ERRORLOGRETENTION    30 D
   INCLEXCL             /opt/tivoli/tsm/client/ba/bin/inclexcl.tsm

11. Remove tsm install directory tar ball and rpms to save space on system

The current version of Tivoli service manager is 586 Megabytes.

[root@linux-server tsm]# du -hsc 8.1.11.0-TIV-TSMBAC-LinuxX86.tar
586M    8.1.11.0-TIV-TSMBAC-LinuxX86.tar

Some systems are on purpose configured to have less space under their /home directory,
hence it is a good idea to clear up unnecessery files after completion.

Lets get rid of all the IBM Spectrum archive source files and the rest of RPMs used for installation.

[root@linux-server tsm]# rm -rf ~/tsm/{*.tar,*.rpm,*.gpg,*.htm,*.txt}

12. Check backups are really created on the configured remote Central backup server

To make sure after the upgrade the backups are continuously created and properly stored on the IBM Tivoly remote central backup server, either manually initiate a backup or wait for lets say a day and run dsmc client to show all created backups from previous day. To make sure you'll not get empty output you can on purpose modify some file by simply opening it and writting over without chaning anything e.g. modify your ~/.bashrc or ~/.bash_profile

## List all backups for '/' root directory from -fromdate='DD/MM/YY'

[root@linux-server tsm]# dsmc
Protect>
IBM Spectrum Protect
Command Line Backup-Archive Client Interface
  Client Version 8, Release 1, Level 11.0
  Client date/time: 12/03/2020 18:14:03
(c) Copyright by IBM Corporation and other(s) 1990, 2020. All Rights Reserved.

Node Name: NODE.SERVER-TO-BACKUP-HOSTNAME.COM
Session established with server TSM2_SERVER: AIX
  Server Version 8, Release 1, Level 10.000
  Server date/time: 12/03/2020 18:14:04  Last access: 12/03/2020 18:06:29
 
Protect> query backup -subdir=yes "/" -fromdate=12/3/2020
           Size        Backup Date                Mgmt Class           A/I File
           —-        ———–                ———-           — —-
         6,776  B  12/03/2020 01:26:53             DEFAULT              A  /etc/freshclam.conf
         6,685  B  12/03/2020 01:26:53             DEFAULT              A  /etc/freshclam.conf-2020-12-02
         5,602  B  12/03/2020 01:26:53             DEFAULT              A  /etc/hosts
         5,506  B  12/03/2020 01:26:53             DEFAULT              A  /etc/hosts-2020-12-02
           398  B  12/03/2020 01:26:53             DEFAULT              A  /opt/tivoli/tsm/client/ba/bin/tsmstats.ini
       114,328  B  12/03/2020 01:26:53             DEFAULT              A  /root/.bash_history
           403  B  12/03/2020 01:26:53             DEFAULT              A  /root/.lesshst

Sysadmin tip: How to force a new Linux user account password change after logging to improve security

Thursday, June 18th, 2020

chage-linux-force-password-expiry-check-user-password-expiry-setting

Have you logged in through SSH to remote servers with the brand new given UNIX account in your company just to be prompted for your current Password immediately after logging and forced to change your password?
The smart sysadmins or security officers use this trick for many years to make sure the default set password for new user is set to a smarter user to prevent default password leaks which might later impose a severe security risk for a company Demiliterized networks confidential data etc.

If you haven't seen it yet and you're in the beautiful world of UNIX / Linux as a developer qa tester or sysadmin sooner or later you will face it.
Here of course I'm talking about plain password local account authentication using user / pass credentials stored in /etc/passwd or /etc/shadow.

Lets Say hello to the main command chage that is used to do this sysadmin trick.
chage command is used to change user password expiry information and  set and alter password aging parameters on user accounts.

 

1. Force chage to make password expire on next user login for a new created user
 

# chage -d 0 {user-name} 


Below is a real life example
 

chage-force-user-account-password-expiry-linux

 

2. Get information on when account expires

 

[hipo@linux ~]$ chage -l hipo
Last password change                                    : Apr 03, 2020
Password expires                                        : Jul 08, 2020
Password inactive                                       : never
Account expires                                         : never
Minimum number of days between password change          : 0
Maximum number of days between password change          : 90
Number of days of warning before password expires       : 14

 

3. Use chage to set user account password expiration

The most straight forward way to set an expiration date for an active user acct is with:

 

# chage -E 2020-08-16 username


To make the account get locked automatically if the password has expired and the user did not logged in to it for 2 days after its expiration.

# chage -I 2 username


– Set Password expire with Minimum days 7 (-n mindays 7), (-x maxdays 28) and (-w warndays 5)

# passwd -n 7 -x 28 -w 5 username

To check the passwod expiration settings use list command:

# chage -l username
Last password change                                    : юни 18, 2020
Password expires                                        : юли 16, 2020
Password inactive                                       : never
Account expires                                         : never
Minimum number of days between password change          : 7
Maximum number of days between password change          : 28
Number of days of warning before password expires       : 5

 

chage is a command is essential sysadmin command that is mentioned in every Learn Linux book out there, however due to its often rare used many people and sysadmins either, don't know it or learn of it only once it is needed. 
A note to make here is some sysadmins prefer to use usermod to set a password expire instead of chage.

usermod -e 2020-10-14 username

 

For those who wonder how to set password expiry on FreeBSD and other BSD-es is done, there it is done via the pw system user management tool as chage is not present there.

 

A note to make here is chage usually does not provide information for Linux user accounts that are stored in LDAP. To get information of such you can use ldapsearch with a query to the LDAP domain store with something like.
 

ldapsearch -x -ZZ -LLL -b dc=domain.com,dc=com objectClass=*


It is worthy to mention also another useful command when managing users this is getent used to get entries from Name Service Switch libraries. 
getent is useful to get various information from basic /etc/ stored db files such as /etc/services /etc/shadow, /etc/group, /etc/aliases, /etc/hosts and even do some simple rpc queries.

Saving multiple passwords in Linux with Revelation and Keepass2 – Keeping track of multiple passwords

Thursday, October 17th, 2013

System Administrators who use MS Windows to access multiple hosts in big companies like HP or IBM certainly use some kind of multiple password manager like PasswordSafe.

Keep multiple passwords safe in Microsoft Windows 7 passwordsafe with masterpassword

When number of passwords you have to keep in mind grows significantly using something like PasswordSafe becomes mandatory. Same is valid also for valid for system administrators who use GNU / Linux as a Desktop environment to administer thousands or hundreds of servers. I'm one of those admins who for years use Linux and until recently I kept logging all my passwords in separate directory full with text files created with vim (text editor). As the number of passwords and accesses to servers and web interfaces grow up dramatically as well as my requirement for security raised up I wanted to have my passwords secured being kept encrypted on my hard drive. For those who never use PasswordSafe the idea of program is to store all passwords you have in encrypted database which can be only opened through PasswordSafe by providing a master password.

passwordsafe on microsoft windows keep in order multiple passwords manager

Of course having one master password imposes other security risks as someone who knows the MasterPass can easily access all your passwords anyways for now such level of security perfectly fits my needs.

PasswordSafe is since recently Open Source so there is a Linux port, but the port is still in beta and though I tried hard to install it using provided .deb binaries as well as compile from source, I finally give it up. And decided to review what kind of password managers are available in Debian Wheezy's ports.

Here are those I found with;

apt-cache search password|grep -i manager

cpm – Curses based password manager using PGP-encryption
fpm2 – password manager with GTK+ 2.x GUI
gringotts – secure password and data storage manager
kedpm – KED Password Manager
kedpm-gtk – KED Password Manager
keepass2 – Password manager
keepassx – Cross Platform Password Manager
kwalletmanager – secure password wallet manager
password-gorilla – cross-platform password manager
revelation – GNOME2 Password manager

I didn't have the time to test each one of them, so I installed and checked only those which seemed more reliable, i.e.:
keepass2 and revelation

# apt-get install –yes fpm2 keepass2 revelation

Below is screenshot of each one of managers:

Revelation Linux Gnome graphic password manager program

Revelation – GNOME Password Manager

keepass2 Linux gui password manager screenshot Debian - graphic manager for storing passwords

kde password safe gui program Linux Debian screenshot

KDE QT Interface Linux GUI Password Manager (KeePass2)

With one of this tools admin's life is much easier as you don't have to get crazy and remember thousands of passwords.
Hope this helps some admin out there! Enjoy ! 🙂
 

Why du and df reporting different on a filesystem / How to fix inconsistency between used space on FS and disk showing full strangeness

Wednesday, July 24th, 2019

linux-why-du-and-df-shows-different-result-inconsincy-explained-filesystem-full-oddity

If you're a sysadmin on a large server environment such as a couple of hundred of Virtual Machines running Linux OS on either physical host or OpenXen / VmWare hosted guest Virtual Machine, you might end up sometimes at an odd case where some mounted partition mount point reports its file use different when checked with
df
cmd than when checked with du command, like for example:
 

root@sqlserver:~# df -hT /var/lib/mysql
Filesystem   Type  Size Used Avail Use% Mounted On
/dev/sdb5      ext4    19G  3,4G    14G  20% /var/lib/mysql

Here the '-T' argument is used to show us the filesystem.

root@sqlserver:~# du -hsc /var/lib/mysql
0K    /var/lib/mysql/
0K    total

 

1. Simple debug on what might be the root cause for df / du inconsistency reporting

 

Of course the basic thing to do when in that weird situation is to be totally shocked how this is possible and to investigate a bit what is the biggest first level sub-directories that eat up the space on the mounted location, with du:

 

# du -hkx –max-depth=1 /var/lib/mysql/|uniq|sort -n
4       /var/lib/mysql/test
8       /var/lib/mysql/ezmlm
8       /var/lib/mysql/micropcfreak
8       /var/lib/mysql/performance_schema
12      /var/lib/mysql/mysqltmp
24      /var/lib/mysql/speedtest
64      /var/lib/mysql/yourls
144     /var/lib/mysql/narf
320     /var/lib/mysql/webchat_plus
424     /var/lib/mysql/goodfaithair
528     /var/lib/mysql/moonman
648     /var/lib/mysql/daniel
852     /var/lib/mysql/lessn
1292    /var/lib/mysql/gallery

The given output is in Kilobytes so it is a little bit hard to read, if you're used to Mbytes instead, do

 

 # du -hmx –max-depth=1 /var/lib/mysql/|uniq|sort -n|less

 

I've also investigated on the complete /var directory contents sorted by size with:

 

 # du -akx ./ | sort -n
5152564    ./cache/rsnapshot/hourly.2/localhost
5255788    ./cache/rsnapshot/hourly.2
5287912    ./cache/rsnapshot
7192152    ./cache


Even after finding out the bottleneck dirs and trying to clear up a bit, continued facing that inconsistently shown in two commands and if you're likely to be stunned like me and try … to move some files to a different filesystem to free up space or assigned inodes with a hope that shown inconsitency output will be fixed as it might be caused  due to some kernel / FS caching ?? and this will eventually make the mounted FS to refresh …

But unfortunately, if you try it you'll figure out clearing up a couple of Megas or Gigas will make no difference in cmd output.

In my exact case /var/lib/mysql is a separate mounted ext4 filesystem, however same issue was present also on a Network Filesystem (NFS) and thus, my first thought that this is caused by a network failure problem or NFS bug turned to be wrong.

After further short investigation on the inodes on the Filesystem, it was clear enough inodes are available:
 

# df -i /var/lib/mysql
Filesystem       Inodes  IUsed   IFree IUse% Mounted on
/dev/sdb5      1221600  2562 1219038   1% /var/lib/mysql

 

So the filled inodes count assumed issue also has been rejected.
P.S. (if you're not well familiar with them read manual, i.e. – man 7 inode).
 

– Remounting the mounted filesystem

To make sure the filesystem shown inconsistency between du and df is not due to some hanging network mount or bug, first logical thing I did is to remount the filesytem showing different in size, in my case this was done with:
 

# mount -o remount,rw -t ext4 /var/lib/mysql

For machines with NFS remote mounted storage locations, used:

# mount -o remount,rw -t nfs /var/www


FS remount did not solved it so I continued to ponder what oddity and of course I thought of a workaround (in case if this issues are caused by kernel bug or OS lib issue) reboot might be the solution, however unfortunately restarting the VMs was not a wanted easy to do solution, thus I continued investigating what is wrong …

Next check of course was to check, what kind of network connections are opened to the affected hosts with:
 

# netstat -tupanl


Did not found anything that might point me to the reported different Megabytes issue, so next step was to check what is the situation with currently opened files by running processes on the weird df / du reported systems with lsof, and boom there I observed oddity such as multiple files

 

# lsof -nP | grep '(deleted)'

COMMAND   PID   USER   FD   TYPE DEVICE    SIZE NLINK  NODE NAME
mysqld   2588  mysql    4u   REG 253,17      52     0  1495 /var/lib/mysql/tmp/ibY0cXCd (deleted)
mysqld   2588  mysql    5u   REG 253,17    1048     0  1496 /var/lib/mysql/tmp/ibOrELhG (deleted)
mysqld   2588  mysql    6u   REG 253,17       777884290     0  1497 /var/lib/mysql/tmp/ibmDFAW8 (deleted)
mysqld   2588  mysql    7u   REG 253,17       123667875     0 11387 /var/lib/mysql/tmp/ib2CSACB (deleted)
mysqld   2588  mysql   11u   REG 253,17       123852406     0 11388 /var/lib/mysql/tmp/ibQpoZ94 (deleted)

 

Notice that There were plenty of '(deleted)' STATE files shown in memory an overall of 438:

 

# lsof -nP | grep '(deleted)' |wc -l
438


As I've learned a bit online about the problem, I found it is also possible to find deleted unlinked files only without any greps (to list all deleted files in memory files with lsof args only):

 

# lsof +L1|less


The SIZE field (fourth column)  shows a number of files that are really hard in size and that are kept in open on filesystem and in memory, totally messing up with the filesystem. In my case this is temp files created by MYSQLD daemon but depending on the server provided service this might be apache's www-data, some custom perl / bash script executed via a cron job, stalled rsync jobs etc.
 

2. Check all the list open files with the mysql / root user as part of the the server filesystem inconsistency debugging with:

 

– Grep opened files on server by user

# lsof |grep mysql
mysqld    1312                       mysql  cwd       DIR               8,21       4096          2 /var/lib/mysql
mysqld    1312                       mysql  rtd       DIR                8,1       4096          2 /
mysqld    1312                       mysql  txt       REG                8,1   20336792   23805048 /usr/sbin/mysqld
mysqld    1312                       mysql  mem       REG               8,21      24576         20 /var/lib/mysql/tc.log
mysqld    1312                       mysql  DEL       REG               0,16                 29467 /[aio]
mysqld    1312                       mysql  mem       REG                8,1      55792   14886933 /lib/x86_64-linux-gnu/libnss_files-2.28.so

 

# lsof | grep root
COMMAND    PID   TID TASKCMD          USER   FD      TYPE             DEVICE   SIZE/OFF       NODE NAME
systemd      1                        root  cwd       DIR                8,1       4096          2 /
systemd      1                        root  rtd       DIR                8,1       4096          2 /
systemd      1                        root  txt       REG                8,1    1489208   14928891 /lib/systemd/systemd
systemd      1                        root  mem       REG                8,1    1579448   14886924 /lib/x86_64-linux-gnu/libm-2.28.so

Other command that helped to track the discrepancy between df and du different file usage on FS is:
 

# du -hxa  / | egrep '^[[:digit:]]{1,1}G[[:space:]]*'
 

 

3. Fixing large files kept in memory filesystem problem


What is the real reason for ending up with this file handlers opened by running backgrounded programs on the Linux OS?
It could be multiple  but most likely it is due to exceeded server / client interactions or breaking up RAM or HDD drive with writing plenty of logs on the FS without ending keeping space occupied or Programming library bugs used by hanged service leaving the FH opened on storage.

What is the solution to file system files left in memory problem?

The best solution is to first fix custom script or hanged service and then if possible to simply restart the server to make the kernel / services reload or if this is not possible just restart the problem creation processes.

Once the process is identified like in my case this was MySQL on systemd enabled newer OS distros, just do:

 

 

# systemctl restart mysqld.service


or on older init.d system V ones:

# /etc/init.d/service restart


For custom hanged scripts being listed in ps axuwef you can grep the pid and do a kill -HUP (if the script is written in a good way to recognize -HUP and restart the sub-running process properly – BE EXTRA CAREFUL IF YOU'RE RESTARTING BROKEN SCRIPTS as this might cause your running service disruptions …).

# pgrep -l script.sh
7977 script.sh


# kill -HUP PID

 

Now finally this should either mitigate or at best case completely solve the reported disagreement between df and du, after which the calculated / reported disk space should be back to normal and show up approximately the same (note that size changes a bit as mysql service is writting data) constantly extending the size between the two checks.

 

# df -hk /var/lib/mysql; du -hskc /var/lib/mysql
Filesystem       Inodes  IUsed   IFree IUse% Mounted on
/dev/sdb5        19097172 3472744 14631296  20% /var/lib/mysql
3427772    /var/lib/mysql
3427772    total

 

What we learned?

What I've explained in this article is why and how it comes that 'zoombie' files reside on a filesystem
appearing to be eating disk space on a mounted local or network partition, giving strange inconsistent
reports, leading to system service disruptions and impossibility to have correctly shown information on used
disk space on mounted drive.

I went through with some standard logic on debugging service / filesystem / inode issues up explainat, that led me to the finding about deleted files being kept in filesystem and producing the filesystem strange sized / showing not correct / filled even after it was extended with tune2fs and was supposed to have extra 50GBs.

Finally it was explained shortly how to HUP / restart hanging script / service to fix it.

Some few good readings that helped to fix the issue:

What to do when du and df report different usage is here
df in linux not showing correct free space after file removal is here
Why do “df” and “du” commands show different disk usage?
 

Nginx increase security by putting websites into Linux jails howto

Monday, August 27th, 2018

linux-jail-nginx-webserver-increase-security-by-putting-it-and-its-data-into-jail-environment

If you're sysadmining a large numbers of shared hosted websites which use Nginx Webserver to interpret PHP scripts and serve HTML, Javascript, CSS … whatever data.

You realize the high amount of risk that comes with a possible successful security breach / hack into a server by a malicious cracker. Compromising Nginx Webserver by an intruder automatically would mean that not only all users web data will get compromised, but the attacker would get an immediate access to other data such as Email or SQL (if the server is running multiple services).

Nowadays it is not so common thing to have a multiple shared websites on the same server together with other services, but historically there are many legacy servers / webservers left which host some 50 or 100+ websites.

Of course the best thing to do is to isolate each and every website into a separate Virtual Container however as this is a lot of work and small and mid-sized companies refuse to spend money on mostly anything this might be not an option for you.

Considering that this might be your case and you're running Nginx either as a Load Balancing, Reverse Proxy server etc. , even though Nginx is considered to be among the most secure webservers out there, there is absolutely no gurantee it would not get hacked and the server wouldn't get rooted by a script kiddie freak that just got in darknet some 0day exploit.

To minimize the impact of a possible Webserver hack it is a good idea to place all websites into Linux Jails.

linux-jail-simple-explained-diagram-chroot-jail

For those who hear about Linux Jail for a first time,
chroot() jail is a way to isolate a process / processes and its forked children from the rest of the *nix system. It should / could be used only for UNIX processes that aren't running as root (administrator user), because of the fact the superuser could break out (escape) the jail pretty easily.

Jailing processes is a concept that is pretty old that was first time introduced in UNIX version 7 back in the distant year 1979, and it was first implemented into BSD Operating System ver. 4.2 by Bill Joy (a notorious computer scientist and co-founder of Sun Microsystems). Its original use for the creation of so called HoneyPot – a computer security mechanism set to detect, deflect, or, in some manner, counteract attempts at unauthorized use of information systems that appears completely legimit service or part of website whose only goal is to track, isolate, and monitor intruders, a very similar to police string operations (baiting) of the suspect. It is pretty much like а bait set to collect the fish (which in this  case is the possible cracker).

linux-chroot-jail-environment-explained-jailing-hackers-and-intruders-unix

BSD Jails nowadays became very popular as iPhones environment where applications are deployed are inside a customly created chroot jail, the principle is exactly the same as in Linux.

But anyways enough talk, let's create a new jail and deploy set of system binaries for our Nginx installation, here is the things you will need:

1. You need to have set a directory where a copy of /bin/ls /bin/bash /bin/,  /bin/cat … /usr/bin binaries /lib and other base system Linux system binaries copy will reside.

 

server:~# mkdir -p /usr/local/chroot/nginx

 


2. You need to create the isolated environment backbone structure /etc/ , /dev, /var/, /usr/, /lib64/ (in case if deploying on 64 bit architecture Operating System).

 

server:~# export DIR_N=/usr/local/chroot/nginx;
server:~# mkdir -p $DIR_N/etc
server:~# mkdir -p $DIR_N/dev
server:~# mkdir -p $DIR_N/var
server:~# mkdir -p $DIR_N/usr
server:~# mkdir -p $DIR_N/usr/local/nginx
server:~# mkdir -p $DIR_N/tmp
server:~# chmod 1777 $DIR_N/tmp
server:~# mkdir -p $DIR_N/var/tmp
server:~# chmod 1777 $DIR_N/var/tmp
server:~# mkdir -p $DIR_N/lib64
server:~# mkdir -p $DIR_N/usr/local/

 

3. Create required device files for the new chroot environment

 

server:~# /bin/mknod -m 0666 $D/dev/null c 1 3
server:~# /bin/mknod -m 0666 $D/dev/random c 1 8
server:~# /bin/mknod -m 0444 $D/dev/urandom c 1 9

 

mknod COMMAND is used instead of the usual /bin/touch command to create block or character special files.

Once create the permissions of /usr/local/chroot/nginx/{dev/null, dev/random, dev/urandom} have to be look like so:

 

server:~# ls -l /usr/local/chroot/nginx/dev/{null,random,urandom}
crw-rw-rw- 1 root root 1, 3 Aug 17 09:13 /dev/null
crw-rw-rw- 1 root root 1, 8 Aug 17 09:13 /dev/random
crw-rw-rw- 1 root root 1, 9 Aug 17 09:13 /dev/urandom

 

4. Install nginx files into the chroot directory (copy all files of current nginx installation into the jail)
 

If your NGINX webserver installation was installed from source to keep it latest
and is installed in lets say, directory location /usr/local/nginx you have to copy /usr/local/nginx to /usr/local/chroot/nginx/usr/local/nginx, i.e:

 

server:~# /bin/cp -varf /usr/local/nginx/* /usr/local/chroot/nginx/usr/local/nginx

 


5. Copy necessery Linux system libraries to newly created jail
 

NGINX webserver is compiled to depend on various libraries from Linux system root e.g. /lib/* and /lib64/* therefore in order to the server work inside the chroot-ed environment you need to transfer this libraries to the jail folder /usr/local/chroot/nginx

If you are curious to find out which libraries exactly is nginx binary dependent on run:

server:~# ldd /usr/local/nginx/usr/local/nginx/sbin/nginx

        linux-vdso.so.1 (0x00007ffe3e952000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f2b4762c000)
        libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f2b473f4000)
        libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f2b47181000)
        libcrypto.so.0.9.8 => /usr/local/lib/libcrypto.so.0.9.8 (0x00007f2b46ddf000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f2b46bc5000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2b46826000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2b47849000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f2b46622000)


The best way is to copy only the libraries in the list from ldd command for best security, like so:

 

server: ~# cp -rpf /lib/x86_64-linux-gnu/libthread.so.0 /usr/local/chroot/nginx/lib/*
server: ~# cp -rpf library chroot_location

etc.

 

However if you're in a hurry (not a recommended practice) and you don't care for maximum security anyways (you don't worry the jail could be exploited from some of the many lib files not used by nginx and you don't  about HDD space), you can also copy whole /lib into the jail, like so:

 

server: ~# cp -rpf /lib/ /usr/local/chroot/nginx/usr/local/nginx/lib

 

NOTE! Once again copy whole /lib directory is a very bad practice but for a time pushing activities sometimes you can do it …


6. Copy /etc/ some base files and ld.so.conf.d , prelink.conf.d directories to jail environment
 

 

server:~# cp -rfv /etc/{group,prelink.cache,services,adjtime,shells,gshadow,shadow,hosts.deny,localtime,nsswitch.conf,nscd.conf,prelink.conf,protocols,hosts,passwd,ld.so.cache,ld.so.conf,resolv.conf,host.conf}  \
/usr/local/chroot/nginx/usr/local/nginx/etc

 

server:~# cp -avr /etc/{ld.so.conf.d,prelink.conf.d} /usr/local/chroot/nginx/nginx/etc


7. Copy HTML, CSS, Javascript websites data from the root directory to the chrooted nginx environment

 

server:~# nice -n 10 cp -rpf /usr/local/websites/ /usr/local/chroot/nginx/usr/local/


This could be really long if the websites are multiple gigabytes and million of files, but anyways the nice command should reduce a little bit the load on the server it is best practice to set some kind of temporary server maintenance page to show on the websites index in order to prevent the accessing server clients to not have interrupts (that's especially the case on older 7200 / 7400 RPM non-SSD HDDs.)
 

 

8. Stop old Nginx server outside of Chroot environment and start the new one inside the jail


a) Stop old nginx server

Either stop the old nginx using it start / stop / restart script inside /etc/init.d/nginx (if you have such installed) or directly kill the running webserver with:

 

server:~# killall -9 nginx

 

b) Test the chrooted nginx installation is correct and ready to run inside the chroot environment

 

server:~# /usr/sbin/chroot /usr/local/chroot/nginx /usr/local/nginx/nginx/sbin/nginx -t
server:~# /usr/sbin/chroot /usr/local/chroot/nginx /usr/local/nginx/nginx/sbin/nginx

 

c) Restart the chrooted nginx webserver – when necessery later

 

server:~# /usr/sbin/chroot /nginx /usr/local/chroot/nginx/sbin/nginx -s reload

 

d) Edit the chrooted nginx conf

If you need to edit nginx configuration, be aware that the chrooted NGINX will read its configuration from /usr/local/chroot/nginx/nginx/etc/conf/nginx.conf (i'm saying that if you by mistake forget and try to edit the old config that is usually under /usr/local/nginx/conf/nginx.conf

 

 

MySQL: How to check user privileges and allowed hosts to connect with mysql cli

Wednesday, April 2nd, 2014

how-to-check-user-privileges-and-allowed-hosts-to-connect-with-mysql-cli

On a project there are some issues with root admin user unable to access the server from remote host and the most probable reason was there is no access to the server from that host thus it was necessary check mysql root user privilegse and allowed hosts to connect, here SQL query to do it:
 

mysql> select * from `user` where  user like 'root%';
+——————————–+——+——————————————-+————-+————-+————-+————-+————-+———–+————-+—————+————–+———–+————+—————–+————+————+————–+————+———————–+——————+————–+—————–+——————+——————+—————-+———————+——————–+——————+————+————–+———-+————+————-+————–+—————+————-+—————–+———————-+
| Host                           | User | Password                                  | Select_priv | Insert_priv | Update_priv | Delete_priv | Create_priv | Drop_priv | Reload_priv | Shutdown_priv | Process_priv | File_priv | Grant_priv | References_priv | Index_priv | Alter_priv | Show_db_priv | Super_priv | Create_tmp_table_priv | Lock_tables_priv | Execute_priv | Repl_slave_priv | Repl_client_priv | Create_view_priv | Show_view_priv | Create_routine_priv | Alter_routine_priv | Create_user_priv | Event_priv | Trigger_priv | ssl_type | ssl_cipher | x509_issuer | x509_subject | max_questions | max_updates | max_connections | max_user_connections |
+——————————–+——+——————————————-+————-+————-+————-+————-+————-+———–+————-+—————+————–+———–+————+—————–+————+————+————–+————+———————–+——————+————–+—————–+——————+——————+—————-+———————+——————–+——————+————+————–+———-+————+————-+————–+—————+————-+—————–+———————-+
| localhost                      | root | *5A07790DCF43AC89820F93CAF7B03DE3F43A10D9 | Y           | Y           | Y           | Y           | Y           | Y         | Y           | Y             | Y            | Y         | Y          | Y               | Y          | Y          | Y            | Y          | Y                     | Y                | Y            | Y               | Y                | Y                | Y              | Y                   | Y                  | Y                | Y          | Y            |          |            |             |              |             0 |           0 |               0 |                    0 |
| server737                        | root | *5A07790DCF43AC89820F93CAF7B03DE3F43A10D9 | Y           | Y           | Y           | Y           | Y           | Y         | Y           | Y             | Y            | Y         | Y          | Y               | Y          | Y          | Y            | Y          | Y                     | Y                | Y            | Y               | Y                | Y                | Y              | Y                   | Y                  | Y                | Y          | Y            |          |            |             |              |             0 |           0 |               0 |                    0 |
| 127.0.0.1                      | root | *5A07790DCF43AC89820F93CAF7B03DE3F43A10D9 | Y           | Y           | Y           | Y           | Y           | Y         | Y           | Y             | Y            | Y         | Y          | Y               | Y          | Y          | Y            | Y          | Y                     | Y                | Y            | Y               | Y                | Y                | Y              | Y                   | Y                  | Y                | Y          | Y            |          |            |             |              |             0 |           0 |               0 |                    0 |
| server737.server.myhost.net | root | *5A07790DCF43FC89820A93CAF7B03DE3F43A10D9 | Y           | Y           | Y           | Y           | Y           | Y         | Y           | Y             | Y            | Y         | Y          | Y               | Y          | Y          | Y            | Y          | Y                     | Y                | Y            | Y               | Y                | Y                | Y              | Y                   | Y                  | Y                | Y          | Y            |          |            |             |              |             0 |           0 |               0 |                    0 |
| server4586                        | root | *5A07790DCF43AC89820F93CAF7B03DE3F43A10D9 | N           | N           | N           | N           | N           | N         | N           | N             | N            | N         | N          | N               | N          | N          | N            | N          | N                     | N                | N            | N               | N                | N                | N              | N                   | N                  | N                | N          | N            |          |            |             |              |             0 |           0 |               0 |                    0 |
| server4586.myhost.net              | root | *5A07790DCF43AC89820F93CAF7B03DE3F43A10D9 | N           | N           | N           | N           | N           | N         | N           | N             | N            | N         | N          | N               | N          | N          | N            | N          | N                     | N                | N            | N               | N                | N                | N              | N                   | N                  | N                | N          | N            |          |            |             |              |             0 |           0 |               0 |                    0 |
+——————————–+——+——————————————-+————-+————-+————-+————-+————-+———–+————-+—————+————–+———–+————+—————–+————+————+————–+————+———————–+——————+————–+—————–+——————+——————+—————-+———————+——————–+——————+————+————–+———-+————+————-+————–+—————+————-+—————–+———————-+
6 rows in set (0.00 sec)

mysql> exit


Here is query explained:

select * from `user` where  user like 'root%'; query means:

select * – show all
from `user` – from user database
where user like 'root%' – where there is match in user column to any string starting with 'root*',
 

Maximal protection against SSH attacks. If your server has to stay with open SSH (Secure Shell) port open to the world

Thursday, April 7th, 2011

Brute Force Attack SSH screen, Script kiddie attacking
If you’re a a remote Linux many other Unix based OSes, you have defitenily faced the security threat of many failed ssh logins or as it’s better known a brute force attack

During such attacks your /var/log/messages or /var/log/auth gets filled in with various failed password logs like for example:

Feb 3 20:25:50 linux sshd[32098]: Failed password for invalid user oracle from 95.154.249.193 port 51490 ssh2
Feb 3 20:28:30 linux sshd[32135]: Failed password for invalid user oracle1 from 95.154.249.193 port 42778 ssh2
Feb 3 20:28:55 linux sshd[32141]: Failed password for invalid user test1 from 95.154.249.193 port 51072 ssh2
Feb 3 20:30:15 linux sshd[32163]: Failed password for invalid user test from 95.154.249.193 port 47481 ssh2
Feb 3 20:33:20 linux sshd[32211]: Failed password for invalid user testuser from 95.154.249.193 port 51731 ssh2
Feb 3 20:35:32 linux sshd[32249]: Failed password for invalid user user from 95.154.249.193 port 38966 ssh2
Feb 3 20:35:59 linux sshd[32256]: Failed password for invalid user user1 from 95.154.249.193 port 55850 ssh2
Feb 3 20:36:25 linux sshd[32268]: Failed password for invalid user user3 from 95.154.249.193 port 36610 ssh2
Feb 3 20:36:52 linux sshd[32274]: Failed password for invalid user user4 from 95.154.249.193 port 45514 ssh2
Feb 3 20:37:19 linux sshd[32279]: Failed password for invalid user user5 from 95.154.249.193 port 54262 ssh2
Feb 3 20:37:45 linux sshd[32285]: Failed password for invalid user user2 from 95.154.249.193 port 34755 ssh2
Feb 3 20:38:11 linux sshd[32292]: Failed password for invalid user info from 95.154.249.193 port 43146 ssh2
Feb 3 20:40:50 linux sshd[32340]: Failed password for invalid user peter from 95.154.249.193 port 46411 ssh2
Feb 3 20:43:02 linux sshd[32372]: Failed password for invalid user amanda from 95.154.249.193 port 59414 ssh2
Feb 3 20:43:28 linux sshd[32378]: Failed password for invalid user postgres from 95.154.249.193 port 39228 ssh2
Feb 3 20:43:55 linux sshd[32384]: Failed password for invalid user ftpuser from 95.154.249.193 port 47118 ssh2
Feb 3 20:44:22 linux sshd[32391]: Failed password for invalid user fax from 95.154.249.193 port 54939 ssh2
Feb 3 20:44:48 linux sshd[32397]: Failed password for invalid user cyrus from 95.154.249.193 port 34567 ssh2
Feb 3 20:45:14 linux sshd[32405]: Failed password for invalid user toto from 95.154.249.193 port 42350 ssh2
Feb 3 20:45:42 linux sshd[32410]: Failed password for invalid user sophie from 95.154.249.193 port 50063 ssh2
Feb 3 20:46:08 linux sshd[32415]: Failed password for invalid user yves from 95.154.249.193 port 59818 ssh2
Feb 3 20:46:34 linux sshd[32424]: Failed password for invalid user trac from 95.154.249.193 port 39509 ssh2
Feb 3 20:47:00 linux sshd[32432]: Failed password for invalid user webmaster from 95.154.249.193 port 47424 ssh2
Feb 3 20:47:27 linux sshd[32437]: Failed password for invalid user postfix from 95.154.249.193 port 55615 ssh2
Feb 3 20:47:54 linux sshd[32442]: Failed password for www-data from 95.154.249.193 port 35554 ssh2
Feb 3 20:48:19 linux sshd[32448]: Failed password for invalid user temp from 95.154.249.193 port 43896 ssh2
Feb 3 20:48:46 linux sshd[32453]: Failed password for invalid user service from 95.154.249.193 port 52092 ssh2
Feb 3 20:49:13 linux sshd[32458]: Failed password for invalid user tomcat from 95.154.249.193 port 60261 ssh2
Feb 3 20:49:40 linux sshd[32464]: Failed password for invalid user upload from 95.154.249.193 port 40236 ssh2
Feb 3 20:50:06 linux sshd[32469]: Failed password for invalid user debian from 95.154.249.193 port 48295 ssh2
Feb 3 20:50:32 linux sshd[32479]: Failed password for invalid user apache from 95.154.249.193 port 56437 ssh2
Feb 3 20:51:00 linux sshd[32492]: Failed password for invalid user rds from 95.154.249.193 port 45540 ssh2
Feb 3 20:51:26 linux sshd[32501]: Failed password for invalid user exploit from 95.154.249.193 port 53751 ssh2
Feb 3 20:51:51 linux sshd[32506]: Failed password for invalid user exploit from 95.154.249.193 port 33543 ssh2
Feb 3 20:52:18 linux sshd[32512]: Failed password for invalid user postgres from 95.154.249.193 port 41350 ssh2
Feb 3 21:02:04 linux sshd[32652]: Failed password for invalid user shell from 95.154.249.193 port 54454 ssh2
Feb 3 21:02:30 linux sshd[32657]: Failed password for invalid user radio from 95.154.249.193 port 35462 ssh2
Feb 3 21:02:57 linux sshd[32663]: Failed password for invalid user anonymous from 95.154.249.193 port 44290 ssh2
Feb 3 21:03:23 linux sshd[32668]: Failed password for invalid user mark from 95.154.249.193 port 53285 ssh2
Feb 3 21:03:50 linux sshd[32673]: Failed password for invalid user majordomo from 95.154.249.193 port 34082 ssh2
Feb 3 21:04:43 linux sshd[32684]: Failed password for irc from 95.154.249.193 port 50918 ssh2
Feb 3 21:05:36 linux sshd[32695]: Failed password for root from 95.154.249.193 port 38577 ssh2
Feb 3 21:06:30 linux sshd[32705]: Failed password for bin from 95.154.249.193 port 53564 ssh2
Feb 3 21:06:56 linux sshd[32714]: Failed password for invalid user dev from 95.154.249.193 port 34568 ssh2
Feb 3 21:07:23 linux sshd[32720]: Failed password for root from 95.154.249.193 port 43799 ssh2
Feb 3 21:09:10 linux sshd[32755]: Failed password for invalid user bob from 95.154.249.193 port 50026 ssh2
Feb 3 21:09:36 linux sshd[32761]: Failed password for invalid user r00t from 95.154.249.193 port 58129 ssh2
Feb 3 21:11:50 linux sshd[537]: Failed password for root from 95.154.249.193 port 58358 ssh2

This brute force dictionary attacks often succeed where there is a user with a weak a password, or some old forgotten test user account.
Just recently on one of the servers I administrate I have catched a malicious attacker originating from Romania, who was able to break with my system test account with the weak password tset .

Thanksfully the script kiddie was unable to get root access to my system, so what he did is he just started another ssh brute force scanner to crawl the net and look for some other vulnerable hosts.

As you read in my recent example being immune against SSH brute force attacks is a very essential security step, the administrator needs to take on a newly installed server.

The easiest way to get read of the brute force attacks without using some external brute force filtering software like fail2ban can be done by:

1. By using an iptables filtering rule to filter every IP which has failed in logging in more than 5 times

To use this brute force prevention method you need to use the following iptables rules:
linux-host:~# /sbin/iptables -I INPUT -p tcp --dport 22 -i eth0 -m state -state NEW -m recent -set
linux-host:~# /sbin/iptables -I INPUT -p tcp --dport 22 -i eth0 -m state -state NEW
-m recent -update -seconds 60 -hitcount 5 -j DROP

This iptables rules will filter out the SSH port to an every IP address with more than 5 invalid attempts to login to port 22

2. Getting rid of brute force attacks through use of hosts.deny blacklists

sshbl – The SSH blacklist, updated every few minutes, contains IP addresses of hosts which tried to bruteforce into any of currently 19 hosts (all running OpenBSD, FreeBSD or some Linux) using the SSH protocol. The hosts are located in Germany, the United States, United Kingdom, France, England, Ukraine, China, Australia, Czech Republic and setup to report and log those attempts to a central database. Very similar to all the spam blacklists out there.

To use sshbl you will have to set up in your root crontab the following line:

*/60 * * * * /usr/bin/wget -qO /etc/hosts.deny http://www.sshbl.org/lists/hosts.deny

To set it up from console issue:

linux-host:~# echo '*/60 * * * * /usr/bin/wget -qO /etc/hosts.deny http://www.sshbl.org/lists/hosts.deny' | crontab -u root -

These crontab will download and substitute your system default hosts with the one regularly updated on sshbl.org , thus next time a brute force attacker which has been a reported attacker will be filtered out as your Linux or Unix system finds out the IP matches an ip in /etc/hosts.deny

The /etc/hosts.deny filtering rules are written in a way that only publicly known brute forcer IPs will only be filtered for the SSH service, therefore other system services like Apache or a radio, tv streaming server will be still accessible for the brute forcer IP.

It’s a good practice actually to use both of the methods 😉
Thanks to Static (Multics) a close friend of mine for inspiring this article.

How to make GRE tunnel iptables port redirect on Linux

Saturday, August 20th, 2011

I’ve recently had to build a Linux server with some other servers behind the router with NAT.
One of the hosts behind the Linux router was running a Window GRE encrypted tunnel service. Which had to be accessed with the Internet ip address of the server.
In order < б>to make the GRE tunnel accessible, a bit more than just adding a normal POSTROUTING DNAT rule and iptables FORWARD is necessery.

As far as I’ve read online, there is quite of a confusion on the topic of how to properly configure the GRE tunnel accessibility on Linux , thus in this very quick tiny tutorial I’ll explain how I did it.

1. Load the ip_nat_pptp and ip_conntrack_pptp kernel module

linux-router:~# modprobe ip_nat_pptp
linux-router:~# modprobe ip_conntrack_pptp

These two modules are an absolutely necessery to be loaded before the remote GRE tunnel is able to be properly accessed, I’ve seen many people complaining online that they can’t make the GRE tunnel to work and I suppose in many of the cases the reason not to be succeed is omitting to load this two kernel modules.

2. Make the ip_nat_pptp and ip_nat_pptp modules to load on system boot time

linux-router:~# echo 'ip_nat_pptp' >> /etc/modules
linux-router:~# echo 'ip_conntrack_pptp' >> /etc/modules

3. Insert necessery iptables PREROUTING rules to make the GRE tunnel traffic flow

linux-router:~# /sbin/iptables -A PREROUTING -d 111.222.223.224/32 -p tcp -m tcp --dport 1723 -j DNAT --to-destination 192.168.1.3:1723
linux-router:~# /sbin/iptables -A PREROUTING -p gre -j DNAT --to-destination 192.168.1.3

In the above example rules its necessery to substitute the 111.222.223.224 ip address withe the external internet (real IP) address of the router.

Also the IP address of 192.168.1.3 is the internal IP address of the host where the GRE host tunnel is located.

Next it’s necessery to;

4. Add iptables rule to forward tcp/ip traffic to the GRE tunnel

linux-router:~# /sbin/iptables -A FORWARD -p gre -j ACCEPT

Finally it’s necessery to make the above iptable rules to be permanent by saving the current firewall with iptables-save or add them inside the script which loads the iptables firewall host rules.
Another possible way is to add them from /etc/rc.local , though this kind of way is not recommended as rules would add only after succesful bootup after all the rest of init scripts and stuff in /etc/rc.local is loaded without errors.

Afterwards access to the GRE tunnel to the local IP 192.168.1.3 using the port 1723 and host IP 111.222.223.224 is possible.
Hope this is helpful. Cheers 😉