Posts Tagged ‘backup’

Create Haproxy Loadbalancer Access Control Lists and forward incoming frontend traffics based on simple logic

Friday, February 16th, 2024

Create-haproxy-loadbalancer-access-control-list-and-forward-frontend-traffic-based-on-simple-logic-acls-logo

Haproxy Load Balancers could do pretty much to load balance traffic between application servers. The most straight forward way to use is to balance traffic for incoming Frontends towards a Backend configuration with predefined Application machines and ports to send the traffic, where one can be the leading one and others be set as backup or we can alternatively send the traffic towards a number of machines incoming to a Frontend port bind IP listener and number of backend machine.

Besides this the more interesting capabilities of Haproxy comes with using Access Control Lists (ACLs) to forward Incoming Frontend (FT) traffic towards specific backends and ports based on logic, power ACLs gives to Haproxy to do a sophisticated load balancing are enormous. 
In this post I'll give you a very simple example on how you can save some time, if you have already a present Frontend listening to a Range of TCP Ports and it happens you want to redirect some of the traffic towards a spefic predefined Backend.

This is not the best way to it as Access Control Lists will put some extra efforts on the server CPU, but as today machines are quite powerful, it doesn't really matter. By using a simple ACLs as given in below example, one can save much of a time of writting multiple frontends for a complete sequential port range, if lets say only two of the ports in the port range and distinguish and redirect traffic incoming to Haproxy frontend listener in the port range of 61000-61230 towards a certain Ports that are supposed to go to a Common Backends to a separate ones, lets say ports 61115 and 61215.

Here is a short description on the overall screnarios. We have an haproxy with 3 VIP (Virtual Private IPs) with a Single Frontend with 3 binded IPs and 3 Backends, there is a configured ACL rule to redirect traffic for certain ports, the overall Load Balancing config is like so:

Frontend (ft):

ft_PROD:
listen IPs:

192.168.0.77
192.168.0.83
192.168.0.78

On TCP port range: 61000-61299

Backends (bk): 

bk_PROD_ROUNDROBIN
bk_APP1
bk_APP2


Config Access Control Liststo seperate incoming haproxy traffic for CUSTOM_APP1 and CUSTOM_APP2


By default send all incoming FT traffic to: bk_PROD_ROUNDROBIN

With exception for frontend configured ports on:
APP1 port 61115 
APP2 port 61215

If custom APP1 send to bk:
RULE1
If custom APP2 send to bk:
RULE2

Config on frontends traffic send operation: 

bk_PROD_ROUNDROBIN (roundrobin) traffic send to App machines all in parallel
traffic routing mode (roundrobin)
Appl1
Appl2
Appl3
Appl4

bk_APP1 and bk_APP2

traffic routing mode: (balance source)
Appl1 default serving host

If configured check port 61888, 61887 is down, traffic will be resend to configured pre-configured backup hosts: 

Appl2
Appl3
Appl4


/etc/haproxy/haproxy.cfg that does what is described with ACL LB capabilities looks like so:

#———————————————————————
# Global settings
#———————————————————————
global
    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats

#———————————————————————
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#———————————————————————
defaults
    mode                    tcp
    log                     global
    option                  tcplog
    #option                  dontlognull
    #option http-server-close
    #option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 7
    #timeout http-request    10s
    timeout queue           10m
    timeout connect         30s
    timeout client          20m
    timeout server          10m
    #timeout http-keep-alive 10s
    timeout check           30s
    maxconn                 3000


#———————————————————————
# Synchronize server entries in sticky tables
#———————————————————————

peers hapeers
    peer haproxy1-fqdn.com 192.168.0.58:8388
    peer haproxy2-fqdn.com 192.168.0.79:8388


#———————————————————————
# HAProxy Monitoring Config
#———————————————————————
listen stats 192.168.0.77:8080                #Haproxy Monitoring run on port 8080
    mode http
    option httplog
    option http-server-close
    stats enable
    stats show-legends
    stats refresh 5s
    stats uri /stats                            #URL for HAProxy monitoring
    stats realm Haproxy\ Statistics
    stats auth hauser:secretpass4321         #User and Password for login to the monitoring dashboard
    stats admin if TRUE
    #default_backend bk_Prod1         #This is optionally for monitoring backend
#———————————————————————
# HAProxy Monitoring Config
#———————————————————————
#listen stats 192.168.0.83:8080                #Haproxy Monitoring run on port 8080
#    mode http
#    option httplog
#    option http-server-close
#    stats enable
#    stats show-legends
#    stats refresh 5s
#    stats uri /stats                            #URL for HAProxy monitoring
#    stats realm Haproxy\ Statistics
#    stats auth hauser:secretpass321          #User and Password for login to the monitoring dashboard
#    stats admin if TRUE
#    #default_backend bk_Prod1           #This is optionally for monitoring backend

#———————————————————————
# HAProxy Monitoring Config
#———————————————————————
# listen stats 192.168.0.78:8080                #Haproxy Monitoring run on port 8080
#    mode http
#    option httplog
#    option http-server-close
#    stats enable
#    stats show-legends
#    stats refresh 5s
#    stats uri /stats                            #URL for HAProxy monitoring
#    stats realm Haproxy\ Statistics
#    stats auth hauser:secretpass123          #User and Password for login to the monitoring dashboard
#    stats admin if TRUE
#    #default_backend bk_DKV_PROD_WLPFO          #This is optionally for monitoring backend


#———————————————————————
# frontend which proxys to the backends
#———————————————————————
frontend ft_PROD
    mode tcp
    bind 192.168.0.77:61000-61299
        bind 192.168.0.83:51000-51300
        bind 192.168.0.78:51000-62300
    option tcplog
        # (4) Peer Sync: a sticky session is a session maintained by persistence
        stick-table type ip size 1m peers hapeers expire 60m
# Commented for change CHG0292890
#   stick on src
    log-format %ci:%cp\ [%t]\ %ft\ %b/%s\ %Tw/%Tc/%Tt\ %B\ %ts\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq
        acl RULE1 dst_port 61115
        acl RULE2 dst_port 61215
        use_backend APP1 if app1
        use_backend APP2 if app2
    default_backend bk_PROD_ROUNDROBIN


#———————————————————————
# round robin balancing between the various backends
#———————————————————————
backend bk_PROD_ROUNDROBIN
    mode tcp
    # (0) Load Balancing Method.
    balance roundrobin
    # (4) Peer Sync: a sticky session is a session maintained by persistence
    stick-table type ip size 1m peers hapeers expire 60m
    # (5) Server List
    # (5.1) Backend
    server appl1 10.33.0.50 check port 31232
    server appl2 10.33.0.51 check port 31232 
    server appl2 10.45.0.78 check port 31232 
    server appl3 10.45.0.79 check port 31232 

#———————————————————————
# source balancing for the GUI
#———————————————————————
backend bk_APP2
    mode tcp
    # (0) Load Balancing Method.
    balance source
    # (4) Peer Sync: a sticky session is a session maintained by persistence
    stick-table type ip size 1m peers hapeers expire 60m
        stick on src
    # (5) Server List
    # (5.1) Backend
    server appl1 10.33.0.50 check port 55232
    server appl2 10.32.0.51 check port 55232 backup
    server appl3 10.45.0.78 check port 55232 backup
    server appl4 10.45.0.79 check port 55232 backup

#———————————————————————
# source balancing for the OLW
#———————————————————————
backend bk_APP1
    mode tcp
    # (0) Load Balancing Method.
    balance source
    # (4) Peer Sync: a sticky session is a session maintained by persistence
    stick-table type ip size 1m peers hapeers expire 60m
        stick on src
    # (5) Server List
    # (5.1) Backend
    server appl1 10.33.0.50 check port 53119
    server appl2 10.32.0.51 check port 53119 backup
    server appl3 10.45.0.78 check port 53119 backup
    server appl4 10.45.0.79 check port 53119 backup

 

You can also check and download the haproxy.cfg here.
Enjjoy !

Fix “There Has Been a Critical Error on Your Website” wordpress error

Friday, December 2nd, 2022

there-has-been-a-critical-error-on-your-website-wordpress-critical-error-fix

Say you have a shiny working WordPress based website withtout any monitoring set for years but suddenly, you open the site and you get the terrifying error:
 

There Has Been a Critical Error on Your Website

That is quite of a stress for sure. As in the first few minutes you don't understand how this has happened since, you did not touched the perfeclty working site for a very, very long time.
Then you start to debug into the apache / nginx access.log, error.log and mysql mysql.err etc. franticly trying to figure it out the normal ideas pop-up immediately into mind, whether you have a recent backup for the website's database. If you have pair of high availability webservers service or backup databases that serve the traffic via a separate standby instance of the service, you might try to switch off the official service and see whether the standby Webserver / SQL server instance would serve the website fine.
However, if this is not an option and you have no standby backup service as a recovery Plan B option already set. Your only option is to continue to debug what is wrong.
Then the next thing to do is to check whether you don't have a Web Caching or Proxy in front of your webservers that are preventing you to see a recent version of the website and give you some old cache or you don't have an ISP proxy that is giving you some unreal results. That is easily seenable from the Webserver logs. If this is neither the case the next thing is to:
 

Enable WordPress (wp-config.php) Debug mode

By default for Security reasons the WordPress PHP execution debug mode is switched off inside wp-config.php.
When there are odd pages with the WordPress based blog or site however this can easily be changed by modifying the WP_DEBUG true|false value.

To do so edit with a text editor such as vim / nano / mcedit  wp-config.php or if no SSH access to the remote machine, use SFTP / FTP transfer protocol copy the file to your desktop and inspect it and make sure the WP_DEBUG / WP_DEBUG_DISPLAY / WP_DEBUG_LOG has following values:

define( 'WP_DEBUG', true );

define( 'WP_DEBUG_DISPLAY', false );

define( 'WP_DEBUG_LOG', true );

Reloading the Browser window tab with There is a critical error on Your website, you should get some Errors or Warnings like:
 

Warning: Illegal string offset 'parent_slug' in /var/www/websitecom/wp-content/plugins/photo-gallery/booster/main.php on line 180

Warning: Illegal string offset 'slug' in /var/www/websitecom/wp-content/plugins/photo-gallery/booster/main.php on line 180

 

Then you can temporary disable the problematic problem in that case for example the photo-gallery and recheck the website, and then restore from backup snapshot the respective plugin files version from a moment, when the website was working.

If this doesn't solve it and more plugins are crashing and you can't find an easy way to work-around it you miss a backup, you might try to

 

Disable all WordPress active plugins

Disable your plugins from the dashboard, visit Plugins > Installed Plugins and tick the checkbox at the top of the list to select them all.
Then click Bulk Actions -> Deactivate, which should be enough to disable any conflicts and restore your site.

You can do essentially the same thing through SSH / FTP session.

Step 1: Log in to your site with SSH / FTP.
Step 2: Open the wp-content folder to find your plugins.
Step 3: Rename the plugins folder to plugins_old and verify that your site is working again via SSH run commands:

# cd  path_to/plugins; mv plugins plugins_old

or rename via FTP client
Step 4: Rename the folder back to “plugins”. The plugins should be disabled still, so you should be able to log in to your dashboard and activate them one by one. If
the plugins reactivate automatically, rename individual plugin folders with _old until your site is restored.

Raise the PHP Memory Limit

Sometimes, a low PHP limitation causes critical errors on WP based blogs and sites, if necessery raise up the memory limitation via:

define( 'WP_MEMORY_LIMIT', '128M' );

Change Max Upload File Size and Text Processing function limits

To increase the max upload file size, add this code to wp-config.php:

ini_set('upload_max_size' , '256M' );

ini_set('post_max_size','256M');

And to fix the breaking of large pages on your site, add this code:

ini_set('pcre.recursion_limit',20000000);
ini_set('pcre.backtrack_limit',10000000);

Clear up any caches

If you use some session caching of the website on the machine such as memcached / ncache / redis / varnish or an haproxy or any proxy in front of the webserver to do some kind of High availability could produce strange  unexpected Critical errors on Your Website, thus restarting such services or cleaning up any cache would be advisable if you have such.
 

What Causes "There Has Been a Critical Error on Your Website" error?


The reason could be practically anything as WP is a kind of multi-comonent free and a bit of bloatware. The general ones could be  from a missing database table / table fields to a messed up plugin after update a disappeared critical plugin or essential wordpress PHP file, but in my specific case the reason was simple the Plugins Auto-update, which I have had the stupidity to enable.

The WordPress Automatic Updates, though saving you effort and Protecting your website in most cases against recent bugs and Exploits and increasing the WP security level, often causes issues and from my personal experience it is not recommended so better avoid it. Again next time you implement any automation to your server make sure you put some kind of monitoring.

Even if you decide to enable it make sure you do it the right way and not like me, by enabling some Monitoring to the WordPress site via Zabbix / Nagios / Cacti / monit  etc to be sure you get notified immediately if the WordPress based site is down.

Create Linux High Availability Load Balancer Cluster with Keepalived and Haproxy on Linux

Tuesday, March 15th, 2022

keepalived-logo-linux

Configuring a Linux HA (High Availibiltiy) for an Application with Haproxy is already used across many Websites on the Internet and serious corporations that has a crucial infrastructure has long time
adopted and used keepalived to provide High Availability Application level Clustering.
Usually companies choose to use HA Clusters with Haproxy with Pacemaker and Corosync cluster tools.
However one common used alternative solution if you don't have the oportunity to bring up a High availability cluster with Pacemaker / Corosync / pcs (Pacemaker Configuration System) due to fact machines you need to configure the cluster on are not Physical but VMWare Virtual Machines which couldn't not have configured a separate Admin Lans and Heartbeat Lan as we usually do on a Pacemaker Cluster due to the fact the 5 Ethernet LAN Card Interfaces of the VMWare Hypervisor hosts are configured as a BOND (e.g. all the incoming traffic to the VMWare vSphere  HV is received on one Virtual Bond interface).

I assume you have 2 separate vSphere Hypervisor Physical Machines in separate Racks and separate switches hosting the two VMs.
For the article, I'll call the two brand new brought Virtual Machines with some installation automation software such as Terraform or Ansible – vm-server1 and vm-server2 which would have configured some recent version of Linux.

In that scenario to have a High Avaiability for the VMs on Application level and assure at least one of the two is available at a time if one gets broken due toe malfunction of the HV, a Network connectivity issue, or because the VM OS has crashed.
Then one relatively easily solution is to use keepalived and configurea single High Availability Virtual IP (VIP) Address, i.e. 10.10.10.1, which would float among two VMs using keepalived so at a time at least one of the two VMs would be reachable on the Network.

haproxy_keepalived-vip-ip-diagram-linux

Having a VIP IP is quite a common solution in corporate world, as it makes it pretty easy to add F5 Load Balancer in front of the keepalived cluster setup to have a 3 Level of security isolation, which usually consists of:

1. Physical (access to the hardware or Virtualization hosts)
2. System Access (The mechanism to access the system login credetials users / passes, proxies, entry servers leading to DMZ-ed network)
3. Application Level (access to different programs behind L2 and data based on the specific identity of the individual user,
special Secondary UserID,  Factor authentication, biometrics etc.)

 

1. Install keepalived and haproxy on machines

Depending on the type of Linux OS:

On both machines
 

[root@server1:~]# yum install -y keepalived haproxy

If you have to install keepalived / haproxy on Debian / Ubuntu and other Deb based Linux distros

[root@server1:~]# apt install keepalived haproxy –yes

2. Configure haproxy (haproxy.cfg) on both server1 and server2

 

Create some /etc/haproxy/haproxy.cfg configuration

 

[root@server1:~]vim /etc/haproxy/haproxy.cfg

#———————————————————————
# Global settings
#———————————————————————
global
    log          127.0.0.1 local6 debug
    chroot       /var/lib/haproxy
    pidfile      /run/haproxy.pid
    stats socket /var/lib/haproxy/haproxy.sock mode 0600 level admin 
    maxconn      4000
    user         haproxy
    group        haproxy
    daemon
    #debug
    #quiet

#———————————————————————
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#———————————————————————
defaults
    mode        tcp
    log         global
#    option      dontlognull
#    option      httpclose
#    option      httplog
#    option      forwardfor
    option      redispatch
    option      log-health-checks
    timeout connect 10000 # default 10 second time out if a backend is not found
    timeout client 300000
    timeout server 300000
    maxconn     60000
    retries     3

#———————————————————————
# round robin balancing between the various backends
#———————————————————————

listen FRONTEND_APPNAME1
        bind 10.10.10.1:15000
        mode tcp
        option tcplog
#        #log global
        log-format [%t]\ %ci:%cp\ %bi:%bp\ %b/%s:%sp\ %Tw/%Tc/%Tt\ %B\ %ts\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq
        balance roundrobin
        timeout client 350000
        timeout server 350000
        timeout connect 35000
        server app-server1 10.10.10.55:30000 weight 1 check port 68888
        server app-server2 10.10.10.55:30000 weight 2 check port 68888

listen FRONTEND_APPNAME2
        bind 10.10.10.1:15000
        mode tcp
        option tcplog
        #log global
        log-format [%t]\ %ci:%cp\ %bi:%bp\ %b/%s:%sp\ %Tw/%Tc/%Tt\ %B\ %ts\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq
        balance roundrobin
        timeout client 350000
        timeout server 350000
        timeout connect 35000
        server app-server1 10.10.10.55:30000 weight 5
        server app-server2 10.10.10.55:30000 weight 5 

 

You can get a copy of above haproxy.cfg configuration here.
Once configured roll it on.

[root@server1:~]#  systemctl start haproxy
 
[root@server1:~]# ps -ef|grep -i hapro
root      285047       1  0 Mar07 ?        00:00:00 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
haproxy   285050  285047  0 Mar07 ?        00:00:26 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid

Bring up the haproxy also on server2 machine, by placing same configuration and starting up the proxy.
 

[root@server1:~]vim /etc/haproxy/haproxy.cfg


 

3. Configure keepalived on both servers

We'll be configuring 2 nodes with keepalived even though if necessery this can be easily extended and you can add more nodes.
First we make a copy of the original or existing server configuration keepalived.conf (just in case we need it later on or if you already had something other configured manually by someone – that could be so on inherited servers by other sysadmin)
 

[root@server1:~]# mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.orig
[root@server2:~]# mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.orig

a. Configure keepalived to serve as a MASTER Node

 

[root@server1:~]# vim /etc/keepalived/keepalived.conf

Master Node
global_defs {
  router_id server1-fqdn # The hostname of this host.
  
  enable_script_security
  # Synchro of the state of the connections between the LBs on the eth0 interface
   lvs_sync_daemon eth0
 
notification_email {
        linuxadmin@notify-domain.com     # Email address for notifications 
    }
 notification_email_from keepalived@server1-fqdn        # The from address for the notifications
    smtp_server 127.0.0.1                       # SMTP server address
    smtp_connect_timeout 15
}

vrrp_script haproxy {
  script "killall -0 haproxy"
  interval 2
  weight 2
  user root
}

vrrp_instance LB_VIP_QA {
  virtual_router_id 50
  advert_int 1
  priority 51

  state MASTER
  interface eth0
  smtp_alert          # Enable Notifications Via Email
  
  authentication {
              auth_type PASS
              auth_pass testp141

    }
### Commented because running on VM on VMWare
##    unicast_src_ip 10.44.192.134 # Private IP address of master
##    unicast_peer {
##        10.44.192.135           # Private IP address of the backup haproxy
##   }

#        }
# master node with higher priority preferred node for Virtual IP if both keepalived up
###  priority 51
###  state MASTER
###  interface eth0
  virtual_ipaddress {
     10.10.10.1 dev eth0 # The virtual IP address that will be shared between MASTER and BACKUP
  }
  track_script {
      haproxy
  }
}

 

 To dowload a copy of the Master keepalived.conf configuration click here

Below are few interesting configuration variables, worthy to mention few words on, most of them are obvious by their names but for more clarity I'll also give a list here with short description of each:

 

  • vrrp_instance – defines an individual instance of the VRRP protocol running on an interface.
  • state – defines the initial state that the instance should start in (i.e. MASTER / SLAVE )state –
  • interface – defines the interface that VRRP runs on.
  • virtual_router_id – should be unique value per Keepalived Node (otherwise slave master won't function properly)
  • priority – the advertised priority, the higher the priority the more important the respective configured keepalived node is.
  • advert_int – specifies the frequency that advertisements are sent at (1 second, in this case).
  • authentication – specifies the information necessary for servers participating in VRRP to authenticate with each other. In this case, a simple password is defined.
    only the first eight (8) characters will be used as described in  to note is Important thing
    man keepalived.conf – keepalived.conf variables documentation !!! Nota Bene !!! – Password set on each node should match for nodes to be able to authenticate !
  • virtual_ipaddress – defines the IP addresses (there can be multiple) that VRRP is responsible for.
  • notification_email – the notification email to which Alerts will be send in case if keepalived on 1 node is stopped (e.g. the MASTER node switches from host 1 to 2)
  • notification_email_from – email address sender from where email will originte
    ! NB ! In order for notification_email to be working you need to have configured MTA or Mail Relay (set to local MTA) to another SMTP – e.g. have configured something like Postfix, Qmail or Postfix

b. Configure keepalived to serve as a SLAVE Node

[root@server1:~]vim /etc/keepalived/keepalived.conf
 

#Slave keepalived
global_defs {
  router_id server2-fqdn # The hostname of this host!

  enable_script_security
  # Synchro of the state of the connections between the LBs on the eth0 interface
  lvs_sync_daemon eth0
 
notification_email {
        linuxadmin@notify-host.com     # Email address for notifications
    }
 notification_email_from keepalived@server2-fqdn        # The from address for the notifications
    smtp_server 127.0.0.1                       # SMTP server address
    smtp_connect_timeout 15
}

vrrp_script haproxy {
  script "killall -0 haproxy"
  interval 2
  weight 2
  user root
}

vrrp_instance LB_VIP_QA {
  virtual_router_id 50
  advert_int 1
  priority 50

  state BACKUP
  interface eth0
  smtp_alert          # Enable Notifications Via Email

authentication {
              auth_type PASS
              auth_pass testp141
}
### Commented because running on VM on VMWare    
##    unicast_src_ip 10.10.192.135 # Private IP address of master
##    unicast_peer {
##        10.10.192.134         # Private IP address of the backup haproxy
##   }

###  priority 50
###  state BACKUP
###  interface eth0
  virtual_ipaddress {
     10.10.10.1 dev eth0 # The virtual IP address that will be shared betwee MASTER and BACKUP.
  }
  track_script {
    haproxy
  }
}

 

Download the keepalived.conf slave config here

 

c. Set required sysctl parameters for haproxy to work as expected
 

[root@server1:~]vim /etc/sysctl.conf
#Haproxy config
# haproxy
net.core.somaxconn=65535
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_syn_backlog = 10240
net.ipv4.tcp_max_tw_buckets = 400000
net.ipv4.tcp_max_orphans = 60000
net.ipv4.tcp_synack_retries = 3

4. Test Keepalived keepalived.conf configuration syntax is OK

 

[root@server1:~]keepalived –config-test
(/etc/keepalived/keepalived.conf: Line 7) Unknown keyword 'lvs_sync_daemon_interface'
(/etc/keepalived/keepalived.conf: Line 21) Unable to set default user for vrrp script haproxy – removing
(/etc/keepalived/keepalived.conf: Line 31) (LB_VIP_QA) Specifying lvs_sync_daemon_interface against a vrrp is deprecated.
(/etc/keepalived/keepalived.conf: Line 31)              Please use global lvs_sync_daemon
(/etc/keepalived/keepalived.conf: Line 35) Truncating auth_pass to 8 characters
(/etc/keepalived/keepalived.conf: Line 50) (LB_VIP_QA) track script haproxy not found, ignoring…

I've experienced this error because first time I've configured keepalived, I did not mention the user with which the vrrp script haproxy should run,
in prior versions of keepalived, leaving the field empty did automatically assumed you have the user with which the vrrp script runs to be set to root
as of RHELs keepalived-2.1.5-6.el8.x86_64, i've been using however this is no longer so and thus in prior configuration as you can see I've
set the user in respective section to root.
The error Unknown keyword 'lvs_sync_daemon_interface'
is also easily fixable by just substituting the lvs_sync_daemon_interface and lvs_sync_daemon and reloading
keepalived etc.

Once keepalived is started and you can see the process on both machines running in process list.

[root@server1:~]ps -ef |grep -i keepalived
root     1190884       1  0 18:50 ?        00:00:00 /usr/sbin/keepalived -D
root     1190885 1190884  0 18:50 ?        00:00:00 /usr/sbin/keepalived -D

Next step is to check the keepalived statuses as well as /var/log/keepalived.log

If everything is configured as expected on both keepalived on first node you should see one is master and one is slave either in the status or the log

[root@server1:~]#systemctl restart keepalived

 

[root@server1:~]systemctl status keepalived|grep -i state
Mar 14 18:59:02 server1-fqdn Keepalived_vrrp[1192003]: (LB_VIP_QA) Entering MASTER STATE

[root@server1:~]systemctl status keepalived

● keepalived.service – LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Mon 2022-03-14 18:15:51 CET; 32min ago
  Process: 1187587 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 1187589 (code=exited, status=0/SUCCESS)

Mar 14 18:15:04 server1lb-fqdn Keepalived_vrrp[1187590]: Sending gratuitous ARP on eth0 for 10.44.192.142
Mar 14 18:15:50 server1lb-fqdn systemd[1]: Stopping LVS and VRRP High Availability Monitor…
Mar 14 18:15:50 server1lb-fqdn Keepalived[1187589]: Stopping
Mar 14 18:15:50 server1lb-fqdn Keepalived_vrrp[1187590]: (LB_VIP_QA) sent 0 priority
Mar 14 18:15:50 server1lb-fqdn Keepalived_vrrp[1187590]: (LB_VIP_QA) removing VIPs.
Mar 14 18:15:51 server1lb-fqdn Keepalived_vrrp[1187590]: Stopped – used 0.002007 user time, 0.016303 system time
Mar 14 18:15:51 server1lb-fqdn Keepalived[1187589]: CPU usage (self/children) user: 0.000000/0.038715 system: 0.001061/0.166434
Mar 14 18:15:51 server1lb-fqdn Keepalived[1187589]: Stopped Keepalived v2.1.5 (07/13,2020)
Mar 14 18:15:51 server1lb-fqdn systemd[1]: keepalived.service: Succeeded.
Mar 14 18:15:51 server1lb-fqdn systemd[1]: Stopped LVS and VRRP High Availability Monitor

[root@server2:~]systemctl status keepalived|grep -i state
Mar 14 18:59:02 server2-fqdn Keepalived_vrrp[297368]: (LB_VIP_QA) Entering BACKUP STATE

[root@server1:~]# grep -i state /var/log/keepalived.log
Mar 14 18:59:02 server1lb-fqdn Keepalived_vrrp[297368]: (LB_VIP_QA) Entering MASTER STATE
 

a. Fix Keepalived SECURITY VIOLATION – scripts are being executed but script_security not enabled.
 

When configurating keepalived for a first time we have faced the following strange error inside keepalived status inside keepalived.log 
 

Feb 23 14:28:41 server1 Keepalived_vrrp[945478]: SECURITY VIOLATION – scripts are being executed but script_security not enabled.

 

To fix keepalived SECURITY VIOLATION error:

Add to /etc/keepalived/keepalived.conf on the keepalived node hosts
inside 

global_defs {}

After chunk
 

enable_script_security

include

# Synchro of the state of the connections between the LBs on the eth0 interface
  lvs_sync_daemon_interface eth0

 

5. Prepare rsyslog configuration and Inlcude additional keepalived options
to force keepalived log into /var/log/keepalived.log

To force keepalived log into /var/log/keepalived.log on RHEL 8 / CentOS and other Redhat Package Manager (RPM) Linux distributions

[root@server1:~]# vim /etc/rsyslog.d/48_keepalived.conf

#2022/02/02: HAProxy logs to local6, save the messages
local7.*                                                /var/log/keepalived.log
if ($programname == 'Keepalived') then -/var/log/keepalived.log
if ($programname == 'Keepalived_vrrp') then -/var/log/keepalived.log
& stop

[root@server:~]# touch /var/log/keepalived.log

Reload rsyslog to load new config
 

[root@server:~]# systemctl restart rsyslog
[root@server:~]# systemctl status rsyslog

 

rsyslog.service – System Logging Service
   Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/rsyslog.service.d
           └─rsyslog-service.conf
   Active: active (running) since Mon 2022-03-07 13:34:38 CET; 1 weeks 0 days ago
     Docs: man:rsyslogd(8)

           https://www.rsyslog.com/doc/
 Main PID: 269574 (rsyslogd)
    Tasks: 6 (limit: 100914)
   Memory: 5.1M
   CGroup: /system.slice/rsyslog.service
           └─269574 /usr/sbin/rsyslogd -n

Mar 15 08:15:16 server1lb-fqdn rsyslogd[269574]: — MARK —
Mar 15 08:35:16 server1lb-fqdn rsyslogd[269574]: — MARK —
Mar 15 08:55:16 server1lb-fqdn rsyslogd[269574]: — MARK —

 

If once keepalived is loaded but you still have no log written inside /var/log/keepalived.log

[root@server1:~]# vim /etc/sysconfig/keepalived
 KEEPALIVED_OPTIONS="-D -S 7"

[root@server2:~]# vim /etc/sysconfig/keepalived
 KEEPALIVED_OPTIONS="-D -S 7"

[root@server1:~]# systemctl restart keepalived.service
[root@server1:~]#  systemctl status keepalived

● keepalived.service – LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-02-24 12:12:20 CET; 2 weeks 4 days ago
 Main PID: 1030501 (keepalived)
    Tasks: 2 (limit: 100914)
   Memory: 1.8M
   CGroup: /system.slice/keepalived.service
           ├─1030501 /usr/sbin/keepalived -D
           └─1030502 /usr/sbin/keepalived -D

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

[root@server2:~]# systemctl restart keepalived.service
[root@server2:~]# systemctl status keepalived

6. Monitoring VRRP traffic of the two keepaliveds with tcpdump
 

Once both keepalived are up and running a good thing is to check the VRRP protocol traffic keeps fluently on both machines.
Keepalived VRRP keeps communicating over the TCP / IP Port 112 thus you can simply snoop TCP tracffic on its protocol.
 

[root@server1:~]# tcpdump proto 112

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:08:07.356187 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:08.356297 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:09.356408 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:10.356511 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:11.356655 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20

[root@server2:~]# tcpdump proto 112

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
​listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:08:07.356187 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:08.356297 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:09.356408 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:10.356511 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:11.356655 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20

As you can see the VRRP traffic on the network is originating only from server1lb-fqdn, this is so because host server1lb-fqdn is the keepalived configured master node.

It is possible to spoof the password configured to authenticate between two nodes, thus if you're bringing up keepalived service cluster make sure your security is tight at best the machines should be in a special local LAN DMZ, do not configure DMZ on the internet !!! 🙂 Or if you eventually decide to configure keepalived in between remote hosts, make sure you somehow use encrypted VPN or SSH tunnels to tunnel the VRRP traffic.

[root@server1:~]tcpdump proto 112 -vv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:36:25.530772 IP (tos 0xc0, ttl 255, id 59838, offset 0, flags [none], proto VRRP (112), length 40)
    server1lb-fqdn > vrrp.mcast.net: vrrp server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20, addrs: VIPIP_QA auth "testp431"
11:36:26.530874 IP (tos 0xc0, ttl 255, id 59839, offset 0, flags [none], proto VRRP (112), length 40)
    server1lb-fqdn > vrrp.mcast.net: vrrp server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20, addrs: VIPIP_QA auth "testp431"

Lets also check what floating IP is configured on the machines:

[root@server1:~]# ip -brief address show
lo               UNKNOWN        127.0.0.1/8 
eth0             UP             10.10.10.5/26 10.10.10.1/32 

The 10.10.10.5 IP is the main IP set on LAN interface eth0, 10.10.10.1 is the floating IP which as you can see is currently set by keepalived to listen on first node.

[root@server2:~]# ip -brief address show |grep -i 10.10.10.1

An empty output is returned as floating IP is currently configured on server1

To double assure ourselves the IP is assigned on correct machine, lets ping it and check the IP assigned MAC  currently belongs to which machine.
 

[root@server2:~]# ping 10.10.10.1
PING 10.10.10.1 (10.10.10.1) 56(84) bytes of data.
64 bytes from 10.10.10.1: icmp_seq=1 ttl=64 time=0.526 ms
^C
— 10.10.10.1 ping statistics —
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.526/0.526/0.526/0.000 ms

[root@server2:~]# arp -an |grep -i 10.44.192.142
? (10.10.10.1) at 00:48:54:91:83:7d [ether] on eth0
[root@server2:~]# ip a s|grep -i 00:48:54:91:83:7d
[root@server2:~]# 

As you can see from below output MAC is not found in configured IPs on server2.
 

[root@server1-fqdn:~]# /sbin/ip a s|grep -i 00:48:54:91:83:7d -B1 -A1
 eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:48:54:91:83:7d brd ff:ff:ff:ff:ff:ff
inet 10.10.10.1/26 brd 10.10.1.191 scope global noprefixroute eth0

Pretty much expected MAC is on keepalived node server1.

 

7. Testing keepalived on server1 and server2 maachines VIP floating IP really works
 

To test the overall configuration just created, you should stop keeaplived on the Master node and in meantime keep an eye on Slave node (server2), whether it can figure out the Master node is gone and switch its
state BACKUP to save MASTER. By changing the secondary (Slave) keepalived to master the floating IP: 10.10.10.1 will be brought up by the scripts on server2.

Lets assume that something went wrong with server1 VM host, for example the machine crashed due to service overload, DDoS or simply a kernel bug or whatever reason.
To simulate that we simply have to stop keepalived, then the broadcasted information on VRRP TCP/IP proto port 112 will be no longer available and keepalived on node server2, once
unable to communicate to server1 should chnage itself to state MASTER.

[root@server1:~]# systemctl stop keepalived
[root@server1:~]# systemctl status keepalived

● keepalived.service – LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Tue 2022-03-15 12:11:33 CET; 3s ago
  Process: 1192001 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 1192002 (code=exited, status=0/SUCCESS)

Mar 14 18:59:07 server1lb-fqdn Keepalived_vrrp[1192003]: Sending gratuitous ARP on eth0 for 10.10.10.1
Mar 15 12:11:32 server1lb-fqdn systemd[1]: Stopping LVS and VRRP High Availability Monitor…
Mar 15 12:11:32 server1lb-fqdn Keepalived[1192002]: Stopping
Mar 15 12:11:32 server1lb-fqdn Keepalived_vrrp[1192003]: (LB_VIP_QA) sent 0 priority
Mar 15 12:11:32 server1lb-fqdn Keepalived_vrrp[1192003]: (LB_VIP_QA) removing VIPs.
Mar 15 12:11:33 server1lb-fqdn Keepalived_vrrp[1192003]: Stopped – used 2.145252 user time, 15.513454 system time
Mar 15 12:11:33 server1lb-fqdn Keepalived[1192002]: CPU usage (self/children) user: 0.000000/44.555362 system: 0.001151/170.118126
Mar 15 12:11:33 server1lb-fqdn Keepalived[1192002]: Stopped Keepalived v2.1.5 (07/13,2020)
Mar 15 12:11:33 server1lb-fqdn systemd[1]: keepalived.service: Succeeded.
Mar 15 12:11:33 server1lb-fqdn systemd[1]: Stopped LVS and VRRP High Availability Monitor.

 

On keepalived off, you will get also a notification Email on the Receipt Email configured from keepalived.conf from the working keepalived node with a simple message like:

=> VRRP Instance is no longer owning VRRP VIPs <=

Once keepalived is back up you will get another notification like:

=> VRRP Instance is now owning VRRP VIPs <=

[root@server2:~]# systemctl status keepalived
● keepalived.service – LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2022-03-14 18:13:52 CET; 17h ago
  Process: 297366 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 297367 (keepalived)
    Tasks: 2 (limit: 100914)
   Memory: 2.1M
   CGroup: /system.slice/keepalived.service
           ├─297367 /usr/sbin/keepalived -D -S 7
           └─297368 /usr/sbin/keepalived -D -S 7

Mar 15 12:11:33 server2lb-fqdn Keepalived_vrrp[297368]: Sending gratuitous ARP on eth0 for 10.10.10.1
Mar 15 12:11:33 server2lb-fqdn Keepalived_vrrp[297368]: Sending gratuitous ARP on eth0 for 10.10.10.1
Mar 15 12:11:33 server2lb-fqdn Keepalived_vrrp[297368]: Remote SMTP server [127.0.0.1]:25 connected.
Mar 15 12:11:33 server2lb-fqdn Keepalived_vrrp[297368]: SMTP alert successfully sent.
Mar 15 12:11:38 server2lb-fqdn Keepalived_vrrp[297368]: (LB_VIP_QA) Sending/queueing gratuitous ARPs on eth0 for 10.10.10.1
Mar 15 12:11:38 server2lb-fqdn Keepalived_vrrp[297368]: Sending gratuitous ARP on eth0 for 10.10.10.1
Mar 15 12:11:38 server2lb-fqdn Keepalived_vrrp[297368]: Sending gratuitous ARP on eth0 for 10.10.10.1
Mar 15 12:11:38 server2lb-fqdn Keepalived_vrrp[297368]: Sending gratuitous ARP on eth0 for 10.10.10.1
Mar 15 12:11:38 server2lb-fqdn Keepalived_vrrp[297368]: Sending gratuitous ARP on eth0 for 10.10.10.1
Mar 15 12:11:38 server2lb-fqdn Keepalived_vrrp[297368]: Sending gratuitous ARP on eth0 for 10.10.10.1

[root@server2:~]#  ip addr show|grep -i 10.10.10.1
    inet 10.10.10.1/32 scope global eth0
    

As you see the VIP is now set on server2, just like expected – that's OK, everything works as expected. If the IP did not move double check the keepalived.conf on both nodes for errors or misconfigurations.

To recover the initial order of things so server1 is MASTER and server2 SLAVE host, we just have to switch on the keepalived on server1 machine.

[root@server1:~]# systemctl start keepalived

The automatic change of server1 to MASTER node and respective move of the VIP IP is done because of the higher priority (of importance we previously configured on server1 in keepalived.conf).
 

What we learned?
 

So what we learned in  this article?
We have seen how to easily install and configure a High Availability Load balancer with Keepalived with single floating VIP IP address with 1 MASTER and 1 SLAVE host and a Haproxy example config with few frontends / App backends. We have seen how the config can be tested for potential errors and how we can monitor whether the VRRP2 network traffic flows between nodes and how to potentially debug it further if necessery.
Further on rawly explained some of the keepalived configurations but as keepalived can do pretty much more,for anyone seriously willing to deal with keepalived on a daily basis or just fine tune some already existing ones, you better read closely its manual page "man keepalived.conf" as well as the official Redhat Linux documentation page on setting up a Linux cluster with Keepalived (Be prepare for a small nightmare as the documentation of it seems to be a bit chaotic, and even I would say partly missing or opening questions on what does the developers did meant – not strange considering the havoc that is pretty much as everywhere these days.)

Finally once keepalived hosts are prepared, it was shown how to test the keepalived application cluster and Floating IP does move between nodes in case if one of the 2 keepalived nodes is inaccessible.

The same logic can be repeated multiple times and if necessery you can set multiple VIPs to expand the HA reachable IPs solution.

high-availability-with-two-vips-example-diagram

The presented idea is with haproxy forward Proxy server to proxy requests towards Application backend (servince machines), however if you need to set another set of server on the flow to  process HTML / XHTML / PHP / Perl / Python  programming code, with some common Webserver setup ( Nginx / Apache / Tomcat / JBOSS) and enable SSL Secure certificate with lets say Letsencrypt, this can be relatively easily done. If you want to implement letsencrypt and a webserver check this redundant SSL Load Balancing with haproxy & keepalived article.

That's all folks, hope you enjoyed.
If you need to configure keepalived Cluster or a consultancy write your query here 🙂

Backup entire Live Linux Operating System bit by bit with dd, partimage, partclone clonezilla

Thursday, January 7th, 2021


dd-create-server-hard-drive-identical-mirror-data-copy-backups

This is an old stuff that we UNIX / Linux sysadmins use frequently when we need to migrate operating system from a certain older machine server to another newer one.
However I decided to blog it as it an interesting to know to a new grown junior sysadmins.

To Create a bit to bit data backup with dd command,
the following command is used to create a backup with dd, which takes the entire data content (including partition table etc.) with it:

dd if = / dev / [hard disk 1] of = / dev / [hard disk 2] bs = 512 conv = noerror, sync


For explanation:

 

"if" stands for the hard drive to be read from.
"of" stands for the hard drive to be written to.
Important! if and of must not be interchanged under any circumstances! In the worst case, the data on the disk to be read will otherwise be irrevocably overwritten!

"bs = 512" defines the block size. The value can be increased (which in turn increases the speed of the backup), but you should be sure that the file system to be backed up does not contain any errors. If you were to use block size 64k, for example, the speed of the backup is increased considerably – but if read errors occur within this block, the entire data block that dd has written contains unusable data. Therefore, when choosing the block size, you should always weigh data integrity and time against each other.
"noerror" tells dd to continue the backup in case of errors. Without this option, dd would stop the backup by default.
"sync" commands dd to replace the unreadable blocks with zeros in the event of errors in order to keep the data offset synchronous.
When performing a backup (as with other things that a longer period can take advantage of, it is always recommended (if you SSH is logged in and no direct access to a real Shell), the process either for CTRL + followed from bg to the background (can later be brought back to the foreground with fg ) or to use virtual session managers such as screen or byobu before executing the command.This prevents the process from dying if the SSH session is unintentionally terminated and you have to start over.

Of course there are plenty of other ways to make a mirror backup  cloneof a hard disk to lets say migrate to a new data center  using easier to use tools with (ncurses) Text menu interfaces to avoid bothering a complex typing on the console.
One such tool is Partclon:

Partclone-screenshot,_partclone-linux-create-mirror-disk-backups

PartClone cloning in action

Another text menu interface data cloning Linux tool commonly used by sysadms is partimage

Partimage-linux-screenshot

Most sysadmins however prefer to use Clonezilla when something more cozy is required to do a bit to bit data copy.
Tthere is even a Live Linux CD distribution for that.

Clonezilla can mirror most types of filesystems and partiontions and could be used not only for UNIX / Linux / BSD filesystems Live OS data (backups) (EXT3 / EXT4 / XFS / ZFS etc)  migrations, but also for old NT4 Windows server partitions. One useful application of Clonezilla i can think of is if you want to configure or restore a whole office of Windows computers running on the same clean version of Windows and same hardware configurations PCs, after a Virus or trojan has striked it. By using it you can clone from a central well configured Windows release with the surrounding applications to all machines for up to an hour with Clonezilla and you can even do it over a network.