Posts Tagged ‘root’

Monitoring chronyd time service is synchronized, get additional time server values with Zabbix userparameter script

Monday, March 21st, 2022

monitoring-chronyc-time-server-synchronization-zabbix-logo

If you''re running a server infrastructure and your main monitoring system is Zabbix. Then a vital check you might want to setup is to monitor the server time synchronization to a central server. In newer Linux OS-es ntpd time server is started to be used lesser and many modern Linux distributions used in the corporate realm are starting to recommend using chrony as a time synchronization client / server.

In this article, I'll show you how you can quickly setup monitoring of chronyd process and monitoring whether the time is successfully synchronizing with remote Chronyd time server. This will be done with a tiny one liner shell script setup as userparameter It is relatively easy then to setup an Action Alert


1. Create userparameter script to send parsed chronyd time synchronization to Zabbix Server

chronyc tracking provides plenty of useful data which can give many details about info such as offset, skew, root delay, stratum, update interval.

[root@server: ~]# chronyc tracking
Reference ID    : 0A32EF0B (fkf-intp01.intcs.meshcore.net)
Stratum         : 3
Ref time (UTC)  : Fri Mar 18 12:42:31 2022
System time     : 0.000032544 seconds fast of NTP time
Last offset     : +0.000031102 seconds
RMS offset      : 0.000039914 seconds
Frequency       : 3.037 ppm slow
Residual freq   : +0.000 ppm
Skew            : 0.023 ppm
Root delay      : 0.017352410 seconds
Root dispersion : 0.004285847 seconds
Update interval : 1041.6 seconds
Leap status     : Normal

[root@server zabbix_agentd.d]# cat userparameter_chrony.conf 
UserParameter=chrony.json,chronyc -c tracking | sed -e s/'^'/'{"chrony":[“‘/g -e s/’$’/'”]}'/g -e s/','/'","'/g
[root@server zabbix_agentd.d]#

The -c option passed to chronyc is printing the chronyc tracking command ouput data in comma-separated values ( CSV ) format.

2. Create Necessery Item key to get chronyd processes and catch the userparameter data

 

  • First lets create a an Item key to calculate the chronyd daemon proc.num
    proc.num – simply returns the number of processes in the process list just like a simple
    pgrep servicename command does.


monitroing-chronyc_zabbix_item_report_to-zabbix

Second lets create the Item for the userparameter script, the chrony.json key should be the same as the key given in the userparameter script.

obtain-chronyc-statistic-variables-from-remote-chronyd-to-zabbix-windows

Create Chrony Zabbix Triggers 
 

Expression 

{server-host:proc.num[chronyd].last()}<1


will be triggered if the process of chronyd on the server is less than 1

 

chronyc_monitoring_process-is-not-running-screenshot

Next configure

{server-host:chrony[Leap status].iregexp[Not synchronised) ]=1


to trigger Alert Chronyd is Not synchronized if the Expression check occurs.

chronyd-is-not-synchronized-trigger-iregexp

Reload the zabbix-agent on the server
 

To make zabbix-agent locally installed on the machine read the userparameter into memory  (in my case this is zabbix-agent-4.0.28-1.el8.x86_64) installed on Redhat 8.3 (Ootpa), you have to restart it.

[root@server: ~]# systemctl restart zabbix-agent
[root@server: ~]# systemctl status zabbix-agent

● zabbix-agent.service – Zabbix Agent
   Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2021-12-16 16:41:02 CET; 3 months 0 days ago
 Main PID: 862165 (zabbix_agentd)
    Tasks: 6 (limit: 23662)
   Memory: 20.6M
   CGroup: /system.slice/zabbix-agent.service
           ├─862165 /usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf
           ├─862166 /usr/sbin/zabbix_agentd: collector [idle 1 sec]
           ├─862167 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection]
           ├─862168 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection]
           ├─862169 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection]
           └─862170 /usr/sbin/zabbix_agentd: active checks #1 [idle 1 sec]

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.


In a short while you should be seeing in the chrony.json key History data fed by the userparameter Script.

In Zabbix Latest data, you will see plenty of interesting time synchronization data get reported such as Skew, Stratum, Root Delay, Update Interval, Frequency etc.

chronyd-zabbix-reported-time-synchronization-offset-leap-residual-freq-root-delay-screenshot

To have an email Alerting further, go and setup a new Zabbix Action based on the Trigger with your likings and you're done. 
The tracked machine will be in zabbix to make sure your OS clock is not afar from the time server. Repeat the same steps if you need to track chronyd is up running and synchronized on few machines, or if you have to make it for dozens setup a Zabbix template.

Disable VNC on KVM Virtual Machine without VM restart / How to Change VNC listen address

Monday, February 28th, 2022

disable-vnc-port-listener-on-a-KVM-ran-virtual-machine-virsh-libvirt-libvirt-architecture-design

Say you have recently run a new KVM Virtual machine, have connected via VNC on lets say the default tcp port 5900 
installed a brand new Linux OS using a VNC client to connect, such as:
TightVNC / RealVNC if connecting from Windows Client machine or Vncviewer / Remmina if connecting from Linux / BSD  and now 
you want to turn off the VNC VM listener server either for security reasons to make sure some script kiddie random scanner did not manage to connect and take control over your VM or just because, you will be only further using the new configured VM only via SSH console sessions as they call it in modern times to make a buziness buzz out of it a headless UNIX server (server machines connected a network without a Physical monitor attached to it).


The question comes then how can be the KVM VNC listener on TCP port 5900 be completely disabled?

One way of course is to filter out with a firewall 5900 completely either on a Switch Level (lets say on a Cisco equipment catalist in front of the machine) or the worst solution to  locally filter directly on the server with firewalld or iptables chain rules.
 

1. Disable KVM VNC Port listener via VIRSH VM XML edit

The better way of course  is to completely disable the VNC using KVM, that is possible through the virsh command interface.
By editing the XML Virtual Machine configuration and finding the line about vnc confiuguration with:

root@server:/kvm/disk# virsh edit pcfreakweb
Domain pcfreakweb XML configuration not changed.

like:

<graphics type='vnc' port='5900' autoport='yes' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>


and set value to undefined:

port='-1'


virsh-KVM-disable-VNC-port-listener-virsh-xml-edit-screenshot

Modifying the XML however will require you to reboot the Virtual Machine for which XML was editted. This might be not possible
if you have a running production server already configured with Apache / Proxy / PostgreSQL / Mail or any other Internet public service.

2. Disable VNC KVM TCP port 5900 to a dynamic running VM without a machine reboot


Thus if you want to remove the KVM VNC Port Listener on 5900 without a VM shutdown / reboot you can do it via KVM's virsh client interface.

root@server:/kvm/disk# virsh
Welcome to virsh, the virtualization interactive terminal.

Type:  'help' for help with commands
       'quit' to quit

virsh # qemu-monitor-command pcfreakweb –hmp change  vnc none

 

The virsh management user interface client, can do pretty much more of real time VM changes, it is really useful to use it if you have KVM Hypervisor hosts with 10+ Virtual machines and it if you have to deal with KVM machines on daily, do specific changes to the VMs on how VM networks are configured, information on HV hardware, configure / reconfigure storage volumes to VMs etc, take some time to play with it 🙂

List and fix failed systemd failed services after Linux OS upgrade and how to get full info about systemd service from jorunal log

Friday, February 25th, 2022

systemd-logo-unix-linux-list-failed-systemd-services

I have recently upgraded a number of machines from Debian 10 Buster to Debian 11 Bullseye. The update as always has some issues on some machines, such as problem with package dependencies, changing a number of external package repositories etc. to match che Bullseye deb packages. On some machines the update was less painful on others but the overall line was that most of the machines after the update ended up with one or more failed systemd services. It could be that some of the machines has already had this failed services present and I never checked them from the previous time update from Debian 9 -> Debian 10 or just some mess I've left behind in the hurry when doing software installation in the past. This doesn't matter anyways the fact was that I had to deal to a number of systemctl services which I managed to track by the Failed service mesage on system boot on one of the physical machines and on the OpenXen VTY Console the rest of Virtual Machines after update had some Failed messages. Thus I've spend some good amount of time like an overall of a day or two fixing strange failed services. This is how this small article was born in attempt to help sysadmins or any home Linux desktop users, who has updated his Debian Linux / Ubuntu or any other deb based distribution but due to the chaotic nature of Linux has ended with same strange Failed services and look for a way to find the source of the failures and get rid of the problems. 
Systemd is a very complicated system and in my many sysadmin opinion it makes more problems than it solves, but okay for today's people's megalomania mindset it matches well.

Systemd_components-systemd-journalctl-cgroups-loginctl-nspawn-analyze.svg

 

1. Check the journal for errors, running service irregularities and so on
 

First thing to do to track for errors, right after the update is to take some minutes and closely check,, the journalctl for any strange errors, even on well maintained Unix machines, this journal log would bring you to a problem that is not fatal but still some process or stuff is malfunctioning in the background that you would like to solve:
 

root@pcfreak:~# journalctl -x
Jan 10 10:10:01 pcfreak CRON[17887]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17887]: USER_END pid=17887 uid=0 auid=0 ses=340858 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17888]: CRED_DISP pid=17888 uid=0 auid=0 ses=340860 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="root" exe="/usr/sbin/cron" >
Jan 10 10:10:01 pcfreak CRON[17888]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17888]: USER_END pid=17888 uid=0 auid=0 ses=340860 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17884]: CRED_DISP pid=17884 uid=0 auid=0 ses=340855 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="root" exe="/usr/sbin/cron" >
Jan 10 10:10:01 pcfreak CRON[17884]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17884]: USER_END pid=17884 uid=0 auid=0 ses=340855 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17886]: CRED_DISP pid=17886 uid=0 auid=33 ses=340859 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="www-data" exe="/usr/sbin/c>
Jan 10 10:10:01 pcfreak CRON[17886]: pam_unix(cron:session): session closed for user www-data
Jan 10 10:10:01 pcfreak audit[17886]: USER_END pid=17886 uid=0 auid=33 ses=340859 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permi>
Jan 10 10:10:08 pcfreak NetworkManager[696]:  [1641802208.0899] device (eth1): carrier: link connected
Jan 10 10:10:08 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:08 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:19 pcfreak NetworkManager[696]:
 [1641802219.7920] device (eth1): carrier: link connected
Jan 10 10:10:19 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:20 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:22 pcfreak NetworkManager[696]:
 [1641802222.2772] device (eth1): carrier: link connected
Jan 10 10:10:22 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:23 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:33 pcfreak sshd[18142]: Unable to negotiate with 66.212.17.162 port 19255: no matching key exchange method found. Their offer: diffie-hellman-group14-sha1,diff>
Jan 10 10:10:41 pcfreak NetworkManager[696]:
 [1641802241.0186] device (eth1): carrier: link connected
Jan 10 10:10:41 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx

If you want to only check latest journal log messages use the -x -e (pager catalog) opts

root@pcfreak;~# journalctl -xe

Feb 25 13:08:29 pcfreak audit[2284920]: USER_LOGIN pid=2284920 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='op=login acct=28696E76616C>
Feb 25 13:08:29 pcfreak sshd[2284920]: Received disconnect from 177.87.57.145 port 40927:11: Bye Bye [preauth]
Feb 25 13:08:29 pcfreak sshd[2284920]: Disconnected from invalid user ubuntuuser 177.87.57.145 port 40927 [preauth]

Next thing to after the update was to get a list of failed service only.


2. List all systemd failed check services which was supposed to be running

root@pcfreak:/root # systemctl list-units | grep -i failed
● certbot.service                                                                                                       loaded failed failed    Certbot
● logrotate.service                                                                                                     loaded failed failed    Rotate log files
● maldet.service                                                                                                        loaded failed failed    LSB: Start/stop maldet in monitor mode
● named.service                                                                                                         loaded failed failed    BIND Domain Name Server


Alternative way is with the –failed option

hipo@jeremiah:~$ systemctl list-units –failed
  UNIT                        LOAD   ACTIVE SUB    DESCRIPTION
● haproxy.service             loaded failed failed HAProxy Load Balancer
● libvirt-guests.service      loaded failed failed Suspend/Resume Running libvirt Guests
● libvirtd.service            loaded failed failed Virtualization daemon
● nvidia-persistenced.service loaded failed failed NVIDIA Persistence Daemon
● sqwebmail.service           masked failed failed sqwebmail.service
● tpm2-abrmd.service          loaded failed failed TPM2 Access Broker and Resource Management Daemon
● wd_keepalive.service        loaded failed failed LSB: Start watchdog keepalive daemon

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
7 loaded units listed.

 

root@jeremiah:/etc/apt/sources.list.d#  systemctl list-units –failed
  UNIT                        LOAD   ACTIVE SUB    DESCRIPTION
● haproxy.service             loaded failed failed HAProxy Load Balancer
● libvirt-guests.service      loaded failed failed Suspend/Resume Running libvirt Guests
● libvirtd.service            loaded failed failed Virtualization daemon
● nvidia-persistenced.service loaded failed failed NVIDIA Persistence Daemon
● sqwebmail.service           masked failed failed sqwebmail.service
● tpm2-abrmd.service          loaded failed failed TPM2 Access Broker and Resource Management Daemon
● wd_keepalive.service        loaded failed failed LSB: Start watchdog keepalive daemon


To get a full list of objects of systemctl you can pass as state:
 

# systemctl –state=help
Full list of possible load states to pass is here
Show service properties


Check whether a service is failed or has other status and check default set systemd variables for it.

root@jeremiah~:# systemctl is-failed vboxweb.service
inactive

# systemctl show haproxy
Type=notify
Restart=always
NotifyAccess=main
RestartUSec=100ms
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
TimeoutAbortUSec=1min 30s
TimeoutStartFailureMode=terminate
TimeoutStopFailureMode=terminate
RuntimeMaxUSec=infinity
WatchdogUSec=0
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
SuccessExitStatus=143
MainPID=304858
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ReloadResult=success
CleanResult=success

Full output of the above command is dumped in show_systemctl_properties.txt


3. List all running systemd services for a better overview on what's going on on machine
 

To get a list of all properly systemd loaded services you can use –state running.

hipo@jeremiah:~$ systemctl list-units –state running|head -n 10
  UNIT                              LOAD   ACTIVE SUB     DESCRIPTION
  proc-sys-fs-binfmt_misc.automount loaded active running Arbitrary Executable File Formats File System Automount Point
  cups.path                         loaded active running CUPS Scheduler
  init.scope                        loaded active running System and Service Manager
  session-2.scope                   loaded active running Session 2 of user hipo
  accounts-daemon.service           loaded active running Accounts Service
  anydesk.service                   loaded active running AnyDesk
  apache-htcacheclean.service       loaded active running Disk Cache Cleaning Daemon for Apache HTTP Server
  apache2.service                   loaded active running The Apache HTTP Server
  avahi-daemon.service              loaded active running Avahi mDNS/DNS-SD Stack

 

It is useful thing is to list all unit-files configured in systemd and their state, you can do it with:

 


root@pcfreak:~# systemctl list-unit-files
UNIT FILE                                                                 STATE           VENDOR PRESET
proc-sys-fs-binfmt_misc.automount                                         static          –            
-.mount                                                                   generated       –            
backups.mount                                                             generated       –            
dev-hugepages.mount                                                       static          –            
dev-mqueue.mount                                                          static          –            
media-cdrom0.mount                                                        generated       –            
mnt-sda1.mount                                                            generated       –            
proc-fs-nfsd.mount                                                        static          –            
proc-sys-fs-binfmt_misc.mount                                             disabled        disabled     
run-rpc_pipefs.mount                                                      static          –            
sys-fs-fuse-connections.mount                                             static          –            
sys-kernel-config.mount                                                   static          –            
sys-kernel-debug.mount                                                    static          –            
sys-kernel-tracing.mount                                                  static          –            
var-www.mount                                                             generated       –            
acpid.path                                                                masked          enabled      
cups.path                                                                 enabled         enabled      

 

 


root@pcfreak:~# systemctl list-units –type service –all
  UNIT                                   LOAD      ACTIVE   SUB     DESCRIPTION
  accounts-daemon.service                loaded    inactive dead    Accounts Service
  acct.service                           loaded    active   exited  Kernel process accounting
● alsa-restore.service                   not-found inactive dead    alsa-restore.service
● alsa-state.service                     not-found inactive dead    alsa-state.service
  apache2.service                        loaded    active   running The Apache HTTP Server
● apparmor.service                       not-found inactive dead    apparmor.service
  apt-daily-upgrade.service              loaded    inactive dead    Daily apt upgrade and clean activities
 apt-daily.service                      loaded    inactive dead    Daily apt download activities
  atd.service                            loaded    active   running Deferred execution scheduler
  auditd.service                         loaded    active   running Security Auditing Service
  auth-rpcgss-module.service             loaded    inactive dead    Kernel Module supporting RPCSEC_GSS
  avahi-daemon.service                   loaded    active   running Avahi mDNS/DNS-SD Stack
  certbot.service                        loaded    inactive dead    Certbot
  clamav-daemon.service                  loaded    active   running Clam AntiVirus userspace daemon
  clamav-freshclam.service               loaded    active   running ClamAV virus database updater
..

 


linux-systemd-components-diagram-linux-kernel-system-targets-systemd-libraries-daemons

 

4. Finding out more on why a systemd configured service has failed


Usually getting info about failed systemd service is done with systemctl status servicename.service
However, in case of troubles with service unable to start to get more info about why a service has failed with (-l) or (–full) options


root@pcfreak:~# systemctl -l status logrotate.service
● logrotate.service – Rotate log files
     Loaded: loaded (/lib/systemd/system/logrotate.service; static)
     Active: failed (Result: exit-code) since Fri 2022-02-25 00:00:06 EET; 13h ago
TriggeredBy: ● logrotate.timer
       Docs: man:logrotate(8)
             man:logrotate.conf(5)
    Process: 2045320 ExecStart=/usr/sbin/logrotate /etc/logrotate.conf (code=exited, status=1/FAILURE)
   Main PID: 2045320 (code=exited, status=1/FAILURE)
        CPU: 2.479s

Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: For now we will assume you meant to write /32
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| ERROR: '0.0.0.0/0.0.0.0' needs to be replaced by the term 'all'.
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| SECURITY NOTICE: Overriding config setting. Using 'all' instead.
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: (B) '::/0' is a subnetwork of (A) '::/0'
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: because of this '::/0' is ignored to keep splay tree searching predictable
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: You should probably remove '::/0' from the ACL named 'all'
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Failed with result 'exit-code'.
Feb 25 00:00:06 pcfreak systemd[1]: Failed to start Rotate log files.
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Consumed 2.479s CPU time.


systemctl -l however is providing only the last log from message a started / stopped or whatever status service has generated. Sometimes systemctl -l servicename.service is showing incomplete the splitted error message as there is a limitation of line numbers on the console, see below

 

root@pcfreak:~# systemctl status -l certbot.service
● certbot.service – Certbot
     Loaded: loaded (/lib/systemd/system/certbot.service; static)
     Active: failed (Result: exit-code) since Fri 2022-02-25 09:28:33 EET; 4h 0min ago
TriggeredBy: ● certbot.timer
       Docs: file:///usr/share/doc/python-certbot-doc/html/index.html
             https://certbot.eff.org/docs
    Process: 290017 ExecStart=/usr/bin/certbot -q renew (code=exited, status=1/FAILURE)
   Main PID: 290017 (code=exited, status=1/FAILURE)
        CPU: 9.771s

Feb 25 09:28:33 pcfrxen certbot[290017]: The error was: PluginError('An authentication script must be provided with –manual-auth-hook when using th>
Feb 25 09:28:33 pcfrxen certbot[290017]: All renewals failed. The following certificates could not be renewed:
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/mail.pcfreak.org-0003/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/www.eforia.bg-0005/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/zabbix.pc-freak.net/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]: 3 renew failure(s), 5 parse failure(s)
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Failed with result 'exit-code'.
Feb 25 09:28:33 pcfrxen systemd[1]: Failed to start Certbot.
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Consumed 9.771s CPU time.

 

5. Get a complete log of journal to make sure everything configured on server host runs as it should

Thus to get more complete list of the message and be able to later google and look if has come with a solution on the internet  use:

root@pcfrxen:~#  journalctl –catalog –unit=certbot

— Journal begins at Sat 2022-01-22 21:14:05 EET, ends at Fri 2022-02-25 13:32:01 EET. —
Jan 23 09:58:18 pcfrxen systemd[1]: Starting Certbot…
░░ Subject: A start job for unit certbot.service has begun execution
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ A start job for unit certbot.service has begun execution.
░░ 
░░ The job identifier is 5754.
Jan 23 09:58:20 pcfrxen certbot[124996]: Traceback (most recent call last):
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/renewal.py", line 71, in _reconstitute
Jan 23 09:58:20 pcfrxen certbot[124996]:     renewal_candidate = storage.RenewableCert(full_path, config)
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/storage.py", line 471, in __init__
Jan 23 09:58:20 pcfrxen certbot[124996]:     self._check_symlinks()
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/storage.py", line 537, in _check_symlinks

root@server:~# journalctl –catalog –unit=certbot|grep -i pluginerror|tail -1
Feb 25 09:28:33 pcfrxen certbot[290017]: The error was: PluginError('An authentication script must be provided with –manual-auth-hook when using the manual plugin non-interactively.')


Or if you want to list and read only the last messages in the journal log regarding a service

root@server:~# journalctl –catalog –pager-end –unit=certbot


If you have disabled a failed service because you don't need it to run at all on the machine with:

root@rhel:~# systemctl stop rngd.service
root@rhel:~# systemctl disable rngd.service

And you want to clear up any failed service information that is kept in the systemctl service log you can do it with:
 

root@rhel:~# systemctl reset-failed

Another useful systemctl option is cat, you can use it to easily list a service it is useful to quickly check what is a service, an actual shortcut to save you from giving a full path to the service e.g. cat /lib/systemd/system/certbot.service

root@server:~# systemctl cat certbot
# /lib/systemd/system/certbot.service
[Unit]
Description=Certbot
Documentation=file:///usr/share/doc/python-certbot-doc/html/index.html
Documentation=https://certbot.eff.org/docs
[Service]
Type=oneshot
ExecStart=/usr/bin/certbot -q renew
PrivateTmp=true


After failed SystemD services are fixed, it is best to reboot the machine and check put some more time to inspect rawly the complete journal log to make sure, no error  was left behind.


Closure
 

As you can see updating a machine from a major to a major version even if you follow the official documentation and you have plenty of experience is always more or a less a pain in the ass, which can eat up much of your time banging your head solving problems with failed daemons issues with /etc/rc.local (which I have faced becase of #/bin/sh -e (which would make /etc/rc.local) to immediately quit if any error from command $? returns different from 0 etc.. The  logical questions comes then;
1. Is it really worthy to update at all regularly, especially if you don't know of a famous major Vulnerability 🙂 ?
2. Or is it worthy to update from OS major release to OS major release at all?  
3. Or should you only try to patch the service that is exposed to an external reachable computer network or the internet only and still the the same OS release until End of Life (LTS = Long Term Support) as called in Debian or  End Of Life  (EOL) Cycle as called in RPM based distros the period until the OS major release your software distro has official security patches is reached.

Anyone could take any approach but for my own managed systems small network at home my practice was always to try to keep up2date everything every 3 or 6 months maximum. This has caused me multiple days of irritation and stress and perhaps many white hairs and spend nerves on shit.


4. Based on the company where I'm employed the better strategy is to patch to the EOL is still offered and keep the rule First Things First (FTF), once the EOL is reached, just make a copy of all servers data and configuration to external Data storage, bring up a new Physical or VM and migrate the services.
Test after the migration all works as expected if all is as it should be change the DNS records or Leading Infrastructure Proxies whatever to point to the new service and that's it! Yes it is true that migration based on a full OS reinstall is more time consuming and requires much more planning, but usually the result is much more expected, plus it is much less stressful for the guy doing the job.

How to disable appArmor automatically installed and loaded after Linux Debian 10 to 11 Upgrade. Disable Apparmour on Deb based Linux

Friday, January 28th, 2022

check-apparmor-status-linux-howto-disable-Apparmor_on-debian-ubuntu-mint-and-other-deb-based-linux-distributions

I've upgraded recently all my machines from Debian Buster Linux 10 to Debian 11 Bullseye (if you wonder what Bullseye is) this is one of the heroes of Disneys Toy Stories which are used for a naming of General Debian Distributions.
After the upgrade most of the things worked expected, expect from some stuff like MariaDB (MySQL) and other weirdly behaving services. After some time of investigation being unable to find out what was causing the random issues observed on the machines. I finally got the strange daemon improper functioning and crashing was caused by AppArmor.

AppArmor ("Application Armor") is a Linux kernel security module that allows the system administrator to restrict programs' capabilities with per-program profiles. Profiles can allow capabilities like network access, raw socket access, and the permission to read, write, or execute files on matching paths. AppArmor supplements the traditional Unix discretionary access control (DAC) model by providing mandatory access control (MAC). It has been partially included in the mainline Linux kernel since version 2.6.36 and its development has been supported by Canonical since 2009.

The general idea of apparmor is wonderful as it could really strengthen system security, however it should be setup on install time and not setup on update time. For one more time I got convinced myself that upgrading from version to version to keep up to date with security is a hard task and often the results are too much unexpected and a better way to upgrade from General version to version any modern Linux / Unix distribution (and their forked mobile equivalents Android etc.) is to just make a copy of the most important configuration, setup the services on a freshly new installed machine be it virtual or a physical Server and rebuild the whole system from scratch, test and then run the system in production, substituting the old server general version with the new machine. 

The rest is leading to so much odd issues like this time with AppArmors causing distractions on the servers hosted applications.

But enough rent if you're unlucky and unwise enough to try to Upgrade Debian / Ubuntu 20, 21 / Mint 18, 19 etc. or whatever Deb distro from older general release to a newer One. Perhaps the best first thing to do onwards is stop and remove AppArmor (those who are hardcore enthusiasts could try to enable the failing services due to apparmor), by disabling the respective apparmor hardening profile but i did not have time to waste on stupid stuff and experiment so I preferred to completely stop it. 

To identify the upgrade oddities has to deal with apparmors service enabled security protections you should be able to find respective records inside /var/log/messages as well as in /var/log/audit/audit.log

 

# dmesg

[   64.210463] audit: type=1400 audit(1548120161.662:21): apparmor="DENIED" operation="sendmsg" info="Failed name lookup – disconnected path" error=-13 profile="/usr/sbin/mysqld" name="run/systemd/notify" pid=2527 comm="mysqld" requested_mask="w" denied_mask="w" fsuid=113 ouid=0
[  144.364055] audit: type=1400 audit(1548120241.595:22): apparmor="DENIED" operation="sendmsg" info="Failed name lookup – disconnected path" error=-13 profile="/usr/sbin/mysqld" name="run/systemd/notify" pid=2527 comm="mysqld" requested_mask="w" denied_mask="w" fsuid=113 ouid=0
[  144.465883] audit: type=1400 audit(1548120241.699:23): apparmor="DENIED" operation="sendmsg" info="Failed name lookup – disconnected path" error=-13 profile="/usr/sbin/mysqld" name="run/systemd/notify" pid=2527 comm="mysqld" requested_mask="w" denied_mask="w" fsuid=113 ouid=0
[  144.566363] audit: type=1400 audit(1548120241.799:24): apparmor="DENIED" operation="sendmsg" info="Failed name lookup – disconnected path" error=-13 profile="/usr/sbin/mysqld" name="run/systemd/notify" pid=2527 comm="mysqld" requested_mask="w" denied_mask="w" fsuid=113 ouid=0
[  144.666722] audit: type=1400 audit(1548120241.899:25): apparmor="DENIED" operation="sendmsg" info="Failed name lookup – disconnected path" error=-13 profile="/usr/sbin/mysqld" name="run/systemd/notify" pid=2527 comm="mysqld" requested_mask="w" denied_mask="w" fsuid=113 ouid=0
[  144.767069] audit: type=1400 audit(1548120241.999:26): apparmor="DENIED" operation="sendmsg" info="Failed name lookup – disconnected path" error=-13 profile="/usr/sbin/mysqld" name="run/systemd/notify" pid=2527 comm="mysqld" requested_mask="w" denied_mask="w" fsuid=113 ouid=0
[  144.867432] audit: type=1400 audit(1548120242.099:27): apparmor="DENIED" operation="sendmsg" info="Failed name lookup – disconnected path" error=-13 profile="/usr/sbin/mysqld" name="run/systemd/notify" pid=2527 comm="mysqld" requested_mask="w" denied_mask="w" fsuid=113 ouid=0


1. How to check if AppArmor is running on the system

If you have a system with enabled apparmor you should get some output like:

root@haproxy2:~# apparmor_status 
apparmor module is loaded.
5 profiles are loaded.
5 profiles are in enforce mode.
   /usr/sbin/ntpd
   lsb_release
   nvidia_modprobe
   nvidia_modprobe//kmod
   tcpdump
0 profiles are in complain mode.
1 processes have profiles defined.
1 processes are in enforce mode.
   /usr/sbin/ntpd (387) 
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.


Also if you check the service you will find out that Debian's Major Release upgrade from 10 Buster to 11 BullsEye with.

apt update -y && apt upgrade -y && apt dist-update -y

automatically installed apparmor and started the service, e.g.:

# systemctl status apparmor
● apparmor.service – Load AppArmor profiles
     Loaded: loaded (/lib/systemd/system/apparmor.service; enabled; vendor pres>
     Active: active (exited) since Sat 2022-01-22 23:04:58 EET; 5 days ago
       Docs: man:apparmor(7)
             https://gitlab.com/apparmor/apparmor/wikis/home/
    Process: 205 ExecStart=/lib/apparmor/apparmor.systemd reload (code=exited, >
   Main PID: 205 (code=exited, status=0/SUCCESS)
        CPU: 43ms

яну 22 23:04:58 haproxy2 apparmor.systemd[205]: Restarting AppArmor
яну 22 23:04:58 haproxy2 apparmor.systemd[205]: Reloading AppArmor profiles
яну 22 23:04:58 haproxy2 systemd[1]: Starting Load AppArmor profiles…
яну 22 23:04:58 haproxy2 systemd[1]: Finished Load AppArmor profiles.

 

# dpkg -l |grep -i apparmor
ii  apparmor                          2.13.6-10                      amd64        user-space parser utility for AppArmor
ii  libapparmor1:amd64                2.13.6-10                      amd64        changehat AppArmor library
ii  libapparmor-perl:amd64               2.13.6-10


In case AppArmor is disabled, you will get something like:

root@pcfrxenweb:~# aa-status 
apparmor module is loaded.
0 profiles are loaded.
0 profiles are in enforce mode.
0 profiles are in complain mode.
0 processes have profiles defined.
0 processes are in enforce mode.
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.


2. How to disable AppArmor for particular running services processes

In my case after the upgrade of a system running a MySQL Server suddenly out of nothing after reboot the Database couldn't load up properly and if I try to restart it with the usual

root@pcfrxen: /# systemctl restart mariadb

I started getting errors like:

DBI connect failed : Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

To get an idea of what kind of profile definitions, could be enabled disabled on apparmor enabled system do:
 

root@pcfrxen:/var/log# ls -1 /etc/apparmor.d/
abstractions/
force-complain/
local/
lxc/
lxc-containers
samba/
system_tor
tunables/
usr.bin.freshclam
usr.bin.lxc-start
usr.bin.man
usr.bin.tcpdump
usr.lib.telepathy
usr.sbin.clamd
usr.sbin.cups-browsed
usr.sbin.cupsd
usr.sbin.ejabberdctl
usr.sbin.mariadbd
usr.sbin.mysqld
usr.sbin.named
usr.sbin.ntpd
usr.sbin.privoxy
usr.sbin.squid

Lets say you want to disable any protection AppArmor profile for MySQL you can do it with:

root@pcfrxen:/ #  ln -s /etc/apparmor.d/usr.sbin.mysqld /etc/apparmor.d/disable/
root@pcfrxen:/ # apparmor_parser -R /etc/apparmor.d/usr.sbin.mysqld 


To make the system know you have disabled a profile you should restart apparmor service:
 

root@pcfrxen:/ # systemctl restart apparmor.service


3. Disable completely AppArmor to save your time weird system behavior and hang bangs

In my opinion the best thing to do anyways, especially if you don't run Containerized applications, that runs only one single application / service at at time is to completely disable apparmor, otherwise you would have to manually check each of the running applications before the upgrade and make sure that apparmor did not bring havoc to some of it.
Hence my way was to simple get rid of apparmor by disable and remove the related package completely out of the system to do so:

root@pcfrxen:/ # systemctl stop apparmor
root@pcfrxen:/ # systemctl disable apparmor
root@pcfrxen:/ # apt-get remove -y apparmor

Once  disabled to make the system completely load out anything loaded related to apparmor loaded into system memory, you should do machine reboot.

root@pcfrxen:/ # shutdown -r now

Hopefully if you run into same issue after removal of apparmor most of the things should be working fine after the upgrade. Anyways I had to go through each and every app everywhere and make sure it is working as expected. The major release upgrade has also automatically enabled me some of the already disable services, thus if you have upgraded like me I would advice you do a close check on every enabled / running service everywhere:

root@pcfrxen:/# systemctl list-unit-files|grep -i enabled

Beware of AppArmor  !!! 🙂

Install and enable Sysstats IO / DIsk / CPU / Network monitoring console suite on Redhat 8.3, Few sar useful command examples

Tuesday, September 28th, 2021

linux-sysstat-monitoring-logo

 

Why to monitoring CPU, Memory, Hard Disk, Network usage etc. with sysstats tools?
 

Using system monitoring tools such as Zabbix, Nagios Monit is a good approach, however sometimes due to zabbix server interruptions you might not be able to track certain aspects of system performance on time. Thus it is always a good idea to 
Gain more insights on system peroformance from command line. Of course there is cmd tools such as iostat and top, free, vnstat that provides plenty of useful info on system performance issues or bottlenecks. However from my experience to have a better historical data that is systimized and all the time accessible from console it is a great thing to have sysstat package at place. Since many years mostly on every server I administer, I've been using sysstats to monitor what is going on servers over a short time frames and I'm quite happy with it. In current company we're using Redhats and CentOS-es and I had to install sysstats on Redhat 8.3. I've earlier done it multiple times on Debian / Ubuntu Linux and while I've faced on some .deb distributions complications of making sysstat collect statistics I've come with an article on Howto fix sysstat Cannot open /var/log/sysstat/sa no such file or directory” on Debian / Ubuntu Linux
 

Sysstat contains the following tools related to collecting I/O and CPU statistics:
iostat
Displays an overview of CPU utilization, along with I/O statistics for one or more disk drives.
mpstat
Displays more in-depth CPU statistics.
Sysstat also contains tools that collect system resource utilization data and create daily reports based on that data. These tools are:
sadc
Known as the system activity data collector, sadc collects system resource utilization information and writes it to a file.
sar
Producing reports from the files created by sadc, sar reports can be generated interactively or written to a file for more intensive analysis.

My experience with CentOS 7 and Fedora to install sysstat it was pretty straight forward, I just had to install it via yum install sysstat wait for some time and use sar (System Activity Reporter) tool to report collected system activity info stats over time.
Unfortunately it seems on RedHat 8.3 as well as on CentOS 8.XX instaling sysstats does not work out of the box.

To complete a successful installation of it on RHEL 8.3, I had to:

[root@server ~]# yum install -y sysstat


To make sysstat enabled on the system and make it run, I've enabled it in sysstat

[root@server ~]# systemctl enable sysstat


Running immediately sar command, I've faced the shitty error:


Cannot open /var/log/sysstat/sa18:
No such file or directory. Please check if data collecting is enabled”

 

Once installed I've waited for about 5 minutes hoping, that somehow automatically sysstat would manage it but it didn't.

To solve it, I've had to create additionally file /etc/cron.d/sysstat (weirdly RPM's post install instructions does not tell it to automatically create it)

[root@server ~]# vim /etc/cron.d/sysstat

# run system activity accounting tool every 10 minutes
0 * * * * root /usr/lib64/sa/sa1 60 59 &
# generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib64/sa/sa2 -A &

 

  • /usr/local/lib/sa1 is a shell script that we can use for scheduling cron which will create daily binary log file.
  • /usr/local/lib/sa2 is a shell script will change binary log file to human-readable form.

 

[root@server ~]# chmod 600 /etc/cron.d/sysstat

[root@server ~]# systemctl restart sysstat


In a while if sysstat is working correctly you should get produced its data history logs inside /var/log/sa

[root@server ~]# ls -al /var/log/sa 


Note that the standard sysstat history files on Debian and other modern .deb based distros such as Debian 10 (in  y.2021) is stored under /var/log/sysstat

Here is few useful uses of sysstat cmds


1. Check with sysstat machine history SWAP and RAM Memory use


To lets say check last 10 minutes SWAP memory use:

[hipo@server yum.repos.d] $ sar -W  |last -n 10
 

Linux 4.18.0-240.el8.x86_64 (server)       09/28/2021      _x86_64_        (8 CPU)

12:00:00 AM  pswpin/s pswpout/s
12:00:01 AM      0.00      0.00
12:01:01 AM      0.00      0.00
12:02:01 AM      0.00      0.00
12:03:01 AM      0.00      0.00
12:04:01 AM      0.00      0.00
12:05:01 AM      0.00      0.00
12:06:01 AM      0.00      0.00

[root@ccnrlb01 ~]# sar -r | tail -n 10
14:00:01        93008   1788832     95.06         0   1357700    725740      9.02    795168    683484        32
14:10:01        78756   1803084     95.81         0   1358780    725740      9.02    827660    652248        16
14:20:01        92844   1788996     95.07         0   1344332    725740      9.02    813912    651620        28
14:30:01        92408   1789432     95.09         0   1344612    725740      9.02    816392    649544        24
14:40:01        91740   1790100     95.12         0   1344876    725740      9.02    816948    649436        36
14:50:01        91688   1790152     95.13         0   1345144    725740      9.02    817136    649448        36
15:00:02        91544   1790296     95.14         0   1345448    725740      9.02    817472    649448        36
15:10:01        91108   1790732     95.16         0   1345724    725740      9.02    817732    649340        36
15:20:01        90844   1790996     95.17         0   1346000    725740      9.02    818016    649332        28
Average:        93473   1788367     95.03         0   1369583    725074      9.02    800965    671266        29

 

2. Check system load? Are my processes waiting too long to run on the CPU?

[root@server ~ ]# sar -q |head -n 10
Linux 4.18.0-240.el8.x86_64 (server)       09/28/2021      _x86_64_        (8 CPU)

12:00:00 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
12:00:01 AM         0       272      0.00      0.02      0.00         0
12:01:01 AM         1       271      0.00      0.02      0.00         0
12:02:01 AM         0       268      0.00      0.01      0.00         0
12:03:01 AM         0       268      0.00      0.00      0.00         0
12:04:01 AM         1       271      0.00      0.00      0.00         0
12:05:01 AM         1       271      0.00      0.00      0.00         0
12:06:01 AM         1       265      0.00      0.00      0.00         0


3. Show various CPU statistics per CPU use
 

On a multiprocessor, multi core server sometimes for scripting it is useful to fetch processor per use historic data, 
this can be attained with:

 

[hipo@server ~ ] $ mpstat -P ALL
Linux 4.18.0-240.el8.x86_64 (server)       09/28/2021      _x86_64_        (8 CPU)

06:08:38 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:08:38 PM  all    0.17    0.02    0.25    0.00    0.05    0.02    0.00    0.00    0.00   99.49
06:08:38 PM    0    0.22    0.02    0.28    0.00    0.06    0.03    0.00    0.00    0.00   99.39
06:08:38 PM    1    0.28    0.02    0.36    0.00    0.08    0.02    0.00    0.00    0.00   99.23
06:08:38 PM    2    0.27    0.02    0.31    0.00    0.06    0.01    0.00    0.00    0.00   99.33
06:08:38 PM    3    0.15    0.02    0.22    0.00    0.03    0.01    0.00    0.00    0.00   99.57
06:08:38 PM    4    0.13    0.02    0.20    0.01    0.03    0.01    0.00    0.00    0.00   99.60
06:08:38 PM    5    0.14    0.02    0.27    0.00    0.04    0.06    0.01    0.00    0.00   99.47
06:08:38 PM    6    0.10    0.02    0.17    0.00    0.04    0.02    0.00    0.00    0.00   99.65
06:08:38 PM    7    0.09    0.02    0.15    0.00    0.02    0.01    0.00    0.00    0.00   99.70


 

sar-sysstat-cpu-statistics-screenshot

Monitor processes and threads currently being managed by the Linux kernel.

[hipo@server ~ ] $ pidstat

pidstat-various-random-process-statistics

[hipo@server ~ ] $ pidstat -d 2


pidstat-show-processes-with-most-io-activities-linux-screenshot

This report tells us that there is few processes with heave I/O use Filesystem system journalling daemon jbd2, apache, mysqld and supervise, in 3rd column you see their respective PID IDs.

To show threads used inside a process (like if you press SHIFT + H) inside Linux top command:

[hipo@server ~ ] $ pidstat -t -p 10765 1 3

Linux 4.19.0-14-amd64 (server)     28.09.2021     _x86_64_    (10 CPU)

21:41:22      UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
21:41:23      108     10765         –    1,98    0,99    0,00    0,00    2,97     1  mysqld
21:41:23      108         –     10765    0,00    0,00    0,00    0,00    0,00     1  |__mysqld
21:41:23      108         –     10768    0,00    0,00    0,00    0,00    0,00     0  |__mysqld
21:41:23      108         –     10771    0,00    0,00    0,00    0,00    0,00     5  |__mysqld
21:41:23      108         –     10784    0,00    0,00    0,00    0,00    0,00     7  |__mysqld
21:41:23      108         –     10785    0,00    0,00    0,00    0,00    0,00     6  |__mysqld
21:41:23      108         –     10786    0,00    0,00    0,00    0,00    0,00     2  |__mysqld

10765 – is the Process ID whose threads you would like to list

With pidstat, you can further monitor processes for memory leaks with:

[hipo@server ~ ] $ pidstat -r 2

 

4. Report paging statistics for some old period

 

[root@server ~ ]# sar -B -f /var/log/sa/sa27 |head -n 10
Linux 4.18.0-240.el8.x86_64 (server)       09/27/2021      _x86_64_        (8 CPU)

15:42:26     LINUX RESTART      (8 CPU)

15:55:30     LINUX RESTART      (8 CPU)

04:00:01 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
04:01:01 PM      0.00     14.47    629.17      0.00    502.53      0.00      0.00      0.00      0.00
04:02:01 PM      0.00     13.07    553.75      0.00    419.98      0.00      0.00      0.00      0.00
04:03:01 PM      0.00     11.67    548.13      0.00    411.80      0.00      0.00      0.00      0.00

 

5.  Monitor Received RX and Transmitted TX network traffic perl Network interface real time
 

To print out Received and Send traffic per network interface 4 times in a raw

sar-sysstats-network-traffic-statistics-screenshot
 

[hipo@server ~ ] $ sar -n DEV 1 4


To continusly monitor all network interfaces I/O traffic

[hipo@server ~ ] $ sar -n DEV 1


To only monitor a certain network interface lets say loopback interface (127.0.0.1) received / transmitted bytes

[hipo@server yum.repos.d] $  sar -n DEV 1 2|grep -i lo
06:29:53 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
06:29:54 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:           lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00


6. Monitor block devices use
 

To check block devices use 3 times in a raw
 

[hipo@server yum.repos.d] $ sar -d 1 3


sar-sysstats-blockdevice-statistics-screenshot
 

7. Output server monitoring data in CSV database structured format


For preparing a nice graphs with Excel from CSV strucuted file format, you can dump the collected data as so:

 [root@server yum.repos.d]# sadf -d /var/log/sa/sa27 — -n DEV | grep -v lo|head -n 10
server-name-fqdn;-1;2021-09-27 13:42:26 UTC;LINUX-RESTART    (8 CPU)
# hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s;%ifutil
server-name-fqdn;-1;2021-09-27 13:55:30 UTC;LINUX-RESTART    (8 CPU)
# hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s;%ifutil
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth1;19.42;16.12;1.94;1.68;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth0;7.18;9.65;0.55;0.78;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth2;5.65;5.13;0.42;0.39;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth1;18.90;15.55;1.89;1.60;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth0;7.15;9.63;0.55;0.74;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth2;5.67;5.15;0.42;0.39;0.00;0.00;0.00;0.00

To graph the output data you can use Excel / LibreOffice's Excel equivalent Calc or if you need to dump a CSV sar output and generate it on the fly from a script  use gnuplot 


What we've learned?


How to install and enable on cron sysstats on Redhat and CentOS 8 Linux ? 
How to continuously monitor CPU / Disk and Network, block devices, paging use and processes and threads used by the kernel per process ?  
As well as how to export previously collected data to CSV to import to database or for later use inrder to generate graphic presentation of data.
Cheers ! 🙂

 

How to add local user to admin access via /etc/sudoers with sudo su – root / Create a sudo admin group to enable users belonging to group become superuser

Friday, January 15th, 2021

sudo_logo-how-to-add-user-to-sysadmin-group

Did you had to have a local users on a server and you needed to be able to add Admins group for all system administrators, so any local user on the system that belongs to the group to be able to become root with command lets say sudo su – root / su -l root / su – root?
If so below is an example /etc/sudoers file that will allow your users belonging to a group local group sysadmins with some assigned group number

Here is how to create the sysadmins group as a starter

linux:~# groupadd -g 800 sysadmins

Lets create a new local user georgi and append the user to be a member of sysadmins group which will be our local system Administrator (superuser) access user group.

To create a user with a specific desired userid lets check in /etc/passwd and create it:

linux:~# grep :811: /etc/passwd || useradd -u 811 -g 800 -c 'Georgi hip0' -d /home/georgi -m georgi

Next lets create /etc/sudoers (if you need to copy paste content of file check here)and paste below configuration:

linux:~# mcedit /etc/sudoers

## Updating the locate database
# Cmnd_Alias LOCATE = /usr/bin/updatedb

 

## Storage
# Cmnd_Alias STORAGE = /sbin/fdisk, /sbin/sfdisk, /sbin/parted, /sbin/partprobe, /bin/mount, /bin/umount

## Delegating permissions
# Cmnd_Alias DELEGATING = /usr/sbin/visudo, /bin/chown, /bin/chmod, /bin/chgrp

## Processes
# Cmnd_Alias PROCESSES = /bin/nice, /bin/kill, /usr/bin/kill, /usr/bin/killall

## Drivers
# Cmnd_Alias DRIVERS = /sbin/modprobe

Cmnd_Alias PASSWD = /usr/bin/passwd [a-zA-Z][a-zA-Z0-9_-]*, \\
!/usr/bin/passwd root

Cmnd_Alias SU_ROOT = /bin/su root, \\
                     /bin/su – root, \\
                     /bin/su -l root, \\
                     /bin/su -p root


# Defaults specification

#
# Refuse to run if unable to disable echo on the tty.
#
Defaults   !visiblepw

#
# Preserving HOME has security implications since many programs
# use it when searching for configuration files. Note that HOME
# is already set when the the env_reset option is enabled, so
# this option is only effective for configurations where either
# env_reset is disabled or HOME is present in the env_keep list.
#
Defaults    always_set_home
Defaults    match_group_by_gid

Defaults    env_reset
Defaults    env_keep =  "COLORS DISPLAY HOSTNAME HISTSIZE KDEDIR LS_COLORS"
Defaults    env_keep += "MAIL PS1 PS2 QTDIR USERNAME LANG LC_ADDRESS LC_CTYPE"
Defaults    env_keep += "LC_COLLATE LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES"
Defaults    env_keep += "LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE"
Defaults    env_keep += "LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET XAUTHORITY"

#
# Adding HOME to env_keep may enable a user to run unrestricted
# commands via sudo.
#
# Defaults   env_keep += "HOME"
Defaults    secure_path = /sbin:/bin:/usr/sbin:/usr/bin

## Next comes the main part: which users can run what software on
## which machines (the sudoers file can be shared between multiple
## systems).
## Syntax:
##
##      user    MACHINE=COMMANDS
##
## The COMMANDS section may have other options added to it.
##
## Allow root to run any commands anywhere
root    ALL=(ALL)       ALL

## Allows members of the 'sys' group to run networking, software,
## service management apps and more.
# %sys ALL = NETWORKING, SOFTWARE, SERVICES, STORAGE, DELEGATING, PROCESSES, LOCATE, DRIVERS

## Allows people in group wheel to run all commands
%wheel  ALL=(ALL)       ALL

## Same thing without a password
# %wheel        ALL=(ALL)       NOPASSWD: ALL

## Allows members of the users group to mount and unmount the
## cdrom as root
# %users  ALL=/sbin/mount /mnt/cdrom, /sbin/umount /mnt/cdrom
## Allows members of the users group to shutdown this system
# %users  localhost=/sbin/shutdown -h now

%sysadmins            ALL            = SU_ROOT, \\
                                   NOPASSWD: PASSWD

## Read drop-in files from /etc/sudoers.d (the # here does not mean a comment)
#includedir /etc/sudoers.d

zabbix  ALL=(ALL) NOPASSWD:/usr/bin/grep


Save the config and give it a try now to become root with sudo su – root command

linux:~$ id
uid=811(georgi) gid=800(sysadmins) groups=800(sysadmins)
linux:~$ sudo su – root
linux~#

w00t Voila your user is with super rights ! Enjoy 🙂

 

Delete empty files and directories under directory tree in Linux / UNIX / BSD

Wednesday, October 21st, 2020

delete-empty-directories-and-files-freeup-inodes-by-empty-deleting-directoriers-or-files

Sometimes it happens that you end up on your server with a multiple of empty files. The reason for that could be different for example it could be /tmp is overflown with some session store files on a busy website, or due to some programmers Web executed badly written PHP / Python / Perl / Ruby code bug or lets say Content Management System ( CMS ) based website based on WordPress / Joomla / Drupal / Magento / Shopify etc. due to a broken plugin some specific directory could get filled up with plenty of meaningless empty files, that never gets wiped out if you don't care. This could happen if you offer your users to share files online to a public sharing service as WebFTP and some of the local hacked UNIX user accounts decides to make you look like a fool and run an endless loop to create files in your Hard Drive until your small server HDD filesystem of few terabytes gets filled up with useless empty files and due to full inode count on the filesystem your machine running running services gets disfunctional …

Hence on servers with shared users or simply webservers it is always a good idea to keep an eye on filesystem used nodes count by system are and in case if notices a sudden increase of used FS inodes as part of the investigation process on what caused it to check the amount of empty files on the system attached SCSI / SSD / SAS whatever drive.
 

1. Show a list of free inodes on server


Getting inodes count after logged is done with df command

root@linux-server:~# df -i
Filesystem        Inodes   IUsed     IFree IUse% Mounted on
udev             2041464     516   2040948    1% /dev
tmpfs            2046343    1000   2045343    1% /run
/dev/sdb2       14655488 1794109  12861379   13% /
tmpfs            2046343       4   2046339    1% /dev/shm
tmpfs            2046343       8   2046335    1% /run/lock
tmpfs            2046343      17   2046326    1% /sys/fs/cgroup
/dev/sdc6        6111232  6111232   0   100% /var/www
/dev/sda1       30162944 3734710  26428234   13% /mnt/sda1
/dev/sdd1      122093568 8011342 114082226    7% /backups
tmpfs            2046343      13   2046330    1% /run/user/1000

 

2. Show all empty files and directories count

 

### count empty directories ### root@linux-server:~# find /path/ -empty -type d | wc -l

### count empty files only ### root@linux-server:~# find /path/ -empty -type f | wc -l

 

3. List all empty files in directory or root dir

As you can see on the server in above example the amount of inodes of empty inodes is depleted.
The next step is to anylize what is happening in that web directory and if there is a multitude of empty files taking up all our disk space.
 

root@linux-server:~# find /var/www -type f -empty > /root/empty_files_list.txt


As you can see I'm redirecting output to a file as with the case of many empty files, I'll have to wait for ages and console will get filled up with a data I'll be unable to easily analyze

If the problem is another directory in your case, lets say the root dir.

root@linux-server:~#  DIR='/';
root@linux-server:~# find $DIR -type f -empty > /root/empty_files_list.txt

4. Getting empty directories list


Under some case it might be that the server is overflowed with empty directories. This is also a thing some malicious cracker guy could do to your server if he can't root the server with some exploit but wants to bug you and 'show off his script kiddie 3l337 magic tricks' :). This is easily done with a perl / python or bash shell endless loop inside which a random file named millions of empty directories instead of files is created.

To look up for empty directories hence use:

root@linux-server:~# DIR='/home';
root@linux-server:~# find  $DIR . -type d -empty > /root/empty_directories_list.txt

 

5. Delete all empty files only to clean up inodes

Deletion of empty files will automatically free up the inodes occupied, to delete them.

root@linux-server:~# cd /path/containing/multiple/empty-dirs/
root@linux-server:~# find . -type f -empty -exec rm -fr {} \;

 

6. Delete all empty directories only to clean up inocommanddes

root@linux-server:~# find . -type d -empty -exec rm -fr {} \;

 

7. Delete all empty files and directories to clean up inodes

root@linux-server:~# cd /path/containing/multiple/empty-dirs/
root@linux-server:~# find . -empty -delete

 

8. Use find + xargs to delete if files count to delete is too high

root@linux-server:~# find . -empty | xargs rm -r


That's all folks ! Enjoy now your Filesystem to have retrieved back the lost inodes from the jump empty files or directories.

Happy cleaning  🙂

Procedure Instructions to safe upgrade CentOS / RHEL Linux 7 Core to latest release

Thursday, February 13th, 2020

safe-upgrade-CentOS-and_Redhat_Enterprise_Linux_RHEL-7-to-latest-stable-release

Generally upgrading both RHEL and CentOS can be done straight with yum tool just we're pretty aware and mostly anyone could do the update, but it is good idea to do some
steps in advance to make backup of any old basic files that might help us to debug what is wrong in case if the Operating System fails to boot after the routine Machine OS restart
after the upgrade that is usually a good idea to make sure that machine is still bootable after the upgrade.

This procedure can be shortened or maybe extended depending on the needs of the custom case but the general framework should be useful anyways to someone that's why
I decided to post this.

Before you go lets prepare a small status script which we'll use to report status of  sysctl installed and enabled services as well as the netstat connections state and
configured IP addresses and routing on the system.

The script show_running_services_netstat_ips_route.sh to be used during our different upgrade stages:
 

# script status ###
echo "STARTED: $(date '+%Y-%m-%d_%H-%M-%S'):" | tee /root/logs/yumcheckupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
systemctl list-unit-files –type=service | grep enabled
systemctl | grep ".service" | grep "running"
netstat -tulpn
netstat -r
ip a s
/sbin/route -n
echo "ENDED $(date '+%Y-%m-%d_%H-%M-%S'):" | tee /root/logs/yumcheckupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
####

 

– Save the script in any file like /root/status.sh

– Make the /root/logs directoriy.
 

[root@redhat: ~ ]# mkdir /root/logs
[root@redhat: ~ ]# vim /root/status.sh
[root@redhat: ~ ]# chmod +x /root/status.sh

 

1. Get a dump of CentOS installed version release and grub-mkconfig generated os_probe

 

[root@redhat: ~ ]# cat /etc/redhat-release  > /root/logs/redhat-release-vorher-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
[root@redhat: ~ ]# cat /etc/grub.d/30_os-prober > /root/logs/grub2-efi-vorher-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out

 

2. Clear old versionlock marked RPM packages (if there are such)

 

On servers maintained by multitude of system administrators just like the case is inside a Global Corporations and generally in the corporate world , where people do access the systems via LDAP and more than a single person
has superuser privileges. It is a good prevention measure to use yum package management  functionality to RPM based Linux distributions called  versionlock.
versionlock for those who hear it for a first time is locking the versions of the installed RPM packages so if someone by mistake or on purpose decides to do something like :

[root@redhat: ~ ]# yum install packageversion

Having the versionlock set will prevent the updated package to be installed with a different branch package version.

Also it will prevent a playful unknowing person who just wants to upgrade the system without any deep knowledge to be able to
run

[root@redhat: ~ ]# yum upgrade

update and leave the system in unbootable state, that will be only revealed during the next system reboot.

If you haven't used versionlock before and you want to use it you can do it with:

[root@redhat: ~ ]# yum install yum-plugin-versionlock

To add all the packages for compiling C code and all the interdependend packages, you can do something like:

 

[root@redhat: ~ ]# yum versionlock gcc-*

If you want to clear up the versionlock, once it is in use run:

[root@redhat: ~ ]#  yum versionlock clear
[root@redhat: ~ ]#  yum versionlock list

 

3.  Check RPC enabled / disabled

 

This step is not necessery but it is a good idea to check whether it running on the system, because sometimes after upgrade rpcbind gets automatically started after package upgrade and reboot. 
If we find it running we'll need to stop and mask the service.

 

# check if rpc enabled
[root@redhat: ~ ]# systemctl list-unit-files|grep -i rpc
var-lib-nfs-rpc_pipefs.mount                                      static
auth-rpcgss-module.service                                        static
rpc-gssd.service                                                  static
rpc-rquotad.service                                               disabled
rpc-statd-notify.service                                          static
rpc-statd.service                                                 static
rpcbind.service                                                   disabled
rpcgssd.service                                                   static
rpcidmapd.service                                                 static
rpcbind.socket                                                    disabled
rpc_pipefs.target                                                 static
rpcbind.target                                                    static

[root@redhat: ~ ]# systemctl status rpcbind.service
● rpcbind.service – RPC bind service
   Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; disabled; vendor preset: enabled)
   Active: inactive (dead)

 

[root@redhat: ~ ]# systemctl status rpcbind.socket
● rpcbind.socket – RPCbind Server Activation Socket
   Loaded: loaded (/usr/lib/systemd/system/rpcbind.socket; disabled; vendor preset: enabled)
   Active: inactive (dead)
   Listen: /var/run/rpcbind.sock (Stream)
           0.0.0.0:111 (Stream)
           0.0.0.0:111 (Datagram)
           [::]:111 (Stream)
           [::]:111 (Datagram)

 

4. Check any previously existing downloaded / installed RPMs (check yum cache)

 

yum install package-name / yum upgrade keeps downloaded packages via its operations inside its cache directory structures in /var/cache/yum/*.
Hence it is good idea to check what were the previously installed packages and their count.

 

[root@redhat: ~ ]# cd /var/cache/yum/x86_64/;
[root@redhat: ~ ]# find . -iname '*.rpm'|wc -l

 

5. List RPM repositories set on the server

 

 [root@redhat: ~ ]# yum repolist
Loaded plugins: fastestmirror, versionlock
Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast
Determining fastest mirrors
repo id                                                                                 repo name                                                                                                            status
!atos-ac/7/x86_64                                                                       Atos Repository                                                                                                       3,128
!base/7/x86_64                                                                          CentOS-7 – Base                                                                                                      10,019
!cr/7/x86_64                                                                            CentOS-7 – CR                                                                                                         2,686
!epel/x86_64                                                                            Extra Packages for Enterprise Linux 7 – x86_64                                                                          165
!extras/7/x86_64                                                                        CentOS-7 – Extras                                                                                                       435
!updates/7/x86_64                                                                       CentOS-7 – Updates                                                                                                    2,500

 

This step is mandatory to make sure you're upgrading to latest packages from the right repositories for more concretics check what is inside in confs /etc/yum.repos.d/ ,  /etc/yum.conf 
 

6. Clean up any old rpm yum cache packages

 

This step is again mandatory but a good to follow just to have some more clearness on what packages is our upgrade downloading (not to mix up the old upgrades / installs with our newest one).
For documentation purposes all deleted packages list if such is to be kept under /root/logs/yumclean-install*.out file

[root@redhat: ~ ]# yum clean all |tee /root/logs/yumcleanall-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out

 

7. List the upgradeable packages's latest repository provided versions

 

[root@redhat: ~ ]# yum check-update |tee /root/logs/yumcheckupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out

 

Then to be aware how many packages we'll be updating:

 

[root@redhat: ~ ]#  yum check-update | wc -l

 

8. Apply the actual uplisted RPM packages to be upgraded

 

[root@redhat: ~ ]# yum update |tee /root/logs/yumupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out

 

Again output is logged to /root/logs/yumcheckupate-*.out 

 

9. Monitor downloaded packages count real time

 

To make sure yum upgrade is not in some hanging state and just get some general idea in which state of the upgrade is it e.g. Download / Pre-Update / Install  / Upgrade/ Post-Update etc.
in mean time when yum upgrade is running to monitor,  how many packages has the yum upgrade downloaded from remote RPM set repositories:

 

[root@redhat: ~ ]#  watch "ls -al /var/cache/yum/x86_64/7Server/…OS-repository…/packages/|wc -l"

 

10. Run status script to get the status again

 

[root@redhat: ~ ]# sh /root/status.sh |tee /root/logs/status-before-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out

 

11. Add back versionlock for all RPM packs

 

Set all RPM packages installed on the RHEL / CentOS versionlock for all packages.

 

#==if needed
# yum versionlock \*

 

 

12. Get whether old software configuration is not messed up during the Package upgrade (Lookup the logs for .rpmsave and .rpmnew)

 

During the upgrade old RPM configuration is probably changed and yum did automatically save .rpmsave / .rpmnew saves of it thus it is a good idea to grep the prepared logs for any matches of this 2 strings :
 

[root@redhat: ~ ]#   grep -i ".rpm" /root/logs/yumupdate-server-host-2020-01-20_14-30-41.out
[root@redhat: ~ ]#  grep -i ".rpmsave" /root/logs/yumupdate-server-host-2020-01-20_14-30-41.out
[root@redhat: ~ ]#  grep -i ".rpmnew" /root/logs/yumupdate-server-host-2020-01-20_14-30-41.out


If above commands returns output usually it is fine if there is is .rpmnew output but, if you get grep output of .rpmsave it is a good idea to review the files compare with the original files that were .rpmsaved with the 
substituted config file and atune the differences with the changes manually made for some program functionality.

What are the .rpmsave / .rpmnew files ?
This files are coded files that got triggered by the RPM install / upgrade due to prewritten procedures on time of RPM build.

 

If a file was installed as part of a rpm, it is a config file (i.e. marked with the %config tag), you've edited the file afterwards and you now update the rpm then the new config file (from the newer rpm) will replace your old config file (i.e. become the active file).
The latter will be renamed with the .rpmsave suffix.

If a file was installed as part of a rpm, it is a noreplace-config file (i.e. marked with the %config(noreplace) tag), you've edited the file afterwards and you now update the rpm then your old config file will stay in place (i.e. stay active) and the new config file (from the newer rpm) will be copied to disk with the .rpmnew suffix.
See e.g. this table for all the details. 

In both cases you or some program has edited the config file(s) and that's why you see the .rpmsave / .rpmnew files after the upgrade because rpm will upgrade config files silently and without backup files if the local file is untouched.

After a system upgrade it is a good idea to scan your filesystem for these files and make sure that correct config files are active and maybe merge the new contents from the .rpmnew files into the production files. You can remove the .rpmsave and .rpmnew files when you're done.


If you need to get a list of all .rpmnew .rpmsave files on the server do:

[root@redhat: ~ ]#  find / -print | egrep "rpmnew$|rpmsave$

 

13. Reboot the system 

To check whether on next hang up or power outage the system will boot normally after the upgrade, reboot to test it.

 

you can :

 

[root@redhat: ~ ]#  reboot

 

either

[root@redhat: ~ ]#  shutdown -r now


or if on newer Linux with systemd in ues below systemctl reboot.target.

[root@redhat: ~ ]#  systemctl start reboot.target

 

14. Get again the system status with our status script after reboot

[root@redhat: ~ ]#  sh /root/status.sh |tee /root/logs/status-after-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out

 

15. Clean up any versionlocks if earlier set

 

[root@redhat: ~ ]# yum versionlock clear
[root@redhat: ~ ]# yum versionlock list

 

16. Check services and logs for problems

 

After the reboot Check closely all running services on system make sure every process / listening ports and services on the system are running fine, just like before the upgrade.
If the sytem had firewall,  check whether firewall rules are not broken, e.g. some NAT is not missing or anything earlier configured to automatically start via /etc/rc.local or some other
custom scripts were run and have done what was expected. 
Go through all the logs in /var/log that are most essential /var/log/boot.log , /var/log/messages … yum.log etc. that could reveal any issues after the boot. In case if running some application server or mail server check /var/log/mail.log or whenever it is configured to log.
If the system runs apache closely check the logs /var/log/httpd/error.log or php_errors.log for any strange errors that occured due to some issues caused by the newer installed packages.
Usually most of the cases all this should be flawless but a multiple check over your work is a stake for good results.
 

Rsync copy files with root privileges between servers with root superuser account disabled

Tuesday, December 3rd, 2019

 

rsync-copy-files-between-two-servers-with-root-privileges-with-root-superuser-account-disabled

Sometimes on servers that follow high security standards in companies following PCI Security (Payment Card Data Security) standards it is necessery to have a very weird configurations on servers,to be able to do trivial things such as syncing files between servers with root privileges in a weird manners.This is the case for example if due to security policies you have disabled root user logins via ssh server and you still need to synchronize files in directories such as lets say /etc , /usr/local/etc/ /var/ with root:root user and group belongings.

Disabling root user logins in sshd is controlled by a variable in /etc/ssh/sshd_config that on most default Linux OS
installations is switched on, e.g. 

grep -i permitrootlogin /etc/ssh/sshd_config
PermitRootLogin yes


Many corporations use Vulnerability Scanners such as Qualys are always having in their list of remote server scan for SSH Port 22 to turn have the PermitRootLogin stopped with:

 

PermitRootLogin no


In this article, I'll explain a scenario where we have synchronization between 2 or more servers Server A / Server B, whatever number of servers that have already turned off this value, but still need to
synchronize traditionally owned and allowed to write directories only by root superuser, here is 4 easy steps to acheive it.

 

1. Add rsyncuser to Source Server (Server A) and Destination (Server B)


a. Execute on Src Host:

 

groupadd rsyncuser
useradd -g 1000 -c 'Rsync user to sync files as root src_host' -d /home/rsyncuser -m rsyncuser

 

b. Execute on Dst Host:

 

groupadd rsyncuser
useradd -g 1000 -c 'Rsync user to sync files dst_host' -d /home/rsyncuser -m rsyncuser

 

2. Generate RSA SSH Key pair to be used for passwordless authentication


a. On Src Host
 

su – rsyncuser

ssh-keygen -t rsa -b 4096

 

b. Check .ssh/ generated key pairs and make sure the directory content look like.

 

[rsyncuser@src-host .ssh]$ cd ~/.ssh/;  ls -1

id_rsa
id_rsa.pub
known_hosts


 

3. Copy id_rsa.pub to Destination host server under authorized_keys

 

scp ~/.ssh/id_rsa.pub  rsyncuser@dst-host:~/.ssh/authorized_keys

 

Next fix permissions of authorized_keys file for rsyncuser as anyone who have access to that file (that exists as a user account) on the system
could steal the key and use it to run rsync commands and overwrite remotely files, like overwrite /etc/passwd /etc/shadow files with his custom crafted credentials
and hence hack you 🙂
 

Hence, On Destionation Host Server B fix permissions with:
 

su – rsyncuser; chmod 0600 ~/.ssh/authorized_keys
[rsyncuser@dst-host ~]$


An alternative way for the lazy sysadmins is to use the ssh-copy-id command

 

$ ssh-copy-id rsyncuser@192.168.0.180
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed — if you are prompted now it is to install the new keys
root@192.168.0.180's password: 
 

 

For improved security here to restrict rsyncuser to be able to run only specific command such as very specific script instead of being able to run any command it is good to use little known command= option
once creating the authorized_keys

 

4. Test ssh passwordless authentication works correctly


For that Run as a normal ssh from rsyncuser

On Src Host

 

[rsyncuser@src-host ~]$ ssh rsyncuser@dst-host


Perhaps here is time that for those who, think enabling a passwordless authentication is not enough secure and prefer to authorize rsyncuser via a password red from a secured file take a look in my prior article how to login to remote server with password provided from command line as a script argument / Running same commands on many servers 

5. Enable rsync in sudoers to be able to execute as root superuser (copy files as root)

 


For this step you will need to have sudo package installed on the Linux server.

Then, Execute once logged in as root on Destionation Server (Server B)

 

[root@dst-host ~]# grep 'rsyncuser ALL' /etc/sudoers|wc -l || echo ‘rsyncuser ALL=NOPASSWD:/usr/bin/rsync’ >> /etc/sudoers
 

 

Note that using rsync with a ALL=NOPASSWD in /etc/sudoers could pose a high security risk for the system as anyone authorized to run as rsyncuser is able to overwrite and
respectivle nullify important files on Destionation Host Server B and hence easily mess the system, even shell script bugs could produce a mess, thus perhaps a better solution to the problem
to copy files with root privileges with the root account disabled is to rsync as normal user somewhere on Dst_host and use some kind of additional script running on Dst_host via lets say cron job and
will copy gently files on selective basis.

Perhaps, even a better solution would be if instead of granting ALL=NOPASSWD:/usr/bin/rsync in /etc/sudoers is to do ALL=NOPASSWD:/usr/local/bin/some_copy_script.sh
that will get triggered, once the files are copied with a regular rsyncuser acct.

 

6. Test rsync passwordless authentication copy with superuser works


Do some simple copy, lets say copy files on Encrypted tunnel configurations located under some directory in /etc/stunnel on Server A to /etc/stunnel on Server B

The general command to test is like so:
 

rsync -aPz -e 'ssh' '–rsync-path=sudo rsync' /var/log rsyncuser@$dst_host:/root/tmp/


This will copy /var/log files to /root/tmp, you will get a success messages for the copy and the files will be at destination folder if succesful.

 

On Src_Host run:

 

[rsyncuser@src-host ~]$ dst=FQDN-DST-HOST; user=rsyncuser; src_dir=/etc/stunnel; dst_dir=/root/tmp;  rsync -aP -e 'ssh' '–rsync-path=sudo rsync' $src_dir  $rsyncuser@$dst:$dst_dir;

 

7. Copying files with root credentials via script


The simlest file to use to copy a bunch of predefined files  is best to be handled by some shell script, the most simple version of it, could look something like this.
 

#!/bin/bash
# On server1 use something like this
# On server2 dst server
# add in /etc/sudoers
# rsyncuser ALL=NOPASSWD:/usr/bin/rsync

user='rsyncuser';

dst_dir="/root/tmp";
dst_host='$dst_host';
src[1]="/etc/hosts.deny";
src[2]="/etc/sysctl.conf";
src[3]="/etc/samhainrc";
src[4]="/etc/pki/tls/";
src[5]="/usr/local/bin/";

 

for i in $(echo ${src[@]}); do
rsync -aPvz –delete –dry-run -e 'ssh' '–rsync-path=sudo rsync' "$i" $rsyncuser@$dst_host:$dst_dir"$i";
done


In above script as you can see, we define a bunch of files that will be copied in bash array and then run a loop to take each of them and copy to testination dir.
A very sample version of the script rsync_with_superuser-while-root_account_prohibited.sh 
 

Conclusion


Lets do short overview on what we have done here. First Created rsyncuser on SRC Server A and DST Server B, set up the key pair on both copied the keys to make passwordless login possible,
set-up rsync to be able to write as root on Dst_Host / testing all the setup and pinpointing a small script that can be used as a backbone to develop something more complex
to sync backups or keep system configurations identicatial – for example if you have doubts that some user might by mistake change a config etc.
In short it was pointed the security downsides of using rsync NOPASSWD via /etc/sudoers and few ideas given that could be used to work on if you target even higher
PCI standards.

 

Ansible Quick Start Cheatsheet for Linux admins and DevOps engineers

Wednesday, October 24th, 2018

ansible-quick-start-cheetsheet-ansible-logo

Ansible is widely used (Configuration management, deployment, and task execution system) nowadays for mass service depoyments on multiple servers and Clustered environments like, Kubernetes clusters (with multiple pods replicas) virtual swarms running XEN / IPKVM virtualization hosting multiple nodes etc. .

Ansible can be used to configure or deploy GNU / Linux tools and services such as Apache / Squid / Nginx / MySQL / PostgreSQL. etc. It is pretty much like Puppet (server / services lifecycle management) tool , except its less-complecated to start with makes it often a choose as a tool for mass deployment (devops) automation.

Ansible is used for multi-node deployments and remote-task execution on group of servers, the big pro of it it does all its stuff over simple SSH on the remote nodes (servers) and does not require extra services or listening daemons like with Puppet. It combined with Docker containerization is used very much for later deploying later on inside Cloud environments such as Amazon AWS / Google Cloud Platform / SAP HANA / OpenStack etc.

Ansible-Architechture-What-Is-Ansible-Edureka

0. Instaling ansible on Debian / Ubuntu Linux


Ansible is a python script and because of that depends heavily on python so to make it running, you will need to have a working python installed on local and remote servers.

Ansible is as easy to install as running the apt cmd:

 

# apt-get install –yes ansible
 

The following additional packages will be installed:
  ieee-data python-jinja2 python-kerberos python-markupsafe python-netaddr python-paramiko python-selinux python-xmltodict python-yaml
Suggested packages:
  sshpass python-jinja2-doc ipython python-netaddr-docs python-gssapi
Recommended packages:
  python-winrm
The following NEW packages will be installed:
  ansible ieee-data python-jinja2 python-kerberos python-markupsafe python-netaddr python-paramiko python-selinux python-xmltodict python-yaml
0 upgraded, 10 newly installed, 0 to remove and 1 not upgraded.
Need to get 3,413 kB of archives.
After this operation, 22.8 MB of additional disk space will be used.

apt-get install –yes sshpass

 

Installing Ansible on Fedora Linux is done with:

 

# dnf install ansible –yes sshpass

 

On CentOS to install:
 

# yum install ansible –yes sshpass

sshpass needs to be installed only if you plan to use ssh password prompt authentication with ansible.

Ansible is also installable via python-pip tool, if you need to install a specific version of ansible you have to use it instead, the package is available as an installable package on most linux distros.

Ansible has a lot of pros and cons and there are multiple articles already written on people for and against it in favour of Chef or Puppet As I recently started learning Ansible. The most important thing to know about Ansible is though many of the things can be done directly using a simple command line, the tool is planned for remote installing of server services using a specially prepared .yaml format configuration files. The power of Ansible comes of the use of Ansible Playbooks which are yaml scripts that tells ansible how to do its activities step by step on remote server. In this article, I'm giving a quick cheat sheet to start quickly with it.
 

1. Remote commands execution with Ansible
 

First thing to do to start with it is to add the desired hostnames ansible will operate with it can be done either globally (if you have a number of remote nodes) to deploy stuff periodically by using /etc/ansible/hosts or use a custom host script for each and every ansible custom scripts developed.

a. Ansible main config files

A common ansible /etc/ansible/hosts definition looks something like that:

 

# cat /etc/ansible/hosts
[mysqldb]
10.69.2.185
10.69.2.186
[master]
10.69.2.181
[slave]
10.69.2.187
[db-servers]
10.69.2.181
10.69.2.187
[squid]
10.69.2.184

Host to execute on can be also provided via a shell variable $ANSIBLE_HOSTS
b) is remote hosts reachable / execute commands on all remote host

To test whether hour hosts are properly configure from /etc/ansible/hosts you can ping all defined hosts with:

 

ansible all -m ping


ansible-check-hosts-ping-command-screenshot

This makes ansible try to remote to remote hosts (if you have properly configured SSH public key authorization) the command should return success statuses on every host.

 

ansible all -a "ifconfig -a"


If you don't have SSH keys configured you can also authenticate with an argument (assuming) all hosts are configured with same password with:

 

ansible all –ask-pass -a "ip all show" -u hipo –ask-pass


ansible-show-ips-ip-a-command-screenshot-linux

If you have configured group of hosts via hosts file you can also run certain commands on just a certain host group, like so:

 

ansible <host-group> -a <command>

It is a good idea to always check /etc/ansible/ansible.cfg which is the system global (main red ansible config file).

c) List defined host groups
 

ansible localhost -m debug -a 'var=groups.keys()'
ansible localhost -m debug -a 'var=groups'

d) Searching remote server variables

 

# Search remote server variables
ansible localhost -m setup -a 'filter=*ipv4*'

 

 

ansible localhost -m setup -a 'filter=ansible_domain'

 

 

ansible all -m setup -a 'filter=ansible_domain'

 

 

# uninstall package on RPM based distros
ansible centos -s -m yum -a "name=telnet state=absent"
# uninstall package on APT distro
ansible localhost -s -m apt -a "name=telnet state=absent"

 

 

2. Debugging – Listing information about remote hosts (facts) and state of a host

 

# All facts for one host
ansible -m setup
  # Only ansible fact for one host
ansible
-m setup -a 'filter=ansible_eth*'
# Only facter facts but for all hosts
ansible all -m setup -a 'filter=facter_*'


To Save outputted information per-host in separate files in lets say ~/ansible/host_facts

 

ansible all -m setup –tree ~/ansible/host_facts

 

3. Playing with Playbooks deployment scripts

 

a) Syntax Check of a playbook yaml

 

ansible-playbook –syntax-check


b) Run General Infos about a playbook such as get what a playbook would do on remote hosts (tasks to run) and list-hosts defined for a playbook (like above pinging).

 

ansible-playbook –list-hosts
ansible-playbook
–list-tasks


To get the idea about what an yaml playbook looks like, here is example from official ansible docs, that deploys on remote defined hosts a simple Apache webserver.
 


– hosts: webservers
  vars:
    http_port: 80
    max_clients: 200
  remote_user: root
  tasks:
  – name: ensure apache is at the latest version
    yum:
      name: httpd
      state: latest
  – name: write the apache config file
    template:
      src: /srv/httpd.j2
      dest: /etc/httpd.conf
    notify:
    – restart apache
  – name: ensure apache is running
    service:
      name: httpd
      state: started
  handlers:
    – name: restart apache
      service:
        name: httpd
        state: restarted

To give it a quick try save the file as webserver.yml and give it a run via ansible-playbook command
 

ansible-playbook -s playbooks/webserver.yml

 

The -s option instructs ansible to run play on remote server with super user (root) privileges.

The power of ansible is its modules, which are constantly growing over time a complete set of Ansible supported modules is in its official documenation.

Ansible-running-playbook-Commands-Task-script-Successful-output-1024x536

There is a lot of things to say about playbooks, just to give the brief they have there own language like a  templates, tasks, handlers, a playbook could have one or multiple plays inside (for instance instructions for deployment of one or more services).

The downsides of playbooks are they're so hard to write from scratch and edit, because yaml syntaxing is much more stricter than a normal oldschool sysadmin configuration file.
I've stucked with problems with modifying and writting .yaml files and I should say the community in #ansible in irc.freenode.net was very helpful to help me debug the obscure errors.

yamllint (The YAML Linter tool) comes handy at times, when facing yaml syntax errors, to use it install via apt:
 

# apt-get install –yes yamllint


a) Running ansible in "dry mode" just show what ansible might do but not change anything
 

ansible-playbook playbooks/PLAYBOOK_NAME.yml –check


b) Running playbook with different users and separate SSH keys

 

ansible-playbook playbooks/your_playbook.yml –user ansible-user
 
ansible -m ping hosts –private-key=~/.ssh/keys/custom_id_rsa -u centos

 

c) Running ansible playbook only for certain hostnames part of a bigger host group

 

ansible-playbook playbooks/PLAYBOOK_NAME.yml –limit "host1,host2,host3"


d) Run Ansible on remote hosts in parallel

To run in raw of 10 hosts in parallel
 

# Run 10 hosts parallel
ansible-playbook <File.yaml> -f 10            


e) Passing variables to .yaml scripts using commandline

Ansible has ability to pre-define variables from .yml playbooks. This variables later can be passed from shell cli, here is an example:

# Example of variable substitution pass from command line the var in varsubsts.yaml if present is defined / replaced ansible-playbook playbooks/varsubst.yaml –extra-vars "myhosts=localhost gather=yes pkg=telnet"

 

4. Ansible Galaxy (A Docker Hub) like large repository with playbook (script) files

 

Ansible Galaxy has about 10000 active users which are contributing ansible automation playbooks in fields such as Development / Networking / Cloud / Monitoring / Database / Web / Security etc.

To install from ansible galaxy use ansible-galaxy

# install from galaxy the geerlingguy mysql playbook
ansible-galaxy install geerlingguy.mysql


The available packages you can use as a template for your purpose are not so much as with Puppet as Ansible is younger and not corporate supported like Puppet, anyhow they are a lot and does cover most basic sysadmin needs for mass deployments, besides there are plenty of other unofficial yaml ansible scripts in various github repos.