Posts Tagged ‘system’

Linux extending life time for a damaged hard drive server tricks on a live server. Force fcsk on next reboot.Read-only file system error solutions

Friday, February 17th, 2023

linux-extending-life-time-for-a-damaged-hard-drive-server-tricks-can-not-read-superblock-linux-force-fsck-on-next-reboot

In our daily work as system administrators we have some very old Legacy systems running Clustered High Availability proxies using CRM (Cluster Resource Manager) and some legacy systems still using Heartbeat to manage the cluster instead of the newer and modern Corosync variant.

The HA cluster is only 2 nodes Linux machine and running the obscure already long time unsupported version of Redhat 5.11 (Ootpa) who was officially became stable distant year 1998 (yeath the years were good) and whose EOL (End of Life) has been reached long time ago and the OS is no longer supported, however for about 14 years the machines has been running perfectly fine until one of the Cluster nodes managed by ocf::heartbeat:IPAddr2 , that is  /etc/ha.d/resource.d/IPAddr2 shell script. Yeah for the newbies Heartbeat Application Cluster in Linux does work like that it uses a number of extendable pair of shell scripts written for different kind of Network / Web / Mail / SQL or whatever services HA management.

The first node configured however, started failing due to some errors like:
 

EXT3-fs error (device dm-1): ext3_journal_start_sb: Detected aborted journal
sd 0:2:0:0: rejecting I/O to offline device
Aborting journal on device sda1.
sd 0:2:0:0: rejecting I/O to offline device
printk: 159 messages suppressed.
Buffer I/O error on device sda1, logical block 526
lost page write due to I/O error on sda1
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
ext3_abort called.
EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
megaraid_sas: FW was restarted successfully, initiating next stage…
megaraid_sas: HBA recovery state machine, state 2 starting…
megasas: Waiting for FW to come to ready state
megasas: FW in FAULT state!!
FW state [-268435456] hasn't changed in 180 secs
megaraid_sas: out: controller is not in ready state
megasas: waiting_for_outstanding: after issue OCR. 
megasas: waiting_for_outstanding: before issue OCR. FW state = f0000000
megaraid_sas: pending commands remain even after reset handling. megasas[0]: Dumping Frame Phys Address of all pending cmds in FW
megasas[0]: Total OS Pending cmds : 0 megasas[0]: 64 bit SGLs were sent to FW
megasas[0]: Pending OS cmds in FW :

The result out of that was a frequently the filesystem of the machine got re-mounted as Read Only and of course that is
quite bad if you have a running processess of haproxy that should be able to be living their and take up some Web traffic
for high availability and you run all the traffic only on the 2nd pair of machine.

This of course was a clear sign for a failing disks or some hit bad blocks regions or as the messages indicates, some
problem with system hardware or Raid SAS Array.

The physical raid on the system, just like rest of the hardware is very old stuff as well.

[root@haproxy_lb_node1 ~]# lspci |grep -i RAI
01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

The produced errors not only made the machine to auto-mount its root / filesystem in Read-Only mode but besides has most
likely made the machine to automatically reboot every few days or few times every day in a raw.

The second Load Balancer node2 did operated perfectly, and we thought that we might just keep the broken machine in that half running
and inconsistent state for few weeks until we have built the new machines with Pre-Installed new haproxy cluster with modern
RedHat Linux 8.6 distribution, but since we have to follow SLAs (Service Line Agreements) with Customers and the end services behind the
High Availability (HA) Haproxy cluster were at danger … 

We as sysadmins had the task to make our best to try to stabilize the unstable node with disk errors for the system to servive
and be able to normally serve traffic (if node2 that is in a separate Data center fails due to a hardware or electricity issues etc.)
.

Here is few steps we took, that has hopefully improved the situation.

1. Make backups of most important files of high importance

Always before doing anything with a broken system, prepare backup of the most important files, if that is a cluster that should be a backup of the cluster configurations (if you don't have already ones) backup of /etc/hosts / backup of any important services configs /etc/haproxy/haproxy.cfg /etc/postfix/postfix.cfg (like it was my case), preferrably backup of whole /etc/  any important files from /root/ or /home/users* directories backup of at leasts latest logs from /var/log etc.
 

2. Clear up all unnecessery services scripts from the server

Any additional Softwares / Services and integrity checking tools (daemons) / scripts and cron jobs, were immediately stopped and wheter unused removed.

E.g. we had moved through /etc/cron* to check what's there,

# ls -ld /etc/cron.*
drwx—— 2 root root 4096 Feb  7 18:13 /etc/cron.d
drwxr-xr-x 2 root root 4096 Feb  7 17:59 /etc/cron.daily
-rw-r–r– 1 root root    0 Jul 20  2010 /etc/cron.deny
drwxr-xr-x 2 root root 4096 Jan  9  2013 /etc/cron.hourly
drwxr-xr-x 2 root root 4096 Jan  9  2013 /etc/cron.monthly
drwxr-xr-x 2 root root 4096 Aug 26  2015 /etc/cron.weekly

 

And like well professional butchers removed everything unnecessery that could trigger any extra unnecessery disk read / writes to HDD.

E.g. just create

# mkdir -p /root/etc_old/{/etc/cron.d,\
/etc/cron.daily,/etc/cron.hourly,/etc/cron.monthly\
,/etc/cron.weekly}

 

And moved all unnecessery cron job scripts like:

1. nmon (old school network / memory / hard disk console tool for monitoring and tuning server parameters)
2. clamscan / freshclam crons
3. mlocate (the script that is taking care for periodic run of updatedb command to keep the locate command to easily search
for files inside the DB to put less read operations on disk in case if you need to find file (e.g. prevent yourself to everytime
run cmd like: find / . -iname '*whatever_you_look_for*'
4. cups cron jobs
5. logwatch cron
6. rkhunter stuff
7. logrotate (yes we stopped even logrotation trigger job as we found the server was crashing sometimes at the same time when
the lograte job to rotate logs inside /var/log/* was running perhaps leading to a hit of the I/O read error (bad blocks).


Also inspected the Administrator user root cron job for any unwated scripts and stopped two report bash scripts that were part of the PCI tightened Security procedures.
Therein found script responsible to periodically report the list of installed packages and if they have not changed, as well a script to periodically report via email the list of
/etc/{passwd,/etc/shadow} created users, used to historically keep an eye on the list of users and easily see if someone
has created new users on the machine. Those were enabled via /var/spool/cron/root cron jobs, in other cases, on other machines if it happens for you
it is a good idea to check out all the existing user cron jobs and stop anything that might be putting Read / Write extra heat pressure on machine attached the Hard drives.

# ls -al /var/spool/cron/
total 20
drwx——  2 root root 4096 Nov 13  2015 .
drwxr-xr-x 12 root root 4096 May 11  2011 ..
-rw——-  1 root root  133 Nov 13  2015 root


3. Clear up old log files and any files unnecessery

Under /var/log and /home /var/tmp /var/spool/tmp immediately try to clear up the old log files.
From my past experience this has many times made the FS file inodes that are storing on a unbroken part (good blocks) of the hard drive and
ready to be reused by newly written rsyslog / syslogd services spitted files.

!!! Note that during the removal of some files you might hit a files stored on a bad blocks that might lead to a unexpected system reboot.

But that's okay, don't worry most likely after a hard reset by a technician in the Datacenter the machine will boot again and you can enjoy
removing remaining still files to send them to the heaven for old files.

 

4. Trigger an automatic system file system check with fsck on next boot

The standard way to force a Linux to aumatically recheck its Root filesystem is to simply create the /forcefsck to root partition or any other secondary disk partition you would like to check.

# touch /forcefsck

# reboot


However at some occasions you might be unable to do it because, the / (root fs) has been remounted in ReadOnly mode, yackes …

Luckily old Linux distibutions like this RHEL 5.1, has a way to force a filesystem check after reboot fsck and identify any
unknown bad-blocks and hopefully succceed in isolating them, so you don't hit into the same auto-reboots if the hard drive or Software / Hardware RAID
is not in terrible state
, you can use an option built in in /sbin/shutdown command the '-F'

   -F     Force fsck on reboot.


Hence to make the machine reboot and trigger immediately fsck:

# shutdown -rF now


Just In case you wonder why to reboot before check the Filesystem. Well simply because you need to have them unmounted before you check.

In that specific case this produced so far a good result and the machine booted just fine and we crossed the fingers and prayed that the machine would work flawlessly in the coming few weeks, before we finalize the configuration of the substitute machines, where this old infrastructure will be migrated to a new built cluster with new Haproxy and Corosync / Pacemaker Cluster on a brand new RHEL.

NB! On newer machines this won't work however as shutdown command has been stripped off this option because no SystemV (SystemInit) or Upstart and not on SystemD newer services architecture.
 

5. Hints on checking the hard drives with fsck

If you happen to be able to have physical access to the remote Hardare machine via a TTY[1-9] Console, that's even better and is the standard way to do it but with this specific case we had no easy way to get access to the Physical server console.

It is even better to go there and via either via connected Monitor (Display) or KVM Switch (Those who hear KVM switch first time this is a great device in server rooms to connect multiple monitors to same Monitor Display), it is better to use a some of the multitude of options to choose from for USB Distro Linux recovery OS versions or a CDROM / DVD on older machines like this with the Redhat's recovery mode rolled on.
After mounting the partition simply check each of the disks
e.g. :

# fsck -y /dev/sdb
# fsck -y /dev/sdc

Or if you want to not waste time and look for each hard drive but directly check all the ones that are attached and known by Linux distro via /etc/fstab definition run:

# fsck -AR

If necessery and you have a mixture of filesystems for example EXT3 , EXT4 , REISERFS you can tell it to omit some filesystem, for example ext3, like that:

# fsck -AR -t noext3 -y


To skip fsck on mounted partitions with fsck:

# fsck -M /dev/sdb


One remark to make here on fsck is usually fsck to complete its job on various filesystem it uses other external component binaries usually stored in /sbin/fsck*

ls -al /sbin/fsck*
-rwxr-xr-x 1 root root  55576 20 яну 2022 /sbin/fsck*
-rwxr-xr-x 1 root root  43272 20 яну 2022 /sbin/fsck.cramfs*
lrwxrwxrwx 1 root root      9  4 юли 2020 /sbin/fsck.exfat -> exfatfsck*
lrwxrwxrwx 1 root root      6  7 юни 2021 /sbin/fsck.ext2 -> e2fsck*
lrwxrwxrwx 1 root root      6  7 юни 2021 /sbin/fsck.ext3 -> e2fsck*
lrwxrwxrwx 1 root root      6  7 юни 2021 /sbin/fsck.ext4 -> e2fsck*
-rwxr-xr-x 1 root root  84208  8 фев 2021 /sbin/fsck.fat*
-rwxr-xr-x 2 root root 393040 30 ное 2009 /sbin/fsck.jfs*
-rwxr-xr-x 1 root root 125184 20 яну 2022 /sbin/fsck.minix*
lrwxrwxrwx 1 root root      8  8 фев 2021 /sbin/fsck.msdos -> fsck.fat*
-rwxr-xr-x 1 root root    333 16 дек 2021 /sbin/fsck.nfs*
lrwxrwxrwx 1 root root      8  8 фев 2021 /sbin/fsck.vfat -> fsck.fat*


6. Using tune2fs to  adjust tunable filesystem parameters on ext2/ext3/ext4 filesystems (few examples)

a) To check whether really the filesystem was checked on boot time or check a random filesystem on the server for its last check up date with fsck:

#  tune2fs -l /dev/sda1 | grep checked
Last checked:             Wed Apr 17 11:04:44 2019

On some distributions like old Debian and Ubuntu, it is even possible to enable fsck to log its operations during check on reboot via changing the verbosity from NO to YES:

# sed -i "s/#VERBOSE=no/VERBOSE=yes/" /etc/default/rcS


If you're having the issues on old Debian Linuxes  and not on RHEL  it is possible to;

b) Enable all fsck repairs automatic on boot

by running via:
 

# sed -i "s/FSCKFIX=no/FSCKFIX=yes/" /etc/default/rcS


c) Forcing fcsk check on for server attached Hard Drive Partitions with tune2fs

# tune2fs -c 1 /dev/sdXY

Note that:
tune2fs can force a fsck on each reboot for EXT4, EXT3 and EXT2 filesystems only.

tune2fs can trigger a forced fsck on every reboot using the -c (max-mount-counts) option.
This option sets the number of mounts after which the filesystem will be checked, so setting it to 1 will run fsck each time the computer boots.
Setting it to -1 or 0 resets this (the number of times the filesystem is mounted will be disregarded by e2fsck and the kernel).


 For example you could:

d) Set fsck to run a filesystem check every 30 boots, by using -c 30 
 

# tune2fs -c 30 /dev/sdXY


e) Checking whether a Hard Drive has been really checked on the boot

 

#  tune2fs -l /dev/sda1 | grep checked
Last checked:             Wed Apr 17 11:04:44 2019


e) Check when was the last time the file system /dev/sdX was checked:
 

# tune2fs -l /dev/sdX | grep Last\ c
Last checked:             Thu Jan 12 20:28:34 2017


f) Check how many times our /dev/sdX filesystem was mounted

# tune2fs -l /dev/sdX | grep Mount
Mount count:              157

g) Check how many mounts are allowed to pass before filesystem check is forced
 

# tune2fs -l /dev/sdX | grep Max
Maximum mount count:      -1


7. Repairing disk / partitions via GRUB fsck.mode and fsck.repair kernel module options

It is also possible to force a fsck.repair on boot via GRUB, but that usually is not an option someone would like as the machine might fail too boot if it hards to repair hardly, however in difficult situations with failing disks temporary enabling it is good idea.

This can be done by including for grub initial config

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash fsck.mode=force fsck.repair=yes"

fsck.mode=force – will force a fsck each time a system boot and keeping that value enabled for a long time inside GRUB is stupid for servers as

sometimes booting could be severely prolonged because of the checks especially with servers with many or slow old hard drives.

fsck.repair=yes – will make the fsck try to repair if it finds bad blocks when checking (be absolutely sure you know, what you're doing if passing this options)

The options can be also set via editing the GRUB boot screen, if you have physical access to the server and don't want to reload the grub loader and possibly make the machine unbootable on next boot.
 

8. Few more details on how /etc/fstab disk fsck check parameters values for Systemd Linux machines works

The "proper" way on systemd (if we can talk about proper way on Linux) to runs fsck for each filesystem that has a fsck is to pass number greater than 0 set in
/etc/fstab (last column in /etc/fstab), so make sure you edit your /etc/fstab if that's not the case.

The root partition should be set to 1 (first to be checked), while other partitions you want to be checked should be set to 2.

Example /etc/fstab:
 

# /etc/fstab: static file system information.

/dev/sda1  /      ext4  errors=remount-ro  0  1
/dev/sda5  /home  ext4  defaults           0  2

The values you can put here as a second number meaning is as follows:
0 – disabled, that is do not check filesystem
1 – partition with this PASS value has a higher priority and is checked first. This value is usually set to the root / partition
2 – partitions with this PASS value will be checked last

a) Check the produced log out of fsck

Unfortunately on the older versions of Linux distros with SystemV fsck log output might be not generated except on the physical console so if you have a kind of duplicator device physical tty on the display port of the server, you might capture some bad block reports or fixed errors messages, but if you don't you might just cross the fingers and hope that anything found FS irregularities was recovered.

On systemd Linux machines the fsck log should be produced either in /run/initramfs/fsck.log or some other location depending on the Linux distro and you should be able to see something from fsck inside /var/log/* logs:

# grep -rli fsck /var/log/*


Close it up

Having a system with failing disk is a really one of the worst sysadmin nightmares to get. The good news is that most of the cases we're prepared with some working backup or some work around stuff like the few steps explained to mitigate the amount of Read / Writes to hard disks on the failing machine HDDs. If the failing disk is a primary Linux filesystem all becomes even worse as every next reboot, you have no guarantee, whether the kernel / initrd or some of the other system components required to run the Core Linux system won't break up the normal boot. Thus one side changes on the hard drives is a risky business on ther other side, if you're in a situation where you have a mirror system or the failing system is just a Linux server installed without a Cluster pair, then this is not a big deal as you can guarantee at least one of the nodes still up, unning and serving. Still doing too much of operations with HDD is always a danger so the steps described, though in most cases leading to improvement on how the system behaves, the system should be considered totally unreliable and closely monitored not only by some monitoring stuff like Zabbix / Prometheus whatever but regularly check the systems state via normal SSH logins. It is important if you have some important datas or logs on the system that are not synchronized to a system node to copy them before doing any of the described operations. After all minimal is backuped, proceed to clear up everything that might be cleared up and still the machine to continue providing most of its functionalities, trigger fsck automatic HDD check on next reboot, reboot, check what is going on and monitor the machine from there on.

Hopefully the few described steps, has helped some sysadmin. There is plenty of things which I've described that might go wrong, even following the described steps, might not help if the machines Storage Drives / SAS / SSD has too much of a damage. But as said in most cases following this few steps would improve the machine state.

Wish you the best of luck!

 

CentOS disable SELinux permanently or one time on grub Linux kernel boot time

Saturday, July 24th, 2021

selinux-artistic-penguin-logo-protect-data

 

1. Office 365 cloud connected computer and a VirtualBox hosted machine with SELINUX preventing it to boot

At my job we're in process of migrating my old Lenovo Laptop Thinkpad model L560 Laptop to Dell Latitude 5510 wiith Intel Core i5 vPro CPU and 256 Gb SSD Hard Drive.  The new laptops are generally fiine though they're not even a middle class computers and generally I prefer thinkpads. The sad thing out of this is our employee decided to migrate to Office 365 (again perhaps another stupid managerial decision out of an excel sheet wtih a balance to save some money … 

As you can imagine Office 365 is not really PCI Standards compliant and not secure since our data is stored in Microsoft cloud and theoretically Microsoft has and owns our data or could wipe loose the data if they want to. The other obvious security downside I've noticed with the new "Secure PCI complaint laptop" is the initial PC login screen which by default offers fingerprint authentication or the even worse  and even less secure face recognition, but obviosly everyhing becomes more and more crazy and people become less and less cautious for security if that would save money or centralize the data … In the name of security we completely waste security that is very dubious paradox I don't really understand but anyways, enough rant back to the main topic of this article is how to and I had to disable selinux?

As part of Migration I've used Microsoft OneDrive to copy old files from the Thinkpad to the Latitude (as on the old machine USB's are forbidden and I cannot copy over wiith a siimple USB driive, as well as II have no right to open the laptop and copy data from the Hard driive, and even if we had this right without breaking up some crazy company policy that will not be possible as the hard drive data on old laptop is encrypted, the funny thing is that the new laptop data comes encrypted and there is no something out of the box as BitDefender or McAffee incryption (once again, obviously our data security is a victim of some managarial decisions) …
 

2. OneDrive copy problems unable to sync some of the copied files to Onedrive


Anyways as the Old Laptop's security is quite paranoid and we're like Fort Nox, only port 80 and port 443 connections to the internet can be initiated to get around this harsh restrictions it was as simple to use a Virtualbox Virtual Machine. So on old laptop I've installed a CentOS 7 image which I used so far and I used one drive to copy my vbox .vdi image on the new laptop work machine.

The first head buml was the .vdi which seems to be prohibited to be copied to OneDrive, so to work around this I had to rename the origianl CentOS7.vdi to CentOS7.vdi-renamed on old laptop and once the data is in one drive copy my Vitualbox VM/ directory from one drive to the Dell Latitude machine and rename the .vdi-named towards .vdi as well as import it from the latest installed VirtualBox on the new machine.
 

3. Disable SELINUX from initial grub boot


So far so good but as usual happens with miigrations I've struck towards another blocker, the VM image once initiated to boot from Virtualbox badly crashed with some complains that selinux cannot be loaded.
Realizing CentOS 7 has the more or less meaningless Selinux, I've took the opportunity to disable SeLinux.

To do so I've booted the Kernel with Selinux disabled from GRUB2 loader prompt before Kernel and OS Userland boots.

 

 

I thought I need to type the information on the source in grub. What I did is very simple, on the Linux GRUB boot screen I've pressed

'e' keyboard letter

that brought the grub boot loader into edit mode.

Then I had to add selinux=0 on the edited selected kernel version, as shown in below screenshot:

selinux-disable-from-grub.png

Next to boot the Linux VM without Selinux enabled one time,  just had to press together

Ctrl+X then add selinux=0 on the edited selected kernel version, that should be added as shown in the screenshot somewhere after the line of
root=/dev/mapper/….

4. Permanently Disable Selinux on CentOS 7


Once I managed to boot Virtual Machine properly with Oracle Virtualbox, to permanently disabled selinux I had to:

 

Once booted into CentOS, to check the status of selinux run:

 

# sestatus
Copy
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      31

 

5. Disable SELinux one time with setenforce command


You can temporarily change the SELinux mode from targeted to permissive with the following command:

 

# setenforce 0


Next o permanently disable SELinux on your CentOS 7 next time the system boots, Open the /etc/selinux/config file and set the SELINUX mod parameter to disabled.

On CentOS 7 you can  edit the kernel parameters in /etc/default/grub (in the GRUB_CMDLINE_LINUX= key) and set selinux=0 so on next VM / PC boot we boot with a SELINUX disabled for example add   RUB_CMDLINE_LINUX=selinux=0 to the file then you have to regenerate your Grub config like this:
 

# grub2-mkconfig -o /etc/grub2.cfg
# grub2-mkconfig -o /etc/grub2-efi.cfg


Further on to disable SeLinux on OS level edit /etc/selinux
 

Default /etc/selinux/config with selinux enabled should look like so:

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#       enforcing – SELinux security policy is enforced.
#       permissive – SELinux prints warnings instead of enforcing.
#       disabled – No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of these two values:
#       targeted – Targeted processes are protected,
#       mls – Multi Level Security protection.
SELINUXTYPE=targeted


To disable SeLinux modify the file to be something like:

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#       enforcing – SELinux security policy is enforced.
#       permissive – SELinux prints warnings instead of enforcing.
#       disabled – No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#       targeted – Targeted processes are protected,
#       mls – Multi Level Security protection.
SELINUXTYPE=targeted

6. Check SELINUX status is disabled

# sestatus

SELinux status:                 disabled

So in this article shottly was explained shortly the fake security adopted by using Microsoft Cloud environment Offiice 365, my faced OneDrive copy issues (which prevented even my old laptop Virtual Machine to boot properly and the handy trick to rename the file that is unwilling to get copied from old PC towards m$ OneDrive as well as the grub trick to disable Selinux permanently from grub2.

Configure rsyslog buffering on Linux to avoid message lost to Central Logging server

Wednesday, January 13th, 2021

rsyslog-Centralized-Logging-System-using-Rsyslog_logo

1. Rsyslog Buffering

One of the best practice about logs management is to send syslog to a central server. However, a logging system should be capable of avoiding message loss in situations where the server is not reachable. To do so, unsent data needs to be buffered at the client when central server is not available. You might have recently noticed that many servers forwarding logs messages to a central server do not have buffering functionalities activated. Thus I strongly advise you to have look to this documentation to know how to check your configuration: http://www.rsyslog.com/doc/rsyslog_reliable_forwarding.html

Rsyslog buffering with TCP/UDP configured

In rsyslog, every action runs on its own queue and each queue can be set to buffer data if the action is not ready. Of course, you must be able to detect that "the action is not ready", which means the remote server is offline. This can be detected with plain TCP syslog and RELP, but not with UDP. So you need to use either of the two. In this howto, we use plain TCP syslog.

– Version requirement

Please note that we are using rsyslog-specific features. The are required on the client, but not on the server. So the client system must run rsyslog (at least version 3.12.0), while on the server another syslogd may be running, as long as it supports plain tcp syslog.

How To Setup rsyslog buffering on Linux

First, you need to create a working directory for rsyslog. This is where it stores its queue files (should need arise). You may use any location on your local system. Next, you need to do is instruct rsyslog to use a disk queue and then configure your action. There is nothing else to do. With the following simple config file, you forward anything you receive to a remote server and have buffering applied automatically when it goes down. This must be done on the client machine.

# Example:
# $ModLoad imuxsock             # local message reception
# $WorkDirectory /rsyslog/work  # default location for work (spool) files
# $ActionQueueType LinkedList   # use asynchronous processing
# $ActionQueueFileName srvrfwd  # set file name, also enables disk mode
# $ActionResumeRetryCount -1    # infinite retries on insert failure
# $ActionQueueSaveOnShutdown on # save in-memory data if rsyslog shuts down
# *.*       @@server:port

Backup entire Live Linux Operating System bit by bit with dd, partimage, partclone clonezilla

Thursday, January 7th, 2021


dd-create-server-hard-drive-identical-mirror-data-copy-backups

This is an old stuff that we UNIX / Linux sysadmins use frequently when we need to migrate operating system from a certain older machine server to another newer one.
However I decided to blog it as it an interesting to know to a new grown junior sysadmins.

To Create a bit to bit data backup with dd command,
the following command is used to create a backup with dd, which takes the entire data content (including partition table etc.) with it:

dd if = / dev / [hard disk 1] of = / dev / [hard disk 2] bs = 512 conv = noerror, sync


For explanation:

 

"if" stands for the hard drive to be read from.
"of" stands for the hard drive to be written to.
Important! if and of must not be interchanged under any circumstances! In the worst case, the data on the disk to be read will otherwise be irrevocably overwritten!

"bs = 512" defines the block size. The value can be increased (which in turn increases the speed of the backup), but you should be sure that the file system to be backed up does not contain any errors. If you were to use block size 64k, for example, the speed of the backup is increased considerably – but if read errors occur within this block, the entire data block that dd has written contains unusable data. Therefore, when choosing the block size, you should always weigh data integrity and time against each other.
"noerror" tells dd to continue the backup in case of errors. Without this option, dd would stop the backup by default.
"sync" commands dd to replace the unreadable blocks with zeros in the event of errors in order to keep the data offset synchronous.
When performing a backup (as with other things that a longer period can take advantage of, it is always recommended (if you SSH is logged in and no direct access to a real Shell), the process either for CTRL + followed from bg to the background (can later be brought back to the foreground with fg ) or to use virtual session managers such as screen or byobu before executing the command.This prevents the process from dying if the SSH session is unintentionally terminated and you have to start over.

Of course there are plenty of other ways to make a mirror backup  cloneof a hard disk to lets say migrate to a new data center  using easier to use tools with (ncurses) Text menu interfaces to avoid bothering a complex typing on the console.
One such tool is Partclon:

Partclone-screenshot,_partclone-linux-create-mirror-disk-backups

PartClone cloning in action

Another text menu interface data cloning Linux tool commonly used by sysadms is partimage

Partimage-linux-screenshot

Most sysadmins however prefer to use Clonezilla when something more cozy is required to do a bit to bit data copy.
Tthere is even a Live Linux CD distribution for that.

Clonezilla can mirror most types of filesystems and partiontions and could be used not only for UNIX / Linux / BSD filesystems Live OS data (backups) (EXT3 / EXT4 / XFS / ZFS etc)  migrations, but also for old NT4 Windows server partitions. One useful application of Clonezilla i can think of is if you want to configure or restore a whole office of Windows computers running on the same clean version of Windows and same hardware configurations PCs, after a Virus or trojan has striked it. By using it you can clone from a central well configured Windows release with the surrounding applications to all machines for up to an hour with Clonezilla and you can even do it over a network.

Remove old unused kernels and cleanup orphaned packages on CentOS / RHEL/ Fedora and Debian Linux

Friday, October 23rd, 2020

remove-old-unused-kernel-on-centos-redhat-rhel-fedora-linux-howto-delete-orphaned-packages

If you administer CentOS 7 / CentOS  8 bunch of servers it is very likely after one of the scheduled Patch days every 6 months or so, you end up with a multiple Linux OS kernels installed on the system.
In normal situation on a freshly installed CentOS machine only one rpm package is installed on the system with the kernel release shipped with CentOS / RHEL / Fedora distro:
The reason to remove the old unused kernels is very simple, you don't want to have a messy installation and after some of the updates to boot up in a revert back old kernel or if you're pedantic to simply save few megas of space.
Some people choose to have more than one kernel just to make sure, if the new installed one doesn't boot, after a restart from ILO / IDRAC remote console interface you can select to boot the proper kernel. I agree having the old kernel before the system *kernel* upgrade as backup recovery is a good thing but this is a good thing to the point the system gets booted after reboot (you know we sysadmins usually after each major system package upgrade), we like to reboot the system warmly praying and hoping it will boot up next time 🙂
 

1. Remove CentOS last XX kernels from the OS

Of course removal of old kernels could be managed by a simple

yum remove kernel


yum-kernel-remove-centos-linux

One more than one kernel is present you can hence leave only lets say the last 2 installed kernel on the CentOS host (some people prefer to have only one) but just for the sake of having a backup kernel I like more to have last two kernels installed present, to do so run package-cleanup which is contained in yum-utils rpm package CentOS – this is CentOS / Redhat ( RHEL) specific command.
 

[root@centos ~ ]:# package-cleanup –oldkernels –count=2

package-cleanup-centos-linux-screenshot-1

–count=number argument – tells how many from the  latest version kernels to get removed.

Note if you don't have the package-cleanup command install yum-utils package:

[root@centos ~ :]#  yum install -y yum-utils

cleanup-old-kernels-linux-leave-only-set-of-2-kernels-active-on-centos-rhel-fedora


2. RemoveOld kernels from Fedora Linux – leave only the latest 3 installed

This is done with dnf by setting the –-latest-limit arg to negative value to how many last kernels want to keep

[root@fedora ~ ]:# dnf remove $(dnf repoquery –installonly –latest-limit=-3 -q)

 

3. Set how many kernels you want to be present on system all the time after package upgrades

It is possible to tell CentOS / RHEL / Fedora's on how many kernels show be kept installed on the system, the default configured on Operating system install time is to keep the last 5 installed kernel on the OS. This is controlled from installonly_limit=5 value that is usually as of year 2020 RPM based distributions found under /etc/yum.conf (on CentOS / RHEL) and in /etc/dnf/dnf.conf (in Fedora) configuration file and sets the desired number of kernels present on system after issuing commands yum upgrade / dnf upgrade –refresh etc.
The minimum number to give to  installonly_limit is 2.
 

4. Remove orphan rpm packages from server

The next thing to do is to check the installed orphan packages to see if we can safely remove them; by orphaned packages we mean all packages which no longer serve a purpose of package dependencies.
Orphan packages are packages who left over from some old dependencies that are no longer needed on the system but just take up space and impose a possible security risk as some of them might end up with time with a public well known and hacked CVE vulnearbility.

Let me try to explain this concept with a quick example: package A is depended on package B, thus, in order to install package A the package B must also be installed. Once the package A is removed the package B might still be installed, hence the package B is now orphaned package.
Here’s how we can safely see the orphan packages we do have on our system:

[root@centos ~ :]#  package-cleanup –quiet –leaves –exclude-bin

And here’s how we can delete them:

[root@centos ~ :]# package-cleanup –quiet –leaves –exclude-bin | xargs yum remove -y


The above commands should be launched multiple times, because the packages deleted with the first batch could create additional orphan packages, and so on: be sure to perform these tasks until no orphan packages appear anymore after the first package-cleanup command.

 

5. Delete Old Kernels and keep only last three ones on Debian / Ubuntu Linux

To do the same on a debian based distribution there is a command is provided by a deb package byobu, if you want to clean up old kernels on Debians :

$ sudo purge-old-kernels –keep 3


That's all folks enjoy ! 🙂

 

How to check if shared library is loaded in AIX OS – Fix missing libreadline.so.7

Thursday, February 20th, 2020

ibm-aix-logo1

I've had to find out whether an externally Linux library is installed  on AIX system and whether something is not using it.
The returned errors was like so:

 

# gpg –export -a

Could not load program gpg:
Dependent module /opt/custom/lib/libreadline.a(libreadline.so.7) could not be loaded.
Member libreadline.so.7 is not found in archive


After a bit of investigation, I found that gpg was failing cause it linked to older version of libreadline.so.6, the workaround was to just substitute the newer version of libreadline.so.7 over the original installed one.

Thus I had a plan to first find out whether this libreadline.a is loaded and recognized by AIX UNIX first and second find out whether some of the running processes is not using that library.
I've come across this interesting IBM official documenation that describes pretty good insights on how to determine whether a shared library  is currently loaded on the system. which mentions the genkld command that is doing
exactly what I needed.

In short:
genkld – creates a list that is printed to the console that shows all loaded shared libraries

genkld-screenshot-aix-unix

Next I used lsof (list open files) command to check whether there is in real time opened libraries by any of the running programs on the system.

After not finding anything and was sure the library is neither loaded as a system library in AIX nor it is used by any of the currently running AIX processes, I was sure I could proceed to safely overwrite libreadline.a (libreadline.so.6) with libreadline.a with (libreadline.so.7).

The result of that is again a normally running gpg as ldd command shows the binary is again normally linked to its dependend system libraries.
 

aix# ldd /usr/bin/gpg
/usr/bin/gpg needs:
         /usr/lib/threads/libc.a(shr.o)
         /usr/lib/libpthreads.a(shr_comm.o)
         /usr/lib/libpthreads.a(shr_xpg5.o)
         /opt/freeware/lib/libintl.a(libintl.so.1)
         /opt/freeware/lib/libreadline.a(libreadline.so.7)
         /opt/freeware/lib/libiconv.a(libiconv.so.2)
         /opt/freeware/lib/libz.a(libz.so.1)
         /opt/freeware/lib/libbz2.a(libbz2.so.1)
         /unix
         /usr/lib/libcrypt.a(shr.o)
         /opt/freeware/lib/libiconv.a(shr4.o)
         /usr/lib/libcurses.a(shr42.o)

 

 

# gpg –version
gpg (GnuPG) 1.4.22
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

 

Home: ~/.gnupg
Supported algorithms:
Pubkey: RSA, RSA-E, RSA-S, ELG-E, DSA
Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,
        CAMELLIA128, CAMELLIA192, CAMELLIA256
Hash: MD5, SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
Compression: Uncompressed, ZIP, ZLIB, BZIP2

 

 

Rsync copy files with root privileges between servers with root superuser account disabled

Tuesday, December 3rd, 2019

 

rsync-copy-files-between-two-servers-with-root-privileges-with-root-superuser-account-disabled

Sometimes on servers that follow high security standards in companies following PCI Security (Payment Card Data Security) standards it is necessery to have a very weird configurations on servers,to be able to do trivial things such as syncing files between servers with root privileges in a weird manners.This is the case for example if due to security policies you have disabled root user logins via ssh server and you still need to synchronize files in directories such as lets say /etc , /usr/local/etc/ /var/ with root:root user and group belongings.

Disabling root user logins in sshd is controlled by a variable in /etc/ssh/sshd_config that on most default Linux OS
installations is switched on, e.g. 

grep -i permitrootlogin /etc/ssh/sshd_config
PermitRootLogin yes


Many corporations use Vulnerability Scanners such as Qualys are always having in their list of remote server scan for SSH Port 22 to turn have the PermitRootLogin stopped with:

 

PermitRootLogin no


In this article, I'll explain a scenario where we have synchronization between 2 or more servers Server A / Server B, whatever number of servers that have already turned off this value, but still need to
synchronize traditionally owned and allowed to write directories only by root superuser, here is 4 easy steps to acheive it.

 

1. Add rsyncuser to Source Server (Server A) and Destination (Server B)


a. Execute on Src Host:

 

groupadd rsyncuser
useradd -g 1000 -c 'Rsync user to sync files as root src_host' -d /home/rsyncuser -m rsyncuser

 

b. Execute on Dst Host:

 

groupadd rsyncuser
useradd -g 1000 -c 'Rsync user to sync files dst_host' -d /home/rsyncuser -m rsyncuser

 

2. Generate RSA SSH Key pair to be used for passwordless authentication


a. On Src Host
 

su – rsyncuser

ssh-keygen -t rsa -b 4096

 

b. Check .ssh/ generated key pairs and make sure the directory content look like.

 

[rsyncuser@src-host .ssh]$ cd ~/.ssh/;  ls -1

id_rsa
id_rsa.pub
known_hosts


 

3. Copy id_rsa.pub to Destination host server under authorized_keys

 

scp ~/.ssh/id_rsa.pub  rsyncuser@dst-host:~/.ssh/authorized_keys

 

Next fix permissions of authorized_keys file for rsyncuser as anyone who have access to that file (that exists as a user account) on the system
could steal the key and use it to run rsync commands and overwrite remotely files, like overwrite /etc/passwd /etc/shadow files with his custom crafted credentials
and hence hack you 🙂
 

Hence, On Destionation Host Server B fix permissions with:
 

su – rsyncuser; chmod 0600 ~/.ssh/authorized_keys
[rsyncuser@dst-host ~]$


An alternative way for the lazy sysadmins is to use the ssh-copy-id command

 

$ ssh-copy-id rsyncuser@192.168.0.180
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed — if you are prompted now it is to install the new keys
root@192.168.0.180's password: 
 

 

For improved security here to restrict rsyncuser to be able to run only specific command such as very specific script instead of being able to run any command it is good to use little known command= option
once creating the authorized_keys

 

4. Test ssh passwordless authentication works correctly


For that Run as a normal ssh from rsyncuser

On Src Host

 

[rsyncuser@src-host ~]$ ssh rsyncuser@dst-host


Perhaps here is time that for those who, think enabling a passwordless authentication is not enough secure and prefer to authorize rsyncuser via a password red from a secured file take a look in my prior article how to login to remote server with password provided from command line as a script argument / Running same commands on many servers 

5. Enable rsync in sudoers to be able to execute as root superuser (copy files as root)

 


For this step you will need to have sudo package installed on the Linux server.

Then, Execute once logged in as root on Destionation Server (Server B)

 

[root@dst-host ~]# grep 'rsyncuser ALL' /etc/sudoers|wc -l || echo ‘rsyncuser ALL=NOPASSWD:/usr/bin/rsync’ >> /etc/sudoers
 

 

Note that using rsync with a ALL=NOPASSWD in /etc/sudoers could pose a high security risk for the system as anyone authorized to run as rsyncuser is able to overwrite and
respectivle nullify important files on Destionation Host Server B and hence easily mess the system, even shell script bugs could produce a mess, thus perhaps a better solution to the problem
to copy files with root privileges with the root account disabled is to rsync as normal user somewhere on Dst_host and use some kind of additional script running on Dst_host via lets say cron job and
will copy gently files on selective basis.

Perhaps, even a better solution would be if instead of granting ALL=NOPASSWD:/usr/bin/rsync in /etc/sudoers is to do ALL=NOPASSWD:/usr/local/bin/some_copy_script.sh
that will get triggered, once the files are copied with a regular rsyncuser acct.

 

6. Test rsync passwordless authentication copy with superuser works


Do some simple copy, lets say copy files on Encrypted tunnel configurations located under some directory in /etc/stunnel on Server A to /etc/stunnel on Server B

The general command to test is like so:
 

rsync -aPz -e 'ssh' '–rsync-path=sudo rsync' /var/log rsyncuser@$dst_host:/root/tmp/


This will copy /var/log files to /root/tmp, you will get a success messages for the copy and the files will be at destination folder if succesful.

 

On Src_Host run:

 

[rsyncuser@src-host ~]$ dst=FQDN-DST-HOST; user=rsyncuser; src_dir=/etc/stunnel; dst_dir=/root/tmp;  rsync -aP -e 'ssh' '–rsync-path=sudo rsync' $src_dir  $rsyncuser@$dst:$dst_dir;

 

7. Copying files with root credentials via script


The simlest file to use to copy a bunch of predefined files  is best to be handled by some shell script, the most simple version of it, could look something like this.
 

#!/bin/bash
# On server1 use something like this
# On server2 dst server
# add in /etc/sudoers
# rsyncuser ALL=NOPASSWD:/usr/bin/rsync

user='rsyncuser';

dst_dir="/root/tmp";
dst_host='$dst_host';
src[1]="/etc/hosts.deny";
src[2]="/etc/sysctl.conf";
src[3]="/etc/samhainrc";
src[4]="/etc/pki/tls/";
src[5]="/usr/local/bin/";

 

for i in $(echo ${src[@]}); do
rsync -aPvz –delete –dry-run -e 'ssh' '–rsync-path=sudo rsync' "$i" $rsyncuser@$dst_host:$dst_dir"$i";
done


In above script as you can see, we define a bunch of files that will be copied in bash array and then run a loop to take each of them and copy to testination dir.
A very sample version of the script rsync_with_superuser-while-root_account_prohibited.sh 
 

Conclusion


Lets do short overview on what we have done here. First Created rsyncuser on SRC Server A and DST Server B, set up the key pair on both copied the keys to make passwordless login possible,
set-up rsync to be able to write as root on Dst_Host / testing all the setup and pinpointing a small script that can be used as a backbone to develop something more complex
to sync backups or keep system configurations identicatial – for example if you have doubts that some user might by mistake change a config etc.
In short it was pointed the security downsides of using rsync NOPASSWD via /etc/sudoers and few ideas given that could be used to work on if you target even higher
PCI standards.

 

How to start / Stop and Analyze system services and improve Linux system boot time performance

Friday, July 5th, 2019

systemd-components-systemd-utilities-targets-cores-libraries
This post is going to be a very short one and to walk through shortly to System V basic start / stop remove service old way and the new ways introduced over the last 10 years or so with the introduction of systemd on mass base across Linux distributions.
Finally I'll give you few hints on how to check (analyze) the boot time performance on a modern GNU / Linux system that is using systemd enabled services.
 

1. System V and the old days few classic used ways to stop / start / restart services (runlevels and common wrapper scripts)

 

The old fashioned days when Linux was using SystemV / e.g. no SystemD used way was to just go through all the running services with following the run script logic inside the runlevel the system was booting, e.g. to check runlevel and then potimize each and every run script via the respective location of the bash service init scripts:

 

root@noah:/home/hipo# /sbin/runlevel 
N 5

 

Or on some RPM based distros like Fedora / RHEL / SUSE Enterprise Linux to use chkconfig command, e.g. list services:

~]# chkconfig –list

etworkManager  0:off   1:off   2:on    3:on    4:on    5:on    6:off
abrtd           0:off   1:off   2:off   3:on    4:off   5:on    6:off
acpid           0:off   1:off   2:on    3:on    4:on    5:on    6:off
anamon          0:off   1:off   2:off   3:off   4:off   5:off   6:off
atd             0:off   1:off   2:off   3:on    4:on    5:on    6:off
auditd          0:off   1:off   2:on    3:on    4:on    5:on    6:off
avahi-daemon    0:off   1:off   2:off   3:on    4:on    5:on    6:off

And to start stop the service into (default runlevel) or respective runlevel:

 

~]#  chkconfig httpd on

~]# chkconfig –list httpd
httpd            0:off   1:off   2:on    3:on    4:on    5:on    6:off

 

 

~]# chkconfig service_name on –level runlevels

 


Debian / Ubuntu and other .deb based distributions with System V (which executes scripts without single order but one by one) are not having natively chkconfig but instead are famous for update-rc.d init script wrapper, here is few basic use  of it:

update-rc.d <service> defaults
update-rc.d <service> start 20 3 4 5
update-rc.d -f <service>  remove

Here defaults means default set boot runtime for system and numbers are just whether service is started or stopped for respective runlevels. To check what is your default one simply run /sbin/runlevel

Other useful tool to stop / start services and analyze what service is running and which not in real time (but without modifying boot time set for a service) – more universal nowadays is to use the service command.

root@noah:/home/hipo# service –status-all
 [ + ]  acpid
 [ – ]  alsa-utils
 [ – ]  anacron
 [ + ]  apache-htcacheclean
 [ – ]  apache2
 [ + ]  atd
 [ + ]  aumix

root@noah:/home/hipo# service cron restart/usr/sbin/service command is just a simple wrapper bash shell script that takes care about start / stop etc. operations of scripts found under /etc/init.d

For those who don't want to tamper with too much typing and manual configuration there is an all distribution system V compatible ncurses interface text itnerface sysv-rc-conf which could make your life easier on configuring services on non-systemd (old) Linux-es.

To install on Debian distros:

debian:~# apt-get install sysv-rc-conf

debian:~# sysv-rc-conf


SysV RC Conf desktop on GNU Linux using sysv-rc-conf systemV and systemd
 

2. SystemD basic use Start / stop check service and a little bit of information
for the novice

As most Linux kernel based distributions except some like Slackware and few others see the full list of Linux distributions without systemd (and aha yes slackw. users loves rc.local so much – we all do 🙂  migrated and are nowadays using actively SystemD, to start / stop analyze running system runnig services / processes

systemctl – Control the systemd system and service manager

To check whether a service is enabled

systemctl is-active application.service

To check whether a unit is in a failed state

systemctl is-failed application.service

To get a status of running application via systemctl messaging

# systemctl status sshd
● ssh.service – OpenBSD Secure Shell server Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2019-07-06 20:01:02 EEST; 2h 3min ago Main PID: 1335 (sshd) Tasks: 1 (limit: 4915) CGroup: /system.slice/ssh.service └─1335 /usr/sbin/sshd -D юли 06 20:01:00 noah systemd[1]: Starting OpenBSD Secure Shell server… юли 06 20:01:02 noah sshd[1335]: Server listening on 0.0.0.0 port 22. юли 06 20:01:02 noah sshd[1335]: Server listening on :: port 22. юли 06 20:01:02 noah systemd[1]: Started OpenBSD Secure Shell server.

To enable / disable application with systemctl systemctl enable application.service

systemctl disable application.service

To stop / start given application systemcl stop sshd

systemctl stop tor

To reload running application

systemctl reload sshd

Some applications does not have the right functionality in systemd script to reload configuration without fully restarting the app if this is the case use systemctl reload-or-restart application.service

systemctl list-unit-files

Then to view the content of a single service unit file:

:~# systemctl cat apache2.service
# /lib/systemd/system/apache2.service
[Unit]
Description=The Apache HTTP Server
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=forking
Environment=APACHE_STARTED_BY_SYSTEMD=true
ExecStart=/usr/sbin/apachectl start
ExecStop=/usr/sbin/apachectl stop
ExecReload=/usr/sbin/apachectl graceful
PrivateTmp=true
Restart=on-abort

[Install]
WantedBy=multi-user.target


converting-traditional-init-scripts-to-systemd-graphical-diagram

systemd's advancement over normal SystemV services it is able to track and show dependencies
of a single run service for proper operation on other services

:~# systemctl list-dependencies sshd.service

 


● ├─system.slice
● └─sysinit.target
●   ├─dev-hugepages.mount
●   ├─dev-mqueue.mount
●   ├─keyboard-setup.service
●   ├─kmod-static-nodes.service
●   ├─proc-sys-fs-binfmt_misc.automount
●   ├─sys-fs-fuse-connections.mount
●   ├─sys-kernel-config.mount
●   ├─sys-kernel-debug.mount
●   ├─systemd-ask-password-console.path
●   ├─systemd-binfmt.service
….

.

 

You can also mask / unmask service e.g. make it temporary unavailable via systemd with

sudo systemctl mask nginx.service

it will then appear as masked if you do list-unit-files

If you want to change something on a systemd unit file this is done with

systemctl edit –full nginx.service

In case if some modificatgion was done to systemd service files e.g. lets say to
/etc/systemd/system/apache2.service or even you've made a Linux system Upgrade recently
that added extra systemd service config files it will be necessery to reload all files
present in /etc/systemd/system/* with:

systemctl daemon-reload


Systemd has a target states which are pretty similar to the runlevel concept (e.g. runlevel 5 means graphical etc.), for example to check the default target for a system:

One very helpful feature is to restart systemd but it seems this is not well documented as of now and though this might work after some system package upgrade roll-outs it is always better to reboot the system, but you can give it a try if restart can't be done due to application criticallity.

To restart systemd and its spawned subprocesses do:
 

systemctl daemon-reexec

 

root@noah:/home/hipo# systemctl get-default
graphical.target


 to check all targets possible targets

root@noah:/home/hipo# systemctl list-unit-files –type=target
UNIT FILE                 STATE   
basic.target              static  
bluetooth.target          static  
busnames.target           static  
cryptsetup-pre.target     static  
cryptsetup.target         static  
ctrl-alt-del.target       disabled
default.target            static  
emergency.target          static  
exit.target               disabled
final.target              static  
getty.target              static  
graphical.target          static  

you can put the system in Single user mode if you like without running the good old well known command:

/sbin/init 1 

command with

systemctl rescue

You can even shutdown / poweroff / reboot system via systemctl (though I never did that and I don't recommend) 🙂
To do so use:

systemctl halt
systemctl poweroff
systemctl reboot


For the lazy ones that don't want to type all the time like crazy to configure and manage simple systemctl set services take a look at chkservice – an ncurses text based menu systemctl management interface

As chkservice is relatively new it is still not present in stable Stretch Debian repositories but it is in current testing Debian unstable Buster / Sid – Testing / Unstable distribution and has installable package for Ubuntu / Arch Linux and Fedora

chkservice-Linux-systemctl-ncurses-text-menu-service-management-interface-start-chkservice
Picture Source Tecmint.com

chkservice linux help screen


3. Analyzing and fix performance boot slowness issues due to a service taking long to boot


The first very useful thing is to know how long exactly all daemons / services got booted
on your GNU / Linux OS.

linux-server:~# systemd-analyze 
Startup finished in 4.135s (kernel) + 3min 47.863s (userspace) = 3min 51.998s

As you can see it reports both the kernel boot time and userspace (surrounding services
that had to boot for the system to be considered fully booted).


Once you have the system properly booted you have a console or / ssh access

root@pcfreak:/home/hipo# systemd-analyze blame
    2min 14.172s tor@default.service
    1min 40.455s docker.service
     1min 3.649s fail2ban.service
         58.806s nmbd.service
         53.992s rc-local.service
         51.458s systemd-tmpfiles-setup.service
         50.495s mariadb.service
         46.348s snort.service
         34.910s ModemManager.service
         33.748s squid.service
         32.226s ejabberd.service
         28.207s certbot.service
         28.104s networking.service
         23.639s munin-node.service
         20.917s smbd.service
         20.261s tinyproxy.service
         19.981s accounts-daemon.service
         18.501s loadcpufreq.service
         16.756s stunnel4.service
         15.575s oidentd.service
         15.376s dev-sda1.device
         15.368s courier-authdaemon.service
         15.301s sysstat.service
         15.154s gpm.service
         13.276s systemd-logind.service
         13.251s rsyslog.service
         13.240s lpd.service
         13.237s pppd-dns.service
         12.904s NetworkManager-wait-online.service
         12.540s lm-sensors.service
         12.525s watchdog.service
         12.515s inetd.service


As you can see you get a list of services time took to boot in secs and you can
further debug each of it to find out why it boots so slow (netwok / DNS / configuration isssue whatever).

On a servers it is useful to look up for some processes slowing it down like gdm.service etc.

 

Close up words rant on SystemD vs SysemV

init-and-systemd-comparison-commands-linux-booting-1

A lot could be ranted on what is better systemd or systemV. I personally hated systemd since day since I saw it being introduced first in Fedora / CentOS linuxes and a bit later in my beloved desktop used Debian Linux.
I still remember the bugs and headaches with systemd's intruduction as it is with all new the early adoption of technology makes a lot of pain in the ass.
Eventually systemd has become a standard and with my employment as a contractor through Itelligence GmBH for SAP AG I now am forced to work with systemd daily on SLES 12 based Linuces and I was forced to get used to it. 
But still there is my personal preference to SystemV even though the critics of slow boot etc.but for managing a multitude of Linux preinstalled servers like Virtual Machines and trying to standardize a Data Center with Tens of Thousands of Linuxes running on different Hypervisors VMWare / OpenXen + physical hosts etc. systemd brings a bit of more standardization that makes it a winner.

Classical System Administration is dying – you either say hello to DevOps and SRE or move to programming or other business if you can

Wednesday, August 29th, 2018

sysadmin-hell-being-a-sysadmin-is-easy-its-like-riding-a-bike

1. Back in the normal computer old Sys Admin days before the new Age of Computing (the Cloud HELL)

I've been in the system (server) administration business for more than 15 years. We started as kids dreaming about managing big Data Centers having ultimate control over servers data and services and in a sense the beginning of the 2000s looked like the system adminsitration will be among the most promising and profitable professions for the coming 30 years or so.

The amount of servers installed were booming, the Domain Registrantrant Ballon (Dot-Com Bubble) and the appearing need for everyone to have and run a website with the connected hardware and software (OS) needs made the sysadmin of the time like a precious asset for a company and business …

Many companies (small and mid-sized) still did not have a separate role for sys admin, but hired some crazy IT enthusiast that was doing a lot of the sysadm job for them.

It was wild years of freedom for the common IT specialist with a server software install / update / maintenance background.

The complexity level to install configure or tune for performance a (UNIX) like server be it GNU / Linux or FreeBSD or farm of servers was also high and there was little documentation than today and a lot of custom tweaks (scripts) to develop to make things working and system administration job was way more custom than today.
In other words the sys admin was a digital artist just like the UI / Web designer or the common programmer (who was way more advanced and hack, thought oriented) than todays "coders" most of which knows no damn thing but are a great Human Robots serving the functionos of ("Google Search for some ind of Programming language code" then "Copy" and "Paste" into a buggy module / script / application function) and then of course as a result you have a large clumsy (softwares) programs which eat a lot of Server resources (often crash – that's especially true for Java based applications) in the background and get respawned (which does severely load the servers CPUs / Memory) but as the end user is not aware of that it is considered a job finely done.

computers-kills-people-silence-means-security


2. The IT Computing and SysAdmin / Programmer Jobs offered today

In other words nowadays computing is becoming a mess, just like a system is complicating it becomes more prone to failures, the same happens with modern informatics. The chaos of programming languages code and concepts (especially), the abstracts makes a programming code harder to debug than in the past (of course that depends on the programmer too), but as most programmers are totally lame and doesn't understand even basic Hardware / Electronics concepts but are more of a Code Monkeys (yes I can say today's programmers are not really a programmers but a CODE MONKEYS !).

The result for the avarage sys admin is that the developed software are less and less custom but written in a way (to just run it on a server) and usually the sys admin ends up with less and less options for modification or debug problems of the software. As the tendency of installable services / programs (I am talking about the proprietary ones) are becoming more and more monolithic of nature.
As a consequence that starts making the classical system administration as most of todays softwares can be installed even by a highly trained monkeys (no real sysadm needed) and even if you work as a sysadmin it is very likely you are not involved in interesting job but doing more and more routine and burecracy work (which is hell at least for me – as one of my primary motivators to start a career in the IT field and specifically in the field of System Administration that back in the day the system administrator used to be a more important person for a company as a whole company infrastructure depended on the work of that single Super Man that made possible the Internet Accessibility for office users, made possible Linux / Windows servers to operate fine with a bunch of websites and some crazy softwares and platforms, and even took a periodic maintenance of an Office Workers PCs, not to mention the responsibilities to do the frequent data backups, do a support functions (talk heavily on the phone with customers with issues etc.) and help programmers set-up their crazy testing environments (developed project code) on a testing servers etc.

It was the golden age of system administration … and perhaps a golden age for the ones involved in the field of Computing .. really …

3. What if you end up to be a Jobless System Administrator today? What does current sys admin Job Market Place look like?


Have you listened to Venom (black metal band) song – Welcome to Hell?  … its like that ..

Yes, that's the worst nightmare for most of us sys admins , becoming jobless due to company bankruptcy, dismissal or just a desire for a rest for some time from the over active job to talk over the phone with uneasy and angry customers.
Al this put you you in a very harsh situation, because the Classical System Administartor jobs from the past such as building a Strong Company Firewall with IPTABLES or BSD PF is nowdays done by some pre-purchased router such as:

McAffee, Palo Alto, JuniperSRX 2020, Next Generation (firewall as a service such as Cato Networks), Kaspersky, Fortinet, (if you're lucky pfSense), Comodo Internet Security, Zone Alarm (the possible list of sh*t goes on and on …)

In other words businesses nowadays, prefer to buy a ready solution and most of this solutions even though being configurable, often have a weird interfaces and force the user to use a ready set of firewall rules (policies) rather than building ones from scratch … and most of the softwares can be configured by a normal non sysadmin anyways so mostly or soon the sysadm is not needed.

devops-diagram-explained-512px-Devops-toolchain.svg

If in the past you have build things from source or deployed / configured things server by server and each of your servers as a consequence had its kind of own spirit, because of the many custom things placed on it, the current situation with sysadmin job are mass deployments of pre-bundled packages (DevOpsDevelopment Operations – another crazy business non-sense buzzword that describes server scripting automation development) as a DevOps (SysAdmin) which is some kind of Hybdir between a programmer / scripter / db developer / and scripter you have to be eloquent or at least have some basic knowledge in mass deploy tools such as Docker, Ansible, Chef, Puppet, TeamCity, Bamboo, Fabric, Etc.
and to add even more hell to the hell, in most System Administration jobs you perhaps won't manage your own company data even but you will have to deal with third party vendors such as AWS Amazon or store the company important data in some external Cloud Storage service (except if you don't have the option to choose for a custom Own Cloud solution)

But often this is not enough you have to be more or less aware or have some experience with some SRE (Site Reliability Engineering)

But wait, that's not enough you need to be also a good Team Player communicate to a good number of often lame burecrats / lame progammers / a manager over your head that usually does not know shit about technology / a project manager / some Database guys that oten have a very questionable knowledge in Database programming maintenance .. etc. …  and the worst (in my humbe opinion) is that you have to spend 2, 3 as a mimumum daily in a non-sense meetings over proprietary non-free software program such as Skype For Business or Web Room meeting online such as WebEx with people that have little to know idea about technology or are presenting professionals but have a very questionable amount of knowledge in their field …

To summarize modern SysAdmin jobs, just like all other jobs are slavery but with the difference that in most common daily jobs most people have more freedom and are less dependent for their daily work, than you end up as a New Age of Computing Sys Admin.

system-administrator-stress-October-Poll-Sysadmin-Results-stress

Oh yeah and lets not forget the high amounts of STRESS you get daily as a sysadmin that for some is almost 24/7 especially for people who manage a large networks or server infrastructures. Suppose you migrate a Web services, database service, mail server, DNS record etc. and you make a minor mistake so the users can't access the service, guess who will be fired first ?! YOU !!! Even if you don't get hired, you'll be murmored and send for some kind of meaningless training just because you did a mistake (which is very normal, as every human daily days tons of mistakes) …

Another thing is if you're truely dedicated to system administration profession and you spend hours reading and learning new technologies (which in the field of system administration is inevitable) or just doing work from home as a freelancer to get some extra bucks and you don't have to actively sport (Running, Biking, Fitness, Mountaun Riding, whatver …), your Spinal problems and Herniated Discs (Neck or Waist) is to soon knock your door
and stay with you until your death bed.

 

But that's not all of the hurdles, many of the System Administrator like jobs of today require you to have an overview knowledge on Virtualization technologies such as VMWare ESX, VServer … and have a good idea about VPS management and even some employeers require a knowledge in Astrerisk IP PBX (Open Source Communiation Software) or other IP Telelphony software strangie …

Dear sysadmin collegues, my opinion is this kind of requirements are a little bit higher and almost impossible to match (or there are none to any living flesh) that attains all this knowledge or they will ever be.

… But even if you get employeed (and you tricked the HR interviee that you own the SuperMan + Batman + Robocop + You name your favourite movie superhero superpowers and went through the other interview (hell) circles) … finally you get hired and you end up often part of projects that are already seriously messed up from the start or developed in a way that even if succeed in a short term, guarantees a long term failure.

Oh the hirement process is also a lot of enjoyment for the burecracy freak, you have to fill in a number of documents, describing tons of information, provide tons of documents, certifications, talk a number of times on the phones with inadequate Human Resource representative (usually highly brainwashed ppl) "specialist" that knows shit about technology … Then you have to go to a few more selections, interviews further with a technical guy, fill in tests online (maybe not always) and finally talk to a company manager.

All above screening and selection I'm desribing of-course is featuring large corporations (which are among the little) that offers some decent sallaries like 1500 – 1800 EUR (for Eastern Europe) or 3000 – 3500 for rest of Western Europe (if you're a lucky American citizen you might earn up to 10 000 – 11 000 $).
The advantages of the large corporation besides the so-so sallary is the sense of security (that you want be jobless just next year or two from your day 1 in the company).

You can always become a sysadmin in a start-up company but finding such is also nowadays a real pain in the ass and even if you have a 12000+ unique a day visits site such as mine and you offer your sys admin skills for really cheap , you still will have troubles in finding clients / employeer for whom you can practice your skills and make a living as a SysAdmin.

That's pretty weird for me especially with the fact that everyone is tubing that more and more IT specialists are required ..

Anyways assuming you have the "luck" to get hired in a large corporation such as IBM you will have to do a very tedious job, such as either Backup with (IBM Data Protect), Veritas Backup, Barracuda Backup, HP Data Protector or similar software, only do build or deploy new servers, web services, databases or whatever else. E.g. your type of work is likely to be monotonоus and boring and will offer you not much than learning a little bit more about the technology you're already acquainted to ..

Moreover, because in modern IT, human freedom is not really respected … you either comply to the company brainwashing strategies a bulk shit procedures or you get fired, you either become a small wheel in the failing machine (here i mean most large companies you might end up hired nowdays reached its peak state are into a decline) and a logical result is living in constant fear that they might fire you end you might end up jobless or you stand up for what you're in the company and be careless about political correctnes and you quickly get inconvenient, politically incorrect (oh yes I forgot to mention this other craziness if you happen to be employeed you have to be politically correct) and do periodically a stupid exhausting Trainings (I prefer to call them a brainwashing session as most of the trainings are not teaching you anything but just wash your head to comply to shit). But if that Hell is not enough in the large corporation in order to look "normal" you have to partcipate in the Non-Sense Teambuildings, with team mates you have little to know affection (with the very same people you spend 5 days a week, now you have spend 1 /2 more day. every month or so …

long-term-ago-people-who-sacrifice-their-time-sleep-family-food-laughter-were-called-saints-now-they-are-called-it-professionals

So welcome to modern HELL OF system administration, or better to say welcome to the Cult of the large corporation businesses.

4. What are your options if you end up as a poor old school sys admin on the job market?

If you have a long history as a sys admin and computers become too boring for you like my case, you can always think about migrating to a Management position in the field of IT (this in most cases means doing nothing all day long pretending that you understand management and talking shit (laughing in a group), being present in a crazy management meeting whose essense is a shit talk all day long … with a bunch of people who facebook / youtube all day long talk about Latest Cars models and how they wish to have a half million car, watch and talk about fuzzy hand clocks, cheeks, plan their next vacation or where to have the lunch and housing (apartments) all day long (in some more extravagant cases you have some guys being wacky talking about drugs, sex and  rock-and-roll.)
but the unpleasent surprise here is even as a Manager you will probably have to start working for a corporation and have the same depressing atmosphere of people standing in front of their computers (tailor like) all their long with the only difference you will have to speak more with a number of computer addict zoombies (left without much options) that are doing some monkey programming / coding or Services job day after …

Other option you have is to move out of the virtual business at all and get into a real works industry such as getting a Construction job (but believe me such transitions, though I heard of are too painful) and sooner or later you will get back to computing virtual business ..

I have a friend Jose Mathew, whose exit poll from the IT business was to graduate a 2 years post-university course to become a professional Chef (cook) in restaurant but after already few years employeed as a Cook, he is again considering getting back into the IT and paradoxically he wants to enter the niche of Network Administrator (which I forget to mention earlier in that article).

The Network Administrators are among lucky System Administartors job profiles because there job is depending nowadays mostly on their CCNA / CCNP certificate, there experience with network routers such as Juniper, LinkSys, Cisco, Avaya etc.  But the big problem with being one of the guys is that the employment jobs offered are much less than the general Senior or Junior System Administrator (that is more free software Linux based).

The most luckly ones are the Windows System Administrators as the amount of such that are looked up on the market at the moment of writting this article is relatively high. The type of job for Win Sys Admin offered on the market as long as I researched is for Windows Sys Admins that have a good amount of experience / knowledge (with Active Directory) domain controller.

There might be some enjoyment for the Win SysAdmin if you have to develop your own PowerShell scripts or do some kind of automations on a domain controller level and from that perspective this job positions are attractive, but unfortunately that comes at the price for being a totally Microsoft software dependent (junkie).
But in overall it is much easier for the ordinary Win Sys Admin than the Unix one because of the reason Windows Servers and related scripting automation solutions is generally much easier to learn and many of the things you have to make up yourself on a common *NIX OS are already available in Windows in the form of some proprietary extra software you have to buy …
However for people as me who are involved in the UNIX world for the last 15 years, it want be easy to migrate to Windows System Administartor.

In my previous employment Job in Hewlett Packard (and later DXC) I have to do a lot of Windows System Administration jobs and I have to says, that was too easy in general but the downside of deploying some third party software on Windows in case of failure is the debugging on Windows is generally harder task than on Linux / BSD..

Another option if you want to move from the field of System Administarton is to start your own company in either Sys Admin or Programming field or Website building, Website hosting.
That's easy especially if you have a good amount of experience but the problem with this is you need a partner and often finding a partner is a tedious job …
Plus most of the clients you can get for your business are already clients of the Large Sharks corporations and at best you or your company might have to work as a contractor for the uncle SAM corporations ..

Of course as a sysadmin you can always repair computers and could try to start a business of computer (OS) repair niche, but as the competition in the field is enormous and you will have to work like crazy to be able to make a decent living, plus it is very likely that you bankrupt, because of lack of enough clients in need to fix their OS (as most people nowadays have learned on how to install Windows and basic surrounding softwares) …

 



system-administration-is-dying-grave-RIP-sysadmins

 


If you have land like my parents you can try to make a living by growing vegetables like Bio potatoes, cucumbers, tomatoes, cabbage, onions, garlic and other fruits such as Apples, Pears, Walnuts, Peaches etc.
The bio-fruits growing business though profitable in western societies is way from profitable in Eastern world so if you happen to be in some eastern country and you want to make good moving to the fruit growing / selling business might not make you rich but at least you will have benefits for your health because of the village / land work + you will have a little bit more independence and your mind will be much clearer. If you decide to try a physical work like this, your concentration level will improve as most IT industry people because of the long hours of computer madness jobs slowly start totally loose focus and often the stress of the Computer works impare memory ..

 

 

Another option for exit from System Administration industry if you have some little marketing experience or background is to move to become a Marketing or E-Marketing SEO specialist, that's not a bad option but the problem is still you will bundled in a permament marriage with the computer and the sallary you will get would most likely no different from the one you will get as a system administrator.
So just like any other Computer related job in order to keep in shape you either have to go Fitness 2 / 3 times a week or actively sport something, otherwise you might experience a growing decline in health over time (just like you already might have in sys admin field).

To sum up being a sysadmin is very enjoyable fun and bright profession, the only small problem is most true dedicated system administrators are know tend to suffer constant anxiety, hyper activity, have physical health issues, suffer forms of depressions or have mental issues (perhaps because of the inhuman amount of information they have to process daily and the large amounts of hard alcohol vodka, beer etc. 🙂 consumed as a mean of anti-depressant) …
But it seems other IT specialists I know such as programmers tend to often suffer similar problems. Besides that many of the people involved in sysadmin business or IT have troubles finding decent woman to marry, as they tend to become more or less anti-social (or gradually loose their ability for proper interactivion with human) because of the fact most of their life is being led in the virtual reality online.

But lets be optimistic, perhaps there are many sysadmins who have the luck to have started a normal life in a normal company and managed their life well with family and kids it is just I haven't met them yet 🙂

I know this post was quite a lot of rant and I would like to excuse anyone who was bored to read all this mess, but I felt obliged to share about this problem as the things are rushing through my mind for over a two years now and we had quite a discussions with friends / collegues on the realization that the system administration job is loosing its attractivity and that the new age of (cloud) computing is pushing computer science to move towards a bad and dark path which makes the individual both employee and user more dependant and less free  …