Posts Tagged ‘Journal’

List and fix failed systemd failed services after Linux OS upgrade and how to get full info about systemd service from jorunal log

Friday, February 25th, 2022

systemd-logo-unix-linux-list-failed-systemd-services

I have recently upgraded a number of machines from Debian 10 Buster to Debian 11 Bullseye. The update as always has some issues on some machines, such as problem with package dependencies, changing a number of external package repositories etc. to match che Bullseye deb packages. On some machines the update was less painful on others but the overall line was that most of the machines after the update ended up with one or more failed systemd services. It could be that some of the machines has already had this failed services present and I never checked them from the previous time update from Debian 9 -> Debian 10 or just some mess I've left behind in the hurry when doing software installation in the past. This doesn't matter anyways the fact was that I had to deal to a number of systemctl services which I managed to track by the Failed service mesage on system boot on one of the physical machines and on the OpenXen VTY Console the rest of Virtual Machines after update had some Failed messages. Thus I've spend some good amount of time like an overall of a day or two fixing strange failed services. This is how this small article was born in attempt to help sysadmins or any home Linux desktop users, who has updated his Debian Linux / Ubuntu or any other deb based distribution but due to the chaotic nature of Linux has ended with same strange Failed services and look for a way to find the source of the failures and get rid of the problems. 
Systemd is a very complicated system and in my many sysadmin opinion it makes more problems than it solves, but okay for today's people's megalomania mindset it matches well.

Systemd_components-systemd-journalctl-cgroups-loginctl-nspawn-analyze.svg

 

1. Check the journal for errors, running service irregularities and so on
 

First thing to do to track for errors, right after the update is to take some minutes and closely check,, the journalctl for any strange errors, even on well maintained Unix machines, this journal log would bring you to a problem that is not fatal but still some process or stuff is malfunctioning in the background that you would like to solve:
 

root@pcfreak:~# journalctl -x
Jan 10 10:10:01 pcfreak CRON[17887]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17887]: USER_END pid=17887 uid=0 auid=0 ses=340858 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17888]: CRED_DISP pid=17888 uid=0 auid=0 ses=340860 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="root" exe="/usr/sbin/cron" >
Jan 10 10:10:01 pcfreak CRON[17888]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17888]: USER_END pid=17888 uid=0 auid=0 ses=340860 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17884]: CRED_DISP pid=17884 uid=0 auid=0 ses=340855 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="root" exe="/usr/sbin/cron" >
Jan 10 10:10:01 pcfreak CRON[17884]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17884]: USER_END pid=17884 uid=0 auid=0 ses=340855 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17886]: CRED_DISP pid=17886 uid=0 auid=33 ses=340859 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="www-data" exe="/usr/sbin/c>
Jan 10 10:10:01 pcfreak CRON[17886]: pam_unix(cron:session): session closed for user www-data
Jan 10 10:10:01 pcfreak audit[17886]: USER_END pid=17886 uid=0 auid=33 ses=340859 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permi>
Jan 10 10:10:08 pcfreak NetworkManager[696]:  [1641802208.0899] device (eth1): carrier: link connected
Jan 10 10:10:08 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:08 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:19 pcfreak NetworkManager[696]:
 [1641802219.7920] device (eth1): carrier: link connected
Jan 10 10:10:19 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:20 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:22 pcfreak NetworkManager[696]:
 [1641802222.2772] device (eth1): carrier: link connected
Jan 10 10:10:22 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:23 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:33 pcfreak sshd[18142]: Unable to negotiate with 66.212.17.162 port 19255: no matching key exchange method found. Their offer: diffie-hellman-group14-sha1,diff>
Jan 10 10:10:41 pcfreak NetworkManager[696]:
 [1641802241.0186] device (eth1): carrier: link connected
Jan 10 10:10:41 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx

If you want to only check latest journal log messages use the -x -e (pager catalog) opts

root@pcfreak;~# journalctl -xe

Feb 25 13:08:29 pcfreak audit[2284920]: USER_LOGIN pid=2284920 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='op=login acct=28696E76616C>
Feb 25 13:08:29 pcfreak sshd[2284920]: Received disconnect from 177.87.57.145 port 40927:11: Bye Bye [preauth]
Feb 25 13:08:29 pcfreak sshd[2284920]: Disconnected from invalid user ubuntuuser 177.87.57.145 port 40927 [preauth]

Next thing to after the update was to get a list of failed service only.


2. List all systemd failed check services which was supposed to be running

root@pcfreak:/root # systemctl list-units | grep -i failed
● certbot.service                                                                                                       loaded failed failed    Certbot
● logrotate.service                                                                                                     loaded failed failed    Rotate log files
● maldet.service                                                                                                        loaded failed failed    LSB: Start/stop maldet in monitor mode
● named.service                                                                                                         loaded failed failed    BIND Domain Name Server


Alternative way is with the –failed option

hipo@jeremiah:~$ systemctl list-units –failed
  UNIT                        LOAD   ACTIVE SUB    DESCRIPTION
● haproxy.service             loaded failed failed HAProxy Load Balancer
● libvirt-guests.service      loaded failed failed Suspend/Resume Running libvirt Guests
● libvirtd.service            loaded failed failed Virtualization daemon
● nvidia-persistenced.service loaded failed failed NVIDIA Persistence Daemon
● sqwebmail.service           masked failed failed sqwebmail.service
● tpm2-abrmd.service          loaded failed failed TPM2 Access Broker and Resource Management Daemon
● wd_keepalive.service        loaded failed failed LSB: Start watchdog keepalive daemon

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
7 loaded units listed.

 

root@jeremiah:/etc/apt/sources.list.d#  systemctl list-units –failed
  UNIT                        LOAD   ACTIVE SUB    DESCRIPTION
● haproxy.service             loaded failed failed HAProxy Load Balancer
● libvirt-guests.service      loaded failed failed Suspend/Resume Running libvirt Guests
● libvirtd.service            loaded failed failed Virtualization daemon
● nvidia-persistenced.service loaded failed failed NVIDIA Persistence Daemon
● sqwebmail.service           masked failed failed sqwebmail.service
● tpm2-abrmd.service          loaded failed failed TPM2 Access Broker and Resource Management Daemon
● wd_keepalive.service        loaded failed failed LSB: Start watchdog keepalive daemon


To get a full list of objects of systemctl you can pass as state:
 

# systemctl –state=help
Full list of possible load states to pass is here
Show service properties


Check whether a service is failed or has other status and check default set systemd variables for it.

root@jeremiah~:# systemctl is-failed vboxweb.service
inactive

# systemctl show haproxy
Type=notify
Restart=always
NotifyAccess=main
RestartUSec=100ms
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
TimeoutAbortUSec=1min 30s
TimeoutStartFailureMode=terminate
TimeoutStopFailureMode=terminate
RuntimeMaxUSec=infinity
WatchdogUSec=0
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
SuccessExitStatus=143
MainPID=304858
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ReloadResult=success
CleanResult=success

Full output of the above command is dumped in show_systemctl_properties.txt


3. List all running systemd services for a better overview on what's going on on machine
 

To get a list of all properly systemd loaded services you can use –state running.

hipo@jeremiah:~$ systemctl list-units –state running|head -n 10
  UNIT                              LOAD   ACTIVE SUB     DESCRIPTION
  proc-sys-fs-binfmt_misc.automount loaded active running Arbitrary Executable File Formats File System Automount Point
  cups.path                         loaded active running CUPS Scheduler
  init.scope                        loaded active running System and Service Manager
  session-2.scope                   loaded active running Session 2 of user hipo
  accounts-daemon.service           loaded active running Accounts Service
  anydesk.service                   loaded active running AnyDesk
  apache-htcacheclean.service       loaded active running Disk Cache Cleaning Daemon for Apache HTTP Server
  apache2.service                   loaded active running The Apache HTTP Server
  avahi-daemon.service              loaded active running Avahi mDNS/DNS-SD Stack

 

It is useful thing is to list all unit-files configured in systemd and their state, you can do it with:

 


root@pcfreak:~# systemctl list-unit-files
UNIT FILE                                                                 STATE           VENDOR PRESET
proc-sys-fs-binfmt_misc.automount                                         static          –            
-.mount                                                                   generated       –            
backups.mount                                                             generated       –            
dev-hugepages.mount                                                       static          –            
dev-mqueue.mount                                                          static          –            
media-cdrom0.mount                                                        generated       –            
mnt-sda1.mount                                                            generated       –            
proc-fs-nfsd.mount                                                        static          –            
proc-sys-fs-binfmt_misc.mount                                             disabled        disabled     
run-rpc_pipefs.mount                                                      static          –            
sys-fs-fuse-connections.mount                                             static          –            
sys-kernel-config.mount                                                   static          –            
sys-kernel-debug.mount                                                    static          –            
sys-kernel-tracing.mount                                                  static          –            
var-www.mount                                                             generated       –            
acpid.path                                                                masked          enabled      
cups.path                                                                 enabled         enabled      

 

 


root@pcfreak:~# systemctl list-units –type service –all
  UNIT                                   LOAD      ACTIVE   SUB     DESCRIPTION
  accounts-daemon.service                loaded    inactive dead    Accounts Service
  acct.service                           loaded    active   exited  Kernel process accounting
● alsa-restore.service                   not-found inactive dead    alsa-restore.service
● alsa-state.service                     not-found inactive dead    alsa-state.service
  apache2.service                        loaded    active   running The Apache HTTP Server
● apparmor.service                       not-found inactive dead    apparmor.service
  apt-daily-upgrade.service              loaded    inactive dead    Daily apt upgrade and clean activities
 apt-daily.service                      loaded    inactive dead    Daily apt download activities
  atd.service                            loaded    active   running Deferred execution scheduler
  auditd.service                         loaded    active   running Security Auditing Service
  auth-rpcgss-module.service             loaded    inactive dead    Kernel Module supporting RPCSEC_GSS
  avahi-daemon.service                   loaded    active   running Avahi mDNS/DNS-SD Stack
  certbot.service                        loaded    inactive dead    Certbot
  clamav-daemon.service                  loaded    active   running Clam AntiVirus userspace daemon
  clamav-freshclam.service               loaded    active   running ClamAV virus database updater
..

 


linux-systemd-components-diagram-linux-kernel-system-targets-systemd-libraries-daemons

 

4. Finding out more on why a systemd configured service has failed


Usually getting info about failed systemd service is done with systemctl status servicename.service
However, in case of troubles with service unable to start to get more info about why a service has failed with (-l) or (–full) options


root@pcfreak:~# systemctl -l status logrotate.service
● logrotate.service – Rotate log files
     Loaded: loaded (/lib/systemd/system/logrotate.service; static)
     Active: failed (Result: exit-code) since Fri 2022-02-25 00:00:06 EET; 13h ago
TriggeredBy: ● logrotate.timer
       Docs: man:logrotate(8)
             man:logrotate.conf(5)
    Process: 2045320 ExecStart=/usr/sbin/logrotate /etc/logrotate.conf (code=exited, status=1/FAILURE)
   Main PID: 2045320 (code=exited, status=1/FAILURE)
        CPU: 2.479s

Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: For now we will assume you meant to write /32
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| ERROR: '0.0.0.0/0.0.0.0' needs to be replaced by the term 'all'.
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| SECURITY NOTICE: Overriding config setting. Using 'all' instead.
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: (B) '::/0' is a subnetwork of (A) '::/0'
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: because of this '::/0' is ignored to keep splay tree searching predictable
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: You should probably remove '::/0' from the ACL named 'all'
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Failed with result 'exit-code'.
Feb 25 00:00:06 pcfreak systemd[1]: Failed to start Rotate log files.
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Consumed 2.479s CPU time.


systemctl -l however is providing only the last log from message a started / stopped or whatever status service has generated. Sometimes systemctl -l servicename.service is showing incomplete the splitted error message as there is a limitation of line numbers on the console, see below

 

root@pcfreak:~# systemctl status -l certbot.service
● certbot.service – Certbot
     Loaded: loaded (/lib/systemd/system/certbot.service; static)
     Active: failed (Result: exit-code) since Fri 2022-02-25 09:28:33 EET; 4h 0min ago
TriggeredBy: ● certbot.timer
       Docs: file:///usr/share/doc/python-certbot-doc/html/index.html
             https://certbot.eff.org/docs
    Process: 290017 ExecStart=/usr/bin/certbot -q renew (code=exited, status=1/FAILURE)
   Main PID: 290017 (code=exited, status=1/FAILURE)
        CPU: 9.771s

Feb 25 09:28:33 pcfrxen certbot[290017]: The error was: PluginError('An authentication script must be provided with –manual-auth-hook when using th>
Feb 25 09:28:33 pcfrxen certbot[290017]: All renewals failed. The following certificates could not be renewed:
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/mail.pcfreak.org-0003/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/www.eforia.bg-0005/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/zabbix.pc-freak.net/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]: 3 renew failure(s), 5 parse failure(s)
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Failed with result 'exit-code'.
Feb 25 09:28:33 pcfrxen systemd[1]: Failed to start Certbot.
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Consumed 9.771s CPU time.

 

5. Get a complete log of journal to make sure everything configured on server host runs as it should

Thus to get more complete list of the message and be able to later google and look if has come with a solution on the internet  use:

root@pcfrxen:~#  journalctl –catalog –unit=certbot

— Journal begins at Sat 2022-01-22 21:14:05 EET, ends at Fri 2022-02-25 13:32:01 EET. —
Jan 23 09:58:18 pcfrxen systemd[1]: Starting Certbot…
░░ Subject: A start job for unit certbot.service has begun execution
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ A start job for unit certbot.service has begun execution.
░░ 
░░ The job identifier is 5754.
Jan 23 09:58:20 pcfrxen certbot[124996]: Traceback (most recent call last):
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/renewal.py", line 71, in _reconstitute
Jan 23 09:58:20 pcfrxen certbot[124996]:     renewal_candidate = storage.RenewableCert(full_path, config)
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/storage.py", line 471, in __init__
Jan 23 09:58:20 pcfrxen certbot[124996]:     self._check_symlinks()
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/storage.py", line 537, in _check_symlinks

root@server:~# journalctl –catalog –unit=certbot|grep -i pluginerror|tail -1
Feb 25 09:28:33 pcfrxen certbot[290017]: The error was: PluginError('An authentication script must be provided with –manual-auth-hook when using the manual plugin non-interactively.')


Or if you want to list and read only the last messages in the journal log regarding a service

root@server:~# journalctl –catalog –pager-end –unit=certbot


If you have disabled a failed service because you don't need it to run at all on the machine with:

root@rhel:~# systemctl stop rngd.service
root@rhel:~# systemctl disable rngd.service

And you want to clear up any failed service information that is kept in the systemctl service log you can do it with:
 

root@rhel:~# systemctl reset-failed

Another useful systemctl option is cat, you can use it to easily list a service it is useful to quickly check what is a service, an actual shortcut to save you from giving a full path to the service e.g. cat /lib/systemd/system/certbot.service

root@server:~# systemctl cat certbot
# /lib/systemd/system/certbot.service
[Unit]
Description=Certbot
Documentation=file:///usr/share/doc/python-certbot-doc/html/index.html
Documentation=https://certbot.eff.org/docs
[Service]
Type=oneshot
ExecStart=/usr/bin/certbot -q renew
PrivateTmp=true


After failed SystemD services are fixed, it is best to reboot the machine and check put some more time to inspect rawly the complete journal log to make sure, no error  was left behind.


Closure
 

As you can see updating a machine from a major to a major version even if you follow the official documentation and you have plenty of experience is always more or a less a pain in the ass, which can eat up much of your time banging your head solving problems with failed daemons issues with /etc/rc.local (which I have faced becase of #/bin/sh -e (which would make /etc/rc.local) to immediately quit if any error from command $? returns different from 0 etc.. The  logical questions comes then;
1. Is it really worthy to update at all regularly, especially if you don't know of a famous major Vulnerability 🙂 ?
2. Or is it worthy to update from OS major release to OS major release at all?  
3. Or should you only try to patch the service that is exposed to an external reachable computer network or the internet only and still the the same OS release until End of Life (LTS = Long Term Support) as called in Debian or  End Of Life  (EOL) Cycle as called in RPM based distros the period until the OS major release your software distro has official security patches is reached.

Anyone could take any approach but for my own managed systems small network at home my practice was always to try to keep up2date everything every 3 or 6 months maximum. This has caused me multiple days of irritation and stress and perhaps many white hairs and spend nerves on shit.


4. Based on the company where I'm employed the better strategy is to patch to the EOL is still offered and keep the rule First Things First (FTF), once the EOL is reached, just make a copy of all servers data and configuration to external Data storage, bring up a new Physical or VM and migrate the services.
Test after the migration all works as expected if all is as it should be change the DNS records or Leading Infrastructure Proxies whatever to point to the new service and that's it! Yes it is true that migration based on a full OS reinstall is more time consuming and requires much more planning, but usually the result is much more expected, plus it is much less stressful for the guy doing the job.

Set all logs to log to to physical console /dev/tty12 (tty12) on Linux

Wednesday, August 12th, 2020

tty linux-logo how to log everything to last console terminal tty12

Those who administer servers from the days of birth of Linux and who used actively GNU / Linux over the years or any other UNIX knows how practical could be to configure logging of all running services / kernel messages / errors and warnings on a physical console.

Traditionally from the days I was learning Linux basics I was shown how to do this on an old Debian Sarge 3.0 Linux without systemd and on all Linux distributions Redhat 9.0 / Calderas and Mandrakes I've used either as a home systems or for servers. I've always configured output of all messages to go to the last easy to access console /dev/tty12 (for those who never use it console switching under Linux plain text console mode is done with key combination of CTRL + ALT + F1 .. F12.

In recent times however with the introduction of systemd pretty much things changed as messages to console are not handled by /etc/inittab which was used to add and refresh physical consoles tty1, tty2 … tty7 (the default added one on Linux were usually 7), but I had to manually include more respawn lines for each console in /etc/inittab.
Nowadays as of year 2020 Linux distros /etc/inittab is no longer there being obsoleted and console print out of INPUT / OUTPUT messages are handled by systemd.
 

1. Enable Physical TTYs from TTY8 till TTY12 etc.


The number of default consoles existing in most Linux distributions I've seen is still from tty1 to tty7. Hence to add more tty consoles and be ready to be able to switch out  not only towards tty7 but towards tty12 once you're connected to the server via a remote ILO (Integrated Lights Out) / IdRAC (Dell Remote Access Controller) / IPMI / IMM (Imtegrated Management Module), you have to do it by telling systemd issuing below systemctl commands:
 

 

 # systemctl enable getty@tty8.service Created symlink /etc/systemd/system/getty.target.wants/getty@tty8.service -> /lib/systemd/system/getty@.service.

systemctl enable getty@tty9.service

Created symlink /etc/systemd/system/getty.target.wants/getty@tty9.service -> /lib/systemd/system/getty@.service.

systemctl enable getty@tty10.service

Created symlink /etc/systemd/system/getty.target.wants/getty@tty10.service -> /lib/systemd/system/getty@.service.

systemctl enable getty@tty11.service

Created symlink /etc/systemd/system/getty.target.wants/getty@tty11.service -> /lib/systemd/system/getty@.service.

systemctl enable getty@tty12.service

Created symlink /etc/systemd/system/getty.target.wants/getty@tty12.service -> /lib/systemd/system/getty@.service.


Once the TTYS tty7 to tty12 are enabled you will be able to switch to this consoles either if you have a physical LCD / CRT monitor or KVM switch connected to the machine mounted on the Rack shelf once you're in the Data Center or will be able to see it once connected remotely via the Management IP Interface (ILO) remote console.
 

2. Taking screenshot of the physical console TTY with fbcat


For example below is a screenshot of the 10th enabled tty10:

tty10-linux-screenshot-fbcat-how-to-screenshot-console

As you can in the screenshot I've used the nice tool fbcat that can be used to make a screenshot of remote console. This is very useful especially if remote access via a SSH client such as PuTTY / MobaXterm is not there but you have only a physical attached monitor access on a DCs that are under a heavy firewall that is preventing anyone to get to the system remotely. For example screenshotting the physical console in case if there is a major hardware failure occurs and you need to dump a hardware error message to a flash drive that will be used to later be handled to technicians to analyize it and exchange the broken server hardware part.

Screenshots of the CLI with fbcat is possible across most Linux distributions where as usual.

In Debian you have to first instal the tool via :
 

# apt install –yes fbcat


and on RedHats / CentOS / Fedoras

# yum install -y fbcat


Taking screenshot once tool is on the server of whatever you have printed on console is as easy as

# fbcat > tty_name.ppm


Note that you might want to convert the .ppm created picture to png with any converter such as imagemagick's convert command or if you have a GUI perhaps with GNU Image Manipulation Tool (GIMP).

3. Enabling every rsyslog handled message to log to Physical TTY12


To make everything such as errors, notices, debug, warning messages  become instantly logging towards above added new /dev/tty12.

Open /etc/rsyslog.conf and to the end of the file append below line :
 

daemon,mail.*;\
   news.=crit;news.=err;news.=notice;\
   *.=debug;*.=info;\
   *.=notice;*.=warn   /dev/tty12


To make rsyslog load its new config restart it:

 

# systemctl status rsyslog

 

 

 

rsyslog.service – System Logging Service
   Loaded: loaded (/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-08-10 04:09:36 EEST; 2 days ago
     Docs: man:rsyslogd(8)
           https://www.rsyslog.com/doc/
 Main PID: 671 (rsyslogd)
    Tasks: 4 (limit: 4915)
   Memory: 12.5M
   CGroup: /system.slice/rsyslog.service
           └─671 /usr/sbin/rsyslogd -n -iNONE

 

авг 12 00:00:05 pcfreak rsyslogd[671]:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="671" x-info="https://www.rsyslo
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

 

systemctl restart rsyslog


That's all folks navigate by pressing simultaneously CTRL + ALT + F12 to get to TTY12 or use ALT + LEFT / ALT + RIGHT ARROW (console switch commands) till you get to the console where everything should be now logged.

Enjoy and if you like this article share to tell your sysadmin friends about this nice hack  ! 🙂

 

 

 

What is inode and how to find out which directory is eating up all your filesystem inodes on Linux, Increase inode count on a ext3 ext4 and ufs filesystems

Tuesday, August 20th, 2019

what-is-inode-find-out-which-filesystem-or-directory-eating-up-all-your-system-inodes-linux_inode_diagram

If you're a system administrator of multiple Linux servers used for Web serving delivery / Mail server sysadmin, Database admin or any High amount of Drives Data Storage used for backup servers infra, Data Repository administrator such as Linux hosted Samba / CIFS shares, etc. or using some Linux Hosting Provider to host your website or any other UNIX like Infrastructure servers that demands a storage of high number of files under a Directory  you might end up with the common filesystem inode depletion issues ( Maximum Inode number for a filesystem is predefined, limited and depending on the filesystem configured size).

In case a directory stored files end up exceding the amount of possible addressable inodes could prevent any data to be further assiged and stored on the Filesystem.

When a device runs out of inodes, new files cannot be created on the device, even though there may be plenty free space available and the first time it happened to me very long time ago I was completely puzzled how this is possible as I was not aware of Inodes existence  …

Reaching maximum inodes number (e.g. inode depletion), often happens on Busy Mail servers (receivng tons of SPAM email messages) or Content Delivery Network (CDN – Website Image caching servers) which contain many small files on EXT3 or EXT4 Journalled filesystems. File systems (such as Btrfs, JFS or XFS) escape this limitation with extents or dynamic inode allocation, which can 'grow' the file system or increase the number of inodes.

 

Hence ending being out of inodes could cause various oddities on how stored data behaves or communicated to other connected microservices and could lead to random application disruptions and odd results costing you many hours of various debugging to find the root cause of inodes (index nodes) being out of order.

In below article, I will try to give an overall explanation on what is an I-Node on a filesystem, how inodes of FS unit could be seen, how to diagnose a possible inode poblem – e.g.  see the maximum amount of inodes available per filesystem and how to prepare (format) a new filesystem with incrsed set of maximum inodes.

 

What are filesystem i-nodes?

 

This is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory.
The data structure described in the inodes might vary slightly depending on the filesystem but usually on EXT3 / EXT4 Linux filesystems each inode stores the index to block that contains attributes and disk block location(s) of the object's data.
– Yes for those who are not aware on how a filesystem is structured on *nix it does allocate all stored data in logical separeted structures called data blocks. Each file stored on a local filesystem has a file descriptor, there are virtual unit structures file tables and each of the inodes that are a reference number has a own data structure (inode table).

Inodes / "Index" are slightly unusual on file system structure that stored the access information of files as a flat array on the disk, with all the hierarchical directory information living aside from this as explained by Unix creator and pioneer- Dennis Ritchie (passed away few years ago).

what-is-inode-very-simplified-explanation-diagram-data

Simplified explanation on file descriptors, file table and inode, table on a common Linux filesystem

Here is another description on what is I-node, given by Ken Thompson (another Unix pioneer and father of Unix) and Denis Ritchie, described in their paper published in 1978:

"    As mentioned in Section 3.2 above, a directory entry contains only a name for the associated file and a pointer to the file itself. This pointer is an integer called the i-number (for index number) of the file. When the file is accessed, its i-number is used as an index into a system table (the i-list) stored in a known part of the device on which the directory resides. The entry found thereby (the file's i-node) contains the description of the file:…
    — The UNIX Time-Sharing System, The Bell System Technical Journal, 1978  "


 

What is typical content of inode and how I-nodes play with rest of Filesystem units?


The inode is just a reference index to a data block (unit) that contains File-system object attributes. It may include metadata information such as (times of last change, access, modification), as well as owner and permission data.

 

On a Linux / Unix filesystem, directories are lists of names assigned to inodes. A directory contains an entry for itself, its parent, and each of its children.

Structure-of-inode-table-on-Linux-Filesystem-diagram

 

Structure of inode table-on Linux Filesystem diagram (picture source GeeksForGeeks.org)

  • Information about files(data) are sometimes called metadata. So you can even say it in another way, "An inode is metadata of the data."
  •  Inode : Its a complex data-structure that contains all the necessary information to specify a file. It includes the memory layout of the file on disk, file permissions, access time, number of different links to the file etc.
  •  Global File table : It contains information that is global to the kernel e.g. the byte offset in the file where the user's next read/write will start and the access rights allowed to the opening process.
  • Process file descriptor table : maintained by the kernel, that in turn indexes into a system-wide table of files opened by all processes, called the file table .

The inode number indexes a table of inodes in a known location on the device. From the inode number, the kernel's file system driver can access the inode contents, including the location of the file – thus allowing access to the file.

  •     Inodes do not contain its hardlink names, only other file metadata.
  •     Unix directories are lists of association structures, each of which contains one filename and one inode number.
  •     The file system driver must search a directory looking for a particular filename and then convert the filename to the correct corresponding inode number.

The operating system kernel's in-memory representation of this data is called struct inode in Linux. Systems derived from BSD use the term vnode, with the v of vnode referring to the kernel's virtual file system layer.


But enough technical specifics, lets get into some practical experience on managing Filesystem inodes.
 

Listing inodes on a Fileystem


Lets say we wan to to list an inode number reference ID for the Linux kernel (files):

 

root@linux: # ls -i /boot/vmlinuz-*
 3055760 /boot/vmlinuz-3.2.0-4-amd64   26091901 /boot/vmlinuz-4.9.0-7-amd64
 3055719 /boot/vmlinuz-4.19.0-5-amd64  26095807 /boot/vmlinuz-4.9.0-8-amd64


To list an inode of all files in the kernel specific boot directory /boot:

 

root@linux: # ls -id /boot/
26091521 /boot/


Listing inodes for all files stored in a directory is also done by adding the -i ls command flag:

Note the the '-1' flag was added to to show files in 1 column without info for ownership permissions

 

root@linux:/# ls -1i /boot/
26091782 config-3.2.0-4-amd64
 3055716 config-4.19.0-5-amd64
26091900 config-4.9.0-7-amd64
26095806 config-4.9.0-8-amd64
26091525 grub/
 3055848 initrd.img-3.2.0-4-amd64
 3055644 initrd.img-4.19.0-5-amd64
26091902 initrd.img-4.9.0-7-amd64
 3055657 initrd.img-4.9.0-8-amd64
26091756 System.map-3.2.0-4-amd64
 3055703 System.map-4.19.0-5-amd64
26091899 System.map-4.9.0-7-amd64
26095805 System.map-4.9.0-8-amd64
 3055760 vmlinuz-3.2.0-4-amd64
 3055719 vmlinuz-4.19.0-5-amd64
26091901 vmlinuz-4.9.0-7-amd64
26095807 vmlinuz-4.9.0-8-amd64

 

To get more information about Linux directory, file, such as blocks used by file-unit, Last Access, Modify and Change times, current External Symbolic or Static links for filesystem object:
 

root@linux:/ # stat /etc/
  File: /etc/
  Size: 16384         Blocks: 32         IO Block: 4096   catalog
Device: 801h/2049d    Inode: 6365185     Links: 231
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-08-20 06:29:39.946498435 +0300
Modify: 2019-08-14 13:53:51.382564330 +0300
Change: 2019-08-14 13:53:51.382564330 +0300
 Birth: –

 

Within a POSIX system (Linux-es) and *BSD are more or less such, a file has the following attributes[9] which may be retrieved by the stat system call:

   – Device ID (this identifies the device containing the file; that is, the scope of uniqueness of the serial number).
    File serial numbers.
    – The file mode which determines the file type and how the file's owner, its group, and others can access the file.
    – A link count telling how many hard links point to the inode.
    – The User ID of the file's owner.
    – The Group ID of the file.
    – The device ID of the file if it is a device file.
    – The size of the file in bytes.
    – Timestamps telling when the inode itself was last modified (ctime, inode change time), the file content last modified (mtime, modification time), and last accessed (atime, access time).
    – The preferred I/O block size.
    – The number of blocks allocated to this file.

 

Getting more extensive information on a mounted filesystem


Most Linuxes have the tune2fs installed by default (in debian Linux this is through e2fsprogs) package, with it one can get a very good indepth information on a mounted filesystem, lets say about the ( / ) root FS.
 

root@linux:~# tune2fs -l /dev/sda1
tune2fs 1.44.5 (15-Dec-2018)
Filesystem volume name:   <none>
Last mounted on:          /
Filesystem UUID:          abe6f5b9-42cb-48b6-ae0a-5dda350bc322
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              30162944
Block count:              120648960
Reserved block count:     6032448
Free blocks:              13830683
Free inodes:              26575654
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      995
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Filesystem created:       Thu Sep  6 21:44:22 2012
Last mount time:          Sat Jul 20 11:33:38 2019
Last write time:          Sat Jul 20 11:33:28 2019
Mount count:              6
Maximum mount count:      22
Last checked:             Fri May 10 18:32:27 2019
Check interval:           15552000 (6 months)
Next check after:         Wed Nov  6 17:32:27 2019
Lifetime writes:          338 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:              256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
First orphan inode:       21554129
Default directory hash:   half_md4
Directory Hash Seed:      d54c5a90-bc2d-4e22-8889-568d3fd8d54f
Journal backup:           inode blocks


Important note to make here is file's inode number stays the same when it is moved to another directory on the same device, or when the disk is defragmented which may change its physical location. This also implies that completely conforming inode behavior is impossible to implement with many non-Unix file systems, such as FAT and its descendants, which don't have a way of storing this invariance when both a file's directory entry and its data are moved around. Also one inode could point to a file and a copy of the file or even a file and a symlink could point to the same inode, below is example:

$ ls -l -i /usr/bin/perl*
266327 -rwxr-xr-x 2 root root 10376 Mar 18  2013 /usr/bin/perl
266327 -rwxr-xr-x 2 root root 10376 Mar 18  2013 /usr/bin/perl5.14.2

A good to know is inodes are always unique values, so you can't have the same inode number duplicated. If a directory is damaged, only the names of the things are lost and the inodes become the so called “orphan”, e.g.  inodes without names but luckily this is recoverable. As the theory behind inodes is quite complicated and is complicated to explain here, I warmly recommend you read Ian Dallen's Unix / Linux / Filesystems – directories inodes hardlinks tutorial – which is among the best academic Tutorials explaining various specifics about inodes online.

 

How to Get inodes per mounted filesystem

 

root@linux:/home/hipo# df -i
Filesystem       Inodes  IUsed   IFree IUse% Mounted on

 

dev             2041439     481   2040958   1% /dev
tmpfs            2046359     976   2045383   1% /run
tmpfs            2046359       4   2046355   1% /dev/shm
tmpfs            2046359       6   2046353   1% /run/lock
tmpfs            2046359      17   2046342   1% /sys/fs/cgroup
/dev/sdb5        1221600    2562   1219038   1% /usr/var/lib/mysql
/dev/sdb6        6111232  747460   5363772  13% /var/www/htdocs
/dev/sdc1      122093568 3083005 119010563   3% /mnt/backups
tmpfs            2046359      13   2046346   1% /run/user/1000


As you see in above output Inodes reported for each of mounted filesystems has a specific number. In above output IFree on every mounted FS locally on Physical installed OS Linux is good.


Here is an example on how to recognize a depleted Inodes on a OpenXen Virtual Machine with attached Virtual Hard disks.

linux:~# df -i
Filesystem         Inodes     IUsed      IFree     IUse%   Mounted on
/dev/xvda         2080768    2080768     0      100%    /
tmpfs             92187      3          92184   1%     /lib/init/rw
varrun            92187      38          92149   1%    /var/run
varlock            92187      4          92183   1%    /var/lock
udev              92187     4404        87783   5%    /dev
tmpfs             92187       1         92186   1%    /dev/shm

 

Finding files with a certain inode


At some cases if you want to check all the copy files of a certain file that have the same i-node pointer it is useful to find them all by their shared inode this is possible with simple find (below example is for /usr/bin/perl binary sharing same inode as perl5.28.1:

 

ls -i /usr/bin/perl
23798851 /usr/bin/perl*

 

 find /usr/bin -inum 435308 -print
/usr/bin/perl5.28.1
/usr/bin/perl

 

Find directory that has a large number of files in it?

To get an overall number of inodes allocated by a certain directory, lets say /usr /var

 

root@linux:/var# du -s –inodes /usr /var
566931    /usr
56020    /var/

To get a list of directories use by inode for a directory with its main contained sub-directories sorted from 1 till highest number use:
 

du -s –inodes * 2>/dev/null |sort -g

 

Usually running out of inodes means there is a directory / fs mounts that has too many (small files) that are depleting the max count of possible inodes.

The most simple way to list directories and number of files in them on the server root directory is with a small bash shell loop like so:
 

for i in /*; do echo $i; find $i |wc -l; done


Another way to identify the exact directory that is most likely the bottleneck for the inode depletion in a sorted by file count, human readable form:
 

find / -xdev -printf '%h\n' | sort | uniq -c | sort -k 1 -n


This will dump a list of every directory on the root (/) filesystem prefixed with the number of files (and subdirectories) in that directory. Thus the directory with the largest number of files will be at the bottom.

 

The -xdev switch is used to instruct find to narrow it's search to only the device where you're initiating the search (any other sub-mounted NAS / NFS filesystems from a different device will be omited).

 

Print top 10 subdirectories with Highest Inode Usage

 

Once identifed the largest number of files directories that is perhaps the issue, to further get a list of Top subdirectories in it with highest amount of inodes used, use below cmd:

 

for i in `ls -1A`; do echo "`find $i | sort -u | wc -l` $i"; done | sort -rn | head -10

 

To list more than 10 of the top inodes used dirs change the head -10 to whatever num needed.

N.B. ! Be very cautious when running above 2 find commands on a very large filesystems as it will be I/O Excessive and in filesystems that has some failing blocks this could create further problems.

To omit putting a high I/O load on a production filesystem, it is possible to also use du + very complex regular expression:
 

cd /backup
du –inodes -S | sort -rh | sed -n         '1,50{/^.\{71\}/s/^\(.\{30\}\).*\(.\{37\}\)$/\1…\2/;p}'


Results returned are from top to bottom.

 

How to Increase the amount of Inodes count on a new created volume EXT4 filesystem

Some FS-es XFS, JFS do have an auto-increase inode feature in case if their is physical space, whether otheres such as reiserfs does not have inodes at all but still have a field reported when queried for errors. But the classical Linux ext3 / ext4 does not have a way to increase the inode number on a live filesystem. Instead the way to do it there is to prepare a brand new filesystem on a Disk / NAS / attached storage.

The number of inodes at format-time of the block storage can be as high as 4 billion inodes. Before you create the new FS, you have to partition the new the block storage as ext4 with lets say parted command (or nullify the content of an with dd to clean up any previous existing data on a volume if there was already existing data:
 

parted /dev/sda


dd if=/dev/zero of=/dev/path/to/volume


  then format it with this additional parameter:

 

mkfs.ext4 -N 3000000000 /dev/path/to/volume

 

Here in above example the newly created filesystem of EXT4 type will be created with 3 Billion inodes !, for setting a higher number on older ext3 filesystem max inode count mkfs.ext3 could be used instead.

Bear in mind that 3 Billion number is a too high number and if you plan to have some large number of files / directories / links structures just raise it up to your pre-planning requirements for FS. In most cases it will be rarely anyone that want to have this number higher than 1 or 2 billion of inodes.

On FreeBSD / NetBSD / OpenBSD setting inode maximum number for a UFS / UFS2 (which is current default FreeBSD FS), this could be done via newfs filesystem creation command after the disk has been labeled with disklabel:

 

freebsd# newfs -i 1024 /dev/ada0s1d

 

Increase the Max Count of Inodes for a /tmp filesystem

 

Sometimes on some machines it is necessery to have ability to store very high number of small files (e.g. have a very large number of inodes) on a temporary filesystem kept in memory. For example some web applications served by Web Server Apache + PHP, Nginx + Perl-FastCGI are written in a bad manner so they kept tons of temporary files in /tmp, leading to issues with exceeded amount of inodes.
If that's the case to temporary work around you can increase the count of Inodes for /tmp to a very high number like 2 billions using:

 

mount -o remount,nr_inodes=<bignum> /tmp

To make the change permanent on next boot if needed don't forget to put the nr_inodes=whatever_bignum as a mount option for the temporary fs to /etc/fstab

Eventually, if you face this issues it is best to immediately track which application produced the mess and ask the developer to fix his messed up programs architecture.

 

Conclusion

 

It was explained on the very common issue of having maximum amount of inodes on a filesystem depleted and the unpleasent consequences of inability to create new files on living FS.
Then a general overview was given on what is inode on a Linux / Unix filesystem, what is typical content of inode, how inode addressing is handled on a FS. Further was explained how to get basic information about available inodes on a filesystem, how to get a filename/s based on inode number (with find), the well known way to determine inode number of a directory or file (with ls) and get more extensive information on a FS on inodes with tune2fs.
Also was explained how to identify directories containing multitudes of files in order to determine a sub-directories that is consuming most of the inodes on a filesystem. Finally it was explained very raughly how to prepare an ext4 filesystem from scratch with predefined number to inodes to much higher than the usual defaults by mkfs.ext3 / mkfs.ext4 and *bsds newfs as well as how to raise the number of inodes of /tmp tmpfs temporary RAM filesystem.