Posts Tagged ‘dmesg’

Get dmesg command kernel log report with human date / time timestamp on older Linux distributions

Friday, June 18th, 2021

how-to-get-dmesg-human-readable-timestamp-kernel-log-command-linux-logo

If you're a sysadmin you surely love to take a look at dmesg kernel log output. Usually on many Linux distributions there is a setup that dmesg keeps logging to log files /var/log/dmesg or /var/log/kern.log. But if you get some inherited old Linux servers it is quite possible that the previous machine maintainer did not enable the output of syslog to get logged in /var/log/{dmesg,kern.log,kernel.log}  or even have disabled the kernel log for some reason. Even though that in dmesg output you might find some interesting events reporting issues with Hard drives on its way to get broken / a bad / reads system processes crashing or whatever of other interesting information that could help you prevent severe servers downtimes or problems earlier but due to an old version of Linux distribution lets say Redhat 5 / Debian 6 or old CentOS / Fedora, the version of dmesg command shipped does not support the '-T' option that is present in util-linux package shipped with newer versions of  Redhat 7.X  / 8.X / SuSEs etc.  

 -T, –ctime
              Print human readable timestamps.  The timestamp could be inaccurate!

To illustrate better what I mean, here is an example from the non-human readable timestamp provided by older dmesg command

root@web-server~:# dmesg |tail -n 5
[4505913.361095] hid-generic 0003:1C4F:0002.000E: input,hidraw1: USB HID v1.10 Device [SIGMACHIP USB Keyboard] on usb-0000:00:1d.0-1.3/input1
[4558251.034024] Process accounting resumed
[4615396.191090] r8169 0000:03:00.0 eth1: Link is Down
[4615397.856950] r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
[4644650.095723] Process accounting resumed

Thanksfully using below few lines of shell or perl scripts the dmesg -T  functionality could be added to the system , so you can easily get the proper timestamp out of the obscure default generated timestamp in the same manner as on newer distros.

Here is how to do with it with bash script:

#!/bin/sh paste in .bashrc and use dmesgt to get human readable timestamp
dmesg_with_human_timestamps () {
    FORMAT="%a %b %d %H:%M:%S %Y"

 

    now=$(date +%s)
    cputime_line=$(grep -m1 "\.clock" /proc/sched_debug)

    if [[ $cputime_line =~ [^0-9]*([0-9]*).* ]]; then
        cputime=$((BASH_REMATCH[1] / 1000))
    fi

    dmesg | while IFS= read -r line; do
        if [[ $line =~ ^\[\ *([0-9]+)\.[0-9]+\]\ (.*) ]]; then
            stamp=$((now-cputime+BASH_REMATCH[1]))
            echo "[$(date +”${FORMAT}” –date=@${stamp})] ${BASH_REMATCH[2]}"
        else
            echo "$line"
        fi
    done
}

Copy the script somewhere under lets say /usr/local/bin or wherever you like on the server and add into your HOME ~/.bashrc some alias like:
 

alias dmesgt=dmesg_with_timestamp.sh


You can get a copy dmesg_with_timestamp.sh of the script from here

Or you can use below few lines perl script to get the proper dmeg kernel date / time

 

#!/bin/perl
# on old Linux distros CentOS 6.0 etc. with dmesg (part of util-linux-ng-2.17.2-12.28.el6_9.2.x86_64) etc. dmesg -T not available
# workaround is little pl script below
dmesg_with_human_timestamps () {
    $(type -P dmesg) "$@" | perl -w -e 'use strict;
        my ($uptime) = do { local @ARGV="/proc/uptime";<>}; ($uptime) = ($uptime =~ /^(\d+)\./);
        foreach my $line (<>) {
            printf( ($line=~/^\[\s*(\d+)\.\d+\](.+)/) ? ( “[%s]%s\n", scalar localtime(time – $uptime + $1), $2 ) : $line )
        }'
}


Again to make use of the script put it under /usr/local/bin/check_dmesg_timestamp.pl

alias dmesgt=dmesg_with_human_timestamps

root@web-server:~# dmesgt | tail -n 20

[Sun Jun 13 15:51:49 2021] usb 2-1.3: USB disconnect, device number 9
[Sun Jun 13 15:51:50 2021] usb 2-1.3: new low-speed USB device number 10 using ehci-pci
[Sun Jun 13 15:51:50 2021] usb 2-1.3: New USB device found, idVendor=1c4f, idProduct=0002, bcdDevice= 1.10
[Sun Jun 13 15:51:50 2021] usb 2-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[Sun Jun 13 15:51:50 2021] usb 2-1.3: Product: USB Keyboard
[Sun Jun 13 15:51:50 2021] usb 2-1.3: Manufacturer: SIGMACHIP
[Sun Jun 13 15:51:50 2021] input: SIGMACHIP USB Keyboard as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.3/2-1.3:1.0/0003:1C4F:0002.000D/input/input25
[Sun Jun 13 15:51:50 2021] hid-generic 0003:1C4F:0002.000D: input,hidraw0: USB HID v1.10 Keyboard [SIGMACHIP USB Keyboard] on usb-0000:00:1d.0-1.3/input0
[Sun Jun 13 15:51:50 2021] input: SIGMACHIP USB Keyboard Consumer Control as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.3/2-1.3:1.1/0003:1C4F:0002.000E/input/input26
[Sun Jun 13 15:51:50 2021] input: SIGMACHIP USB Keyboard System Control as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.3/2-1.3:1.1/0003:1C4F:0002.000E/input/input27
[Sun Jun 13 15:51:50 2021] hid-generic 0003:1C4F:0002.000E: input,hidraw1: USB HID v1.10 Device [SIGMACHIP USB Keyboard] on usb-0000:00:1d.0-1.3/input1
[Mon Jun 14 06:24:08 2021] Process accounting resumed
[Mon Jun 14 22:16:33 2021] r8169 0000:03:00.0 eth1: Link is Down
[Mon Jun 14 22:16:34 2021] r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx

Fix “Approaching the limit on PV entries, consider increasing either the vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable.” in FreeBSD

Monday, May 21st, 2012

bsdinstall-newboot-loader-menu-pv_entries_consider_increasing_vm_pmap_shpgrepproc

I'm running FreeBSD with Apache and PHP on it and I got in dmesg (kernel log), following error:

freebsd# dmesg|grep -i vm.pmap.shpgperproc
Approaching the limit on PV entries, consider increasing either the vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable.
Approaching the limit on PV entries, consider increasing either the vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable.
Approaching the limit on PV entries, consider increasing either the vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable.
Approaching the limit on PV entries, consider increasing either the vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable.
Approaching the limit on PV entries, consider increasing either the vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable.

The exact FreeBSD, Apache and php versions I have installed are:
 

freebsd# uname -a ; httpd -V ; php –version
FreeBSD pcfreak 7.2-RELEASE-p4 FreeBSD 7.2-RELEASE-p4 #0: Fri Oct 2 12:21:39 UTC 2009 root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC i386
Server version: Apache/2.0.64
Server built: Mar 13 2011 23:36:25Server's Module Magic Number: 20050127:14
Server loaded: APR 0.9.19, APR-UTIL 0.9.19
Compiled using: APR 0.9.19, APR-UTIL 0.9.19
Architecture: 32-bit
Server compiled with….
-D APACHE_MPM_DIR="server/mpm/prefork"
-D APR_HAS_SENDFILE
-D APR_HAS_MMAP
-D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
-D APR_USE_FLOCK_SERIALIZE
-D APR_USE_PTHREAD_SERIALIZE
-D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
-D APR_HAS_OTHER_CHILD
-D AP_HAVE_RELIABLE_PIPED_LOGS
-D HTTPD_ROOT="/usr/local"
-D SUEXEC_BIN="/usr/local/bin/suexec"
-D DEFAULT_PIDLOG="/var/run/httpd.pid"
-D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
-D DEFAULT_LOCKFILE="/var/run/accept.lock"
-D DEFAULT_ERRORLOG="logs/error_log"
-D AP_TYPES_CONFIG_FILE="etc/apache2/mime.types"
-D SERVER_CONFIG_FILE="etc/apache2/httpd.conf"
PHP 5.3.5 with Suhosin-Patch (cli) (built: Mar 14 2011 00:29:17)
Copyright (c) 1997-2009 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies
with eAccelerator v0.9.6.1, Copyright (c) 2004-2010 eAccelerator, by eAccelerator

After a bunch of research a FreeBSD forums thread , I've found the fix suggested by a guy.

The solution suggested in the forum is to raise up vm.pmap.pv_entry_ma to vm.pmap.pv_entry_max=1743504, however I've noticed this value is read only and cannot be changed on the BSD running kernel;

freebsd# sysctl vm.pmap.pv_entry_max=1743504
sysctl: oid 'vm.pmap.pv_entry_max' is read only

Instead to solve the;

Approaching the limit on PV entries, consider increasing either the vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable.
, I had to add in /boot/loader.conf

vm.pmap.pde.mappings=68
vm.pmap.shpgperproc=500
vm.pmap.pv_entry_max=1743504

Adding this values through /boot/loader.conf set them on kernel boot time. I've seen also in the threads the consider increasing either the vm.pmap.shpgperproc is also encountered on FreeBSD hosts running Squid, Dansguardion and other web proxy softwares on busy hosts.

This problems are not likely to happen for people who are running latest FreeBSD releases (>8.3, 9.x), I've read in same above post in newer BSD kernels the vm.pmap is no longer existing in newer kernels.

Resolving “nf_conntrack: table full, dropping packet.” flood message in dmesg Linux kernel log

Wednesday, March 28th, 2012

nf_conntrack_table_full_dropping_packet
On many busy servers, you might encounter in /var/log/syslog or dmesg kernel log messages like

nf_conntrack: table full, dropping packet

to appear repeatingly:

[1737157.057528] nf_conntrack: table full, dropping packet.
[1737157.160357] nf_conntrack: table full, dropping packet.
[1737157.260534] nf_conntrack: table full, dropping packet.
[1737157.361837] nf_conntrack: table full, dropping packet.
[1737157.462305] nf_conntrack: table full, dropping packet.
[1737157.564270] nf_conntrack: table full, dropping packet.
[1737157.666836] nf_conntrack: table full, dropping packet.
[1737157.767348] nf_conntrack: table full, dropping packet.
[1737157.868338] nf_conntrack: table full, dropping packet.
[1737157.969828] nf_conntrack: table full, dropping packet.
[1737157.969928] nf_conntrack: table full, dropping packet
[1737157.989828] nf_conntrack: table full, dropping packet
[1737162.214084] __ratelimit: 83 callbacks suppressed

There are two type of servers, I've encountered this message on:

1. Xen OpenVZ / VPS (Virtual Private Servers)
2. ISPs – Internet Providers with heavy traffic NAT network routers
 

I. What is the meaning of nf_conntrack: table full dropping packet error message

In short, this message is received because the nf_conntrack kernel maximum number assigned value gets reached.
The common reason for that is a heavy traffic passing by the server or very often a DoS or DDoS (Distributed Denial of Service) attack. Sometimes encountering the err is a result of a bad server planning (incorrect data about expected traffic load by a company/companeis) or simply a sys admin error…

– Checking the current maximum nf_conntrack value assigned on host:

linux:~# cat /proc/sys/net/ipv4/netfilter/ip_conntrack_max
65536

– Alternative way to check the current kernel values for nf_conntrack is through:

linux:~# /sbin/sysctl -a|grep -i nf_conntrack_max
error: permission denied on key 'net.ipv4.route.flush'
net.netfilter.nf_conntrack_max = 65536
error: permission denied on key 'net.ipv6.route.flush'
net.nf_conntrack_max = 65536

– Check the current sysctl nf_conntrack active connections

To check present connection tracking opened on a system:

:

linux:~# /sbin/sysctl net.netfilter.nf_conntrack_count
net.netfilter.nf_conntrack_count = 12742

The shown connections are assigned dynamicly on each new succesful TCP / IP NAT-ted connection. Btw, on a systems that work normally without the dmesg log being flooded with the message, the output of lsmod is:

linux:~# /sbin/lsmod | egrep 'ip_tables|conntrack'
ip_tables 9899 1 iptable_filter
x_tables 14175 1 ip_tables

On servers which are encountering nf_conntrack: table full, dropping packet error, you can see, when issuing lsmod, extra modules related to nf_conntrack are shown as loaded:

linux:~# /sbin/lsmod | egrep 'ip_tables|conntrack'
nf_conntrack_ipv4 10346 3 iptable_nat,nf_nat
nf_conntrack 60975 4 ipt_MASQUERADE,iptable_nat,nf_nat,nf_conntrack_ipv4
nf_defrag_ipv4 1073 1 nf_conntrack_ipv4
ip_tables 9899 2 iptable_nat,iptable_filter
x_tables 14175 3 ipt_MASQUERADE,iptable_nat,ip_tables

 

II. Remove completely nf_conntrack support if it is not really necessery

It is a good practice to limit or try to omit completely use of any iptables NAT rules to prevent yourself from ending with flooding your kernel log with the messages and respectively stop your system from dropping connections.

Another option is to completely remove any modules related to nf_conntrack, iptables_nat and nf_nat.
To remove nf_conntrack support from the Linux kernel, if for instance the system is not used for Network Address Translation use:

/sbin/rmmod iptable_nat
/sbin/rmmod ipt_MASQUERADE
/sbin/rmmod rmmod nf_nat
/sbin/rmmod rmmod nf_conntrack_ipv4
/sbin/rmmod nf_conntrack
/sbin/rmmod nf_defrag_ipv4

Once the modules are removed, be sure to not use iptables -t nat .. rules. Even attempt to list, if there are any NAT related rules with iptables -t nat -L -n will force the kernel to load the nf_conntrack modules again.

Btw nf_conntrack: table full, dropping packet. message is observable across all GNU / Linux distributions, so this is not some kind of local distribution bug or Linux kernel (distro) customization.
 

III. Fixing the nf_conntrack … dropping packets error

– One temporary, fix if you need to keep your iptables NAT rules is:

linux:~# sysctl -w net.netfilter.nf_conntrack_max=131072

I say temporary, because raising the nf_conntrack_max doesn't guarantee, things will get smoothly from now on.
However on many not so heavily traffic loaded servers just raising the net.netfilter.nf_conntrack_max=131072 to a high enough value will be enough to resolve the hassle.

– Increasing the size of nf_conntrack hash-table

The Hash table hashsize value, which stores lists of conntrack-entries should be increased propertionally, whenever net.netfilter.nf_conntrack_max is raised.

linux:~# echo 32768 > /sys/module/nf_conntrack/parameters/hashsize
The rule to calculate the right value to set is:
hashsize = nf_conntrack_max / 4

– To permanently store the made changes ;a) put into /etc/sysctl.conf:

linux:~# echo 'net.netfilter.nf_conntrack_count = 131072' >> /etc/sysctl.conf
linux:~# /sbin/sysct -p

b) put in /etc/rc.local (before the exit 0 line):

echo 32768 > /sys/module/nf_conntrack/parameters/hashsize

Note: Be careful with this variable, according to my experience raising it to too high value (especially on XEN patched kernels) could freeze the system.
Also raising the value to a too high number can freeze a regular Linux server running on old hardware.

– For the diagnosis of nf_conntrack stuff there is ;

/proc/sys/net/netfilter kernel memory stored directory. There you can find some values dynamically stored which gives info concerning nf_conntrack operations in "real time":

linux:~# cd /proc/sys/net/netfilter
linux:/proc/sys/net/netfilter# ls -al nf_log/

total 0
dr-xr-xr-x 0 root root 0 Mar 23 23:02 ./
dr-xr-xr-x 0 root root 0 Mar 23 23:02 ../
-rw-r--r-- 1 root root 0 Mar 23 23:02 0
-rw-r--r-- 1 root root 0 Mar 23 23:02 1
-rw-r--r-- 1 root root 0 Mar 23 23:02 10
-rw-r--r-- 1 root root 0 Mar 23 23:02 11
-rw-r--r-- 1 root root 0 Mar 23 23:02 12
-rw-r--r-- 1 root root 0 Mar 23 23:02 2
-rw-r--r-- 1 root root 0 Mar 23 23:02 3
-rw-r--r-- 1 root root 0 Mar 23 23:02 4
-rw-r--r-- 1 root root 0 Mar 23 23:02 5
-rw-r--r-- 1 root root 0 Mar 23 23:02 6
-rw-r--r-- 1 root root 0 Mar 23 23:02 7
-rw-r--r-- 1 root root 0 Mar 23 23:02 8
-rw-r--r-- 1 root root 0 Mar 23 23:02 9

 

IV. Decreasing other nf_conntrack NAT time-out values to prevent server against DoS attacks

Generally, the default value for nf_conntrack_* time-outs are (unnecessery) large.
Therefore, for large flows of traffic even if you increase nf_conntrack_max, still shorty you can get a nf_conntrack overflow table resulting in dropping server connections. To make this not happen, check and decrease the other nf_conntrack timeout connection tracking values:

linux:~# sysctl -a | grep conntrack | grep timeout
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_events_retry_timeout = 15
net.ipv4.netfilter.ip_conntrack_generic_timeout = 600
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent2 = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_recv = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 432000
net.ipv4.netfilter.ip_conntrack_tcp_timeout_fin_wait = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close_wait = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_last_ack = 30
net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close = 10
net.ipv4.netfilter.ip_conntrack_tcp_timeout_max_retrans = 300
net.ipv4.netfilter.ip_conntrack_udp_timeout = 30
net.ipv4.netfilter.ip_conntrack_udp_timeout_stream = 180
net.ipv4.netfilter.ip_conntrack_icmp_timeout = 30

All the timeouts are in seconds. net.netfilter.nf_conntrack_generic_timeout as you see is quite high – 600 secs = (10 minutes).
This kind of value means any NAT-ted connection not responding can stay hanging for 10 minutes!

The value net.netfilter.nf_conntrack_tcp_timeout_established = 432000 is quite high too (5 days!)
If this values, are not lowered the server will be an easy target for anyone who would like to flood it with excessive connections, once this happens the server will quick reach even the raised up value for net.nf_conntrack_max and the initial connection dropping will re-occur again …

With all said, to prevent the server from malicious users, situated behind the NAT plaguing you with Denial of Service attacks:

Lower net.ipv4.netfilter.ip_conntrack_generic_timeout to 60 – 120 seconds and net.ipv4.netfilter.ip_conntrack_tcp_timeout_established to stmh. like 54000

linux:~# sysctl -w net.ipv4.netfilter.ip_conntrack_generic_timeout = 120
linux:~# sysctl -w net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 54000

This timeout should work fine on the router without creating interruptions for regular NAT users. After changing the values and monitoring for at least few days make the changes permanent by adding them to /etc/sysctl.conf

linux:~# echo 'net.ipv4.netfilter.ip_conntrack_generic_timeout = 120' >> /etc/sysctl.conf
linux:~# echo 'net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 54000' >> /etc/sysctl.conf

How to detect failing storage LUN on Linux – multipath

Wednesday, April 16th, 2014

detect-failing-lun-on-linux-failing-scsi-detection

If you login to server and after running dmesg – show kernel log command you get tons of messages like:

# dmesg

 

end_request: I/O error, dev sdc, sector 0
sd 0:0:0:1: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdb, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00010000
end_request: I/O error, dev sda, sector 0
sd 0:0:0:2: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdc, sector 0
sd 0:0:0:1: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdb, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00010000
end_request: I/O error, dev sda, sector 0
sd 0:0:0:2: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdc, sector 0
sd 0:0:0:1: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdb, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00010000
end_request: I/O error, dev sda, sector 0

 

In /var/log/messages there are also log messages present like:

# cat /var/log/messages
...

 

Apr 13 09:45:49 sl02785 kernel: end_request: I/O error, dev sda, sector 0
Apr 13 09:45:49 sl02785 kernel: klogd 1.4.1, ———- state change ———-
Apr 13 09:45:50 sl02785 kernel: sd 0:0:0:1: SCSI error: return code = 0x00010000
Apr 13 09:45:50 sl02785 kernel: end_request: I/O error, dev sdb, sector 0
Apr 13 09:45:55 sl02785 kernel: sd 0:0:0:2: SCSI error: return code = 0x00010000
Apr 13 09:45:55 sl02785 kernel: end_request: I/O error, dev sdc, sector 0

 

 

This is a sure sign something is wrong with SCSI hard drives or SCSI controller, to further debug the situation you can use the multipath command, i.e.:


multipath -ll | grep -i failed
_ 0:0:0:2 sdc 8:32 [failed][faulty]
_ 0:0:0:1 sdb 8:16 [failed][faulty]
_ 0:0:0:0 sda 8:0 [failed][faulty]

 

As you can see all 3 drives merged (sdc, sdb and sda) to show up on 1 physical drive via the remote network connectedLUN to the server is showing as faulty. This is a clear sign something is either wrong with 3 hard drive members of LUN – (Logical Unit Number) (which is less probable)  or most likely there is problem with the LUN network  SCSI controller. It is common error that LUN SCSI controller optics cable gets dirty and needs a physical clean up to solve it.

In case you don't know what is storage LUN? – In computer storage, a logical unit number, or LUN, is a number used to identify a logical unit, which is a device addressed by the SCSI protocol or protocols which encapsulate SCSI, such as Fibre Channel or iSCSI. A LUN may be used with any device which supports read/write operations, such as a tape drive, but is most often used to refer to a logical disk as created on a SAN. Though not technically correct, the term "LUN" is often also used to refer to the logical disk itself.

What LUN's does is very similar to Linux's Software LVM (Logical Volume Manager).

swap_pager_getswapspace: failed, MySQL troubles on FreeBSD 7.2 cause and solution

Tuesday, May 3rd, 2011

Every now and then my FreeBSD router dmesg ( /var/log/dmesg.today ) logs, gets filled with error messages like:

pid 86369 (httpd), uid 80, was killed: out of swap space
swap_pager_getswapspace(14): failed
swap_pager_getswapspace(16): failed
swap_pager_getswapspace(11): failed
swap_pager_getswapspace(12): failed
swap_pager_getswapspace(16): failed
swap_pager_getswapspace(16): failed
swap_pager_getswapspace(16): failed
swap_pager_getswapspace(16): failed
swap_pager_getswapspace(14): failed
swap_pager_getswapspace(16): failed
swap_pager_getswapspace(8): failed

Using swapinfo during the swap_pager_getswapspace(16): failed messages were logged in, I figured out that definitely the swap memory over-use is the bottleneck for the troubles, to find this I used the command:

freebsd# swapinfo
Device 1K-blocks Used Avail Capacity Type
/dev/ad0s1b 49712 45920 3792 92% Interleaved

After some investigation, I’ve figured out that the MySQL server is causing the kernel exceeded swap troubles.

My current MySQL server version is installed from the ports tree, whether I’m using the bsd port /usr/ports/databases/mysql51-server/ and it appears to work just fine.

However I have noticed that the mysql-server is missing a my.cnf file!, which means the mysql server is running under a mode with some kind of default configurations.

Strangely in the system process list it appeared it is using a default my.cnf file located in /var/db/mysql/my.cnf

Below you see the paste from the ps command:

ps axuww freebsd# ps axuww | grep -i my.cnf | grep -v grep
mysql 7557 0.0 0.1 3464 1268 p1 I 12:03PM 0:00.01 /bin/sh /usr/local/bin/mysqld_safe --defaults-extra-file=/var/db/mysql/my.cnf --user=mysql --datadir=/var/db/mysql --pid-file=/var/db/mysql/pcfreak.pidmysql 7589 0.0 5.1 93284 52852 p1 I 12:03PM 0:59.01 /usr/local/libexec/mysqld --defaults-extra-file=/var/db/mysql/my.cnf --basedir=/usr/local --datadir=/var/db/mysql --user=mysql --pid-file=/var/db/mysql/pcfreak.pid --port=3306 --socket=/tmp/mysql.sock

Nevertheless it appeared the sql server is running the file /var/db/mysql/my.cnf conf was not existing! This was really weird for me as I’m used to have the default my.cnf from my previous experience with Linux servers!

Thus the next logical thing I did was to create my.cnf conf file in order to be able to have a proper limiting configuration for the sql server.

The FreeBSD my.cnf skele files are found in /usr/local/share/mysql/, here are the 4 files one can use as a starting basis for further configuration of the mysql-server.

freebsd# ls -al /usr/local/share/mysql/my-*.cnf
-r--r--r-- 1 root wheel 4948 Aug 12 2009 /usr/local/share/mysql/my-huge.cnf
-r--r--r-- 1 root wheel 20949 Aug 12 2009 /usr/local/share/mysql/my-innodb-heavy-4G.cnf
-r--r--r-- 1 root wheel 4924 Aug 12 2009 /usr/local/share/mysql/my-large.cnf
-r--r--r-- 1 root wheel 4931 Aug 12 2009 /usr/local/share/mysql/my-medium.cnf
-r--r--r-- 1 root wheel 2502 Aug 12 2009 /usr/local/share/mysql/my-small.cnf

I have chosen to use the my-medium.cnf as a skele to tune up, as my server is not high iron one e.g. the host I run the mysql is a (simple dual core 1.2Ghz system).

Further on I copied the /usr/local/share/mysql/my-medium.cnf to /var/db/mysql/my.cnf e.g.:

freebsd# cp -rpf /usr/local/share/mysql/my-medium.cnf /var/db/mysql/my.cnf

As a next step to properly tune up the default values of the newly copied my.cnf to my specific server I used the Tuning-Primer MySQL tuning script

Using tuning-primer.sh is really easy as all I did is download, launch it and follow the script suggestions to correct some of the values already in my.cnf

I have finally ended up with the following my.cnf after using tuning-primer.sh to optimize mysql server to work with my bsd host

Now I really hope the shitty swap_pager_getswapspace: failed errors would not haunt me once again by crashing my server and causing mem overheads.

Still I wonder why the port developer Alex Dupre – ale@FreeBSD.org choose not to provide the default mysql51-server conf with some kind of my.cnf file? I hope he had a good reason.

HasciiCAM supposed to stream ASCII video over the network on GNU / Linux

Tuesday, May 22nd, 2012

Richard M. Stallman (RMS) Face portrait rendered in ASCII art from a video with hasciicam
To continue with my lately ASCII centered articles I found hasciicam
hasciicam is a program to stream ASCII video over the network on Linux and probably can be easily made working on FreeBSDtoo.

The project concept is interesting in a matter of fun (play) point of view, however not too usable as we all know ASCII character looking faces doesn't look too pretty.

Below is the Debian (Squeeze) package description:

noah:~# apt-cache show hasciicam|grep -i description -A 7
Description: (h)ascii for the masses: live video as text
Hasciicam makes it possible to have live ASCII video on the web. It
captures video from a tv card and renders it into ascii, formatting the
output into an html page with a refresh tag or in a live ASCII window or
in a simple text file as well, giving the possibility to anybody that has a
bttv card, a Linux box and a cheap modem line to show a live ASCII video
feed that can be browsable without any need for plugin, java etc.
Homepage: http://ascii.dyne.org/

On hasciicam Project webpage is it is stated as a hardware you need to have:
 

"As hardware you need to have a webcam or a videocard supported by "video 4 linux", most of the gear you can buy around should work well."

To install and test it I run:

noah:~# apt-get --yes install hasciicam

Though it is stated on the project website supposed to work display video fine with most 'linux ready' webcams, it didn't with this very standard one.

Here is the exact WebCamera model as identified to the kernel:

noah:~# dmesg|grep -i camera
[ 1.433661] usb 2-2: Product: USB2.0 Camera
[ 10.107840] uvcvideo: Found UVC 1.00 device USB2.0 Camera (1e4e:0102)
[ 10.110660] input: USB2.0 Camera as /devices/pci0000:00/0000:00:1d.7/usb2/2-2/2-2:1.0/input/input11

By the way, I use the very same CAM daily on for Skype video calls as well as the Camera is working with no problems to save video or pictures inside Cheese

Here is the exact WebCamera model as identified to the kernel:

noah:~# dmesg|grep -i camera
[ 1.433661] usb 2-2: Product: USB2.0 Camera
[ 10.107840] uvcvideo: Found UVC 1.00 device USB2.0 Camera (1e4e:0102)
[ 10.110660] input: USB2.0 Camera as /devices/pci0000:00/0000:00:1d.7/usb2/2-2/2-2:1.0/input/input11

The just installed deb has one binary file only /usr/bin/hasciicam. To test it with the camera I issued:

noah:~# hasciicam -d /dev/video0
HasciiCam 1.0 - (h)ascii 4 the masses! - http://ascii.dyne.org
(c)2000-2006 Denis Roio < jaromil @ dyne.org >
watch out for the (h)ASCII ROOTS

Device detected is /dev/video0
USB2.0 Camera
1 channels detected
max size w[640] h[480] - min size w[48] h[32]
Video capabilities:
VID_TYPE_CAPTURE can capture to memory
!! error in ioctl VIDIOCGMBUF: : Invalid argument

Unfortunately as you see from the output, it failed to detect the web camera model.
The exact camera besides its kernel detection naminf is a cheap external USB 2.0 (fake brand / nonanem) "universal" Web PC Camera (SUPER .3mega pixel)

For those who have a further interest in building and installing hasciicam on other Linux platforms than Debian and Ubuntu or whoever wants to look in the code check check Project webpage is. For those who are less of programmers (like me) the project is written in C programming language and uses aa-lib in order to render the video to ASCII.

On the site you will notice two totally schizophrenic looking pictures of presumably the project head developer …

hasciiart video streamed ASCII screenshot of some crazy looking guy smoking marijuanna or smth

As I read in man hasciicam manual page it's said to be able to generate ascii plain text and html files as well as directly to write the output to console, which later probably can be streamed via the network.
Pitily as it didn't detect my camera I couldn't make some testing of its network capabilities.

A Streaming of ASCII couuld be done through pushing the .html output to a webserver and setting a php or javascript to loop through and refresh the browser over the uploaded files every sec or so.

Also I assume the ASCII video output saved in plain console could be streamed via netcat or some tiny scripted perl or bash script and directly observed via a telnet or ssh connection.
One playful way I can think of checking a stored video without the use of FTP is to login via ssh and do:

$ ssh someuser@somehost
$ watch -n 1 "cat video-ascii.html"

🙂

Well something disturbing about hasciicam from a (purely Christian point of view) is it was developed by some kind of non profit organization called RastaSoft on the project website, some of its authors has written JAH BLESS.

As I didn't succeeded seeing it working, I'll be interested to hear if someone who red this article and give it a try can report the web camera model used.

How to test if USB Camera is working with Cheese on GNU / Linux

Friday, December 23rd, 2011

I just bought an USB Camera (my notebook doesn't include an embedded camera). The camera is some infamous brand chineese name Eilondo
and on the camera all that is written is SUPER USB2.0 1.3 mega pixel

I bought exactly this camera because I was said by the shop reseller that the camera works without any driver installations on Windows XP and Windows Vista

On my Debian Squeeze GNU / Linux it was detected in dmesg without any troubles, here is how the camera got detected in my kernel log :

debian:~# dmesg |tail -n 10
[25385.734932] usb 2-1: USB disconnect, address 4
[25388.905049] usb 2-1: new high speed USB device using ehci_hcd and address 5
[25389.050753] usb 2-1: New USB device found, idVendor=1e4e, idProduct=0102
[25389.050757] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[25389.050760] usb 2-1: Product: USB2.0 Camera
[25389.050762] usb 2-1: Manufacturer: Etron Technology, Inc.
[25389.050936] usb 2-1: configuration #1 chosen from 1 choice
[25389.056056] uvcvideo: Found UVC 1.00 device USB2.0 Camera (1e4e:0102)
[25389.058242] uvcvideo: UVC non compliance - GET_DEF(PROBE) not supported. Enabling workaround.
[25389.059113] input: USB2.0 Camera as /devices/pci0000:00/0000:00:1d.7/usb2/2-1/2-1:1.0/input/input26

I was troubled by the message uvcvideo: UVC non compliance – GET_DEF(PROBE) not supported. Enabling workaround. , and hence looked for an application to test if the camera can recored properly.

While checking in packages available in Software Center , I found a plenty of programs under the search keyword Camera
I however decided to test it using just one application Cheese A tool to take pictures and videos from your webcam which I've seen to be quite popular among Liunx users.
Cheese is part of GNOME Desktop, so that was another reason I decided to give it a try. I was pleasently surprised about how good these tiny but functional proggie is.

To run Cheese in GNOME I nagivated to the menus:

Applications -> Sound & Video -> Cheese Webcam Booth

Just in case if Cheese is not installed, installing it with apt:

debian:~# apt-get install cheese

Cheese has capabilities to take pictures, a consequential photos take up, as well as create Video movies.

Cheese take camera testing tool Debian GNU Linux

The program has support to apply 12 Effects / (Masks) to add some fun to the pictures or video snapshots.

Test Video Camera on Debian Linux Cheese Effects

Probably the best thing about Cheese is its simplistic interface, which for me personally is a main criterion to evaluate a program quality ;).

How to solve “IPv6 addrconf: prefix with wrong length 48”

Friday, December 9th, 2011

While reading some log files on one of the co-located servers at UK2.net , I’ve noticed dmesg log was filling in with tons of junk messages like:

[4288245.609762] IPv6 addrconf: prefix with wrong length 48
[4288445.984153] IPv6 addrconf: prefix with wrong length 48
[4288646.296110] IPv6 addrconf: prefix with wrong length 48
[4288846.609119] IPv6 addrconf: prefix with wrong length 48
[4289046.922604] IPv6 addrconf: prefix with wrong length 48
[4289247.267273] IPv6 addrconf: prefix with wrong length 48
[4289447.545800] IPv6 addrconf: prefix with wrong length 48
[4289647.857789] IPv6 addrconf: prefix with wrong length 48
[4289848.169308] IPv6 addrconf: prefix with wrong length 48
[4290048.595104] IPv6 addrconf: prefix with wrong length 48
[4290248.808497] IPv6 addrconf: prefix with wrong length 48
[4290449.103503] IPv6 addrconf: prefix with wrong length 48
[4290649.418747] IPv6 addrconf: prefix with wrong length 48
[4290849.742731] IPv6 addrconf: prefix with wrong length 48

After checking the message to make sure it would not suddeny lead to server hang ups I figured out the message is not dangerous but just an annoying warning that some other (routing) host on the same network as mine is advertising something using IPv6, that doesn’t fit with my IPv6 server config.
Actually the server doesn’t use the IPv6 configuration at all, and the assigned configuration is just some kind of auto set IPv6 IP address.
The server, where this message appeared is powered by 64 bit Debian GNU / Linux Squeeze

To resolve the annoying message, 5 of the kernel sysctl settings needs to be modified with cmds:

debian:~# sysctl net.ipv6.conf.all.accept_ra=0
debian:~# sysctl net.ipv6.conf.all.autoconf=0
debian:~# sysctl net.ipv6.conf.lo.autoconf=0
debian:~# sysctl net.ipv6.conf.eth0.autoconf=0
debian:~# sysctl net.ipv6.conf.eth1.autoconf=0

Furthermore to prevent the IPv6 addrconf: prefix with wrong length 48 to re-appear after future server reboots / boots the two sysctl values of course needs to be included in /etc/sysctl.conf e.g.:

debian:~# echo 'net.ipv6.conf.all.accept_ra = 0' >> /etc/sysctl.conf
debian:~# echo 'net.ipv6.conf.all.autoconf = 0' >> /etc/sysctl.conf
echo 'net.ipv6.conf.lo.autoconf = 0' >> /etc/sysctl.conf
echo 'net.ipv6.conf.eth0.autoconf = 0' >> /etc/sysctl.conf
echo 'net.ipv6.conf.eth1.autoconf = 0' >> /etc/sysctl.conf

My server has 2 etherhet interfaces, eth0 and eth1 that’s the reason I had to set up autoconf kernel the two vars net.ipv6.conf.eth0.autoconf and net.ipv6.conf.eth1.autoconf , for more interfaces more kernel vars (eth2, eth3) etc. needs to be set to “0”

I’ve seen posts online of people complaining about a similar errors to IPv6 addrconf: prefix with wrong length 48, like:

IPv6 addrconf: prefix with wrong length 96
IPv6 addrconf: prefix with wrong length 128

The solution to this messages is also done by setting the above described sysctl kernel vars. Setting the vars will suppress the messages which by the way with time could take up A LOT of disk space and fills /var/log/dmesg with this useless message, hence applying the “fix” is a must 😉

Another thing, I’ve noticed while I was researching about the error and the respective fix is that people on other deb based distributions like Ubuntu as well as on Fedora GNU / Linux had also experienced the issue.

How to auto restart CentOS Linux server with software watchdog (softdog) to reduce server downtime

Wednesday, August 10th, 2011

How to auto restart centos with software watchdog daemon to mitigate server downtimes, watchdog linux artistic logo

I’m in charge of dozen of Linux servers these days and therefore am required to restart many of the servers with a support ticket (because many of the Data Centers where the servers are co-located does not have a web interface or IPKVM connected to the server for that purpose). Therefore the server restart requests in case of crash sometimes gets processed in few hours or in best case in at least half an hour.

I’m aware of the existence of Hardware Watchdog devices, which are capable to detect if a server is hanged and auto-restart it, however the servers I administrate does not have Hardware support for Watchdog timer.

Thanksfully there is a free software project called Watchdog which is easily configured and mitigates the terrible downtimes caused every now and then by a server crash and respective delays by tech support in Data Centers.

I’ve recently blogged on the topic of Debian Linux auto-restart in case of kernel panic , however now i had to conifgure watchdog on some dozen of CentOS Linux servers.

It appeared installation & configuration of Watchdog on CentOS is a piece of cake and comes to simply following few easy steps, which I’ll explain quickly in this post:

1. Install with yum watchdog to CentOS

[root@centos:/etc/init.d ]# yum install watchdog
...

2. Add to configuration a log file to log watchdog activities and location of the watchdog device

The quickest way to add this two is to use echo to append it in /etc/watchdog.conf:

[root@centos:/etc/init.d ]# echo 'file = /var/log/messages' >> /etc/watchdog.conf
echo 'watchdog-device = /dev/watchdog' >> /etc/watchdog.conf

3. Load the softdog kernel module to initialize the software watchdog via /dev/watchdog

[root@centos:/etc/init.d ]# /sbin/modprobe softdog

Initialization of softdog should be indicated by a line in dmesg kernel log like the one above:

[root@centos:/etc/init.d ]# dmesg |grep -i watchdog
Software Watchdog Timer: 0.07 initialized. soft_noboot=0 soft_margin=60 sec (nowayout= 0)

4. Include the softdog kernel module to load on CentOS boot up

This is necessery, because otherwise after reboot the softdog would not be auto initialized and without it being initialized, the watchdog daemon service could not function as it does automatically auto reboots the server if the /dev/watchdog disappears.

It’s better that the softdog module is not loaded via /etc/rc.local but the default CentOS methodology to load module from /etc/rc.module is used:

[root@centos:/etc/init.d ]# echo modprobe softdog >> /etc/rc.modules
[root@centos:/etc/init.d ]# chmod +x /etc/rc.modules

5. Start the watchdog daemon service

The succesful intialization of softdog in step 4, should have provided the system with /dev/watchdog, before proceeding with starting up the watchdog daemon it’s wise to first check if /dev/watchdog is existent on the system. Here is how:

[root@centos:/etc/init.d ]# ls -al /dev/watchdogcrw------- 1 root root 10, 130 Aug 10 14:03 /dev/watchdog

Being sure, that /dev/watchdog is there, I’ll start the watchdog service.

[root@centos:/etc/init.d ]# service watchdog restart
...

Very important note to make here is that you should never ever configure watchdog service to run on boot time with chkconfig. In other words the status from chkconfig for watchdog boot on all levels should be off like so:

[root@centos:/etc/init.d ]# chkconfig --list |grep -i watchdog
watchdog 0:off 1:off 2:off 3:off 4:off 5:off 6:off

Enabling the watchdog from the chkconfig will cause watchdog to automatically restart the system as it will probably start the watchdog daemon before the softdog module is initialized. As watchdog will be unable to read the /dev/watchdog it will though the system has hanged even though the system might be in a boot process. Therefore it will end up in an endless loops of reboots which can only be fixed in a linux single user mode!!! Once again BEWARE, never ever activate watchdog via chkconfig!

Next step to be absolutely sure that watchdog device is running it can be checked with normal ps command:

[root@centos:/etc/init.d ]# ps aux|grep -i watchdog
root@hosting1-fr [~]# ps axu|grep -i watch|grep -v greproot 18692 0.0 0.0 1816 1812 ? SNLs 14:03 0:00 /usr/sbin/watchdog
root 25225 0.0 0.0 0 0 ? ZN 17:25 0:00 [watchdog] <defunct>

You have probably noticed the defunct state of watchdog, consider that as absolutely normal, above output indicates that now watchdog is properly running on the host and waiting to auto reboot in case of sudden /dev/watchdog disappearance.

As a last step before, after being sure its initialized properly, it’s necessery to add watchdog to run on boot time via /etc/rc.local post init script, like so:

[root@centos:/etc/init.d ]# echo 'echo /sbin/service watchdog start' >> /etc/rc.local

Now enjoy, watchdog is up and running and will automatically restart the CentOS host 😉

How to reboot remotely Linux server if reboot, shutdown and init commands are not working (/sbin/reboot: Input/output error) – Reboot Linux in emergency using MagicSysRQ kernel sysctl variable

Saturday, July 23rd, 2011

SysRQ an alternative way to restart unrestartable Linux server

I’ve been in a situation today, where one Linux server’s hard drive SCSI driver or the physical drive is starting to break off where in dmesg kernel log, I can see a lot of errors like:

[178071.998440] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[178071.998440] end_request: I/O error, dev sda, sector 89615868

I tried a number of things to remount the hdd which was throwing out errors in read only mode, but almost all commands I typed on the server were either shown as missng or returning an error:
Input/output error

Just ot give you an idea what I mean, here is a paste from the shell:

linux-server:/# vim /etc/fstab
-bash: vim: command not found
linux-server:/# vi /etc/fstab
-bash: vi: command not found
linux-server:/# mcedit /etc/fstab
-bash: /usr/bin/mcedit: Input/output error
linux-server:/# fdisk -l
-bash: /sbin/fdisk: Input/output error

After I’ve tried all kind of things to try to diagnose the server and all seemed failing, I thought next a reboot might help as on server boot the filesystems will get checked with fsck and fsck might be able to fix (at least temporary) the mess.

I went on and tried to restart the system, and guess what? I got:

/sbin/reboot init Input/output error

I hoped that at least /sbin/shutdown or /sbin/init commands might work out and since I couldn’t use the reboot command I tried this two as well just to get once again:

linux-server:/# shutdown -r now
bash: /sbin/shutdown: Input/output error
linux-server:/# init 6
bash: /sbin/init: Input/output error

You see now the situation was not pinky, it seemed there was no way to reboot the system …
Moreover the server is located in remote Data Center and I the tech support there is conducting assigned task with the speed of a turtle.
The server had no remote reboot, web front end or anything and thefore I needed desperately a way to be able to restart the machine.

A bit of research on the issue has led me to other people who experienced the /sbin/reboot init Input/output error error mostly caused by servers with failing hard drives as well as due to HDD control driver bugs in the Linux kernel.

As I was looking for another alternative way to reboot my Linux machine in hope this would help. I came across a blog post Rebooting the Magic Wayhttp://www.linuxjournal.com/content/rebooting-magic-way

As it was suggested in Cory’s blog a nice alternative way to restart a Linux machine without using reboot, shutdown or init cmds is through a reboot with the Magic SysRQ key combination

The only condition for the Magic SysRQ key to work is to have enabled the SysRQ – CONFIG_MAGIC_SYSRQ in Kernel compile time.
As of today luckily SysRQ Magic key is compiled and enabled by default in almost all modern day Linux distributions in this numbers Debian, Fedora and their derivative distributions.

To use the sysrq kernel capabilities as a mean to restart the server, it’s necessery first to activate the sysrq through sysctl, like so:

linux-server:~# sysctl -w kernel.sysrq=1
kernel.sysrq = 1

I found enabling the kernel.sysrq = 1 permanently in the kernel is also quite a good idea, to achieve that I used:

echo 'kernel.sysrq = 1' >> /etc/sysctl.conf

Next it’s wise to use the sync command to sync any opened files on the server as well stopping as much of the server active running services (MySQL, Apache etc.).

linux-server:~# sync

Now to reboot the Linux server, I used the /proc Linux virtual filesystem by issuing:

linux-server:~# echo b > /proc/sysrq-trigger

Using the echo b > /proc/sysrq-trigger simulates a keyboard key press which does invoke the Magic SysRQ kernel capabilities and hence instructs the kernel to immediately reboot the system.
However one should be careful with using the sysrq-trigger because it’s not a complete substitute for /sbin/reboot or /sbin/shutdown -r commands.
One major difference between the standard way to reboot via /sbin/reboot is that reboot kills all the running processes on the Linux machine and attempts to unmount all filesystems, before it proceeds to sending the kernel reboot instruction.

Using echo b > /proc/sysrq-trigger, however neither tries to umount mounted filesystems nor tries to kill all processes and sync the filesystem, so on a heavy loaded (SQL data critical) server, its use might create enormous problems and lead to severe data loss!

SO BEWARE be sure you know what you’re doing before you proceed using /proc/sysrq-trigger as a way to reboot ;).