Posts Tagged ‘grub’

Linux extending life time for a damaged hard drive server tricks on a live server. Force fcsk on next reboot.Read-only file system error solutions

Friday, February 17th, 2023


In our daily work as system administrators we have some very old Legacy systems running Clustered High Availability proxies using CRM (Cluster Resource Manager) and some legacy systems still using Heartbeat to manage the cluster instead of the newer and modern Corosync variant.

The HA cluster is only 2 nodes Linux machine and running the obscure already long time unsupported version of Redhat 5.11 (Ootpa) who was officially became stable distant year 1998 (yeath the years were good) and whose EOL (End of Life) has been reached long time ago and the OS is no longer supported, however for about 14 years the machines has been running perfectly fine until one of the Cluster nodes managed by ocf::heartbeat:IPAddr2 , that is  /etc/ha.d/resource.d/IPAddr2 shell script. Yeah for the newbies Heartbeat Application Cluster in Linux does work like that it uses a number of extendable pair of shell scripts written for different kind of Network / Web / Mail / SQL or whatever services HA management.

The first node configured however, started failing due to some errors like:

EXT3-fs error (device dm-1): ext3_journal_start_sb: Detected aborted journal
sd 0:2:0:0: rejecting I/O to offline device
Aborting journal on device sda1.
sd 0:2:0:0: rejecting I/O to offline device
printk: 159 messages suppressed.
Buffer I/O error on device sda1, logical block 526
lost page write due to I/O error on sda1
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
ext3_abort called.
EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
megaraid_sas: FW was restarted successfully, initiating next stage…
megaraid_sas: HBA recovery state machine, state 2 starting…
megasas: Waiting for FW to come to ready state
megasas: FW in FAULT state!!
FW state [-268435456] hasn't changed in 180 secs
megaraid_sas: out: controller is not in ready state
megasas: waiting_for_outstanding: after issue OCR. 
megasas: waiting_for_outstanding: before issue OCR. FW state = f0000000
megaraid_sas: pending commands remain even after reset handling. megasas[0]: Dumping Frame Phys Address of all pending cmds in FW
megasas[0]: Total OS Pending cmds : 0 megasas[0]: 64 bit SGLs were sent to FW
megasas[0]: Pending OS cmds in FW :

The result out of that was a frequently the filesystem of the machine got re-mounted as Read Only and of course that is
quite bad if you have a running processess of haproxy that should be able to be living their and take up some Web traffic
for high availability and you run all the traffic only on the 2nd pair of machine.

This of course was a clear sign for a failing disks or some hit bad blocks regions or as the messages indicates, some
problem with system hardware or Raid SAS Array.

The physical raid on the system, just like rest of the hardware is very old stuff as well.

[root@haproxy_lb_node1 ~]# lspci |grep -i RAI
01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

The produced errors not only made the machine to auto-mount its root / filesystem in Read-Only mode but besides has most
likely made the machine to automatically reboot every few days or few times every day in a raw.

The second Load Balancer node2 did operated perfectly, and we thought that we might just keep the broken machine in that half running
and inconsistent state for few weeks until we have built the new machines with Pre-Installed new haproxy cluster with modern
RedHat Linux 8.6 distribution, but since we have to follow SLAs (Service Line Agreements) with Customers and the end services behind the
High Availability (HA) Haproxy cluster were at danger … 

We as sysadmins had the task to make our best to try to stabilize the unstable node with disk errors for the system to servive
and be able to normally serve traffic (if node2 that is in a separate Data center fails due to a hardware or electricity issues etc.)

Here is few steps we took, that has hopefully improved the situation.

1. Make backups of most important files of high importance

Always before doing anything with a broken system, prepare backup of the most important files, if that is a cluster that should be a backup of the cluster configurations (if you don't have already ones) backup of /etc/hosts / backup of any important services configs /etc/haproxy/haproxy.cfg /etc/postfix/postfix.cfg (like it was my case), preferrably backup of whole /etc/  any important files from /root/ or /home/users* directories backup of at leasts latest logs from /var/log etc.

2. Clear up all unnecessery services scripts from the server

Any additional Softwares / Services and integrity checking tools (daemons) / scripts and cron jobs, were immediately stopped and wheter unused removed.

E.g. we had moved through /etc/cron* to check what's there,

# ls -ld /etc/cron.*
drwx—— 2 root root 4096 Feb  7 18:13 /etc/cron.d
drwxr-xr-x 2 root root 4096 Feb  7 17:59 /etc/cron.daily
-rw-r–r– 1 root root    0 Jul 20  2010 /etc/cron.deny
drwxr-xr-x 2 root root 4096 Jan  9  2013 /etc/cron.hourly
drwxr-xr-x 2 root root 4096 Jan  9  2013 /etc/cron.monthly
drwxr-xr-x 2 root root 4096 Aug 26  2015 /etc/cron.weekly


And like well professional butchers removed everything unnecessery that could trigger any extra unnecessery disk read / writes to HDD.

E.g. just create

# mkdir -p /root/etc_old/{/etc/cron.d,\


And moved all unnecessery cron job scripts like:

1. nmon (old school network / memory / hard disk console tool for monitoring and tuning server parameters)
2. clamscan / freshclam crons
3. mlocate (the script that is taking care for periodic run of updatedb command to keep the locate command to easily search
for files inside the DB to put less read operations on disk in case if you need to find file (e.g. prevent yourself to everytime
run cmd like: find / . -iname '*whatever_you_look_for*'
4. cups cron jobs
5. logwatch cron
6. rkhunter stuff
7. logrotate (yes we stopped even logrotation trigger job as we found the server was crashing sometimes at the same time when
the lograte job to rotate logs inside /var/log/* was running perhaps leading to a hit of the I/O read error (bad blocks).

Also inspected the Administrator user root cron job for any unwated scripts and stopped two report bash scripts that were part of the PCI tightened Security procedures.
Therein found script responsible to periodically report the list of installed packages and if they have not changed, as well a script to periodically report via email the list of
/etc/{passwd,/etc/shadow} created users, used to historically keep an eye on the list of users and easily see if someone
has created new users on the machine. Those were enabled via /var/spool/cron/root cron jobs, in other cases, on other machines if it happens for you
it is a good idea to check out all the existing user cron jobs and stop anything that might be putting Read / Write extra heat pressure on machine attached the Hard drives.

# ls -al /var/spool/cron/
total 20
drwx——  2 root root 4096 Nov 13  2015 .
drwxr-xr-x 12 root root 4096 May 11  2011 ..
-rw——-  1 root root  133 Nov 13  2015 root

3. Clear up old log files and any files unnecessery

Under /var/log and /home /var/tmp /var/spool/tmp immediately try to clear up the old log files.
From my past experience this has many times made the FS file inodes that are storing on a unbroken part (good blocks) of the hard drive and
ready to be reused by newly written rsyslog / syslogd services spitted files.

!!! Note that during the removal of some files you might hit a files stored on a bad blocks that might lead to a unexpected system reboot.

But that's okay, don't worry most likely after a hard reset by a technician in the Datacenter the machine will boot again and you can enjoy
removing remaining still files to send them to the heaven for old files.


4. Trigger an automatic system file system check with fsck on next boot

The standard way to force a Linux to aumatically recheck its Root filesystem is to simply create the /forcefsck to root partition or any other secondary disk partition you would like to check.

# touch /forcefsck

# reboot

However at some occasions you might be unable to do it because, the / (root fs) has been remounted in ReadOnly mode, yackes …

Luckily old Linux distibutions like this RHEL 5.1, has a way to force a filesystem check after reboot fsck and identify any
unknown bad-blocks and hopefully succceed in isolating them, so you don't hit into the same auto-reboots if the hard drive or Software / Hardware RAID
is not in terrible state
, you can use an option built in in /sbin/shutdown command the '-F'

   -F     Force fsck on reboot.

Hence to make the machine reboot and trigger immediately fsck:

# shutdown -rF now

Just In case you wonder why to reboot before check the Filesystem. Well simply because you need to have them unmounted before you check.

In that specific case this produced so far a good result and the machine booted just fine and we crossed the fingers and prayed that the machine would work flawlessly in the coming few weeks, before we finalize the configuration of the substitute machines, where this old infrastructure will be migrated to a new built cluster with new Haproxy and Corosync / Pacemaker Cluster on a brand new RHEL.

NB! On newer machines this won't work however as shutdown command has been stripped off this option because no SystemV (SystemInit) or Upstart and not on SystemD newer services architecture.

5. Hints on checking the hard drives with fsck

If you happen to be able to have physical access to the remote Hardare machine via a TTY[1-9] Console, that's even better and is the standard way to do it but with this specific case we had no easy way to get access to the Physical server console.

It is even better to go there and via either via connected Monitor (Display) or KVM Switch (Those who hear KVM switch first time this is a great device in server rooms to connect multiple monitors to same Monitor Display), it is better to use a some of the multitude of options to choose from for USB Distro Linux recovery OS versions or a CDROM / DVD on older machines like this with the Redhat's recovery mode rolled on.
After mounting the partition simply check each of the disks
e.g. :

# fsck -y /dev/sdb
# fsck -y /dev/sdc

Or if you want to not waste time and look for each hard drive but directly check all the ones that are attached and known by Linux distro via /etc/fstab definition run:

# fsck -AR

If necessery and you have a mixture of filesystems for example EXT3 , EXT4 , REISERFS you can tell it to omit some filesystem, for example ext3, like that:

# fsck -AR -t noext3 -y

To skip fsck on mounted partitions with fsck:

# fsck -M /dev/sdb

One remark to make here on fsck is usually fsck to complete its job on various filesystem it uses other external component binaries usually stored in /sbin/fsck*

ls -al /sbin/fsck*
-rwxr-xr-x 1 root root  55576 20 яну 2022 /sbin/fsck*
-rwxr-xr-x 1 root root  43272 20 яну 2022 /sbin/fsck.cramfs*
lrwxrwxrwx 1 root root      9  4 юли 2020 /sbin/fsck.exfat -> exfatfsck*
lrwxrwxrwx 1 root root      6  7 юни 2021 /sbin/fsck.ext2 -> e2fsck*
lrwxrwxrwx 1 root root      6  7 юни 2021 /sbin/fsck.ext3 -> e2fsck*
lrwxrwxrwx 1 root root      6  7 юни 2021 /sbin/fsck.ext4 -> e2fsck*
-rwxr-xr-x 1 root root  84208  8 фев 2021 /sbin/fsck.fat*
-rwxr-xr-x 2 root root 393040 30 ное 2009 /sbin/fsck.jfs*
-rwxr-xr-x 1 root root 125184 20 яну 2022 /sbin/fsck.minix*
lrwxrwxrwx 1 root root      8  8 фев 2021 /sbin/fsck.msdos -> fsck.fat*
-rwxr-xr-x 1 root root    333 16 дек 2021 /sbin/fsck.nfs*
lrwxrwxrwx 1 root root      8  8 фев 2021 /sbin/fsck.vfat -> fsck.fat*

6. Using tune2fs to  adjust tunable filesystem parameters on ext2/ext3/ext4 filesystems (few examples)

a) To check whether really the filesystem was checked on boot time or check a random filesystem on the server for its last check up date with fsck:

#  tune2fs -l /dev/sda1 | grep checked
Last checked:             Wed Apr 17 11:04:44 2019

On some distributions like old Debian and Ubuntu, it is even possible to enable fsck to log its operations during check on reboot via changing the verbosity from NO to YES:

# sed -i "s/#VERBOSE=no/VERBOSE=yes/" /etc/default/rcS

If you're having the issues on old Debian Linuxes  and not on RHEL  it is possible to;

b) Enable all fsck repairs automatic on boot

by running via:

# sed -i "s/FSCKFIX=no/FSCKFIX=yes/" /etc/default/rcS

c) Forcing fcsk check on for server attached Hard Drive Partitions with tune2fs

# tune2fs -c 1 /dev/sdXY

Note that:
tune2fs can force a fsck on each reboot for EXT4, EXT3 and EXT2 filesystems only.

tune2fs can trigger a forced fsck on every reboot using the -c (max-mount-counts) option.
This option sets the number of mounts after which the filesystem will be checked, so setting it to 1 will run fsck each time the computer boots.
Setting it to -1 or 0 resets this (the number of times the filesystem is mounted will be disregarded by e2fsck and the kernel).

 For example you could:

d) Set fsck to run a filesystem check every 30 boots, by using -c 30 

# tune2fs -c 30 /dev/sdXY

e) Checking whether a Hard Drive has been really checked on the boot


#  tune2fs -l /dev/sda1 | grep checked
Last checked:             Wed Apr 17 11:04:44 2019

e) Check when was the last time the file system /dev/sdX was checked:

# tune2fs -l /dev/sdX | grep Last\ c
Last checked:             Thu Jan 12 20:28:34 2017

f) Check how many times our /dev/sdX filesystem was mounted

# tune2fs -l /dev/sdX | grep Mount
Mount count:              157

g) Check how many mounts are allowed to pass before filesystem check is forced

# tune2fs -l /dev/sdX | grep Max
Maximum mount count:      -1

7. Repairing disk / partitions via GRUB fsck.mode and kernel module options

It is also possible to force a on boot via GRUB, but that usually is not an option someone would like as the machine might fail too boot if it hards to repair hardly, however in difficult situations with failing disks temporary enabling it is good idea.

This can be done by including for grub initial config

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash fsck.mode=force"

fsck.mode=force – will force a fsck each time a system boot and keeping that value enabled for a long time inside GRUB is stupid for servers as

sometimes booting could be severely prolonged because of the checks especially with servers with many or slow old hard drives. – will make the fsck try to repair if it finds bad blocks when checking (be absolutely sure you know, what you're doing if passing this options)

The options can be also set via editing the GRUB boot screen, if you have physical access to the server and don't want to reload the grub loader and possibly make the machine unbootable on next boot.

8. Few more details on how /etc/fstab disk fsck check parameters values for Systemd Linux machines works

The "proper" way on systemd (if we can talk about proper way on Linux) to runs fsck for each filesystem that has a fsck is to pass number greater than 0 set in
/etc/fstab (last column in /etc/fstab), so make sure you edit your /etc/fstab if that's not the case.

The root partition should be set to 1 (first to be checked), while other partitions you want to be checked should be set to 2.

Example /etc/fstab:

# /etc/fstab: static file system information.

/dev/sda1  /      ext4  errors=remount-ro  0  1
/dev/sda5  /home  ext4  defaults           0  2

The values you can put here as a second number meaning is as follows:
0 – disabled, that is do not check filesystem
1 – partition with this PASS value has a higher priority and is checked first. This value is usually set to the root / partition
2 – partitions with this PASS value will be checked last

a) Check the produced log out of fsck

Unfortunately on the older versions of Linux distros with SystemV fsck log output might be not generated except on the physical console so if you have a kind of duplicator device physical tty on the display port of the server, you might capture some bad block reports or fixed errors messages, but if you don't you might just cross the fingers and hope that anything found FS irregularities was recovered.

On systemd Linux machines the fsck log should be produced either in /run/initramfs/fsck.log or some other location depending on the Linux distro and you should be able to see something from fsck inside /var/log/* logs:

# grep -rli fsck /var/log/*

Close it up

Having a system with failing disk is a really one of the worst sysadmin nightmares to get. The good news is that most of the cases we're prepared with some working backup or some work around stuff like the few steps explained to mitigate the amount of Read / Writes to hard disks on the failing machine HDDs. If the failing disk is a primary Linux filesystem all becomes even worse as every next reboot, you have no guarantee, whether the kernel / initrd or some of the other system components required to run the Core Linux system won't break up the normal boot. Thus one side changes on the hard drives is a risky business on ther other side, if you're in a situation where you have a mirror system or the failing system is just a Linux server installed without a Cluster pair, then this is not a big deal as you can guarantee at least one of the nodes still up, unning and serving. Still doing too much of operations with HDD is always a danger so the steps described, though in most cases leading to improvement on how the system behaves, the system should be considered totally unreliable and closely monitored not only by some monitoring stuff like Zabbix / Prometheus whatever but regularly check the systems state via normal SSH logins. It is important if you have some important datas or logs on the system that are not synchronized to a system node to copy them before doing any of the described operations. After all minimal is backuped, proceed to clear up everything that might be cleared up and still the machine to continue providing most of its functionalities, trigger fsck automatic HDD check on next reboot, reboot, check what is going on and monitor the machine from there on.

Hopefully the few described steps, has helped some sysadmin. There is plenty of things which I've described that might go wrong, even following the described steps, might not help if the machines Storage Drives / SAS / SSD has too much of a damage. But as said in most cases following this few steps would improve the machine state.

Wish you the best of luck!


How to test RAM Memory for errors in Linux / UNIX OS servers. Find broken memory RAM banks

Friday, December 3rd, 2021



1. Testing the memory with motherboard integrated tools

Memory testing has been integral part of Computers for the last 50 years. In the dawn of computers those older perhaps remember memory testing was part of the computer initialization boot. And this memory testing was delaying the boot with some seconds and the user could see the memory numbers being counted up to the amount of memory. With the increased memory modern computers started to have and the annoyance to wait for a memory check program to check the computer hardware memory on modern computers this check has been mitigated or completely removed on some hardware.
Thus under some circumstances sysadmins or advanced computer users might need to check the memory, especially if there is some suspicion for memory damages or if for example a home PC starts crashing with Blue screens of Death on Windows without reason or simply the PC or some old arcane Linux / UNIX servers gets restarted every now and then for now apparent reason. When such circumstances occur it is an idea to start debugging the hardware issue with a simple memory check.

There are multiple ways to test installed memory banks on a server laptop or local home PC both integrated and using external programs.
On servers that is usually easily done from ILO or IPMI or IDRAC access (usually web) interface of the vendor, on laptops and home usage from BIOS or UEFI (Unified Extensible Firmware Interface) acces interface on system boot that is possible as well.


An old but gold TIP, more younger people might not know is the


Prolonged SHIFT key press which once held with the user instructs the machine to initiate a memory test before the computer starts reading what is written in the boot loader.

So before anything else from below article it might be a good idea to just try HOLD SHIFT for 15-20 seconds after a complete Shut and ON from the POWER button.

If this test does not triggered or it is triggered and you end up with some corrupted memory but you're not sure which exact Memory bank is really crashing and want to know more on what memory Bank and segments are breaking up you might want to do a more thorough testing. In below article I'll try to explain shortly how this can be done.

2. Test the memory using a boot USB Flash Drive / DVD / CD 

Say hello to memtest86+. It is a Linux GRUB boot loader bootable utility that tests physical memory by writing various patterns to it and reading them back. Since memtest86+ runs directly off the hardware it does not require any operating system support for execution. Perhaps it is important to mention that memtest86 (is PassMark memtest86)and memtest86+ (An Advanced Memory diagnostic tool) are different tools, the first is freeware and second one is FOSS software.

To use it all you'll need is some version of Linux. If you don't already have some burned in somewhere at your closet, you might want to burn one.
For Linux / Mac users this is as downloading a Linux distribution ISO file and burning it with

# dd if=/path/to/iso of=/dev/sdbX bs=80M status=progress

Windows users can burn a Live USB with whatever Linux distro or download and burn the latest versionof memtest86+ from  on Windows Desktop with some proggie like lets say UnetBootIn.

2.1. Run memtest86+ on Ubuntu

Many Linux distributions such as Ubuntu 20.0 comes together with memtest86+, which can be easily invoked from GRUB / GRUB2 Kernel boot loader.
Ubuntu has a separate menu pointer for a Memtest.


Other distributions RPM based distributions such as CentOS, Fedora Linux, Redhat things differ.

2.2. memtest86+ on Fedora

Fedora used to have the memtest86+ menu at the GRUB boot selection prompt, but for some reason removed it and in newest Fedora releases as of time such as Fedora 35 memtest86+ is preinstalled and available but not visible, to start on  already and to start a memtest memory test tool:

  •   Boot a Fedora installation or Rescue CD / USB. At the prompt, type "memtest86".

boot: memtest86

2.3 memtest86+ on RHEL Linux

The memtest86+tool is available as an RPM package from Red Hat Network (RHN) as well as a boot option from the Red Hat Enterprise Linux rescue disk.
And nowadays Red Hat Enterprise Linux ships by default with the tool.

Prior redhat (now legacy) releases such as on RHEL 5.0 it has to be installed and configure it with below 3 commands.

[root@rhel ~]# yum install memtest86+
[root@rhel ~]# memtest-setup
[root@rhel ~]# grub2-mkconfig -o /boot/grub2/grub.cfg

    Again as with CentOS to boot memtest86+ from the rescue disk, you will need to boot your system from CD 1 of the Red Hat Enterprise Linux installation media, and type the following at the boot prompt (before the Linux kernel is started):

boot: memtest86

memtest86+ testing 5 memory slots

As you see all on above screenshot the Memory banks are listed as Slots. There are a number of Tests to be completed until
it can be said for sure memory does not have any faulty cells. 

Pass: 0
Errors: 0 

Indicates no errors, so in the end if memtest86 does not find anything this values should stay at zero.
memtest86+ is also usable to detecting issues with temperature of CPU. Just recently I've tested a PC thinking that some memory has defects but it turned out the issue on the Computer was at the CPU's temperature which was topping up at 80 – 82 Celsius.

If you're unfortunate and happen to get some corrupted memory segments you will get some red fields with the memory addresses found to have corrupted on Read / Write test operations:


2.4. Install and use memtest and memtest86+ on Debian / Mint Linux

You can install either memtest86+ or just for the fun put both of them and play around with both of them as they have a .deb package provided out of debian non-free /etc/apt/sources.list repositories.

root@jeremiah:/home/hipo# apt-cache show memtest86 memtest86+
Package: memtest86
Version: 4.3.7-3
Installed-Size: 302
Maintainer: Yann Dirson <>
Architecture: amd64
Depends: debconf (>= 0.5) | debconf-2.0
Recommends: memtest86+
Suggests: hwtools, memtester, kernel-patch-badram, grub2 (>= 1.96+20090523-1) | grub (>= 0.95+cvs20040624), mtools
Description-en: thorough real-mode memory tester
 Memtest86 scans your RAM for errors.
 This tester runs independently of any OS – it is run at computer
 boot-up, so that it can test *all* of your memory.  You may want to
 look at `memtester', which allows testing your memory within Linux,
 but this one won't be able to test your whole RAM.
 It can output a list of bad RAM regions usable by the BadRAM kernel
 patch, so that you can still use you old RAM with one or two bad bits.
 This is the last DFSG-compliant version of this software, upstream
 has opted for a proprietary development model starting with 5.0.  You
 may want to consider using memtest86+, which has been forked from an
 earlier version of memtest86, and provides a different set of
 features.  It is available in the memtest86+ package.
 A convenience script is also provided to make a grub-legacy-based
 floppy or image.

Description-md5: 0ad381a54d59a7d7f012972f613d7759
Section: misc
Priority: optional
Filename: pool/main/m/memtest86/memtest86_4.3.7-3_amd64.deb
Size: 45470
MD5sum: 8dd2a4c52910498d711fbf6b5753bca9
SHA256: 09178eca21f8fd562806ccaa759d0261a2d3bb23190aaebc8cd99071d431aeb6

Package: memtest86+
Version: 5.01-3
Installed-Size: 2391
Maintainer: Yann Dirson <>
Architecture: amd64
Depends: debconf (>= 0.5) | debconf-2.0
Suggests: hwtools, memtester, kernel-patch-badram, memtest86, grub-pc | grub-legacy, mtools
Description-en: thorough real-mode memory tester
 Memtest86+ scans your RAM for errors.
 This tester runs independently of any OS – it is run at computer
 boot-up, so that it can test *all* of your memory.  You may want to
 look at `memtester', which allows to test your memory within Linux,
 but this one won't be able to test your whole RAM.
 It can output a list of bad RAM regions usable by the BadRAM kernel
 patch, so that you can still use your old RAM with one or two bad bits.
 Memtest86+ is based on memtest86 3.0, and adds support for recent
 hardware, as well as a number of general-purpose improvements,
 including many patches to memtest86 available from various sources.
 Both memtest86 and memtest86+ are being worked on in parallel.
Description-md5: aa685f84801773ef97fdaba8eb26436a

Tag: admin::benchmarking, admin::boot, hardware::storage:floppy,
 interface::text-mode, role::program, scope::utility, use::checking
Section: misc
Priority: optional
Filename: pool/main/m/memtest86+/memtest86+_5.01-3_amd64.deb
Size: 75142
MD5sum: 4f06523532ddfca0222ba6c55a80c433
SHA256: ad42816e0b17e882713cc6f699b988e73e580e38876cebe975891f5904828005


root@jeremiah:/home/hipo# apt-get install –yes memtest86+

root@jeremiah:/home/hipo# apt-get install –yes memtest86

Reading package lists… Done
Building dependency tree       
Reading state information… Done
Suggested packages:
  hwtools kernel-patch-badram grub2 | grub
The following NEW packages will be installed:
0 upgraded, 1 newly installed, 0 to remove and 21 not upgraded.
Need to get 45.5 kB of archives.
After this operation, 309 kB of additional disk space will be used.
Get:1 buster/main amd64 memtest86 amd64 4.3.7-3 [45.5 kB]
Fetched 45.5 kB in 0s (181 kB/s)     
Preconfiguring packages …
Selecting previously unselected package memtest86.
(Reading database … 519985 files and directories currently installed.)
Preparing to unpack …/memtest86_4.3.7-3_amd64.deb …
Unpacking memtest86 (4.3.7-3) …
Setting up memtest86 (4.3.7-3) …
Generating grub configuration file …
Found background image: saint-John-of-Rila-grub.jpg
Found linux image: /boot/vmlinuz-4.19.0-18-amd64
Found initrd image: /boot/initrd.img-4.19.0-18-amd64
Found linux image: /boot/vmlinuz-4.19.0-17-amd64
Found initrd image: /boot/initrd.img-4.19.0-17-amd64
Found linux image: /boot/vmlinuz-4.19.0-8-amd64
Found initrd image: /boot/initrd.img-4.19.0-8-amd64
Found linux image: /boot/vmlinuz-4.19.0-6-amd64
Found initrd image: /boot/initrd.img-4.19.0-6-amd64
Found linux image: /boot/vmlinuz-4.19.0-5-amd64
Found initrd image: /boot/initrd.img-4.19.0-5-amd64
Found linux image: /boot/vmlinuz-4.9.0-8-amd64
Found initrd image: /boot/initrd.img-4.9.0-8-amd64
Found memtest86 image: /boot/memtest86.bin
Found memtest86+ image: /boot/memtest86+.bin
Found memtest86+ multiboot image: /boot/memtest86+_multiboot.bin
File descriptor 3 (pipe:[66049]) leaked on lvs invocation. Parent PID 22581: /bin/sh
Processing triggers for man-db (2.8.5-2) …


After this both memory testers memtest86+ and memtest86 will appear next to the option of booting a different version kernels and the Advanced recovery kernels, that you usually get in the GRUB boot prompt.

2.5. Use memtest embedded tool on any Linux by adding a kernel variable


2.4.1. Reboot your computer

# reboot

2.4.2. At the GRUB boot screen (with UEFI, press Esc).

2.4.3 For 4 passes add temporarily the memtest=4 kernel parameter.

memtest=        [KNL,X86,ARM,PPC,RISCV] Enable memtest
                Format: <integer>
                default : 0 <disable>
                Specifies the number of memtest passes to be
                performed. Each pass selects another test
                pattern from a given set of patterns. Memtest
                fills the memory with this pattern, validates
                memory contents and reserves bad memory
                regions that are detected.

3. Install and use memtester Linux tool

At some condition, memory is the one of the suspcious part, or you just want have a quick test. memtester  is an effective userspace tester for stress-testing the memory subsystem.  It is very effective at finding intermittent and non-deterministic faults.

The advantage of memtester "live system check tool is", you can check your system for errors while it's still running. No need for a restart, just run that application, the downside is that some segments of memory cannot be thoroughfully tested as you already have much preloaded data in it to have the Operating Sytstem running, thus always when possible try to stick to rule to test the memory using memtest86+  from OS Boot Loader, after a clean Machine restart in order to clean up whole memory heap.

Anyhow for a general memory test on a Critical Legacy Server  (if you lets say don't have access to Remote Console Board, or don't trust the ILO / IPMI Hardware reported integrity statistics), running memtester from already booted is still a good idea.

3.1. Install memtester on any Linux distribution from source

# tar zxvf memtester-4.2.2.tar.gz
# cd memtester-4.2.2
# make && make install

3.2 Install on RPM based distros


On Fedora memtester is available from repositories however on many other RPM based distros it is not so you have to install it from source.

[root@fedora ]# yum install -y memtester


3.3. Install memtester on Deb based Linux distributions from source

To install it on Debian / Ubuntu / Mint etc. , open a terminal and type:

root@linux:/ #  apt install –yes memtester

The general run syntax is:

memtester [-p PHYSADDR] [ITERATIONS]

You can hence use it like so:

hipo@linux:/ $ sudo memtester 1024 5

This should allocate 1024MB of memory, and repeat the test 5 times. The more repeats you run the better, but as a memtester run places a great overall load on the system you either don't increment the runs too much or at least run it with  lowered process importance e.g. by nicing the PID:

hipo@linux:/ $ nice -n 15 sudo memtester 1024 5


  • If you have more RAM like 4GB or 8GB, it is upto you how much memory you want to allocate for testing.
  • As your operating system, current running process might take some amount of RAM, Please check available Free RAM and assign that too memtester.
  • If you are using a 32 Bit System, you cant test more than 4 GB even though you have more RAM( 32 bit systems doesnt support more than 3.5 GB RAM as you all know).
  • If your system is very busy and you still assigned higher than available amount of RAM, then the test might get your system into a deadlock, leads to system to halt, be aware of this.
  • Run the memtester as root user, so that memtester process can malloc the memory, once its gets hold on that memory it will try to apply lock. if specified memory is not available, it will try to reduce required RAM automatically and try to lock it with mlock.
  • if you run it as a regular user, it cant auto reduce the required amount of RAM, so it cant lock it, so it tries to get hold on that specified memory and starts exhausting all system resources.

If you have 8 Gigas of RAM plugged into the PC motherboard you have to multiple 1024*8 this is easily done with bc (An arbitrary precision calculator language) tool:

root@linux:/ # bc -l
bc 1.07.1
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'. 

 for example you should run:

root@linux:/ # memtester 8192 5

memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 8192MB (2083520512 bytes)
got  8192MB (2083520512 bytes), trying mlock …Loop 1/1:
  Stuck Address       : ok        
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok        
  Block Sequential    : ok        
  Checkerboard        : ok        
  Bit Spread          : ok        
  Bit Flip            : ok        
  Walking Ones        : ok        
  Walking Zeroes      : ok        
  8-bit Writes        : ok
  16-bit Writes       : ok



4. Shell Script to test server memory for corruptions

If for some reason the machine you want to run a memory test doesn't have connection to the external network such as the internet and therefore you cannot configure a package repository server and install memtester, the other approach is to use a simple memory test script such as

# Downloaded from
echo "ByteOnSite Memory Test"
cpus=`cat /proc/cpuinfo | grep processor | wc -l`
if [ $cpus -lt 6 ]; then
threads=$(($cpus / 2))
echo "Detected $cpus CPUs, using $threads threads.."
memory=`free | grep 'Mem:' | awk {'print $2'}`
memoryper=$(($memory / $threads))
echo "Detected ${memory}K of RAM ($memoryper per thread).."
freespace=`df -B1024 . | tail -n1 | awk {'print $4'}`
if [ $freespace -le $memory ]; then
echo You do not have enough free space on the current partition. Minimum: $memory bytes
exit 1
echo "Clearing RAM Cache.."
sync; echo 3 > /proc/sys/vm/drop_cachesfile
echo > dump.memtest.img
echo "Writing to dump file (dump.memtest.img).."
for i in `seq 1 $threads`;
# 1044 is used in place of 1024 to ensure full RAM usage (2% over allocation)
dd if=/dev/urandom bs=$memoryper count=1044 >> dump.memtest.img 2>/dev/null &
echo $i
for pid in "${pids[@]}"
wait $pid

echo "Reading and analyzing dump file…"
echo "Pass 1.."
md51=`md5sum dump.memtest.img | awk {'print $1'}`
echo "Pass 2.."
md52=`md5sum dump.memtest.img | awk {'print $1'}`
echo "Pass 3.."
md53=`md5sum dump.memtest.img | awk {'print $1'}`
if [ “$md51” != “$md52” ]; then
elif [ “$md51” != “$md53” ]; then
elif [ “$md52” != “$md53” ]; then
if [ $fail -eq 0 ]; then
echo "Memory test PASSED."
echo "Memory test FAILED. Bad memory detected."
rm -f dump.memtest.img
exit $fail

Nota Bene !: Again consider the restults might not always be 100% trustable if possible restart the server and test with memtest86+

Consider also its important to make sure prior to script run,  you''ll have enough disk space to produce the dump.memtest.img file – file is created as a test bed for the memory tests and if not scaled properly you might end up with a full ( / ) root directory!


4.1 Other memory test script with dd and md5sum checksum

I found this solution on the well known sysadmin site nixCraft, I think it makes sense and quicker.

First find out memory site using free command.

# free
             total       used       free     shared    buffers     cached
Mem:      32867436   32574160     293276          0      16652   31194340
-/+ buffers/cache:    1363168   31504268
Swap:            0          0          0

It shows that this server has 32GB memory,

# dd if=/dev/urandom bs=32867436 count=1050 of=/home/memtest

free reports by k and use 1050 is to make sure file memtest is bigger than physical memory.  To get better performance, use proper bs size, for example 2048 or 4096, depends on your local disk i/o,  the rule is to make bs * count > 32 GB.

# md5sum /home/memtest; md5sum /home/memtest; md5sum /home/memtest

If you see md5sum mismatch in different run, you have faulty memory guaranteed.
The theory is simple, the file /home/memtest will cache data in memory by filling up all available memory during read operation. Using md5sum command you are reading same data from memory.

5. Other ways to test memory / do a machine stress test

Other good tools you might want to check for memory testing is mprime – 

  •  (mprime can also be used to stress test your CPU)

Alternatively, use the package stress-ng to run all kind of stress tests (including memory test) on your machine.
Perhaps there are other interesting tools for a diagnosis of memory if you know other ones I miss, let me know in the comment section.

How to install GNOME server on Ubuntu 10.04.2 LTS Lucid

Wednesday, April 20th, 2011

After some upgrades of Ubuntu from 9.10 to Ubuntu 10.04.2, I faced problems during apt-get upgrade && apt-get dist-upgrade

I had to fix it up with apt-get upgrade -f , however the “fix” which was targetting a fix up to the apt-get dist-upgrade removed about 260 packages, among which were the grub boot loader , xorg-server and even gnome

As in order to fix the package mishap the apt-get update -f was my only possible solution I went by and confirmed that I would like to wipe out all the packages.

Logically afterwards it was required that I install my missing xorg-server and gnome in order to make the Ubuntu desktop work again.

Here is how:

ubuntu:~# apt-get update && apt-get xorg-server ubuntu-desktop

The ubuntu-desktop is a meta package which installs the GNOME environment.

Looser Again

Wednesday, January 31st, 2007

Got the 2 mark on Marketing Exam. Again I’m a looser. I dont’ have nor time nor desire to learn again for this exam.I think I’m not suitable for student. Today we was on a coffee with Mitko, Toto and Dido. Nothing special ordinary day.Yesterday we stayed in Mitko and was installing Gentoo Linux to his laptop. Gentoo’s grub was buggy or something,we didn’t succeeded running the kernel with GRUB, so we decided to switch to LILO. We were able to makethe maching bootable using LILO. Then there was an annying error with REAL_ROOT option. After a lot of wanderingediting of /linuxrc we found the mistake it was a mismatch in lilo a mistake we made writing in it we wrote therereal_boot instead of real_root. In the end everything worked okay. And I went home sleeping.I’m not sure where my life is going to again … I’m completely Lost in the Dark.END—–