Posts Tagged ‘backups’

KVM Creating LIVE and offline VM snapshot backup of Virtual Machines. Restore KVM VM from backup. Delete old KVM backups

Tuesday, January 16th, 2024

kvm-backup-restore-vm-logo

For those who have to manage Kernel-Based Virtual Machines it is a must to create periodic backups of VMs. The backup is usually created as a procedure part of the Update plan (schedule) of the server either after shut down the machine completely or live.

Since KVM is open source the very logical question for starters, whether KVM supports Live backups. The simple answer is Yes it does.

virsh command as most people know is the default command to manage VMs on KVM running Hypervisor servers to manage the guest domains.

KVM is flexible and could restore a VM based on its XML configuration and the VM data (either a static VM single file) or a filesystem laying on LVM filesystem etc.

To create a snapshot out of the KVM HV, list all VMs and create the backup:

# export VM-NAME=fedora;
# export SNAPSHOT-NAME=fedora-backup;
# virsh list –all


It is useful to check out the snapshot-create-as sub arguments

 

 

# virsh help snapshot-create-as

 OPTIONS
    [–domain] <string>  domain name, id or uuid
    –name <string>  name of snapshot
    –description <string>  description of snapshot
    –print-xml      print XML document rather than create
    –no-metadata    take snapshot but create no metadata
    –halt           halt domain after snapshot is created
    –disk-only      capture disk state but not vm state
    –reuse-external  reuse any existing external files
    –quiesce        quiesce guest's file systems
    –atomic         require atomic operation
    –live           take a live snapshot
    –memspec <string>  memory attributes: [file=]name[,snapshot=type]
    [–diskspec]  disk attributes: disk[,snapshot=type][,driver=type][,file=name]

 

# virsh shutdown $VM_NAME
# virsh snapshot-create-as –domain $VM-NAME –name "$SNAPSHOT-NAME"


1. Creating a KVM VM LIVE (running machine) backup
 

# virsh snapshot-create-as –domain debian \
–name "debian-snapshot-2024" \
–description "VM Snapshot before upgrading to latest Debian" \
–live

On successful execution of KVM Virtual Machine live backup, should get something like:

Domain snapshot debian-snapshot-2024 created

 

2. Listing backed-up snapshot content of KVM machine
 

# virsh snapshot-list –domain debian


a. To get more extended info about a previous snapshot backup

# virsh snapshot-info –domain debian –snapshotname debian-snapshot-2024


b. Listing info for multiple attached storage qcow partition to a VM
 

# virsh domblklist linux-guest-vm1 –details

Sample Output would be like:

 Type   Device   Target   Source
——————————————————————-
 file   disk     vda      /kvm/linux-host/linux-guest-vm1_root.qcow2
 file   disk     vdb      /kvm/linux-host/linux-guest-vm1_attached_storage.qcow2
 file   disk     vdc      /kvm/linux-host/guest01_logging_partition.qcow2
 file   cdrom    sda      –
 file   cdrom    sdb      

 

3. Backup KVM only Virtual Machine data files (but not VM state) Live

 

# virsh snapshot-create-as –name "mint-snapshot-2024" \
–description "Mint Linux snapshot" \
–disk-only \
–live
–domain mint-home-desktop


4. KVM restore snapshot (backup)
 

To revert backup VM state to older backup snapshot:
 

# virsh shutdown –domain manjaro
# virsh snapshot-revert –domain manjaro –snapshotname manjaro-linux-back-2024 –running


5. Delete old unnecessery KVM VM backup
 

# virsh snapshot-delete –domain dragonflybsd –snapshotname dragonfly-freebsd

 

Linux extending life time for a damaged hard drive server tricks on a live server. Force fcsk on next reboot.Read-only file system error solutions

Friday, February 17th, 2023

linux-extending-life-time-for-a-damaged-hard-drive-server-tricks-can-not-read-superblock-linux-force-fsck-on-next-reboot

In our daily work as system administrators we have some very old Legacy systems running Clustered High Availability proxies using CRM (Cluster Resource Manager) and some legacy systems still using Heartbeat to manage the cluster instead of the newer and modern Corosync variant.

The HA cluster is only 2 nodes Linux machine and running the obscure already long time unsupported version of Redhat 5.11 (Ootpa) who was officially became stable distant year 1998 (yeath the years were good) and whose EOL (End of Life) has been reached long time ago and the OS is no longer supported, however for about 14 years the machines has been running perfectly fine until one of the Cluster nodes managed by ocf::heartbeat:IPAddr2 , that is  /etc/ha.d/resource.d/IPAddr2 shell script. Yeah for the newbies Heartbeat Application Cluster in Linux does work like that it uses a number of extendable pair of shell scripts written for different kind of Network / Web / Mail / SQL or whatever services HA management.

The first node configured however, started failing due to some errors like:
 

EXT3-fs error (device dm-1): ext3_journal_start_sb: Detected aborted journal
sd 0:2:0:0: rejecting I/O to offline device
Aborting journal on device sda1.
sd 0:2:0:0: rejecting I/O to offline device
printk: 159 messages suppressed.
Buffer I/O error on device sda1, logical block 526
lost page write due to I/O error on sda1
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
ext3_abort called.
EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
megaraid_sas: FW was restarted successfully, initiating next stage…
megaraid_sas: HBA recovery state machine, state 2 starting…
megasas: Waiting for FW to come to ready state
megasas: FW in FAULT state!!
FW state [-268435456] hasn't changed in 180 secs
megaraid_sas: out: controller is not in ready state
megasas: waiting_for_outstanding: after issue OCR. 
megasas: waiting_for_outstanding: before issue OCR. FW state = f0000000
megaraid_sas: pending commands remain even after reset handling. megasas[0]: Dumping Frame Phys Address of all pending cmds in FW
megasas[0]: Total OS Pending cmds : 0 megasas[0]: 64 bit SGLs were sent to FW
megasas[0]: Pending OS cmds in FW :

The result out of that was a frequently the filesystem of the machine got re-mounted as Read Only and of course that is
quite bad if you have a running processess of haproxy that should be able to be living their and take up some Web traffic
for high availability and you run all the traffic only on the 2nd pair of machine.

This of course was a clear sign for a failing disks or some hit bad blocks regions or as the messages indicates, some
problem with system hardware or Raid SAS Array.

The physical raid on the system, just like rest of the hardware is very old stuff as well.

[root@haproxy_lb_node1 ~]# lspci |grep -i RAI
01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

The produced errors not only made the machine to auto-mount its root / filesystem in Read-Only mode but besides has most
likely made the machine to automatically reboot every few days or few times every day in a raw.

The second Load Balancer node2 did operated perfectly, and we thought that we might just keep the broken machine in that half running
and inconsistent state for few weeks until we have built the new machines with Pre-Installed new haproxy cluster with modern
RedHat Linux 8.6 distribution, but since we have to follow SLAs (Service Line Agreements) with Customers and the end services behind the
High Availability (HA) Haproxy cluster were at danger … 

We as sysadmins had the task to make our best to try to stabilize the unstable node with disk errors for the system to servive
and be able to normally serve traffic (if node2 that is in a separate Data center fails due to a hardware or electricity issues etc.)
.

Here is few steps we took, that has hopefully improved the situation.

1. Make backups of most important files of high importance

Always before doing anything with a broken system, prepare backup of the most important files, if that is a cluster that should be a backup of the cluster configurations (if you don't have already ones) backup of /etc/hosts / backup of any important services configs /etc/haproxy/haproxy.cfg /etc/postfix/postfix.cfg (like it was my case), preferrably backup of whole /etc/  any important files from /root/ or /home/users* directories backup of at leasts latest logs from /var/log etc.
 

2. Clear up all unnecessery services scripts from the server

Any additional Softwares / Services and integrity checking tools (daemons) / scripts and cron jobs, were immediately stopped and wheter unused removed.

E.g. we had moved through /etc/cron* to check what's there,

# ls -ld /etc/cron.*
drwx—— 2 root root 4096 Feb  7 18:13 /etc/cron.d
drwxr-xr-x 2 root root 4096 Feb  7 17:59 /etc/cron.daily
-rw-r–r– 1 root root    0 Jul 20  2010 /etc/cron.deny
drwxr-xr-x 2 root root 4096 Jan  9  2013 /etc/cron.hourly
drwxr-xr-x 2 root root 4096 Jan  9  2013 /etc/cron.monthly
drwxr-xr-x 2 root root 4096 Aug 26  2015 /etc/cron.weekly

 

And like well professional butchers removed everything unnecessery that could trigger any extra unnecessery disk read / writes to HDD.

E.g. just create

# mkdir -p /root/etc_old/{/etc/cron.d,\
/etc/cron.daily,/etc/cron.hourly,/etc/cron.monthly\
,/etc/cron.weekly}

 

And moved all unnecessery cron job scripts like:

1. nmon (old school network / memory / hard disk console tool for monitoring and tuning server parameters)
2. clamscan / freshclam crons
3. mlocate (the script that is taking care for periodic run of updatedb command to keep the locate command to easily search
for files inside the DB to put less read operations on disk in case if you need to find file (e.g. prevent yourself to everytime
run cmd like: find / . -iname '*whatever_you_look_for*'
4. cups cron jobs
5. logwatch cron
6. rkhunter stuff
7. logrotate (yes we stopped even logrotation trigger job as we found the server was crashing sometimes at the same time when
the lograte job to rotate logs inside /var/log/* was running perhaps leading to a hit of the I/O read error (bad blocks).


Also inspected the Administrator user root cron job for any unwated scripts and stopped two report bash scripts that were part of the PCI tightened Security procedures.
Therein found script responsible to periodically report the list of installed packages and if they have not changed, as well a script to periodically report via email the list of
/etc/{passwd,/etc/shadow} created users, used to historically keep an eye on the list of users and easily see if someone
has created new users on the machine. Those were enabled via /var/spool/cron/root cron jobs, in other cases, on other machines if it happens for you
it is a good idea to check out all the existing user cron jobs and stop anything that might be putting Read / Write extra heat pressure on machine attached the Hard drives.

# ls -al /var/spool/cron/
total 20
drwx——  2 root root 4096 Nov 13  2015 .
drwxr-xr-x 12 root root 4096 May 11  2011 ..
-rw——-  1 root root  133 Nov 13  2015 root


3. Clear up old log files and any files unnecessery

Under /var/log and /home /var/tmp /var/spool/tmp immediately try to clear up the old log files.
From my past experience this has many times made the FS file inodes that are storing on a unbroken part (good blocks) of the hard drive and
ready to be reused by newly written rsyslog / syslogd services spitted files.

!!! Note that during the removal of some files you might hit a files stored on a bad blocks that might lead to a unexpected system reboot.

But that's okay, don't worry most likely after a hard reset by a technician in the Datacenter the machine will boot again and you can enjoy
removing remaining still files to send them to the heaven for old files.

 

4. Trigger an automatic system file system check with fsck on next boot

The standard way to force a Linux to aumatically recheck its Root filesystem is to simply create the /forcefsck to root partition or any other secondary disk partition you would like to check.

# touch /forcefsck

# reboot


However at some occasions you might be unable to do it because, the / (root fs) has been remounted in ReadOnly mode, yackes …

Luckily old Linux distibutions like this RHEL 5.1, has a way to force a filesystem check after reboot fsck and identify any
unknown bad-blocks and hopefully succceed in isolating them, so you don't hit into the same auto-reboots if the hard drive or Software / Hardware RAID
is not in terrible state
, you can use an option built in in /sbin/shutdown command the '-F'

   -F     Force fsck on reboot.


Hence to make the machine reboot and trigger immediately fsck:

# shutdown -rF now


Just In case you wonder why to reboot before check the Filesystem. Well simply because you need to have them unmounted before you check.

In that specific case this produced so far a good result and the machine booted just fine and we crossed the fingers and prayed that the machine would work flawlessly in the coming few weeks, before we finalize the configuration of the substitute machines, where this old infrastructure will be migrated to a new built cluster with new Haproxy and Corosync / Pacemaker Cluster on a brand new RHEL.

NB! On newer machines this won't work however as shutdown command has been stripped off this option because no SystemV (SystemInit) or Upstart and not on SystemD newer services architecture.
 

5. Hints on checking the hard drives with fsck

If you happen to be able to have physical access to the remote Hardare machine via a TTY[1-9] Console, that's even better and is the standard way to do it but with this specific case we had no easy way to get access to the Physical server console.

It is even better to go there and via either via connected Monitor (Display) or KVM Switch (Those who hear KVM switch first time this is a great device in server rooms to connect multiple monitors to same Monitor Display), it is better to use a some of the multitude of options to choose from for USB Distro Linux recovery OS versions or a CDROM / DVD on older machines like this with the Redhat's recovery mode rolled on.
After mounting the partition simply check each of the disks
e.g. :

# fsck -y /dev/sdb
# fsck -y /dev/sdc

Or if you want to not waste time and look for each hard drive but directly check all the ones that are attached and known by Linux distro via /etc/fstab definition run:

# fsck -AR

If necessery and you have a mixture of filesystems for example EXT3 , EXT4 , REISERFS you can tell it to omit some filesystem, for example ext3, like that:

# fsck -AR -t noext3 -y


To skip fsck on mounted partitions with fsck:

# fsck -M /dev/sdb


One remark to make here on fsck is usually fsck to complete its job on various filesystem it uses other external component binaries usually stored in /sbin/fsck*

ls -al /sbin/fsck*
-rwxr-xr-x 1 root root  55576 20 яну 2022 /sbin/fsck*
-rwxr-xr-x 1 root root  43272 20 яну 2022 /sbin/fsck.cramfs*
lrwxrwxrwx 1 root root      9  4 юли 2020 /sbin/fsck.exfat -> exfatfsck*
lrwxrwxrwx 1 root root      6  7 юни 2021 /sbin/fsck.ext2 -> e2fsck*
lrwxrwxrwx 1 root root      6  7 юни 2021 /sbin/fsck.ext3 -> e2fsck*
lrwxrwxrwx 1 root root      6  7 юни 2021 /sbin/fsck.ext4 -> e2fsck*
-rwxr-xr-x 1 root root  84208  8 фев 2021 /sbin/fsck.fat*
-rwxr-xr-x 2 root root 393040 30 ное 2009 /sbin/fsck.jfs*
-rwxr-xr-x 1 root root 125184 20 яну 2022 /sbin/fsck.minix*
lrwxrwxrwx 1 root root      8  8 фев 2021 /sbin/fsck.msdos -> fsck.fat*
-rwxr-xr-x 1 root root    333 16 дек 2021 /sbin/fsck.nfs*
lrwxrwxrwx 1 root root      8  8 фев 2021 /sbin/fsck.vfat -> fsck.fat*


6. Using tune2fs to  adjust tunable filesystem parameters on ext2/ext3/ext4 filesystems (few examples)

a) To check whether really the filesystem was checked on boot time or check a random filesystem on the server for its last check up date with fsck:

#  tune2fs -l /dev/sda1 | grep checked
Last checked:             Wed Apr 17 11:04:44 2019

On some distributions like old Debian and Ubuntu, it is even possible to enable fsck to log its operations during check on reboot via changing the verbosity from NO to YES:

# sed -i "s/#VERBOSE=no/VERBOSE=yes/" /etc/default/rcS


If you're having the issues on old Debian Linuxes  and not on RHEL  it is possible to;

b) Enable all fsck repairs automatic on boot

by running via:
 

# sed -i "s/FSCKFIX=no/FSCKFIX=yes/" /etc/default/rcS


c) Forcing fcsk check on for server attached Hard Drive Partitions with tune2fs

# tune2fs -c 1 /dev/sdXY

Note that:
tune2fs can force a fsck on each reboot for EXT4, EXT3 and EXT2 filesystems only.

tune2fs can trigger a forced fsck on every reboot using the -c (max-mount-counts) option.
This option sets the number of mounts after which the filesystem will be checked, so setting it to 1 will run fsck each time the computer boots.
Setting it to -1 or 0 resets this (the number of times the filesystem is mounted will be disregarded by e2fsck and the kernel).


 For example you could:

d) Set fsck to run a filesystem check every 30 boots, by using -c 30 
 

# tune2fs -c 30 /dev/sdXY


e) Checking whether a Hard Drive has been really checked on the boot

 

#  tune2fs -l /dev/sda1 | grep checked
Last checked:             Wed Apr 17 11:04:44 2019


e) Check when was the last time the file system /dev/sdX was checked:
 

# tune2fs -l /dev/sdX | grep Last\ c
Last checked:             Thu Jan 12 20:28:34 2017


f) Check how many times our /dev/sdX filesystem was mounted

# tune2fs -l /dev/sdX | grep Mount
Mount count:              157

g) Check how many mounts are allowed to pass before filesystem check is forced
 

# tune2fs -l /dev/sdX | grep Max
Maximum mount count:      -1


7. Repairing disk / partitions via GRUB fsck.mode and fsck.repair kernel module options

It is also possible to force a fsck.repair on boot via GRUB, but that usually is not an option someone would like as the machine might fail too boot if it hards to repair hardly, however in difficult situations with failing disks temporary enabling it is good idea.

This can be done by including for grub initial config

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash fsck.mode=force fsck.repair=yes"

fsck.mode=force – will force a fsck each time a system boot and keeping that value enabled for a long time inside GRUB is stupid for servers as

sometimes booting could be severely prolonged because of the checks especially with servers with many or slow old hard drives.

fsck.repair=yes – will make the fsck try to repair if it finds bad blocks when checking (be absolutely sure you know, what you're doing if passing this options)

The options can be also set via editing the GRUB boot screen, if you have physical access to the server and don't want to reload the grub loader and possibly make the machine unbootable on next boot.
 

8. Few more details on how /etc/fstab disk fsck check parameters values for Systemd Linux machines works

The "proper" way on systemd (if we can talk about proper way on Linux) to runs fsck for each filesystem that has a fsck is to pass number greater than 0 set in
/etc/fstab (last column in /etc/fstab), so make sure you edit your /etc/fstab if that's not the case.

The root partition should be set to 1 (first to be checked), while other partitions you want to be checked should be set to 2.

Example /etc/fstab:
 

# /etc/fstab: static file system information.

/dev/sda1  /      ext4  errors=remount-ro  0  1
/dev/sda5  /home  ext4  defaults           0  2

The values you can put here as a second number meaning is as follows:
0 – disabled, that is do not check filesystem
1 – partition with this PASS value has a higher priority and is checked first. This value is usually set to the root / partition
2 – partitions with this PASS value will be checked last

a) Check the produced log out of fsck

Unfortunately on the older versions of Linux distros with SystemV fsck log output might be not generated except on the physical console so if you have a kind of duplicator device physical tty on the display port of the server, you might capture some bad block reports or fixed errors messages, but if you don't you might just cross the fingers and hope that anything found FS irregularities was recovered.

On systemd Linux machines the fsck log should be produced either in /run/initramfs/fsck.log or some other location depending on the Linux distro and you should be able to see something from fsck inside /var/log/* logs:

# grep -rli fsck /var/log/*


Close it up

Having a system with failing disk is a really one of the worst sysadmin nightmares to get. The good news is that most of the cases we're prepared with some working backup or some work around stuff like the few steps explained to mitigate the amount of Read / Writes to hard disks on the failing machine HDDs. If the failing disk is a primary Linux filesystem all becomes even worse as every next reboot, you have no guarantee, whether the kernel / initrd or some of the other system components required to run the Core Linux system won't break up the normal boot. Thus one side changes on the hard drives is a risky business on ther other side, if you're in a situation where you have a mirror system or the failing system is just a Linux server installed without a Cluster pair, then this is not a big deal as you can guarantee at least one of the nodes still up, unning and serving. Still doing too much of operations with HDD is always a danger so the steps described, though in most cases leading to improvement on how the system behaves, the system should be considered totally unreliable and closely monitored not only by some monitoring stuff like Zabbix / Prometheus whatever but regularly check the systems state via normal SSH logins. It is important if you have some important datas or logs on the system that are not synchronized to a system node to copy them before doing any of the described operations. After all minimal is backuped, proceed to clear up everything that might be cleared up and still the machine to continue providing most of its functionalities, trigger fsck automatic HDD check on next reboot, reboot, check what is going on and monitor the machine from there on.

Hopefully the few described steps, has helped some sysadmin. There is plenty of things which I've described that might go wrong, even following the described steps, might not help if the machines Storage Drives / SAS / SSD has too much of a damage. But as said in most cases following this few steps would improve the machine state.

Wish you the best of luck!

 

How to RPM update Hypervisors and Virtual Machines running Haproxy High Availability cluster on KVM, Virtuozzo without a downtime on RHEL / CentOS Linux

Friday, May 20th, 2022

virtuozzo-kvm-virtual-machines-and-hypervisor-update-manual-haproxy-logo


Here is the scenario, lets say you have on your daily task list two Hypervisor (HV) hosts running CentOS or RHEL Linux with KVM or Virutozzo technology and inside the HV hosts you have configured at least 2 pairs of virtual machines one residing on HV Host 1 and one residing on HV Host 2 and you need to constantly keep the hosts to the latest distribution major release security patchset.

The Virtual Machines has been running another set of Redhat Linux or CentOS configured to work in a High Availability Cluster running Haproxy / Apache / Postfix or any other kind of HA solution on top of corosync / keepalived or whatever application cluster scripts Free or Open Source technology that supports a switch between clustered Application nodes.

The logical question comes how to keep up the CentOS / RHEL Machines uptodate without interfering with the operations of the Applications running on the cluster?

Assuming that the 2 or more machines are configured to run in Active / Passive App member mode, e.g. one machine is Active at any time and the other is always Passive, a switch is possible between the Active and Passive node.

HAProxy--Load-Balancer-cluster-2-nodes-your-Servers

In this article I'll give a simple step by step tested example on how you I succeeded to update (for security reasons) up to the latest available Distribution major release patchset on one by one first the Clustered App on Virtual Machines 1 and VM2 on Linux Hypervisor Host 1. Then the App cluster VM 1 / VM 2 on Hypervisor Host 2.
And finally update the Hypervisor1 (after moving the Active resources from it to Hypervisor2) and updating the Hypervisor2 after moving the App running resources back on HV1.
I know the procedure is a bit monotonic but it tries to go through everything step by step to try to mitigate any possible problems. In case of failure of some rpm dependencies during yum / dnf tool updates you can always revert to backups so in anyways don't forget to have a fully functional backup of each of the HV hosts and the VMs somewhere on a separate machine before proceeding further, any possible failures due to following my aritcle literally is your responsibility 🙂

 

0. Check situation before the update on HVs / get VM IDs etc.

Check the virsion of each of the machines to be updated both Hypervisor and Hosted VMs, on each machine run:
 

# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)


The machine setup I'll be dealing with is as follows:
 

hypervisor-host1 -> hypervisor-host1.fqdn.com 
•    virt-mach-centos1
•    virt-machine-zabbix-proxy-centos (zabbix proxy)

hypervisor-host2 -> hypervisor-host2.fqdn.com
•    virt-mach-centos2
•    virt-machine-zabbix2-proxy-centos (zabbix proxy)

To check what is yours check out with virsh cmd –if on KVM or with prlctl if using Virutozzo, you should get something like:

[root@hypervisor-host2 ~]# virsh list
 Id Name State
—————————————————-
 1 vm-host1 running
 2 virt-mach-centos2 running

 # virsh list –all

[root@hypervisor-host1 ~]# virsh list
 Id Name State
—————————————————-
 1 vm-host2 running
 3 virt-mach-centos1 running

[root@hypervisor-host1 ~]# prlctl list
UUID                                    STATUS       IP_ADDR         T  NAME
{dc37c201-08c9-589d-aa20-9386d63ce3f3}  running      –               VM virt-mach-centos1
{76e8a5f8-caa8-5442-830e-aa4bfe8d42d9}  running      –               VM vm-host2
[root@hypervisor-host1 ~]#

If you have stopped VMs with Virtuozzo to list the stopped ones as well.
 

# prlctl list -a

[root@hypervisor-host2 74a7bbe8-9245-5385-ac0d-d10299100789]# vzlist -a
                                CTID      NPROC STATUS    IP_ADDR         HOSTNAME
[root@hypervisor-host2 74a7bbe8-9245-5385-ac0d-d10299100789]# prlctl list
UUID                                    STATUS       IP_ADDR         T  NAME
{92075803-a4ce-5ec0-a3d8-9ee83d85fc76}  running      –               VM virt-mach-centos2
{74a7bbe8-9245-5385-ac0d-d10299100789}  running      –               VM vm-host1

# prlctl list -a


If due to Virtuozzo version above command does not return you can manually check in the VM located folder, VM ID etc.
 

[root@hypervisor-host2 vmprivate]# ls
74a7bbe8-9245-4385-ac0d-d10299100789  92075803-a4ce-4ec0-a3d8-9ee83d85fc76
[root@hypervisor-host2 vmprivate]# pwd
/vz/vmprivate
[root@hypervisor-host2 vmprivate]#


[root@hypervisor-host1 ~]# ls -al /vz/vmprivate/
total 20
drwxr-x—. 5 root root 4096 Feb 14  2019 .
drwxr-xr-x. 7 root root 4096 Feb 13  2019 ..
drwxr-x–x. 4 root root 4096 Feb 18  2019 1c863dfc-1deb-493c-820f-3005a0457627
drwxr-x–x. 4 root root 4096 Feb 14  2019 76e8a5f8-caa8-4442-830e-aa4bfe8d42d9
drwxr-x–x. 4 root root 4096 Feb 14  2019 dc37c201-08c9-489d-aa20-9386d63ce3f3
[root@hypervisor-host1 ~]#


Before doing anything with the VMs, also don't forget to check the Hypervisor hosts has enough space, otherwise you'll get in big troubles !
 

[root@hypervisor-host2 vmprivate]# df -h
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/centos_hypervisor-host2-root   20G  1.8G   17G  10% /
devtmpfs                          20G     0   20G   0% /dev
tmpfs                             20G     0   20G   0% /dev/shm
tmpfs                             20G  2.0G   18G  11% /run
tmpfs                             20G     0   20G   0% /sys/fs/cgroup
/dev/sda1                        992M  159M  766M  18% /boot
/dev/mapper/centos_hypervisor-host2-home  9.8G   37M  9.2G   1% /home
/dev/mapper/centos_hypervisor-host2-var   9.8G  355M  8.9G   4% /var
/dev/mapper/centos_hypervisor-host2-vz    755G   25G  692G   4% /vz

 

[root@hypervisor-host1 ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root   50G  1.8G   45G   4% /
devtmpfs                  20G     0   20G   0% /dev
tmpfs                     20G     0   20G   0% /dev/shm
tmpfs                     20G  2.1G   18G  11% /run
tmpfs                     20G     0   20G   0% /sys/fs/cgroup
/dev/sda2                992M  153M  772M  17% /boot
/dev/mapper/centos-home  9.8G   37M  9.2G   1% /home
/dev/mapper/centos-var   9.8G  406M  8.9G   5% /var
/dev/mapper/centos-vz    689G   12G  643G   2% /vz

Another thing to do before proceeding with update is to check and tune if needed the amount of CentOS repositories used, before doing anything with yum.
 

[root@hypervisor-host2 yum.repos.d]# ls -al
total 68
drwxr-xr-x.   2 root root  4096 Oct  6 13:13 .
drwxr-xr-x. 110 root root 12288 Oct  7 11:13 ..
-rw-r–r–.   1 root root  4382 Mar 14  2019 CentOS7.repo
-rw-r–r–.   1 root root  1664 Sep  5  2019 CentOS-Base.repo
-rw-r–r–.   1 root root  1309 Sep  5  2019 CentOS-CR.repo
-rw-r–r–.   1 root root   649 Sep  5  2019 CentOS-Debuginfo.repo
-rw-r–r–.   1 root root   314 Sep  5  2019 CentOS-fasttrack.repo
-rw-r–r–.   1 root root   630 Sep  5  2019 CentOS-Media.repo
-rw-r–r–.   1 root root  1331 Sep  5  2019 CentOS-Sources.repo
-rw-r–r–.   1 root root  6639 Sep  5  2019 CentOS-Vault.repo
-rw-r–r–.   1 root root  1303 Mar 14  2019 factory.repo
-rw-r–r–.   1 root root   666 Sep  8 10:13 openvz.repo
[root@hypervisor-host2 yum.repos.d]#

 

[root@hypervisor-host1 yum.repos.d]# ls -al
total 68
drwxr-xr-x.   2 root root  4096 Oct  6 13:13 .
drwxr-xr-x. 112 root root 12288 Oct  7 11:09 ..
-rw-r–r–.   1 root root  1664 Sep  5  2019 CentOS-Base.repo
-rw-r–r–.   1 root root  1309 Sep  5  2019 CentOS-CR.repo
-rw-r–r–.   1 root root   649 Sep  5  2019 CentOS-Debuginfo.repo
-rw-r–r–.   1 root root   314 Sep  5  2019 CentOS-fasttrack.repo
-rw-r–r–.   1 root root   630 Sep  5  2019 CentOS-Media.repo
-rw-r–r–.   1 root root  1331 Sep  5  2019 CentOS-Sources.repo
-rw-r–r–.   1 root root  6639 Sep  5  2019 CentOS-Vault.repo
-rw-r–r–.   1 root root  1303 Mar 14  2019 factory.repo
-rw-r–r–.   1 root root   300 Mar 14  2019 obsoleted_tmpls.repo
-rw-r–r–.   1 root root   666 Sep  8 10:13 openvz.repo


1. Dump VM definition XMs (to have it in case if it gets wiped during update)

There is always a possibility that something will fail during the update and you might be unable to restore back to the old version of the Virtual Machine due to some config misconfigurations or whatever thus a very good idea, before proceeding to modify the working VMs is to use KVM's virsh and dump the exact set of XML configuration that makes the VM roll properly.

To do so:
Check a little bit up in the article how we have listed the IDs that are part of the directory containing the VM.
 

[root@hypervisor-host1 ]# virsh dumpxml (Id of VM virt-mach-centos1 ) > /root/virt-mach-centos1_config_bak.xml
[root@hypervisor-host2 ]# virsh dumpxml (Id of VM virt-mach-centos2) > /root/virt-mach-centos2_config_bak.xml

 


2. Set on standby virt-mach-centos1 (virt-mach-centos1)

As I'm upgrading two machines that are configured to run an haproxy corosync cluster, before proceeding to update the active host, we have to switch off
the proxied traffic from node1 to node2, – e.g. standby the active node, so the cluster can move up the traffic to other available node.
 

[root@virt-mach-centos1 ~]# pcs cluster standby virt-mach-centos1


3. Stop VM virt-mach-centos1 & backup on Hypervisor host (hypervisor-host1) for VM1

Another prevention step to make sure you don't get into damaged VM or broken haproxy cluster after the upgrade is to of course backup 

 

[root@hypervisor-host1 ]# prlctl backup virt-mach-centos1

or
 

[root@hypervisor-host1 ]# prlctl stop virt-mach-centos1
[root@hypervisor-host1 ]# cp -rpf /vz/vmprivate/dc37c201-08c9-489d-aa20-9386d63ce3f3 /vz/vmprivate/dc37c201-08c9-489d-aa20-9386d63ce3f3-bak
[root@hypervisor-host1 ]# tar -czvf virt-mach-centos1_vm_virt-mach-centos1.tar.gz /vz/vmprivate/dc37c201-08c9-489d-aa20-9386d63ce3f3

[root@hypervisor-host1 ]# prlctl start virt-mach-centos1


4. Remove package version locks on all hosts

If you're using package locking to prevent some other colleague to not accidently upgrade the machine (if multiple sysadmins are managing the host), you might use the RPM package locking meachanism, if that is used check RPM packs that are locked and release the locking.

+ List actual list of locked packages

[root@hypervisor-host1 ]# yum versionlock list  

…..
0:libtalloc-2.1.16-1.el7.*
0:libedit-3.0-12.20121213cvs.el7.*
0:p11-kit-trust-0.23.5-3.el7.*
1:quota-nls-4.01-19.el7.*
0:perl-Exporter-5.68-3.el7.*
0:sudo-1.8.23-9.el7.*
0:libxslt-1.1.28-5.el7.*
versionlock list done
                          

+ Clear the locking            

# yum versionlock clear                               


+ List actual list / == clear all entries
 

[root@virt-mach-centos2 ]# yum versionlock list; yum versionlock clear
[root@virt-mach-centos1 ]# yum versionlock list; yum versionlock clear
[root@hypervisor-host1 ~]# yum versionlock list; yum versionlock clear
[root@hypervisor-host2 ~]# yum versionlock list; yum versionlock clear


5. Do yum update virt-mach-centos1


For some clarity if something goes wrong, it is really a good idea to make a dump of the basic packages installed before the RPM package update is initiated,
The exact versoin of RHEL or CentOS as well as the list of locked packages, if locking is used.

Enter virt-mach-centos1 (ssh virt-mach-centos1) and run following cmds:
 

# cat /etc/redhat-release  > /root/logs/redhat-release-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# cat /etc/grub.d/30_os-prober > /root/logs/grub2-efi-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out


+ Only if needed!!
 

# yum versionlock clear
# yum versionlock list


Clear any previous RPM packages – careful with that as you might want to keep the old RPMs, if unsure comment out below line
 

# yum clean all |tee /root/logs/yumcleanall-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out

 

Proceed with the update and monitor closely the output of commands and log out everything inside files using a small script that you should place under /root/status the script is given at the end of the aritcle.:
 

yum check-update |tee /root/logs/yumcheckupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
yum check-update | wc -l
yum update |tee /root/logs/yumupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
sh /root/status |tee /root/logs/status-before-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out

 

6. Check if everything is running fine after upgrade

Reboot VM
 

# shutdown -r now


7. Stop VM virt-mach-centos2 & backup  on Hypervisor host (hypervisor-host2)

Same backup step as prior 

# prlctl backup virt-mach-centos2


or
 

# prlctl stop virt-mach-centos2
# cp -rpf /vz/vmprivate/92075803-a4ce-4ec0-a3d8-9ee83d85fc76 /vz/vmprivate/92075803-a4ce-4ec0-a3d8-9ee83d85fc76-bak
## tar -czvf virt-mach-centos2_vm_virt-mach-centos2.tar.gz /vz/vmprivate/92075803-a4ce-4ec0-a3d8-9ee83d85fc76

# prctl start virt-mach-centos2


8. Do yum update on virt-mach-centos2

Log system state, before the update
 

# cat /etc/redhat-release  > /root/logs/redhat-release-vorher-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# cat /etc/grub.d/30_os-prober > /root/logs/grub2-efi-vorher-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# yum versionlock clear == if needed!!
# yum versionlock list

 

Clean old install update / packages if required
 

# yum clean all |tee /root/logs/yumcleanall-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out


Initiate the update

# yum check-update |tee /root/logs/yumcheckupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out 2>&1
# yum check-update | wc -l 
# yum update |tee /root/logs/yumupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out 2>&1
# sh /root/status |tee /root/logs/status-before-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out


9. Check if everything is running fine after upgrade
 

Reboot VM
 

# shutdown -r now

 

10. Stop VM vm-host2 & backup
 

# prlctl backup vm-host2


or

# prlctl stop vm-host2

Or copy the actual directory containig the Virtozzo VM (use the correct ID)
 

# cp -rpf /vz/vmprivate/76e8a5f8-caa8-5442-830e-aa4bfe8d42d9 /vz/vmprivate/76e8a5f8-caa8-5442-830e-aa4bfe8d42d9-bak
## tar -czvf vm-host2.tar.gz /vz/vmprivate/76e8a5f8-caa8-4442-830e-aa5bfe8d42d9

# prctl start vm-host2


11. Do yum update vm-host2
 

# cat /etc/redhat-release  > /root/logs/redhat-release-vorher-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# cat /etc/grub.d/30_os-prober > /root/logs/grub2-efi-vorher-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out


Clear only if needed

# yum versionlock clear
# yum versionlock list
# yum clean all |tee /root/logs/yumcleanall-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out


Do the rpm upgrade

# yum check-update |tee /root/logs/yumcheckupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# yum check-update | wc -l
# yum update |tee /root/logs/yumupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# sh /root/status |tee /root/logs/status-before-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out


12. Check if everything is running fine after upgrade
 

Reboot VM
 

# shutdown -r now


13. Do yum update hypervisor-host2

 

 

# cat /etc/redhat-release  > /root/logs/redhat-release-vorher-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# cat /etc/grub.d/30_os-prober > /root/logs/grub2-efi-vorher-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out

Clear lock   if needed

# yum versionlock clear
# yum versionlock list
# yum clean all |tee /root/logs/yumcleanall-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out


Update rpms
 

# yum check-update |tee /root/logs/yumcheckupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out 2>&1
# yum check-update | wc -l
# yum update |tee /root/logs/yumupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out 2>&1
# sh /root/status |tee /root/logs/status-before-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out


14. Stop VM vm-host1 & backup


Some as ealier
 

# prlctl backup vm-host1

or
 

# prlctl stop vm-host1

# cp -rpf /vz/vmprivate/74a7bbe8-9245-4385-ac0d-d10299100789 /vz/vmprivate/74a7bbe8-9245-4385-ac0d-d10299100789-bak
# tar -czvf vm-host1.tar.gz /vz/vmprivate/74a7bbe8-9245-4385-ac0d-d10299100789

# prctl start vm-host1


15. Do yum update vm-host2
 

# cat /etc/redhat-release  > /root/logs/redhat-release-vorher-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# cat /etc/grub.d/30_os-prober > /root/logs/grub2-efi-vorher-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# yum versionlock clear == if needed!!
# yum versionlock list
# yum clean all |tee /root/logs/yumcleanall-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# yum check-update |tee /root/logs/yumcheckupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# yum check-update | wc -l
# yum update |tee /root/logs/yumupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# sh /root/status |tee /root/logs/status-before-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out


16. Check if everything is running fine after upgrade

+ Reboot VM

# shutdown -r now


17. Do yum update hypervisor-host1

Same procedure for HV host 1 

# cat /etc/redhat-release  > /root/logs/redhat-release-vorher-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# cat /etc/grub.d/30_os-prober > /root/logs/grub2-efi-vorher-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out

Clear lock
 

# yum versionlock clear
# yum versionlock list
# yum clean all |tee /root/logs/yumcleanall-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out

# yum check-update |tee /root/logs/yumcheckupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# yum check-update | wc -l
# yum update |tee /root/logs/yumupdate-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out
# sh /root/status |tee /root/logs/status-before-$(hostname)-$(date '+%Y-%m-%d_%H-%M-%S').out


18. Check if everything is running fine after upgrade

Reboot VM
 

# shutdown -r now


Check hypervisor-host1 all VMs run as expected 


19. Check if everything is running fine after upgrade

Reboot VM
 

# shutdown -r now


Check hypervisor-host2 all VMs run as expected afterwards


20. Check once more VMs and haproxy or any other contained services in VMs run as expected

Login to hosts and check processes and logs for errors etc.
 

21. Haproxy Unstandby virt-mach-centos1

Assuming that the virt-mach-centos1 and virt-mach-centos2 are running a Haproxy / corosync cluster you can try to standby node1 and check the result
hopefully all should be fine and traffic should come to host node2.

[root@virt-mach-centos1 ~]# pcs cluster unstandby virt-mach-centos1


Monitor logs and make sure HAproxy works fine on virt-mach-centos1


22. If necessery to redefine VMs (in case they disappear from virsh) or virtuosso is not working

[root@virt-mach-centos1 ]# virsh define /root/virt-mach-centos1_config_bak.xml
[root@virt-mach-centos1 ]# virsh define /root/virt-mach-centos2_config_bak.xml


23. Set versionlock to RPMs to prevent accident updates and check OS version release

[root@virt-mach-centos2 ]# yum versionlock \*
[root@virt-mach-centos1 ]# yum versionlock \*
[root@hypervisor-host1 ~]# yum versionlock \*
[root@hypervisor-host2 ~]# yum versionlock \*

[root@hypervisor-host2 ~]# cat /etc/redhat-release 
CentOS Linux release 7.8.2003 (Core)

Other useful hints

[root@hypervisor-host1 ~]# virsh console dc37c201-08c9-489d-aa20-9386d63ce3f3
Connected to domain virt-mach-centos1
..

! Compare packages count before the upgrade on each of the supposable identical VMs and HVs – if there is difference in package count review what kind of packages are different and try to make the machines to look as identical as possible  !

Packages to update on hypervisor-host1 Count: XXX
Packages to update on hypervisor-host2 Count: XXX
Packages to update virt-mach-centos1 Count: – 254
Packages to update virt-mach-centos2 Count: – 249

The /root/status script

+++

#!/bin/sh
echo  '=======================================================   '
echo  '= Systemctl list-unit-files –type=service | grep enabled '
echo  '=======================================================   '
systemctl list-unit-files –type=service | grep enabled

echo  '=======================================================   '
echo  '= systemctl | grep ".service" | grep "running"            '
echo  '=======================================================   '
systemctl | grep ".service" | grep "running"

echo  '=======================================================   '
echo  '= chkconfig –list                                        '
echo  '=======================================================   '
chkconfig –list

echo  '=======================================================   '
echo  '= netstat -tulpn                                          '
echo  '=======================================================   '
netstat -tulpn

echo  '=======================================================   '
echo  '= netstat -r                                              '
echo  '=======================================================   '
netstat -r


+++

That's all folks, once going through the article, after some 2 hours of efforts or so you should have an up2date machines.
Any problems faced or feedback is mostly welcome as this might help others who have the same setup.

Thanks for reading me 🙂

How to automate open xen Hypervisor Virtual Machines backups shell script

Tuesday, June 22nd, 2021

openxen-backup-logo As a sysadmin that have my own Open Xen Debian Hypervisor running on a Lenovo ThinkServer few months ago due to a human error I managed to mess up one of my virtual machines and rebuild the Operating System from scratch and restore files service and MySQl data from backup that really pissed me of and this brought the need for having a decent Virtual Machine OpenXen backup solution I can implement on the Debian ( Buster) 10.10 running the free community Open Xen version 4.11.4+107-gef32c7afa2-1. The Hypervisor is a relative small one holding just 7 VM s:

HypervisorHost:~#  xl list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0 11102    24     r—–  214176.4
pcfrxenweb                                  11 12288     4     -b—-  247425.5
pcfrxen                                     12 16384    10     -b—-  1371621.4
windows7                                    20  4096     2     -b—-   97887.2
haproxy2                                    21  4096     2     -b—-   11806.9
jitsi-meet                                  22  2048     2     -b—-   12843.9
zabbix                                      23  2048     2     -b—-   20275.1
centos7                                     24  2040     2     -b—-   10898.2

HypervisorHost:~# xl list|grep -v 'Name ' |grep  -v 'Domain-0'  |wc -l
7


The backup strategy of the script is very simple to shutdown the running VM machine, make a copy with rsync to a backup location the image of each of the Virtual Machines in a bash shell loop for each virtual machine shown in output of xl command and backup to a preset local directory in my case this is /backups/ the backup of each virtual machine is produced within a separate backup directory with a respective timestamp. Backup VM .img files are produced in my case to mounted 2x external attached hard drives each of which is a 4 Terabyte Seagate Plus Backup (Storage). The original version of the script was made to be a slightly different by Zhiqiang Ma whose script I used for a template to come up with my xen VM backup solution. To prevent the Hypervisor's load the script is made to do it with a nice of (nice -n 10) this might be not required or you might want to modify it to better suit your needs. Below is the script itself you can fetch a copy of it /usr/sbin/xen_vm_backups.sh :

#!/bin/bash

# Author: Zhiqiang Ma (http://www.ericzma.com/)
# Modified to work with xl and OpenXen by Georgi Georgiev – https://pc-freak.net
# Original creation dateDec. 27, 2010
# Script takes all defined vms under xen_name_list and prepares backup of each
# after shutting down the machine prepares archive and copies archive in externally attached mounted /backup/disk1 HDD
# Latest update: 08.06.2021 G. Georgiev – hipo@pc-freak.net

mark_file=/backups/disk1/tmp/xen-bak-marker
log_file=/var/log/xen/backups/bak-$(date +%Y_%m_%d).log
err_log_file=/var/log/xen/backups/bak_err-$(date +%H_%M_%Y_%m_%d).log
xen_dir=/xen/domains
xen_vmconfig_dir=/etc/xen/
local_bak_dir=/backups/disk1/tmp
#bak_dir=xenbak@backup_host1:/lhome/xenbak
bak_dir=/backups/disk1/xen-backups/xen_images/$(date +%Y_%m_%d)/xen/domains
#xen_name_list="haproxy2 pcfrxenweb jitsi-meet zabbix windows7 centos7 pcfrxenweb pcfrxen"
xen_name_list="windows7 haproxy2 jitsi-meet zabbix centos7"

if [ ! -d /var/log/xen/backups ]; then
echo mkdir -p /var/log/xen/backups
 mkdir -p /var/log/xen/backups
fi

if [ ! -d $bak_dir ]; then
echo mkdir -p $bak_dir
 mkdir -p $bak_dir

fi


# check whether bak runned last week
if [ -e $mark_file ] ; then
        echo  rm -f $mark_file
 rm -f $mark_file
else
        echo  touch $mark_file
 touch $mark_file
  # exit 0
fi

# set std and stderr to log file
        echo mv $log_file $log_file.old
       mv $log_file $log_file.old
        echo mv $err_log_file $err_log_file.old
       mv $err_log_file $err_log_file.old
        echo "exec 2> $err_log_file"
       exec 2> $err_log_file
        echo "exec > $log_file"
       exec > $log_file


# check whether the VM is running
# We only backup running VMs

echo "*** Check alive VMs"

xen_name_list_tmp=""

for i in $xen_name_list
do
        echo "/usr/sbin/xl list > /tmp/tmp-xen-list"
        /usr/sbin/xl list > /tmp/tmp-xen-list
  grepinlist=`grep $i" " /tmp/tmp-xen-list`;
  if [[ “$grepinlist” == “” ]]
  then
    echo $i is not alive.
  else
    echo $i is alive.
    xen_name_list_tmp=$xen_name_list_tmp" "$i
  fi
done

xen_name_list=$xen_name_list_tmp

echo "Alive VM list:"

for i in $xen_name_list
do
   echo $i
done

echo "End alive VM list."

###############################
date
echo "*** Backup starts"

###############################
date
echo "*** Copy VMs to local disk"

for i in $xen_name_list
do
  date
  echo "Shutdown $i"
        echo  /usr/sbin/xl shutdown $i
        /usr/sbin/xl shutdown $i
        if [ $? != ‘0’ ]; then
                echo 'Not Xen Disk image destroying …';
                /usr/sbin/xl destroy $i
        fi
  sleep 30

  echo "Copy $i"
  echo "Copy to local_bak_dir: $local_bak_dir"
      echo /usr/bin/rsync -avhW –no-compress –progress $xen_dir/$i/{disk.img,swap.img} $local_bak_dir/$i/
     time /usr/bin/rsync -avhW –no-compress –progress $xen_dir/$i/{disk.img,swap.img} $local_bak_dir/$i/
      echo /usr/bin/rsync -avhW –no-compress –progress $xen_vmconfig_dir/$i.cfg $local_bak_dir/$i.cfg
     time /usr/bin/rsync -avhW –no-compress –progress $xen_vmconfig_dir/$i.cfg $local_bak_dir/$i.cfg
  date
  echo "Create $i"
  # with vmmem=1024"
  # /usr/sbin/xm create $xen_dir/vm.run vmid=$i vmmem=1024
          echo /usr/sbin/xl create $xen_vmconfig_dir/$i.cfg
          /usr/sbin/xl create $xen_vmconfig_dir/$i.cfg
## Uncomment if you need to copy with scp somewhere
###       echo scp $log_file $bak_dir/xen-bak-111.log
###      echo  /usr/bin/rsync -avhW –no-compress –progress $log_file $local_bak_dir/xen-bak-111.log
done

####################
date
echo "*** Compress local bak vmdisks"

for i in $xen_name_list
do
  date
  echo "Compress $i"
      echo tar -z -cfv $bak_dir/$i-$(date +%Y_%m_%d).tar.gz $local_bak_dir/$i-$(date +%Y_%m_%d) $local_bak_dir/$i.cfg
     time nice -n 10 tar -z -cvf $bak_dir/$i-$(date +%Y_%m_%d).tar.gz $local_bak_dir/$i/ $local_bak_dir/$i.cfg
    echo rm -vf $local_bak_dir/$i/ $local_bak_dir/$i.cfg
    rm -vrf $local_bak_dir/$i/{disk.img,swap.img}  $local_bak_dir/$i.cfg
done

####################
date
echo "*** Copy local bak vmdisks to remote machines"

copy_remote () {
for i in $xen_name_list
do
  date
  echo "Copy to remote: vm$i"
        echo  scp $local_bak_dir/vmdisk0-$i.tar.gz $bak_dir/vmdisk0-$i.tar.gz
done

#####################
date
echo "Backup finishes"
        echo scp $log_file $bak_dir/bak-111.log

}

date
echo "Backup finished"

 

Things to configure before start using using the script to prepare backups for you is the xen_name_list variable

#  directory skele where to store already prepared backups
bak_dir=/backups/disk1/xen-backups/xen_images/$(date +%Y_%m_%d)/xen/domains

# The configurations of the running Xen Virtual Machines
xen_vmconfig_dir=/etc/xen/
# a local directory that will be used for backup creation ( I prefer this directory to be on the backup storage location )
local_bak_dir=/backups/disk1/tmp
#bak_dir=xenbak@backup_host1:/lhome/xenbak
# the structure on the backup location where daily .img backups with be produced with rsync and tar archived with bzip2
bak_dir=/backups/disk1/xen-backups/xen_images/$(date +%Y_%m_%d)/xen/domains

# list here all the Virtual Machines you want the script to create backups of
xen_name_list="windows7 haproxy2 jitsi-meet zabbix centos7"

If you need the script to copy the backup of Virtual Machine images to external Backup server via Local Lan or to a remote Internet located encrypted connection with a passwordless ssh authentication (once you have prepared the Machines to automatically login without pass over ssh with specific user), you can uncomment the script commented section to adapt it to copy to remote host.

Once you have placed at place /usr/sbin/xen_vm_backups.sh use a cronjob to prepare backups on a regular basis, for example I use the following cron to produce a working copy of the Virtual Machine backups everyday.
 

# crontab -u root -l 

# create windows7 haproxy2 jitsi-meet centos7 zabbix VMs backup once a month
00 06 1,2,3,4,5,6,7,8,9,10,11,12 * * /usr/sbin/xen_vm_backups.sh 2>&1 >/dev/null


I do clean up virtual machines Images that are older than 95 days with another cron job

# crontab -u root -l

# Delete xen image files older than 95 days to clear up space from backup HDD
45 06 17 * * find /backups/disk1/xen-backups/xen_images* -type f -mtime +95 -exec rm {} \; 2>&1 >/dev/null

#### Delete xen config backups older than 1 year+3 days (368 days)
45 06 17 * * find /backups/disk1/xen-backups/xen_config* -type f -mtime +368 -exec rm {} \; 2>&1 >/dev/null

 

# Delete xen image files older than 95 days to clear up space from backup HDD
45 06 17 * * find /backups/disk1/xen-backups/xen_images* -type f -mtime +95 -exec rm {} \; 2>&1 >/dev/null

#### Delete xen config backups older than 1 year+3 days (368 days)
45 06 17 * * find /backups/disk1/xen-backups/xen_config* -type f -mtime +368 -exec rm {} \; 2>&1 >/dev/null

luckyBackup Linux GUI back-up and synchronization tool

Wednesday, May 14th, 2014

luckybackup_best-linux-graphical-tool-for-backup_linux_gui-defacto-standard-tool
If you're a using GNU / Linux  for Desktop and you're already tired of creating backups by your own hacks using terminal and you want to make your life a little bit more easier and easily automate your important files back up through GUI program take a look at luckyBackup.

Luckibackup is a GUI frontend to the infamous rsync command line backup  tool. Luckibackup is available as a package in almost all modern Linux distributions its very easy to setup and can save you a lot of time especially if you have to manage a number of your Workplace Desktop Office Linux based computers.
Luckibackup is an absolute must have program for Linux Desktop start-up users. If you're migrating from Microsoft Windows realm and you're used to BackupPC, Luckibackup is probably the defacto Linux BackupPC substitute.

The sad news for Linux GNOME Desktop users is luckibackup is written in QT and it using it will load up a bit your notebook.
It is not installed by default so once a new Linux Desktop is installed you will have to install it manually on Debian and Ubuntu based Linux-es to install Luckibackup apt-get it.

debian:~# apt-get install --yes luckibackup
...

On Fedora and CentOS Linux install LuckiBackup via yum rpm package manager

[root@centos :~]# yum -y install luckibackup
.

Luckibackup is also ported for OpenSuSE Slackware, Gentoo, Mandriva and ArchLinux. In 2009 Luckibackup won the prize of Sourceforge Community Choice Awards for "best new project".

luckyBackup copies over only the changes you've made to the source directory and nothing more.
You will be surprised when your huge source is backed up in seconds (after the first backup).

Whatever changes you make to the source including adding, moving, deleting, modifying files / directories etc, will have the same effect to the destination.
Owner, group, time stamps, links and permissions of files are preserved (unless stated otherwise).

Luckibackup creates different multiple backup "snapshots".Each snapshot is an image of the source data that refers to a specific date-time.
Easy rollback to any of the snapshots is possible. Besides that luckibackup support Sync (just like rsync) od any directories keeping the files that were most recently modified on both of them.

Useful if you modify files on more than one PCs (using a flash-drive and don't want to bother remembering what did you use last. Luckibackup is capable of excluding certain files or directories from backupsExclude any file, folder or pattern from backup transfer.

After each operation a logfile is created in your home folder. You can have a look at it any time you want.

luckyBackup can run in command line if you wish not to use the gui, but you have to first create the profile that is going to be executed.
Type "luckybackup –help" at a terminal to see usage and supported options.
There is also TrayNotification – Visual feedback at the tray area informs you about what is going on.
 

 

 

Create Easy Data Backups with Rsnapshot back-up tool on GNU / Linux

Monday, April 15th, 2013

 

rsnapshot Linux and FreeBSD easy data backup tool logo
Backing up information on Linux servers is essential part of routine system adminsitrator job. Thus I decided to write for those interested in how one can easily create backups of important data through a tiny tool called rsnapshot which I prior used to make periodic data incremental backups on few of Debian Linux servers I manage. In case you wonder why use rsnapshot and not just rsync – the reasons are 2.
a. Rsnapshot is very easy to configure and use and you don't need to have deep understanding on  rsync numerous options to use it.
b. Rsnapshot does support incremental data backups – saving a lot of disk space on backup host.

 

 

 

Mentioning  incremental data backups for some those term might be a news so I will in short explain here what is Incremental Data Backups?

Incremental Data Backups are such backups which only create new backup of system scheduled files to backup only whether there are changes in files to backup or new ones are added to directory/directories set to be routinely backed up. Incremental backups are often desirable as they consume minimum storage space and are quicker to perform than normal periodic whole data archiving (differential backups). rsync has also support for incremental backups but configuring it to do so takes time and requires extra time on reading and understanding how they work, so I personally prefer simplicity rsnapshot brings.

1. Installing rsnapshot with apt-get

Here is rsnapshot debian package description;

debian:~#  apt-cache show rsnapshot|grep -i description -A 5

 

Description: local and remote filesystem snapshot utility
 rsnapshot is an rsync-based filesystem snapshot utility. It can take
 incremental backups of local and remote filesystems for any number of
 machines. rsnapshot makes extensive use of hard links, so disk space is
 only used when absolutely necessary.
Homepage: http://www.rsnapshot.org/

As you can read from description, rsnapshot is a frontend command using rsync to make data backups.

Install of rsnapshot is done through;

 debian:~# apt-get install --yes rsnapshot

Reading package lists… Done
Building dependency tree      
Reading state information… Done
The following NEW packages will be installed:
  rsnapshot
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/140 kB of archives.
After this operation, 598 kB of additional disk space will be used.
Selecting previously deselected package rsnapshot.
(Reading database … 87026 files and directories currently installed.)
Unpacking rsnapshot (from …/rsnapshot_1.3.1-1_all.deb) … –
Processing triggers for man-db …
Setting up rsnapshot (1.3.1-1) …

2. Rsnapshot  package content and Documentation

Once installed here is file content of rsnapshot deb package;

debian:~# dpkg -L rsnapshot

 

/.
/usr
/usr/share
/usr/share/doc-base
/usr/share/doc-base/rsnapshot
/usr/share/doc
/usr/share/doc/rsnapshot
/usr/share/doc/rsnapshot/TODO
/usr/share/doc/rsnapshot/changelog.gz
/usr/share/doc/rsnapshot/Upgrading_from_1.1.gz
/usr/share/doc/rsnapshot/examples
/usr/share/doc/rsnapshot/examples/rsnapshot.conf.default.gz
/usr/share/doc/rsnapshot/examples/utils
/usr/share/doc/rsnapshot/examples/utils/backup_mysql.sh
/usr/share/doc/rsnapshot/examples/utils/mysqlbackup.pl
/usr/share/doc/rsnapshot/examples/utils/random_file_verify.sh
/usr/share/doc/rsnapshot/examples/utils/rsnapreport.pl.gz
/usr/share/doc/rsnapshot/examples/utils/make_cvs_snapshot.sh
/usr/share/doc/rsnapshot/examples/utils/backup_pgsql.sh
/usr/share/doc/rsnapshot/examples/utils/rsnapshotdb
/usr/share/doc/rsnapshot/examples/utils/rsnapshotdb/CHANGES.txt
/usr/share/doc/rsnapshot/examples/utils/rsnapshotdb/rsnapshotDB.pl.gz
/usr/share/doc/rsnapshot/examples/utils/rsnapshotdb/INSTALL.txt
/usr/share/doc/rsnapshot/examples/utils/rsnapshotdb/TODO.txt
/usr/share/doc/rsnapshot/examples/utils/rsnapshotdb/rsnapshotDB.xsd
/usr/share/doc/rsnapshot/examples/utils/rsnapshotdb/rsnapshotDB.conf.sample
/usr/share/doc/rsnapshot/examples/utils/rsnapshotdb/README.txt
/usr/share/doc/rsnapshot/examples/utils/rsnapshot-copy
/usr/share/doc/rsnapshot/examples/utils/backup_rsnapshot_cvsroot.sh
/usr/share/doc/rsnapshot/examples/utils/backup_dpkg.sh
/usr/share/doc/rsnapshot/examples/utils/sign_packages.sh
/usr/share/doc/rsnapshot/examples/utils/mkmakefile.sh
/usr/share/doc/rsnapshot/examples/utils/rsnaptar
/usr/share/doc/rsnapshot/examples/utils/rsnapshot_invert.sh
/usr/share/doc/rsnapshot/examples/utils/rsnapshot_if_mounted.sh
/usr/share/doc/rsnapshot/examples/utils/README
/usr/share/doc/rsnapshot/examples/utils/debug_moving_files.sh
/usr/share/doc/rsnapshot/examples/utils/backup_smb_share.sh
/usr/share/doc/rsnapshot/README.gz
/usr/share/doc/rsnapshot/changelog.Debian.gz
/usr/share/doc/rsnapshot/copyright
/usr/share/doc/rsnapshot/README.Debian
/usr/share/doc/rsnapshot/html
/usr/share/doc/rsnapshot/html/rsnapshot-HOWTO.en.html
/usr/share/doc/rsnapshot/NEWS.Debian.gz
/usr/share/lintian
/usr/share/lintian/overrides
/usr/share/lintian/overrides/rsnapshot
/usr/share/man
/usr/share/man/man1
/usr/share/man/man1/rsnapshot.1.gz
/usr/share/man/man1/rsnapshot-diff.1.gz
/usr/bin
/usr/bin/rsnapshot-diff
/usr/bin/rsnapshot
/var
/var/cache
/var/cache/rsnapshot
/etc
/etc/cron.d
/etc/cron.d/rsnapshot
/etc/rsnapshot.conf
/etc/logrotate.d
/etc/logrotate.d/rsnapshot

To get basic idea, on rsnapshot and how it can be configured and run manually as well as how it can be set-up to run periodic via a cronjob README shipped with package is a good start point.

debian:~# zless /usr/share/doc/rsnapshot/README.gz
....

It is also useful to check program documentation in HTML, whether you have some text browser installed – i.e. lynx or links:

debian:~# links /usr/share/doc/rsnapshot/html/rsnapshot-HOWTO.en.html

Note that many of information in rsnapshot-HOWTO is related to how rsnapshot is installed manually from source, so for Deb based distro users reading these sections can be safely skipped. For Debian users hence it is useful to read howto from section 4.A onwards. man rsnapshot's Examle section is very good reading too as it gives a lot of use scenarios necessary in more complicated backup situations.

3. Configuring Rsnapshot – Setting Data Directories to Backup

Configuration of Rsnapshot is done through /etc/rsnapshot.conf file. There is plenty of comments in file, so opening in text editor and taking few minutes to read commented lines is necessery. Configuration options just like with most Linux tool config files is done through config directives, not commented.

debian:~# cat /etc/rsnapshot.conf |grep -v "#"|uniq

 

 

config_version    1.2

snapshot_root    /var/cache/rsnapshot/

cmd_rm        /bin/rm

cmd_rsync    /usr/bin/rsync

cmd_logger    /usr/bin/logger

interval    hourly    6
interval    daily    7
interval    weekly    4

verbose        2

loglevel    3

lockfile    /var/run/rsnapshot.pid

backup    /home/        localhost/
backup    /etc/        localhost/
backup    /usr/local/    localhost/

 

 

Above config options are clear to understand, there is interval of backups to set (hourly, daily, weekly), verbose level of rsnapshot backup operation log file, lockfile which will be used by rsnapshot to prevent duplicate rsnapshot runs and last backup directive in which you need to specify what needs to be backed up. In config file there is also commented variable for creating rsnapshot backup once a month

#interval   monthly 3

If you need to create backups once a month uncomment it.

In backup directive add all directories from filesystem which need to have routine backup, for example I keep my Apache Web server files in /var/www/, store various install software in
/root/

and keep backup of Qmail (Vpopmail) old emails kept in
/var/vpopmail
.
To make rsnapshot backup those I add after rest of backup directives:

backup  /var/www/   localhost/
backup  /var/vpopmail/  localhost/
backup  /root/  localhost/


It is good practice to change snapshot_root directive to /root/.backups or whether you prefer to keep snapshot_root to default /var/cache/rsnapshot at least link with ln command /root/.backups to -> /root/.backups.

debian:~# ln -sf /var/cache/rsnapshot /root/.backups

If you change snapshot_root to /root/.backups, don't forget to create /root/.backups and set chmod  dir persmissions only readable to owner, i.e.:

debian:~# mkdir /root/.rsnapshot
debian:~# chmod -R 700 /root/.backups

Note that, it is important to use tab delimiters, everywhere in /etc/rsnapshot.conf, if you use space key delimiter instead of Tab you will end up with errors preventing rsnapshot to run.

4. Testing rsnapshot configuration and launching it first time

I will say it once again use Tab key for delimiters in config. It was my mistake on first time Rsnapshot launch to use spaces to delimiter my config options, thus testing my configuration, rsnapshot print an error and failed:

debian:~# rsnapshot configtest

 

———————————————————
rsnapshot encountered an error! The program was invoked with these options: /usr/bin/rsnapshot configtest ———————————————————
ERROR: /etc/rsnapshot.conf on line 199: ERROR: backup /var/www/ localhost/
ERROR: ———————————————————
ERROR: Errors were found in /etc/rsnapshot.conf, ERROR: rsnapshot can not continue. If you think an entry looks right, make
ERROR: sure you don't have spaces where only tabs should be.  

After changing, Space delimiters with Tabs and re-running rsnapshot configtest if all fine you get:

debian:~# rsnapshot configtest
Syntax OK

Once all good with config to launch Rsnapshot do its first complete incremental data backup, to display what rsnapshot will backup and what exact rsync invocations will it use type:


debian:~# rsnapshot -t hourly

echo 5644 > /var/run/rsnapshot.pid
mv /var/cache/rsnapshot/hourly.2/ /var/cache/rsnapshot/hourly.3/
mv /var/cache/rsnapshot/hourly.1/ /var/cache/rsnapshot/hourly.2/
native_cp_al("/var/cache/rsnapshot/hourly.0", \
    "/var/cache/rsnapshot/hourly.1")
/usr/bin/rsync -a –delete –numeric-ids –relative –delete-excluded /home \
    /var/cache/rsnapshot/hourly.0/localhost/
/usr/bin/rsync -a –delete –numeric-ids –relative –delete-excluded /etc \
    /var/cache/rsnapshot/hourly.0/localhost/
/usr/bin/rsync -a –delete –numeric-ids –relative –delete-excluded \
    /usr/local /var/cache/rsnapshot/hourly.0/localhost/
/usr/bin/rsync -a –delete –numeric-ids –relative –delete-excluded \
    /var/www /var/cache/rsnapshot/hourly.0/localhost/
/usr/bin/rsync -a –delete –numeric-ids –relative –delete-excluded \
    /var/vpopmail /var/cache/rsnapshot/hourly.0/localhost/
/usr/bin/rsync -a –delete –numeric-ids –relative –delete-excluded /root \
    /var/cache/rsnapshot/hourly.0/localhost/
touch /var/cache/rsnapshot/hourly.0/

To launch backup first time manually:

debian:~# rsnapshot hourly

Depending on backupped data (Mega/Giga/Terabytes) size and the number of files which had to be backed up, backup takes from minutes to hours.
Note that it is always good idea to create backups on separate hard disk configured in some kind of RAID array, preferrably (RAID 1 or RAID 5). Creating backups on separate hard disk has numerous advantages, the most important one is it doesn't put too much Input / Output (I/O) stress on hard disk and thus will not create server downtimes on High traffic – Busy servers slow old Hard Disks or servers with Big amount of I/O HDD read/writes .

5. Enabling Rsnapshot to create backups via scheduled cron job

On package install Rsnapshot creates a skele file for running via cronjob in /etc/cron.d/rsnapshot.

debian:~# cat /etc/cron.d/rsnapshot

 

 

# This is a sample cron file for rsnapshot.
# The values used correspond to the examples in /etc/rsnapshot.conf.
# There you can also set the backup points and many other things.
#
# To activate this cron file you have to uncomment the lines below.
# Feel free to adapt it to your needs.

# 0 */4        * * *        root    /usr/bin/rsnapshot hourly
# 30 3      * * *        root    /usr/bin/rsnapshot daily
# 0  3      * * 1        root    /usr/bin/rsnapshot weekly
# 30 2      1 * *        root    /usr/bin/rsnapshot monthly
 

To make hourly, daily, weekly, monthly backup uncomment one of above 4 lines. For paranoid admins scared to loose even a bit of data, hourly data is a good solution. For me personally I prefer configuring weekly backups for the reason I routinely monitor servers – keeping an eye regularly on dmesg and checking Linux smard / smartmontools logs to find out whether a hard disk or RAID has bad blocks

6. Checking backup size / backup difference and backup structure

Checking size of backups can be done by using standard du command on backup directory:

debian:~# du -hsc /var/cache/rsnapshot/*
4.3G /var/cache/rsnapshot/hourly.0
4.5M /var/cache/rsnapshot/hourly.1
68M /var/cache/rsnapshot/hourly.2
4.4G total

rsnapshot also has du argument via which backup size can be viewed:

debian:~# rsnapshot du 4.3G /var/cache/rsnapshot/hourly.0/
4.5M /var/cache/rsnapshot/hourly.1/
68M /var/cache/rsnapshot/hourly.2/
4.4G total

As you can see each new incremental backup is with new number after hourly{0,1,2} etc.

To check difference between two different backups:

debian:~# rsnapshot diff /var/cache/rsnapshot/hourly.0/ /var/cache/rsnapshot/hourly.1/
Comparing /var/cache/rsnapshot/hourly.1 to /var/cache/rsnapshot/hourly.0
Between /var/cache/rsnapshot/hourly.1 and /var/cache/rsnapshot/hourly.0:
660 were added, taking 3728377727 bytes;
492 were removed, saving 17623 bytes;

Structure of backed up files is identical to normal copy of files without any compression:

debian:~# cd /root/.backups/hourly.0/localhost/
debian:~/.backups/hourly.0/localhost# ls

etc/ home/ root/ usr/ var/

 

7. Restoing files or directory from rsnapshot backup

To restore lets say /var directory cd into it:

debian:~/.backups/hourly.0/localhost# cd var
debian:~/.backups/hourly.0/localhost/var#

Then use rsync as follows:

debian:~/.backups/hourly.0/localhost/var# rsync -avr * /
 

 

8. Creating rsnapshot backups from remote server via SSH protocol

In /etc/rsnapshot.conf you should have set SSH port on which remote server is accepting SSH connections. Standard port is 22, however it is wise to configure on backup server SSH to listen to some other non standard port.

In config variables to look on are:

ssh_args -p 22

and

Onwards to enable remote login via ssh uncomment in /etc/rsnapshot.conf :

# cmd_ssh /usr/bin/ssh

to

cmd_ssh /usr/bin/ssh

Before starting rsnapshot to create backups on remote host2 you need to Configure automatic SSH passwordless login by generating DSA or RSA key pair between host1 and host2. Where host1 is machine on which rsnapshot is run and to which backups will be copied from host2
Once passwordless ssh to remote host is active, to force rsnapshot create backups from host1 you will need to add near end of /etc/rsnapshot.conf .

backup  root@host2.com:/root/ host2.com/

The same way you can add a number of remote hosts from which periodic backups will be created to central host1. Only condition is on each node – host3, host4, host5.

backup  root@host3.com:/root/root/ host3.com
backup  root@host4.com:/home/ host4.com
backup  root@host4.com:/var/ host4.com

To create on host1 public key (id_dsa.pub) file with command:

debian:~# ssh-keygen -t dsa
...
....
debian:~# ssh-copy-id -i ~/.ssh/id_dsa.pub root@host3

Once all hosts that needs to get backed up to central backup host – host1. To test if backups gets uploaded manually issue:

debian:~# rsnapshot -v hourly
...

Rsnapshot has a number of other scripts which can be easily integrated with it in /usr/share/doc/rsnapshot/examples/utils.
Inside you can find example scripts on how to create MySQL / PostgreSQL database backup, Samba Share backups, backup CVS repositories and so on. The scripts can be easily modified and work with mostly any data or protocol with a bit of tweaking. Short description of each of example scripts can be found in /usr/share/doc/rsnapshot/examples/utils/README

How to solve “Incorrect key file for table ‘/tmp/#sql_9315.MYI’; try to repair it” mysql start up error

Saturday, April 28th, 2012

When a server hard disk scape gets filled its common that Apache returns empty (no content) pages…
This just happened in one server I administer. To restore the normal server operation I freed some space by deleting old obsolete backups.
Actually the whole reasons for this mess was an enormous backup files, which on the last monthly backup overfilled the disk empty space.

Though, I freed about 400GB of space on the the root filesystem and on a first glimpse the system had plenty of free hard drive space, still restarting the MySQL server refused to start up properly and spit error:

Incorrect key file for table '/tmp/#sql_9315.MYI'; try to repair it" mysql start up error

Besides that there have been corrupted (crashed) tables, which reported next to above error.
Checking in /tmp/#sql_9315.MYI, I couldn't see any MYI – (MyISAM) format file. A quick google look up revealed that this error is caused by not enough disk space. This was puzzling as I can see both /var and / partitions had plenty of space so this shouldn't be a problem. Also manally creating the file /tmp/#sql_9315.MYI with:

server:~# touch /tmp/#sql_9315.MYI

Didn't help it, though the file created fine. Anyways a bit of a closer examination I've noticed a /tmp filesystem mounted besides with the other file system mounts ????
You can guess my great amazement to find this 1 Megabyte only /tmp filesystem hanging on the server mounted on the server.

I didn't mounted this 1 Megabyte filesystem, so it was either an intruder or some kind of "weird" bug…
I digged in Googling to see, if I can find more on the error and found actually the whole mess with this 1 mb mounted /tmp partition is caused by, just recently introduced Debian init script /etc/init.d/mountoverflowtmp.
It seems this script was introduced in Debian newer releases. mountoverflowtmp is some kind of emergency script, which is triggered in case if the root filesystem/ space gets filled.
The script has only two options:

# /etc/init.d/mountoverflowtmp
Usage: mountoverflowtmp [start|stop]

Once started what it does it remounts the /tmp to be 1 megabyte in size and stops its execution like it never run. Well maybe, the developers had something in mind with introducing this script I will not argue. What I should complain though is the script design is completely broken. Once the script gets "activated" and does its job. This 1MB mount stays like this, even if hard disk space is freed on the root partition – / ….

Hence to cope with this unhandy situation, once I had freed disk space on the root partition for some reason mountoverflowtmp stop option was not working,
So I had to initiate "hard" unmount:

server:~# mount -l /tmp

Also as I had a bunch of crashed tables and to fix them, also issued on each of the broken tables reported on /etc/init.d/mysql start start-up.

server:~# mysql -u root -p
mysql> use Database_Name;
mysql> repair table Table_Name extended;
....

Then to finally solve the stupid Incorrect key file for table '/tmp/#sql_XXYYZZ33444.MYI'; try to repair it error, I had to restart once again the SQL server:

Stopping MySQL database server: mysqld.
Starting MySQL database server: mysqld.
Checking for corrupt, not cleanly closed and upgrade needing tables..
root@server:/etc/init.d#

Tadadadadam!, SQL now loads and works back as before!

Wine in The Central Park

Friday, January 11th, 2008

I and Alex Drunk Asenovgrad’s Mavrut in the central park it was very cold, but at least an experience. Right now I’m a little hot (cause of the wine). I’m urinating too often recently and it drives me mad. Also I’m a kind of lost I told Sasho about Torsion fields and stuff but he make a fun of that no matter I think that’s a serious matter here is an interview with Nikolay Palushev that may be of an interest to the reader
http://www.spiralata.net/kratko/articles.php?lng=bg&pg=128 .

Today I had English in the college pretty boring haven’t had a lot of work I had to fix few binary permissions part of a postfix also delete some old backups and create a new samba share some mail server problems for few minutes I think that’s all … I hate this world so much everything is so useless and awful. I hope to meet God soon …

Just about to go crazy

Monday, July 30th, 2007

The weekend was disastrous. A lot of problems with Valueweb. Missing servers online, missing information about them hard times connecting with their technical support. Lack of information about what’s happening server outages. Using backups of sites changing DNS-es … this is a simple sample of what’s happening and continues to some degree with our servers. Valueweb are a really terrible choice for a Dedicated hosting. But we started with them before so much time it’s hard to change the provider easily. At least the mail server is working normally now. Thanks to God at least I’m provided with ernergy to work, think and fix things Intensively. As Usual PRAISE THE LORD ! :] Hope everything would be back to normal in a few days period. END—–

Using rsync to copy / synchronize files or backups between Linux / BSD / Unix servers

Monday, November 21st, 2011

Rsync and Rsync over ssh logo picture

Many of us have already taken advantage of the powerful Rsync proggie, however I'm quite sure there are still people who never used rsync to transfer files between servers.. That's why I came with this small post to possibly introduce rsync to my blog readers.
Why Rsync and not Scp or SFTP? Well Rsync is designed from the start for large files transfer and optimized to do the file copying job really efficient. Some tests with scp against rsync will clearly show rsync's superiority.
Rsync is also handy to contiue copying of half copied files or backups and thus in many cases saves bandwidth and machine hdd i/o operations.

The most simple way to use rsync is:

server:~# rsync -avz -e ssh remoteuser@remotehost:/remote/directory /local/directory/

Where remoteuser@remotehost — is the the username and hostname of remote server to copy files to.
/remote/directory — is the directory where the rsync copied files should be stored
/local/directory — is the local directory from which files will be copied to remote directory

If not a preliminary passwordless ssh key (RSA / DSA) authentication is configured on remote server, the above command will prompt for a password otherwise the rsync will start doing the transfer.

If one needs to have a RSA or DSA (public / private key) passwordless SSH key authentication , a RSA key first should be generated and copied over to the remote server, like so:

server:~# ssh-keygen -t dsa
...
server:~# ssh-copy-id -i ~/.ssh/id_dsa.pub root@remotehost
...

That's all folks, enjoy rsyncing 😉