Archive for the ‘Monitoring’ Category

How to Install and Use Grafana Loki on Linux for mupltiple server Log Metrics Monitoring

Tuesday, March 31st, 2026

how-to-install-and-use-grafana-loki-on-linux-for-log-metrics-monitoring-for-multiple-server-observability-logo
Grafana Loki
has become a popular choice for log management on Linux systems, nowadays, because free software like under AGPLv3 licence, it’s lightweight, cost-efficient, and integrates seamlessly with modern observability stacks. Unlike traditional log systems, Loki focuses on indexing metadata (labels) instead of full log content, which makes it especially attractive for Linux environments where logs can grow quickly.

Grafana Loki can be used to create fully featured logging stack. It has a small index and highly compressed chunks which simplifies the operation and significantly lowers the Storage expense of it.
Unlike other logging systems, Loki is built around the idea of only indexing metadata about your logs labels (just like Prometheus labels).
Log data itself is then compressed and stored in chunks in object stores such as Amazon Simple Storage Service (S3) or Google Cloud Storage (GCS), or even locally on the filesystem.

In this article will give you some real-world, practical usage of Loki on Linux, from its setup from zero to day-to-day use workflows.

Reasons why to use Loki on Linux ?

Linux systems generate logs mainly in /var/log but often used extra installed Apps tend to log in different locations for easier log distinguishment, e.g.
logs location might lack a good structure (be everywhere) :

Some common example locations, where logs are stored

  • /var/log/syslog
  • /var/log/auth.log
  • Application logs (/opt/app/logs/*.log)
  • Container logs, are kept within respective container ( Docker /  PodMan Kubernetes )

Sonner or later if you have to manage a large infrastructure of servers you end up, it is pretty easy to end up in a log mess.

This is exaclty where Loki helps you solve:

  • Centralize logs from multiple machines (within Grafana)
  • Search logs efficiently using log craeted labels
  • Correlate logs with metrics in Grafana

Loki Architecture Overview


loki-use-stack-chain-diagram-from-cloud-to-grafana

A typical Loki setup on Linux has 3 components:

  1. Loki server -> stores and queries logs
  2. Promtail -> collects logs from the around the system
  3. Grafana -> Use it to visualizes and queries logs

Promtail acts like a lightweight agent that tails log files and sends them to Loki.

I. Installing Loki on Linux

1. Download Loki

$ cd /usr/local/src
$ wget https://github.com/grafana/loki/releases/latest/download/loki-linux-amd64
$ chmod +x loki-linux-amd64
# mv loki-linux-amd64 /usr/local/bin/loki

2. Create a simple config like

auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    address: 127.0.0.1
  chunk_idle_period: 5m

schema_config:
  configs:
    – from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  filesystem:
    directory: /var/lib/loki/chunks

3. Run Loki

# loki -config.file=loki.yaml


Hopefully if all is okay with loki.yaml config the service will start.

a. Installing Promtail (Log Collection)

Example  config (to modify to your preferences):

scrape_configs:
  – job_name: linux-logs
    static_configs:
      – targets:
          – localhost
        labels:
          job: syslog
          host: my-linux-server
          __path__: /var/log/*.log

This collects all logs in /var/log/ and labels them.

b. Run Promtail

# promtail -config.file=promtail.yaml

! Note that loki and promtail it is run as root (to have permissions to files which will be processed). This is not the best practice, so for security reasons,
if you have the necessery storage move out the files to a central log aggregator directory with a script set a unprevileged non-root user for it and run the services with those user.

c. Run loki / promtail as non-root user:

Once tested it runs, it is good idea to run two tools with non-root user, i.e.:
Run promtail as a dedicated user (e.g., promtail).

Add that user to groups like:

adm (for /var/log)

systemd-journal (for journal logs)
Adjust file permissions if needed

# useradd –system –no-create-home promtail
# usermod -aG adm promtail

$ loki -config.file=loki.yaml
$ promtail -config.file=promtail.yaml

II. Practical Use Cases of Loki on Linux

1. System Troubleshooting

One good use of Loki is to Search for errors in syslog:

{job="syslog"} |= "error"

By this you can Quickly diagnose:

  • Boot issues
  • Service failures
  • Kernel errors

2. SSH Login Monitoring

Track login attempts from /var/log/auth.log for many VM hosts:

{job="syslog"} |= "sshd"

You can detect:

  • Failed login attempts
  • Brute-force attacks
  • Unauthorized access

3. Application Debugging (look for exceptions)

If your app logs to /var/log/app.log and you App running it, to get a view on java thrown exceptions:

{job="app"} |= "exception"

This use case can Help developers to:

  • Trace bugs
  • Monitor runtime issues
  • Correlate logs with deployments

4. Multi-Server Log Aggregation

Once you run Promtail on multiple Linux servers:

labels:
  host: server1

Then you can do query to extract collected data for each one if it:

{job="syslog", host=~"server1|server2"}

This makes multiple machines behave like one unified log source.

5. Log-Based Metrics

You can extract metrics from logs:

count_over_time({job="syslog"} |= "error" [5m])

Use this for:

  • Alerting
  • Error rate tracking
  • Incident detection

III. Using Grafana for Visualization

In Grafana, you can:

  • View logs in real time
  • Build dashboards
  • Create alerts based on log patterns

Example use would be:

Create Grafana Panel showing error rate per host and Alert when errors exceed a threshold.

loki-log-drill-down-sample-in-grafana

Good Practices on Loki use

1. Always Use Meaningful Labels

Example for Good label should contain as many descriptory parameters as possible:

labels:
  app: nginx
  env: prod
  virtualization: vmware
  type: Middleware
  service:: proxy
  Customer: customerA

Bad obscure label:

labels:
  request_id: 123456  


2. Avoid Too many Unique labels

Keep in mind Too many unique labels leads to poor performance !.

3. Rotate Logs Properly and optimize with Secure Loki Endpoint

Loki won't manage your internal logs, as it can well complement ( but not replaces ), on Server / VM traditional tools like journalctl / grep / logrotate. but just give you a better overview of what is inside of service spit logs based on easy to give criterias from Grafana.
You will still need usually at best scenario to  setup of a Central Logging Server (to store all Infrastucture logs).
Consider also that sending data from your logs with Loki, like with a zabbix client it is always a idea to have reverse proxy like NGINX or Haproxy to reduce Network bandwith and for better management centralization of the infra.

4. Secure Loki Endpoint

  • Use reverse proxy (NGINX)
  • Enable authentication in production

Closure Summary

On Linux, Grafana Loki can help when:

  • You have multiple servers
  • Logs are growing fast
  • You need centralized  and relatively easy observability

Loki has its downtimes too as processing the logs to really extract data hits a high CPU use. Running it on a multiple machines is useful,
especially if your machines has high unutilized CPU IDLE time and you want to make the log data collection per server based being so to say partially duplicated and indepdendent from centralized logging. .
For high scale infrastructure, however sysadmins prefer to use an ELK OpenSearch Stack or log databases such as:
VictoriaLogs. With having infrastrcture of 100 servers or so perhaps setting up with some Ansible automation Loki makes sense.
Loki
is not meant to replace databases or full-text search engines, but great often for simple  log aggregation and analysis and of the simplistic tools available today.

How to Install and Use Kibana for Log Visualization

Wednesday, February 18th, 2026

/images/kibana-logo how to install it on linux
I saw Kibana in my professional career and I find it a very interesting tool for sysadmins, so I thought it might be helpful to someone out there to write a small article on how to install and use to to visualize data inside some elasticsearch software.

Kibana is an open-source data visualization and exploration tool used to analyze large volumes of data, especially logs. It is part of the ELK Stack (Elasticsearch, Logstash, Kibana), and is commonly used for centralized log management, security monitoring, and observability.

Kibana is often used in the so-called ELK pipeline for log file collection, analysis and visualization:

  • Elasticsearch is for searching, analyzing, and storing your data
  • Logstash (and Beats) is for collecting and transforming data, from any source, in any format
  • Kibana is a portal for visualizing the data and to navigate within the elastic stack
     

In this article, you'll learn how to:

  • Install Kibana
  • Connect it to Elasticsearch
  • Visualize log data
  • Use its basic features

Prerequisites

Before installing Kibana, make sure you have the following:

  • A Linux server running (Ubuntu / Debian / CentOS / RHEL)
  • Elasticsearch installed and running
  • Root or sudo access

Install Kibana

I. On Debian/Ubuntu
 

  1. Import the Elastic GPG key:

# wget -qO – https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add –

  1. Add the repository:

# echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-8.x.list

  1. Update and install:


# apt update

# apt install kibana

II. On RHEL/CentOS Linux

  1. Create repo file:

# tee /etc/yum.repos.d/elastic.repo <<EOF

[elastic-8.x]

name=Elastic repository for 8.x packages

baseurl=https://artifacts.elastic.co/packages/8.x/yum

gpgcheck=1

gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch

enabled=1

autorefresh=1

type=rpm-md

EOF

  1. Install Kibana:

# yum install kibana

2. Configure Kibana

The configuration file is located at:

/etc/kibana/kibana.yml

Edit the file:

# vim /etc/kibana/kibana.yml

Update or add the following:
 

# Server settings
server.port: 5601
server.host: "0.0.0.0"

# Elasticsearch connection
elasticsearch.hosts: [“http://localhost:9200”]

# Logging
logging.level: info

# Security (only if Elasticsearch security is enabled)
# elasticsearch.username: "kibana_system"
# elasticsearch.password: "your_password_here"

Optional: Set basic auth or SSL settings if needed.

 

3. Start and Enable Kibana

# systemctl enable kibana

# systemctl start kibana

Check status:

# systemctl status kibana

 

4. Access Kibana Web Interface

Open your browser and go to:

http://<your-server-ip>:5601

You’ll be welcomed with the Kibana dashboard.

5. Import and Visualize Logs

Option A: Use Filebeat to Send Logs

Install Filebeat on the server with logs and configure it to send data to Elasticsearch. Kibana will then be able to visualize it.

# apt install filebeat

# filebeat modules enable system

# filebeat setup

# systemctl start filebeat

Option B: Ingest Logs via Logstash or Elasticsearch API

If you already have data in Elasticsearch, Kibana will automatically detect indices.
 

6. Create Index Pattern

  1. In Kibana, go to Stack Management -> Index Patterns
  2. Click Create Index Pattern
  3. Enter the name (e.g., filebeat-*)
  4. Select the timestamp field (usually @timestamp)
  5. Save

Now Kibana knows how to query and visualize your data.

7. Create Visualizations and Dashboards

  1. Go to Visualize -> Create visualization
  2. Choose a type (bar, pie, line, etc.)
  3. Select an index pattern
  4. Configure metrics and buckets

You can then save visualizations and add them to dashboards.

8. Secure Kibana

  • Configure TLS/SSL for Kibana / ElasticSearch (such as Logstash)
  • Use additional Elastic Security features like RBAC (Role Based Access Control, SSO (Single Sign On)
  • Secure Kibana with a reverse proxy (e.g., Nginx + Basic Auth or Apache / Haproxy infront)

Example Nginx config simple snippet:

location / {

  proxy_pass http://localhost:5601;

  auth_basic "Restricted";

  auth_basic_user_file /etc/nginx/.htpasswd;

}

 

What is Kibana used for and what it can do for you?

Use Case

Description

Log Monitoring

Visualize system and application logs in real time

Security Analytics

Detect anomalies, failed logins, suspicious activity

DevOps Dashboards

Track uptime, error rates, and system performance

SIEM

Use Elastic Security for threat detection

 

Once Kibana is installed on a server, you typically use it to visualize and explore data stored in Elasticsearch. Here’s a practical guide with sample usage scenarios:

Access Kibana

After installation, Kibana usually runs on port 5601 by default.

http://<your-server-ip>:5601

  • Open this URL in a browser.
  • You should see the Kibana dashboard.

Connect to Elasticsearch

Kibana automatically connects to your Elasticsearch instance if installed locally.
You can verify the connection:

GET /_cluster/health

  • Go to Dev ToolsConsole in Kibana.
  • Run the above query to check cluster status.

Visualize Data

Kibana allows multiple types of visualizations:

  • Bar/line chart: trends over time.
  • Pie chart: distribution of values.
  • Data table: top IP addresses or most visited URLs.
  • Maps: geolocation of IP addresses.

Create Dashboards

  • Combine multiple visualizations in a Dashboard.
  • Useful for monitoring logs, metrics, or application performance.
  • Example: Create a dashboard with:

     

    • Requests per URL (bar chart)
    • Requests over time (line chart)
    • Top client IPs (data table)
    • Errors by type (pie chart)

 Search & Query Logs

  • Use Discover to search logs interactively.
  • Example KQL query:

status:500 AND url:"/login"

This finds all failed login requests.

Set Alerts (Optional)

  • Kibana’s Alerts and Actions can trigger notifications (email, Slack, etc.) when certain thresholds are crossed.
  • Example: alert if error responses exceed 100 in 5 minutes.

Once Kibana is installed on a server, you typically use it to visualize and explore data stored in Elasticsearch. Here’s a practical guide with sample usage scenarios:

Access Kibana

After installation, Kibana usually runs on port 5601 by default.

http://<your-server-ip>:5601

  • Open this URL in a browser.
  • You should see the Kibana dashboard.

Connect to Elasticsearch

Kibana automatically connects to your Elasticsearch instance if installed locally.
You can verify the connection:

GET /_cluster/health

  • Go to Dev ToolsConsole in Kibana.
  • Run the above query to check cluster status.

Visualize Data

Kibana allows multiple types of visualizations:

  • Bar/line chart: trends over time.
  • Pie chart: distribution of values.
  • Data table: top IP addresses or most visited URLs.
  • Maps: geolocation of IP addresses.

Create Dashboards

  • Combine multiple visualizations in a Dashboard.
  • Useful for monitoring logs, metrics, or application performance.
  • Example: Create a dashboard with:
     

    • Requests per URL (bar chart)
    • Requests over time (line chart)
    • Top client IPs (data table)
    • Errors by type (pie chart)

 Search & Query Logs

  • Use Discover to search logs interactively.
  • Example KQL query:

status:500 AND url:"/login"

This finds all failed login requests.

Set Alerts (Optional)

  • Kibana’s Alerts and Actions can trigger notifications (email, Slack, etc.) when certain thresholds are crossed.
  • Example: alert if error responses exceed 100 in 5 minutes.

Once Kibana is installed on a server, you typically use it to visualize and explore data stored in Elasticsearch. Here’s a practical guide with sample usage scenarios:

Access Kibana

After installation, Kibana usually runs on port 5601 by default.

http://your-server-ip:5601

  • Open this URL in a browser.
  • You should see the Kibana dashboard.

Connect to Elasticsearch

Kibana automatically connects to your Elasticsearch instance if installed locally.
You can verify the connection:

GET /_cluster/health

  • Go to Dev ToolsConsole in Kibana.
  • Run the above query to check cluster status.

Visualize Data

Kibana allows multiple types of visualizations:

  • Bar/line chart: trends over time.
  • Pie chart: distribution of values.
  • Data table: top IP addresses or most visited URLs.
  • Maps: geolocation of IP addresses.

Create Dashboards

  • Combine multiple visualizations in a Dashboard.
  • Useful for monitoring logs, metrics, or application performance.
  • Example: Create a dashboard with:

    • Requests per URL (bar chart)
    • Requests over time (line chart)
    • Top client IPs (data table)
    • Errors by type (pie chart)

 Search & Query Logs

  • Use Discover to search logs interactively.
  • Example KQL query:

status:500 AND url:"/login"

This finds all failed login requests.

Set Alerts (Optional)

  • Kibana’s Alerts and Actions can trigger notifications (email, Slack, etc.) when certain thresholds are crossed.
  • Example: alert if error responses exceed 100 in 5 minutes.

kibana-sample-dashboard-screenshot

Sample Kibana dashboard
 

kibana-geo-kibana-web-traffic-by-location

Kibana with connected servers to find out Geo Location
 

Summary closing words (what we did)

Step

Action

 1

Install Kibana from Elastic repo

2

Configure to connect to Elasticsearch

3

Start and enable the service

4

Access it via http://<ip>:5601

5

Ingest log data

6

Define index pattern

7

Create dashboards and visualizations

The idea of this article was just to introduce you to the existence of Elasticsearch / kibana and filebeat and logstack and not to give you a fully fine tuned install guide. The usual way to deploy Kibana on multiple servers of course is using a dockerized container version of it. There is plenty to learned on how to use kibana to do a monitoring of your machines. But most simple use is to directly access the locally visible kibana on a server and check the status of processes on the host instead of logging via SSH. Kibana can do pretty much


Some further useful Reading Resources

 

How to Install and Use Netdata on Debian / Ubuntu Linux for Real-Time Server Monitoring

Wednesday, October 29th, 2025

netdata-server-monitoring-simple-tool-for-linux-bsd-windows

Monitoring system performance is one of the most overlooked aspects of server administration – until something breaks. That is absolutely true for beginners and home brew labs for learnings on how to manage servers and do basic system administration. For novice monitoring servers and infrastructure is not a big deal as usually testing machines are OK to break things. But still if you happen to host yourself some own brew websites a blog or Video streaming servers a torrent server, a tor server, some kind of public proxy or whatever of community advantageous solution for free at home, then monitoring will become important topic at some time when your small thing becomes big.

 While many sysadmins reach for Prometheus or Zabbix, using those comes with much of a complexity and effort to put in installing + setting up the overall server and agent clients as well as configure different monitoring checks.

Anyways sometimes you just need a lightweight, zero-configuration server monitoring tool that gives you instant visibility and you don’t want to learn much to have the basic and a bit of heuristic monitoring in the house.

That’s where Netdata shines.

What is Netdata?

Netdata is an open-source, real-time performance monitoring tool that visualizes CPU, memory, disk, network, and process metrics directly in your browser. It’s written in C and optimized for minimal overhead, making it perfect for both old hardware and modern VPS setups.

Netdata is built to primarly monitor different Linux OS distributions such as CentOS / RHEL / AlmaLinux / Rocky Linux / OpenSuse / Arch Linux / Amazon Linux and Oracle Linux it also has support for Windows OS as well as partially supports Mac OS and FreeBSD / OpenBSD / Dragonfly BSD and some other BSDs but some of its Linux features are not available on those.

1. Install Netdata real time performance monitoring tool


root@pcfreak:~# apt-cache show netdata|grep -i description -A3

Description-en: real-time performance monitoring (metapackage)

 Netdata is distributed, real-time, performance and health monitoring for

 systems and applications. It provides insights of everything happening on the

 systems it runs using interactive web dashboards.

Description-md5: 6843dd310958e94a27dd618821504b8e

Homepage: https://github.com/netdata/netdata

Section: net

Priority: optional

Netdata runs on most Linux distributions. On Debian or Ubuntu, installation is dead simple:

# apt update

# apt install curl -y


To be with the latest version since the default repositories provide usually older release you can use for install directly the kickstart installer from netdata.

# bash <(curl -Ss https://my-netdata.io/kickstart.sh)

This script automatically installs dependencies, sets up the Netdata daemon, and enables it as a systemd service.

Once installed, Netdata starts immediately. You can verify with:

# systemctl status netdata

2. Access the Web Dashboard


Open your browser and navigate to:

http://your-server-ip:19999/

You’ll be greeted by an intuitive dashboard showing real-time graphs of every aspect of your system — CPU load, memory usage, disk I/O, network bandwidth, and more.

If you’re running this on a public server, make sure to secure access with a firewall or reverse proxy, since by default Netdata is open to all IPs.

Example (UFW):

# ufw allow from YOUR.IP.ADDRESS.HERE to any port 19999

# ufw enable

3. Enable Persistent Storage

By default, Netdata stores only live data in memory. To retain historical data:
 

# vim /etc/netdata/netdata.conf

Find the [global] section and modify:

[global]

  history = 86400

  update every = 1

This keeps 24 hours of data at one-second resolution. You can also connect Netdata to a database backend for long-term archiving.

4. Secure with Basic Auth (Optional)

If you want simple password protection:

# apt install apache2-utils

# htpasswd -c /etc/netdata/.htpasswd admin

Then edit /etc/netdata/netdata.conf to include:

[web]

  mode = static-threaded

  default port = 19999

  allow connections from = localhost 192.168.1.*

  basic auth users file = /etc/netdata/.htpasswd

Restart Netdata:

# systemctl restart netdata

Why Netdata is Awesome 

Unlike Prometheus or Grafana, Netdata gives you instant insights without heavy setup. It’s ideal for:

  • Debugging high load or memory leaks in real time
  • Monitoring multiple VPS or embedded devices
  • Visualizing system resource usage with minimal CPU cost

And because it’s written in C, it’s insanely fast — often using less than 1% CPU on idle systems.

Final Thoughts

If you’re running any Linux server – whether for personal projects, web hosting, or experiments – Netdata is one of the easiest ways to visualize what’s happening under the hood.

You can even integrate it into your homelab or connect multiple nodes to the Netdata Cloud for centralized monitoring. And of course the full featured use and all the features of the tool are available just in the cloud version but that is absolutely normal. As the developers of netdata seems to have adopted the usual business model of providing things for free and same time selling another great things to make cash.

Netdata is really cool solution without cloud for people who needs to be able to quickly monitor like 20 or 50 servers without putting too much effort. You can simply install it across the machines and you will get a plenty of monitoring, just open each of the machines inside a separate browser tabs and take a look at what is going on once or 2 times a day. It is very old fashioned way to do monitoring but still can make sense if you don't want to bother too much with developing the monitoring or put any effort in it but still have a kind of obeservability on your mid size computer infrastructure.

So, next time your VPS feels sluggish or your load average spikes, fire up Netdata — your CPU will tell you exactly what’s wrong.

How to Install and Troubleshoot Grafana Server on Linux: Complete Step-by-Step Tutorial

Saturday, October 25th, 2025

install-grafana-on-linux-logo

If you’ve ever set up monitoring for your servers or applications, you’ve probably heard about Grafana.
It’s the de facto open-source visualization and dashboard tool for time-series data — capable of turning raw metrics into beautiful and insightful charts.

In this article, I’ll walk you through how to install Grafana on Linux, configure it properly, and deal with common problems that many sysadmins hit along the way.
This guide is written with real-world experience, blending tips from the official documentation and years of self-hosting tinkering.

1. Prerequisites

You’ll need:

  • A Linux machine — Ubuntu 22.04 LTS, Debian 11+, or CentOS 8 / Rocky Linux 9.
  • A superuser or user with sudo privileges.
  • At least 512 MB of RAM and 1 CPU core (Grafana is lightweight).
  • Internet access to fetch the repository and packages.

Optionally:

  • A domain name (for example: grafana.yourdomain.com)
  • Nginx/Apache if you want to enable HTTPS later.

2. Update Your System

Before any installation, ensure your system packages are up to date:

# Debian/Ubuntu
# apt update && sudo apt upgrade -y

# or

# dnf update -y # CentOS/RHEL/Rocky

3. Install Grafana (from the Official Repository)

Grafana provides an official APT and YUM repository. Always use it to get security and feature updates.

For Debian / Ubuntu:
 

# apt install -y software-properties-common apt-transport-https wget
# wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
# echo "deb https://packages.grafana.com/oss/deb stable main" | \
# tee /etc/apt/sources.list.d/grafana.list
# apt update
# apt install grafana -y 


For CentOS / Rocky / RHEL:

# tee /etc/yum.repos.d/grafana.repo <<EOF
[grafana]
name=Grafana OSS
baseurl=https://packages.grafana.com/oss/rpm
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://packages.grafana.com/gpg.key
EOF

# dnf install grafana -y

 

4. Start and Enable the Service

Once installation completes, start and enable Grafana to run on boot:
 

# systemctl daemon-reload
# systemctl enable grafana-server
# systemctl start grafana-server

Check status:

# systemctl status grafana-server


You should see Active: running.

5. Access the Grafana Web Interface

Grafana listens on port 3000 by default.

Open your browser and visit:

http://<your_server_ip>:3000

Login with the default credentials:

Username: admin
Password: admin

You’ll be asked to change the password — do it immediately for security.

6. Configure Grafana Basics

Configuration file location:

/etc/grafana/grafana.ini

Common settings to edit:

  • Port:

http_port = 8080

  • Root URL (for reverse proxy):

root_url = https://grafana.yourdomain.com/

  • Database: (change from SQLite to MySQL/PostgreSQL if desired)
[database]
type = mysql
host = 127.0.0.1:3306
name = grafana
user = grafana
password = supersecret_password582?255!#

Restart Grafana after making changes:

# systemctl restart grafana-server


7. Optional – Secure Grafana with HTTPS and Reverse Proxy

If Grafana is public-facing, always use HTTPS.

Install Nginx and Certbot:

# apt install nginx certbot python3-certbot-nginx -y
# certbot --nginx -d grafana.yourdomain.com

This automatically sets up SSL certificates and redirects traffic from HTTP → HTTPS.

8. Add a Data Source

Once logged in:

  1. Go to Connections → Data Sources → Add data source

  2. Choose Prometheus, InfluxDB, Loki, or your preferred backend

  3. Enter the connection URL (e.g., http://localhost:9090)

  4. Click “Save & Test”

If successful, you can start creating dashboards!

9. Troubleshooting Common Grafana Installation Issues

Even the smoothest installs sometimes go wrong.
Here are real-world fixes for typical problems sysadmins face:

Problem

Likely Cause

Fix

Service won’t start

Misconfigured grafana.ini or port in use

Run # journalctl -u grafana-server -xe to see logs. Correct syntax, change port, restart.

Port 3000 already in use

Another app (maybe NodeJS app or Jenkins)

sudo lsof -i :3000 → stop or reassign the conflicting service.

Blank login page / dashboard not loading

Browser cache or reverse proxy misconfig

Clear cache, check Nginx proxy settings (proxy_pass should point to http://localhost:3000).

APT key errors (Ubuntu 22.04+)

apt-key deprecated

Use signed-by=/usr/share/keyrings/grafana.gpg method as Grafana docs suggest.

“Bad Gateway” via proxy

Missing header forwarding

In Nginx config, ensure:

proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;


“` |
| **Can’t access Grafana from LAN**

Most common reason Firewall blocking port

# ufw allow 3000/tcp


 (or adjust firewalld). 

 

10. Maintenance and Backup Tips
 

– Backup your dashboard database frequently: 
– SQLite: `/var/lib/grafana/grafana.db`
– MySQL/PostgreSQL: use your normal DB backup methods

– Watch logs regularly, e.g.: 

# journalctl -u grafana-server -f

  • Upgrade grafana safely:

# apt update && apt install grafana -y

(Always back up before upgrading!)

Conclusion

Grafana is a powerful, elegant way to visualize your system metrics, application logs, and alerts.
Installing it on Linux is straightforward — but knowing where it can fail and how to fix it is what separates a beginner from a seasoned sysadmin.

By following this guide, you not only get Grafana up and running, but also understand the moving parts behind the scenes — a perfect fit for any DIY server admin or DevOps engineer who loves keeping full control.

How to Install and Use auditd for System Security Auditing on Linux

Thursday, September 25th, 2025

System auditing is essential for monitoring user activity, detecting unauthorized access, and ensuring compliance with security standards. On Linux, the Audit Daemon (auditd) provides powerful auditing capabilities for logging system events and actions.

This short article will walk you through installing, configuring, and using auditd to monitor your Linux system.

What is auditd?

auditd is the user-space component of the Linux Auditing System. It logs system calls, file access, user activity, and more — offering administrators a clear trail of what’s happening on the system.


1. Installing auditd

The auditd package is available by default in most major Linux distributions.

 On Debian/Ubuntu

# apt update
# apt install auditd audispd-plugins

 On CentOS/RHEL/Fedora

# yum install audit

After installation, start and enable the audit daemon

# systemctl start auditd

# systemctl enable auditd

Check its status

# systemctl status auditd

2. Setting Audit Rules

Once auditd is running, you need to define rules that tell it what to monitor.

Example: Monitor changes to /etc/passwd

# auditctl -w /etc/passwd -p rwxa -k passwd_monitor

Explanation:

  • -w /etc/passwd: Watch this file. When the file is accessed, the watcher will generate events.
  • -p rwxa: Monitor read, write, execute, and attribute changes
  • -k passwd_monitor: Assign a custom key name to identify logs. Later on, we could search for this (arbitrary) passwd string to identify events tagged with this key.

List active rules:

# auditctl -l

3. Common auditd Rules for Security Monitoring

Here are some common and useful auditd rules you can use to monitor system activity and enhance Linux system security. These rules are typically added to the /etc/audit/rules.d/audit.rules or /etc/audit/audit.rules file, depending on your system.

a. Monitor Access to /etc/passwd and /etc/shadow
 

-w /etc/passwd -p wa -k passwd_changes
-w /etc/shadow -p wa -k shadow_changes

  • Monitors read/write/attribute changes to password files.

b. Monitor sudoers file and directory
 

-w /etc/sudoers -p wa -k sudoers
-w /etc/sudoers.d/ -p wa -k sudoers

  • Tracks any change to sudo configuration files.

c. Monitor Use of chmod, chown, and passwd
 

-a always,exit -F arch=b64 -S chmod -S fchmod -S fchmodat -k perm_mod
-a always,exit -F arch=b64 -S chown -S fchown -S fchownat -k perm_mod
-a always,exit -F arch=b64 -S passwd -k passwd_changes

  • Watches permission and ownership changes.

d. Monitor User and Group Modifications

-w /etc/group -p wa -k group_mod
-w /etc/gshadow -p wa -k gshadow_mod
-w /etc/security/opasswd -p wa -k opasswd_mod

  • Catches user/group-related config changes.

e. Track Logins, Logouts, and Session Initiation

-w /var/log/lastlog -p wa -k logins
-w /var/run/faillock/ -p wa -k failed_login
-w /var/log/faillog -p wa -k faillog

  • Tracks login attempts and failures.

f. Monitor auditd Configuration Changes

-w /etc/audit/ -p wa -k auditconfig
-w /etc/audit/audit.rules -p wa -k auditrules

  • Watches changes to auditd configuration and rules.

g. Detect Changes to System Binaries

-w /bin/ -p wa -k bin_changes
-w /sbin/ -p wa -k sbin_changes
-w /usr/bin/ -p wa -k usr_bin_changes
-w /usr/sbin/ -p wa -k usr_sbin_changes

  • Ensures core binaries aren't tampered with.

h. Track Kernel Module Loading and Unloading

-a always,exit -F arch=b64 -S init_module -S delete_module -k kernel_mod

  • Detects dynamic kernel-level changes.

l. Monitor File Deletions

-a always,exit -F arch=b64 -S unlink -S unlinkat -S rename -S renameat -k delete

  • Tracks when files are removed or renamed.

m. Track Privilege Escalation via setuid/setgid

-a always,exit -F arch=b64 -S setuid -S setgid -k priv_esc

  • Helps detect changes in user or group privileges.

n. Track Usage of Dangerous Binaries (e.g., su, sudo, netcat)

-w /usr/bin/su -p x -k su_usage
-w /usr/bin/sudo -p x -k sudo_usage
-w /bin/nc -p x -k netcat_usage

  • Useful for catching potentially malicious command usage.

o. Monitor Cron Jobs

-w /etc/cron.allow -p wa -k cron_allow
-w /etc/cron.deny -p wa -k cron_deny
-w /etc/cron.d/ -p wa -k cron_d
-w /etc/crontab -p wa -k crontab
-w /var/spool/cron/ -p wa -k user_crontabs

  • Alerts on cron job creation/modification.

p. Track Changes to /etc/hosts and DNS Settings

-w /etc/hosts -p wa -k etc_hosts
-w /etc/resolv.conf -p wa -k resolv_conf

  • Monitors potential redirection or DNS manipulation.

q. Monitor Mounting and Unmounting of Filesystems

-a always,exit -F arch=b64 -S mount -S umount2 -k mounts

  • Useful for detecting USB or external drive activity.

r. Track Execution of New Programs

-a always,exit -F arch=b64 -S execve -k exec

  • Captures command execution (can generate a lot of logs).
     

A complete list of rules you can get from the hardening.rules auditd file place it under /etc/audit/rules.d/hardening.rules
and reload auditd to load the configurations.

Tips

  • Use ausearch -k <key> to search audit logs for matching rule.
  • Use auditctl -l to list active rules.
  • Use augenrules –load after editing rules in /etc/audit/rules.d/.


4. Reading Audit Logs

Audit logs events are stored in:

/var/log/audit/audit.log

By default, the location, this can be changed through /etc/auditd/auditd.conf

View recent entries:
 

# tail -f /var/log/audit/audit.log

Search by key:
 

# ausearch -k passwd_monitor

Generate a summary report:

# aureport -f

# aureport


Example: Show all user logins / IPs :

# aureport -au

 

5. Making Audit Rules Persistent

Rules added with auditctl are not persistent and will be lost on reboot. To make them permanent:

Edit the audit rules configuration:

# vim /etc/audit/rules.d/audit.rules

Add your rules, for example:

-w /etc/passwd -p rwxa -k passwd_monitor

Apply the rules:

# augenrules –load

7. Some use case examples of auditd in auditing Linux servers by sysadmins / security experts
 

Below are real-world, practical examples where auditd is actively used by sysadmins, security teams, or compliance officers to detect suspicious activity, meet compliance requirements, or conduct forensic investigations.

a. Detect Unauthorized Access to /etc/shadow

Use Case: Someone tries to read or modify password hashes.

Audit Rule:

-w /etc/shadow -p wa -k shadow_watch

Real-World Trigger:

sudo cat /etc/shadow

Check Logs:
 

# ausearch -k shadow_watch -i

Real Output:
 

type=SYSCALL msg=audit(09/18/2025 14:02:45.123:1078):

  syscall=openat

  exe="/usr/bin/cat"

  success=yes

  path="/etc/shadow"

  key="shadow_watch"

b. Detect Use of chmod to Make Files Executable

Use Case: Attacker tries to make a script executable (e.g., malware).

Audit Rule:

-a always,exit -F arch=b64 -S chmod -k chmod_detect

Real-World Trigger:
 

 # chmod +x /tmp/evil_script.sh

Check Logs:

# ausearch -k chmod_detect -i

c. Monitor Execution of nc (Netcat)

Use Case: Netcat is often used for reverse shells or unauthorized network comms.

Audit Rule:
 

-w /bin/nc -p x -k netcat_usage
 

Real-World Trigger:

nc -lvp 4444

Log Entry:

type=EXECVE msg=audit(09/18/2025 14:35:45.456:1123):

  argc=3 a0="nc" a1="-lvp" a2="4444"

  key="netcat_usage"

 

d. Alert on Kernel Module Insertion
 

Use Case: Attacker loads rootkit or malicious kernel module.

Audit Rule:

-a always,exit -F arch=b64 -S init_module -S delete_module -k kernel_mod

Real-World Trigger:

# insmod myrootkit.ko

Audit Log:
 

type=SYSCALL msg=audit(09/18/2025 15:00:13.100:1155):

  syscall=init_module

  exe="/sbin/insmod"

  key="kernel_mod"

e. Watch for Unexpected sudo Usage

Use Case: Unusual use of sudo might indicate privilege escalation.

Audit Rule:

-w /usr/bin/sudo -p x -k sudo_watch

Real-World Trigger:

sudo whoami

View Log:
 

# ausearch -k sudo_watch -i


f. Monitor Cron Job Modification

Use Case: Attacker schedules persistence via cron.

Audit Rule:

-w /etc/crontab -p wa -k cron_mod

Real-World Trigger:
 

echo "@reboot /tmp/backdoor" >> /etc/crontab

Logs:
 

type=SYSCALL msg=audit(09/18/2025 15:05:45.789:1188):

  syscall=open

  path="/etc/crontab"

  key="cron_mod"

g. Detect File Deletion or Renaming
 

Use Case: Attacker removes logs or evidence.

Audit Rule:

-a always,exit -F arch=b64 -S unlink -S unlinkat -S rename -S renameat -k file_delete

Real-World Trigger:

# rm -f /var/log/syslog

Logs:
 

type=SYSCALL msg=audit(09/18/2025 15:10:33.987:1210):

  syscall=unlink

  path="/var/log/syslog"

  key="file_delete"


h. Detect Script or Malware Execution
 

Use Case: Capture any executed command.

Audit Rule:
 

-a always,exit -F arch=b64 -S execve -k exec

Real-World Trigger:

/tmp/myscript.sh

Log View:

# ausearch -k exec -i | grep /tmp/myscript.sh

l. Detect Manual Changes to /etc/hosts

Use Case: DNS hijacking or phishing setup.

Audit Rule:

-w /etc/hosts -p wa -k etc_hosts

Real-World Trigger:
 

# echo "1.2.3.4 google.com" >> /etc/hosts

Logs:

type=SYSCALL msg=audit(09/18/2025 15:20:11.444:1234):

  path="/etc/hosts"

  syscall=open

  key="etc_hosts"


8. Enable Immutable Mode (if necessery)

For enhanced security, you can make audit rules immutable, preventing any changes until reboot:

# auditctl -e 2


To make this setting persistent, add the following to the end of /etc/audit/rules.d/audit.rules:

-e 2


Common Use Cases

Here are a few more examples of what you can monitor:

Monitor all sudo usage:

# auditctl -w /var/log/auth.log -p wa -k sudo_monitor


Monitor a directory for file access:

# auditctl -w /home/username/important_dir -p rwxa -k dir_watch

Audit execution of a specific command (e.g., rm):

# auditctl -a always,exit -F arch=b64 -S unlink,unlinkat -k delete_cmd

(Adjust arch=b64 to arch=b32 if on 32-bit system.)

9. Managing the Audit Log Size

Audit logs can grow large over time. To manage log rotation and size, edit:
 

# vim /etc/audit/auditd.conf

Set log rotation options like:

max_log_file = 8

num_logs = 5

Then restart auditd:
 

# systemctl restart auditd

Conclusion

The Linux Audit Daemon (auditd) is a powerful tool to track system activity, enhance security, and meet compliance requirements. With just a few configuration steps, you can monitor critical files, user actions, and system behavior in real time.

 

References

  • man auditd
  • man auditctl
  • Linux Audit Wiki

 

Unlocking the Power of lnav: Logfile Navigator – ncurses text based tool guide to mutiple Logs on multiple servers easy analysis on Linux

Saturday, September 13th, 2025

lnav-syslog-screenshot-linux-virtual-machine

If you've ever found yourself buried under a mountain of log files, tailing multiple outputs, or grepping through endless lines trying to spot an error, it's time to meet your new best friend: lnav, the Logfile Navigator.

Lightweight, terminal-based, and surprisingly powerful, lnav is one of the most underrated tools for developers, sysadmins, and anyone who regularly digs into logs. It turns your chaotic logs into something that’s not only readable—but genuinely useful.

What is lnav and why use it ?

lnav (Logfile Navigator) is a command-line tool for viewing and analyzing log files. It goes beyond tail, less, or grep by:

  • Automatically detecting and merging log formats.
  • Highlighting timestamps, log levels, and errors.
  • Providing SQL-like queries over your logs.
  • Offering interactive navigation with a UI inside the terminal.

And yes, all of that without needing to set up a database or a server.

1. Installing lnav on Linux

Installation is straightforward. On most systems, you can install it via package managers:

On Ubuntu/Debian:

# apt install lnav

On Fedora:

# dnf install lnav

On Arch Linux:

# pacman -S lnav

Or build from source via GitHub if you want the latest version.

2. Use lnav Instead of Tail / Grep why?

Traditional tools are powerful, but they require manual work to chain together functionality. lnav gives you:

  • Automatic multi-log parsing: Drop multiple logs in, and it merges them chronologically.
  • Syntax highlighting: Errors and warnings stand out.
  • SQL querying: Run queries like SELECT * FROM syslog_log WHERE log_level = 'error';
  • Filtering and searching: Use intuitive filters and bookmarks to highlight specific entries.

3. Basic tool Usage is simple

Let’s say you want to inspect a system log:

# lnav /var/log/syslog

You'll immediately get:

  • Color-coded output (timestamps, levels, messages).
  • Scrollable view (arrow keys, PgUp, PgDn).
  • Real-time updates (like tail -f).
  • Search with /, filter with :filter-in, and even SQL queries.

Lets say you need to analyze Apache webserver logs recursively including the logs already rotated and gunzipped with *.gz extension on CentOS / Fedora / RHEL, you can do it with:

# lnav -r /var/log/httpd

You can parse the log file and get additional information about requests as well as you can print overall summary of log file.

Choose the line you want to parse. The selected line is always the one at the top of the window. Then press 'p' and you should see the following result:

https://pc-freak.net/images/lnav-get-extra-information-about-apache-query-with-P-press-key-screenshot-linux

Now, if you want to see a summary view of the logs by date and time, simply press 'i'.

lnav-linux-apache-log-review-summary-of-errors-warnings-normal-screenshot

To quit a screen you have chosen press 'q'.

4. LNAV helpful options and hotkeys

Once you've opened a log file/s for analyze you can use few hotkeys that will allow us to move through the output of lnav and the available views more easily:

e or E to jump to the next / previous error message.
w or W to jump to the next / previous warning message.
b or Backspace to move to the previous page.
Space to move to the next page.
g or G to move to the top / bottom of the current view.

To take a closer look at the way lnav operates, use -d option, the debug information is to be spit inside a .txt file:

# lnav /var/log/httpd -d lnav.txt

In this example, the debug information that is generated when lnav starts will be written to a file named lnav.txt inside the current working directory.

5. Real-World Use Cases

a. Troubleshooting application or system process Crashes

Open all relevant logs in one go:

# lnav /var/log/*.log

Errors are highlighted, and you can jump between them with n / N kbd keys.

b. Combining Multiple Logs

Working with an app that logs to different files and you need to combine:

# lnav /var/log/nginx/access.log /var/log/nginx/error.log


Or lets say you want to combine Apache Webserver with Haproxy log and get log summaries or filter out stuff:

lnav /var/log/apache2/access.log /var/log/haproxy.log


Now you will get a single, chronological timeline of events.

 

If you want to Search for a concrete occurance of Error / Warning or IP address inside a bunch of loaded combined logs you can do it with the same command like in simple vim by pressing / (slash) from kbd and type out what you want to filter out to get shown.

c. Analyze SQL Queries Logs

Yes, you can actually do this by passing it query in its command prompt :

:.schema
:SELECT log_time, log_level, log_message FROM syslog_log WHERE log_level = 'error';

You get a table of filtered logs, sortable by columns.
 

6. lnav more usage command tips

  • :help — Opens the help menu.
  • :filter-in <string> — Show only lines matching <string>.
  • :filter-out <string> — Hide lines matching <string>.
  • :export-to <filename> — Export current view to a file.
  • :tag <tagname> — Tag lines for later reference.
  • q — Quit (but why would you want to?).

 

7. Using lnav as a pager for systemd-journald

journalctl | lnav
# journalctl -f | lnav
# journalctl -u ssh.service | lnav

https://pc-freak.net/images/lnav_sshservice-log-view-screenshot-linux
 

8. Use lnav to review remote ssh logs

Newer versions after 0.10 supports ssh protocol as well and theoretically should work:

# lnav user@server-name-here:/var/log/file.log


To read all logs inside /var/log

# lnav root@server-name-here:/var/log/
# lnav root@server-name-here:/var/log/*.err

9. Using lnav to view docker container logs

# docker logs 811ab84aa95l | lnav
# docker logs -f application | lnav

The latest version of lnav supports even the following  simplified docker:// URL syntax:

# lnav docker://{container_id_or_name}/dir_path/to/log/file
# lnav docker://{container_id_or_name}/var/dir_path/log
# lnav docker://application/var/log/
# lnav docker://applcation/var/log/nginx/nginx.app.log

10. Monitoring compilation and command output useful for developers
 

Compilation from archived tar balls with ./configure && make etc. generate lot of outputs and logs while working. 
Here is where the tool can come handy. 
For example, here is how to watch the output of make command when compiling something:

# lnav -e './configure && make'

 11. Learning lnav tool through online ssh service availability via lnav.org

f you're lazy to install it and want to test it anyways:
 

# Start The Basic Tutorial:
ssh -o PubkeyAuthentication=no -o PreferredAuthentications=password tutorial1@demo.lnav.org


# Playground:
ssh -o PubkeyAuthentication=no -o PreferredAuthentications=password playground@demo.lnav.org


Closure

While tools like Kibana, Grafana, and ELK stacks are powerful, they can be overkill for many use cases—especially when you're SSHed into a box and just need to get answers fast. That’s where lnav shines as it is fast, lightweight, visual and can be used offline.

If you’re a developer, sysadmin, SRE (Site Reliability Engineer), or just someone who cares about logs, give lnav a spin. It might just become among your favorite sysadm tools on Linux and safe you pretty much of time if you have to do log reading and analyzing on daily basis (for example if you're admining 20+ or more Linux servers.

 

How to Update / Migrate zabbix-agent 5 to zabbix-agent2 6 on Redhat / CentOS / Fedora Linux

Friday, August 9th, 2024

Upgrade-zabbix-agent1-5-to-zabbix-agent2-6-on-RHEL-CentOS-Fedora-Linux-howto-logo

If you have servers reporting monitoring with Zabbix running still on Zabbix-Agent 1 version 5.0.X but already migrated the Zabbix-server to Zabbix 6, it is a good idea to update the Agent to Zabbix Agent 6 As sson as possible, as you know lacking behind in version makes updating harder and more complicated task.

Mine and I guess most system administrators experience points that Keeping at the same level of versioning on many applications historically has shown to reduce unexpected errors and bugs but nowadays, the rule of keeping local and remote application ( programs )  at the same version level is regularly broken.

Theoretically Zabbix-Agent (Client) and Zabbix (Server) has a compitability for a certain range of versions (Zabbix agents 2 from version 4.4 onwards are compatible with Zabbix 7.0; Zabbix agent 2 must not be newer than 7.0 – for more on zabbix agent – > server version compitability check here ) and having a slight version difference should not be really a problem but often you might have a third party proxies in between such as haproxy or zabbix-proxy or other network oddities and thus my personal opinion is that for interoperability it is better to keep the Zabbix Clients and Zabbix Servers across the DMZ-ed networks running at same version level.

Some would say I have an old fashion thinking as software and technology is moving forward, but as I see how programming code writing and even software is constantly degradating just a reflection of degradation of human element, I prefer to keep my old know how and always stick to same versioning whenever possible.

Some would wonder then why would I upgrade to Zabbix-agent2 ? , if have to keep the same versioning, the reason is zabbix-agent2 is written in GO Language and is much faster and supposably better piece of software than Zabbix Agent1 that is written in Python.

Moreover having Zabbix agent 2 instead of 1 gives also benefits as you can do a bit more with zabbix and on the other hand the machines are more ready for monitoring in terms of future. To know more about the Benefits of Zabbix Agent2 compared to Zabbix Agent 1 read the Agent vs Agent2 comparison on zabbix website.

 

With this little introduction, lets proceed with the exact steps to take to upgrade zabbix-agent1 to zabbix-agent2.

1. Check the current installed Zabbix-Agent version 

[user@monitored-server ~]$ rpm -qa |grep -i zabb
zabbix-get-5.0.42-1.el8.x86_64
zabbix-sender-5.0.42-1.el8.x86_64
zabbix-agent-5.0.42-1.el8.x86_64

[user@server ~]$ 

 

2. Create backup copy of current system working zabbix_agentd.conf
 

Before messing up with the working zabbix-agent as usual create the necessery backup to prevent later suprises

[user@monitored-server ~]$ cp -vrpf /etc/zabbix/zabbix_agentd.conf /etc/zabbix/zabbix_agentd.conf.bak-$(date '+%Y-%m-%d_%H-%M-%S')

3. Check current configured Zabbix repos

 

[user@monitored-server ~]$ vim /etc/yum.repos.d/zabbix.repo
 

[zabbix-4.0]
name = zabbix-4.0 – 8
baseurl = http://zabbix-repo-server.com/external/zabbix-4.0/8/$basearch
enabled = 0
gpgkey = http://zabbix-repo-server.com/external/zabbix-4.0/zabbix-official-repo.key
gpgcheck = 1

[zabbix-4.4]
name = zabbix-4.4 – 8
baseurl = http://zabbix-repo-server.com/external/zabbix-4.4/8/$basearch
enabled = 0
gpgkey = http://zabbix-repo-server.com/external/zabbix-4.4/zabbix-official-repo.key
gpgcheck = 1

[zabbix-5.0]
name = zabbix-5.0 – 8
baseurl = http://zabbix-repo-server.com/external/zabbix-5.0/8/$basearch
enabled = 1
gpgkey = http://zabbix-repo-server.com/external/zabbix-5.0/zabbix-official-repo.key
gpgcheck = 1

[zabbix-5.4]
name = zabbix-5.4 – 8
baseurl = http://zabbix-repo-server.com/external/zabbix-5.4/8/$basearch
enabled = 0
gpgkey = http://zabbix-repo-server.com/external/zabbix-5.4/zabbix-official-repo.key
gpgcheck = 1

[zabbix-6.0]
name = zabbix-6.0 – 8
baseurl = http://zabbix-repo-server.com/external/zabbix-6.0/8/$basearch
enabled = 0
gpgkey = http://zabbix-repo-server.com/external/zabbix-6.0/zabbix-official-repo.key
gpgcheck = 1


4. Modify repositories and include the Zabbix Agent6 yum repos 
 

[user@monitored-server ~]$ cp -rpf zabbix.repo zabbix.repo.5.0.rpmsave

As we want to keep only the 6.0 version, leave only the zabbix-6.0 section and enable the repo:
 

[user@monitored-server ~]$ vim /etc/yum.repos.d/zabbix.repo

[zabbix-6.0]
name = zabbix-6.0 – 8
baseurl = http://zabbix-repo-server.com/external/zabbix-6.0/8/$basearch
enabled = 1
gpgkey = http://zabbix-repo-server.com/external/zabbix-6.0/zabbix-official-repo.key
gpgcheck = 1


5. Update zabbix-agent to zabbix-agent2 and update zabbix-get zabbix-sender versions

To not disrupt reported monitoring for zabbix-agent, don't delete zabbix-agent1 but instead in pararallel install and configure
zabbix-agent2 and then once configuration is migrated from Agent 1 to 2, stop the old zabbix-agent and bring up the new one.

[user@monitored-server ~]$ yum check-update

[user@monitored-server ~]$ yum install zabbix-agent2 zabbix-get zabbix-sender -y

Note that if you want to have a precise version number of zabbix-agent that is lets say 6.0.31 to correspond to zabbix-server 6.0.31 (even though in the repositories newer RPM versions are available), run:
 

[user@monitored-server ~]$ yum upgrade zabbix-agent2-6.0.31-release1.el8

 

  • Check new zabbix_agent2 installed version 


# zabbix_agent2 -V
zabbix_agent2 (Zabbix) 6.0.31
Revision b6d93755a1b 17 June 2024, compilation time: {undefined} {undefined}, built with: go1.21.3
Plugin communication protocol version is 6.0.13

Copyright (C) 2024 Zabbix SIA
License GPLv2+: GNU GPL version 2 or later <https://www.gnu.org/licenses/>.
This is free software: you are free to change and redistribute it according to
the license. There is NO WARRANTY, to the extent permitted by law.

This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/).

Compiled with OpenSSL 1.1.1k  FIPS 25 Mar 2021
Running with OpenSSL 1.1.1k  FIPS 25 Mar 2021

We use the library Eclipse Paho (eclipse/paho.mqtt.golang), which is
distributed under the terms of the Eclipse Distribution License 1.0 (The 3-Clause BSD License)
available at https://www.eclipse.org/org/documents/edl-v10.php

We use the library go-modbus (goburrow/modbus), which is
distributed under the terms of the 3-Clause BSD License
available at https://github.com/goburrow/modbus/blob/master/LICENSE

 

6. Migrate old /etc/zabbix/zabbix_agentd.conf to /etc/zabbix/zabbix-agent2.conf

For readability to show the main configured variables for zabbix-agent without the tons of comments, to later include in agent2
 

[root@monitored-server ~]# cat /etc/zabbix/zabbix_agentd.conf | grep -v '\#' | sed '/^$/d' 
PidFile=/var/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
Server=10.50.37.8,127.0.0.1
ServerActive=10.50.37.8,127.0.0.1
Hostname=fqdn-of-monitored-host.domain.com
Timeout=20
Include=/etc/zabbix/zabbix_agentd.d/*.conf

The default zabbix-agent2 installed config would like similar to:

[root@monitored-server ~]# cat /etc/zabbix/zabbix_agent2.conf | grep -v '\#' | sed '/^$/d'
PidFile=/run/zabbix/zabbix_agent2.pid
LogFile=/var/log/zabbix/zabbix_agent2.log
LogFileSize=0
Server=127.0.0.1
# Specify the location of the Zabbix server host.
ServerActive=127.0.0.1
Hostname=Zabbix server
Include=/etc/zabbix/zabbix_agent2.d/*.conf
PluginSocket=/run/zabbix/agent.plugin.sock
ControlSocket=/run/zabbix/agent.sock
Include=./zabbix_agent2.d/plugins.d/*.conf

The new migrate one, should be like:

[root@monitored-server ~]# vim /etc/zabbix/zabbix_agent2.conf
PidFile=/run/zabbix/zabbix_agent2.pid
LogFile=/var/log/zabbix/zabbix_agent2.log
LogFileSize=10
Server=10.34.89.7,127.0.0.1
ServerActive=10.34.89.7,127.0.0.1
Hostname=lqgblu02f.ffm.de.int.atosorigin.com
Timeout=20
Include=/etc/zabbix/zabbix_agent2.d/*.conf
PluginSocket=/run/zabbix/agent.plugin.sock
ControlSocket=/run/zabbix/agent.sock
Include=/etc/zabbix/zabbix_agent2.d/plugins.d/*.conf


7. Add few Optimization variables for better zabbix-server -> zabbix-proxy -> zabbix-server interactions 

If you have sometimes a network delays between zabbix server -> zabbix client and vice versa (depending on whether Zabbix agent is configured as Active or Passive mode), it is often useful 
to add those 2 variables:

# How often list of active checks is refreshed, in seconds
RefreshActiveChecks=60
# Refresh the active checks on start.ForceActiveChecksOnStart=1
ForceActiveChecksOnStart=1


Also it might be a good practice to add zabbix_agent2.log monitoring with the agent itself, if the log exceeds certain amount, instead of calling it via logrotate.
 

# Perform log file rotation at the 1 MB point for the specified filepath
LogFileSize=1

 

[root@monitored-server ~]# vim /etc/zabbix/zabbix_agent2.conf
PidFile=/run/zabbix/zabbix_agent2.pid
LogFile=/var/log/zabbix/zabbix_agent2.log
LogFileSize=10
Server=10.34.89.7,127.0.0.1
ServerActive=10.34.89.7,127.0.0.1
Hostname=lqgblu02f.ffm.de.int.atosorigin.com
RefreshActiveChecks=60
ForceActiveChecksOnStart=1
Timeout=20
Include=/etc/zabbix/zabbix_agent2.d/*.conf
PluginSocket=/run/zabbix/agent.plugin.sock
ControlSocket=/run/zabbix/agent.sock
Include=/etc/zabbix/zabbix_agent2.d/plugins.d/*.conf

 

8. Stop the old zabbix agent process and run the new one

# systemctl status –full zabbix-agent2
# systemctl stop zabbix-agent


Assuming that the configuratoin of zabbix-agent is correct, execute zabbix-agent2 via system control.and check its status
 

# systemctl start zabbix-agent2
# systemctl status –full zabbix-agent2


If no errors in the configuration, the zabbix_agent2 process should be up and running and the status of above systemctl cmd should report fine.
If you need concretics regarding exact Zabbix checks or whther current conigured Userparameter scripts errors, or any other warnings or errors
of zabbix_agent2 interacting to the server, check further the logs

[root@monitored-server ~]# tail -n 10 /var/log/zabbix/zabbix_agent2.log  
2024/08/06 17:26:52.998749 using plugin 'WebPage' (built-in) providing following interfaces: exporter, configurator
2024/08/06 17:26:52.998760 using plugin 'ZabbixAsync' (built-in) providing following interfaces: exporter
2024/08/06 17:26:52.998794 using plugin 'ZabbixStats' (built-in) providing following interfaces: exporter, configurator
2024/08/06 17:26:52.998804 lowering the plugin ZabbixSync capacity to 1 as the configured capacity 100 exceeds limits
2024/08/06 17:26:52.998820 using plugin 'ZabbixSync' (built-in) providing following interfaces: exporter
2024/08/06 17:26:52.998993 Plugin communication protocol version is 6.0.13
2024/08/06 17:26:52.999018 Zabbix Agent2 hostname: [lqgblu02f.ffm.de.int.atosorigin.com]
2024/08/06 17:26:54.000667 [102] cannot connect to [127.0.0.1:10051]: dial tcp :0->127.0.0.1:10051: connect: connection refused
2024/08/06 17:26:54.000836 [102] active check configuration update from host [lqgblu02f.ffm.de.int.atosorigin.com] started to fail
2024/08/06 17:26:59.344837 Zabbix Agent 2 stopped. (6.0.31)

DNS Monitoring: Check and Alert if DNS nameserver resolver of Linux machine is not properly resolving shell script. Monitor if /etc/resolv.conf DNS runs Okay

Thursday, March 14th, 2024

linux-monitor-check-dns-is-resolving-fine

If you happen to have issues occasionally with DNS resolvers and you want to keep up an eye on it and alert if DNS is not properly resolving Domains, because sometimes you seem to have issues due to network disconnects, disturbances (modifications), whatever and you want to have another mean to see whether a DNS was reachable or unreachable for a time, here is a little bash shell script that does the "trick".

Script work mechacnism is pretty straight forward as you can see we check what are the configured nameservers if they properly resolve and if they're properly resolving we write to log everything is okay, otherwise we write to the log DNS is not properly resolvable and send an ALERT email to preconfigured Email address.

Below is the check_dns_resolver.sh script:

 

#!/bin/bash
# Simple script to Monitor DNS set resolvers hosts for availability and trigger alarm  via preset email if any of the nameservers on the host cannot resolve
# Use a configured RESOLVE_HOST to try to resolve it via available configured nameservers in /etc/resolv.conf
# if machines are not reachable send notification email to a preconfigured email
# script returns OK 1 if working correctly or 0 if there is issue with resolving $RESOLVE_HOST on $SELF_HOSTNAME and mail on $ALERT_EMAIL
# output of script is to be kept inside DNS_status.log

ALERT_EMAIL='your.email.address@email-fqdn.com';
log=/var/log/dns_status.log;
TIMEOUT=3; DNS=($(grep -R nameserver /etc/resolv.conf | cut -d ' ' -f2));  

SELF_HOSTNAME=$(hostname –fqdn);
RESOLVE_HOST=$(hostname –fqdn);

for i in ${DNS[@]}; do dns_status=$(timeout $TIMEOUT nslookup $RESOLVE_HOST  $i); 

if [[ “$?” == ‘0’ ]]; then echo "$(date "+%y.%m.%d %T") $RESOLVE_HOST $i on host $SELF_HOST OK 1" | tee -a $log; 
else 
echo "$(date "+%y.%m.%d %T")$RESOLVE_HOST $i on host $SELF_HOST NOT_OK 0" | tee -a $log; 

echo "$(date "+%y.%m.%d %T") $RESOLVE_HOST $i DNS on host $SELF_HOST resolve ERROR" | mail -s "$RESOLVE_HOST /etc/resolv.conf $i DNS on host $SELF_HOST resolve ERROR";

fi

 done

Download check_dns_resolver.sh here set the script to run via a cron job every lets say 5 minutes, for example you can set a cronjob like this:
 

# crontab -u root -e
*/5 * * * *  check_dns_resolver.sh 2>&1 >/dev/null

 

Then Voila, check the log /var/log/dns_status.log if you happen to run inside a service downtime and check its output with the rest of infrastructure componets, network switch equipment, other connected services etc, that should keep you in-line to proof during eventual RCA (Root Cause Analysis) if complete high availability system gets down to proof your managed Linux servers was not the reason for the occuring service unavailability.

A simplified variant of the check_dns_resolver.sh can be easily integrated to do Monitoring with Zabbix userparameter script and DNS Check Template containing few Triggers, Items and Action if I have time some time in the future perhaps, I'll blog a short article on how to configure such DNS zabbix monitoring, the script zabbix variant of the DNS monitor script is like this:

[root@linux-server bin]# cat check_dns_resolver.sh 
#!/bin/bash
TIMEOUT=3; DNS=($(grep -R nameserver /etc/resolv.conf | cut -d ' ' -f2));  for i in ${DNS[@]}; do dns_status=$(timeout $TIMEOUT nslookup $(hostname –fqdn) $i); if [[ “$?” == ‘0’ ]]; then echo "$i OK 1"; else echo "$i NOT OK 0"; fi; done

[root@linux-server bin]#


Hope this article, will help someone to improve his Unix server Infrastucture monitoring.

Enjoy and Cheers !

How to count number of ESTABLISHED state TCP connections to a Windows server

Wednesday, March 13th, 2024

count-netstat-established-connections-on-windows-server-howto-windows-logo-debug-network-issues-windows

Even if you have the background of a Linux system administrator, sooner or later you will have have to deal with some Windows hosts, thus i'll blog in this article shortly on how the established TCP if it happens you will have to administarte a Windows hosts or help a windows sysadmin noobie 🙂

In Linux it is pretty easy to check the number of established conenctions, because of the wonderful command wc (word count). with a simple command like:
 

$ netstat -etna |wc -l


Then you will get the number of active TCP connections to the machine and based on that you can get an idea on how busy the server is.

But what if you have to deal with lets say a Microsoft Windows 2012 /2019 / 2020 or 2022 Server, assuming you logged in as Administrator and you see the machine is quite loaded and runs multiple Native Windows Administrator common services such as IIS / Active directory Failover Clustering, Proxy server etc.
How can you identify the established number of connections via a simple command in cmd.exe?

1.Count ESTABLISHED TCP connections from Windows Command Line

Here is the answer, simply use netstat native windows command and combine it with find, like that and use the /i (ignores the case of characters when searching the string) /c (count lines containing the string) options

C:\Windows\system32>netstat -p TCP -n|  find /i "ESTABLISHED" /c
1268

Voila, here are number of established connections, only 1268 that is relatively low.
However if you manage Windows servers, and you get some kind of hang ups as part of the monitoring, it is a good idea to setup a script based on this simple command for at least Windows Task Scheduler (the equivallent of Linux's crond service) to log for Peaks in Established connections to see whether Server crashes are not related to High Rise in established connections.
Even better if company uses Zabbix / Nagios, OpenNMS or other  old legacy monitoring stuff like Joschyd even as of today 2024 used in some big of the TOP IT companies such as SAP (they were still using it about 4 years ago for their SAP HANA Cloud), you can set the script to run and do a Monitoring template or Alerting rules to draw you graphs and Trigger Alerts if your connections hits a peak, then you at least might know your Windows server is under a "Hackers" Denial of Service attack or there is something happening on the network, like Cisco Network Infrastructure Switch flappings or whatever.

Perhaps an example script you can use if you decide to implement the little nestat established connection checks Monitoring in Zabbix is the one i've writen about in the previous article "Calculate established connection from IP address with shell script and log to zabbix graphic".

2. Few Useful netstat options for the Windows system admin
 

C:\Windows\System32> netstat -bona


netstat-useful-arguments-for-the-windows-system-administrator

Cmd.exe will lists executable files, local and external IP addresses and ports, and the state in list form. You immediately see which programs have created connections or are listening so that you can find offenders quickly.

b – displays the executable involved in  creating the connection.
o – displays the owning process ID.
n – displays address and port numbers.
a – displays all connections and listening ports.

As you can see in the screenshot, by using netstat -bona you get which process has binded to which local address and the Process ID PID of it, that is pretty useful in debugging stuff.

3. Use a Third Party GUI tool to debug more interactively connection issues

If you need to keep an eye in interactive mode, sometimes if there are issues CurrPorts tool can be of a great help

currports-windows-network-connections-diagnosis-cports

CurrPorts Tool own Description

CurrPorts is network monitoring software that displays the list of all currently opened TCP/IP and UDP ports on your local computer. For each port in the list, information about the process that opened the port is also displayed, including the process name, full path of the process, version information of the process (product name, file description, and so on), the time that the process was created, and the user that created it.
In addition, CurrPorts allows you to close unwanted TCP connections, kill the process that opened the ports, and save the TCP/UDP ports information to HTML file , XML file, or to tab-delimited text file.
CurrPorts also automatically mark with pink color suspicious TCP/UDP ports owned by unidentified applications (Applications without version information and icons).

Sum it up

What we learned is how to calculate number of established TCP connections from command line, useful for scripting, how you can use netstat to display the process ID and Process name that relates to a used Local / Remote TCP connections, and how eventually you can use this to connect it to some monitoring tool to periodically report High Peaks with TCP established connections (usually an indicator of servere system issues).
 

Zabbix script to track arp address cache loss (arp incomplete) from Linux server to gateway IP

Tuesday, January 30th, 2024

Zabbix_arp-network-incomplete-check-logo.svg

Some of the Linux servers recently, I'm responsible had a very annoying issue recently. The problem is ARP address to default configured server gateway is being lost, every now and then and it takes up time, fot the remote CISCO router to realize the problem and resolve it. We have debugged with the Network expert colleague, while he was checking the Cisco router and we were checking the arp table on the Linux server with arp command. And we came to conclusion this behavior is due to some network mess because of too many NAT address configurations on the network or due to a Cisco bug. The colleagues asked Cisco but cisco does not have any solution to the issue and the only close work around for the gateway loosing the mac is to set a network rule on the Cisco router to flush its arp record for the server it was loosing the MAC address for.
This does not really solve completely the problem but at least, once we run into the issue, it gets resolved as quick as 5 minutes time. }

As we run a cluster environment it is useful to Monitor and know immediately once we hit into the MAC gateway disappear issue and if the issue persists, exclude the Linux node from the Cluster so we don't loose connection traffic.
For the purpose of Monitoring MAC state from the Linux haproxy machine towards the Network router GW, I have developed a small userparameter script, that is periodically checking the state of the MAC address of the IP address of remote gateway host and log to a external file for any problems with incomplete MAC address of the Remote configured default router.

In case if you happen to need the same MAC address state monitoring for your servers, I though that might be of a help to anyone out there.
To monitor MAC address incomplete state with Zabbix, do the following:
 

1. Create  userparamater_arp_gw_check.conf Zabbix script
 

# cat userparameter_arp_gw_check.conf 
UserParameter=arp.check,/usr/local/bin/check_gw_arp.sh

 

2. Create the following shell script /usr/local/bin/check_gw_arp.sh

 

#!/bin/bash
# simple script to run on cron peridically or via zabbix userparameter
# to track arp loss issues to gateway IP
#gw_ip='192.168.0.55';
gw_ip=$(ip route show|grep -i default|awk '{ print $3 }');
log_f='/var/log/arp_incomplete.log';
grep_word='incomplete';
inactive_status=$(arp -n "$gw_ip" |grep -i $grep_word);
# if GW incomplete record empty all is ok
if [[ $inactive_status == ” ]]; then 
echo $gw_ip OK 1; 
else 
# log inactive MAC to gw_ip
echo "$(date '+%Y-%m-%d %H:%M:%S')" "ARP_ERROR $inactive_status 0" | tee -a $log_f 2>&1 >/dev/null;
# printout to zabbix
echo "1 ARP FAILED: $inactive_status"; 
fi

You can download the check_gw_arp.sh here.

The script is supposed to automatically grep for the Default Gateway router IP, however before setting it up. Run it and make sure this corresponds correctly to the default Gateway IP MAC you would like to monitor.
 

3. Create New Zabbix Template for ARP incomplete monitoring
 

arp-machine-to-default-gateway-failure-monitoring-template-screenshot

Create Application 

*Name
Default Gateway ARP state

4. Create Item and Dependent Item 
 

Create Zabbix Item and Dependent Item like this

arp-machine-to-default-gateway-failure-monitoring-item-screenshot

 

arp-machine-to-default-gateway-failure-monitoring-item1-screenshot

arp-machine-to-default-gateway-failure-monitoring-item2-screenshot


5. Create Trigger to trigger WARNING or whatever you like
 

arp-machine-to-default-gateway-failure-monitoring-trigger-screenshot


arp-machine-to-default-gateway-failure-monitoring-trigger1-screenshot

arp-machine-to-default-gateway-failure-monitoring-trigger2-screenshot


6. Create Zabbix Action to notify via Email etc.
 

arp-machine-to-default-gateway-failure-monitoring-action1-screenshot

 

arp-machine-to-default-gateway-failure-monitoring-action2-screenshot

That's all. Once you set up this few little things, you can enjoy having monitoring Alerts for your ARP state incomplete on your Linux / Unix servers.
Enjoy !