Infrastructure Monitoring: Unleashing the Power of Prometheus and Grafana

What is Observability?

This is the ability to understand the state of a system based on the data it generates. This capability provides deeper insights into the system's internal operations.

What is Prometheus?

Prometheus is a system monitoring and alerting toolkit that was originally developed at SoundCloud. Since its inception in 2012, it has been widely adopted by numerous companies and organizations, fostering a vibrant developer and user community. Today, Prometheus operates as a standalone open-source project, maintained independently of any single company.

How does Prometheus help ?
Prometheus helps by enabling the generation of alerts when metrics reach a user-specified threshold. It collects metrics by scraping targets that expose metrics through an HTTP endpoint. The scraped metrics are then stored in a time-series database, which can be queried using Prometheus's built-in tool called PromQL.

We can monitor metrics like

  • Disk utilization

  • Uptime of devices

  • CPU utilization

  • Memory Utilization

  • Application specific data

By default, prometheus is configured to use a default path of /metrics but this can be changed to use a different path.

Most systems by default don’t collect metrics and expose them in a HTTP endpoint to be consumed by the prometheus server.

Exporters collects metrics and expose them in a format prometheus understands.

Prometheus have several exporters as listed below

  • Node Exporters

  • Windows Exporters

  • MYSQL

  • Apache

It follows a pull based model and needs to have a list of targets it wants to scrape.

Installation of Prometheus

head over to prometheus_download_page and copy the url for prometheus for your operating system. For our case, is linux. This should be downloaded.

wget https://github.com/prometheus/prometheus/releases/download/v2.51.1/prometheus-2.51.1.linux-amd64.tar.gz

Next, we will tar it

tar xvf prometheus-2.51.1.linux-amd64.tar.gz

Next, we will cd into the directory

cd prometheus-2.51.1.linux-amd64

create a user for running the prometheus process

sudo useradd --no-create-home --shell /bin/false prometheus

create a folder to store prometheus

sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus

update permissions

sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

copy executables

sudo cp prometheus /usr/local/bin 
sudo cp promtool /usr/local/bin

update permissions

sudo chown prometheus:prometheus /usr/local/bin/prometheus 
sudo chown prometheus:prometheus /usr/local/bin/promtool

copy consoles folder used for dashboard and visualization

sudo cp -r consoles /etc/prometheus

sudo cp -r console_libraries /etc/prometheus

update permissions

sudo chown -R prometheus:prometheus /etc/prometheus/consoles

sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries

copy configuration files

sudo cp prometheus.yml /etc/prometheus/prometheus.yml

update permissions

sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml

Now, lets create prometheus service file

sudo nano /etc/systemd/system/prometheus.service

[Unit]
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file /etc/prometheus/prometheus.yml \
  --storage.tsdb.path /var/lib/prometheus/ \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload

start the prometheus service

sudo systemctl start prometheus

check the prometheus service

sudo systemctl status prometheus

Enable the service on-boot

sudo systemctl enable prometheus

output of the service should look like this

Node Exporter

This is responsible for collecting the metrics on a linux host so prometheus can scrape the metrics

head over to prometheus_download_page and copy the url for node exporter.

wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz

tar the file

tar xvf node_exporter-1.7.0.linux-amd64.tar.gz

cd into the folder

 cd node_exporter-1.7.0.linux-amd64/

copy executables

sudo cp node_exporter /usr/local/bin

create a user

sudo useradd --no-create-home --shell /bin/false node_exporter

update permission

sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

create node exporter service file

sudo nano /etc/systemd/system/node_exporter.service

paste the below into the file

[Unit]
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple 
ExecStart=/usr/local/bin/node_exporter 

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload

Start the service

sudo systemctl start node_exporter

Enable the service on-boot

sudo systemctl enable node_exporter

Check the status of the service

sudo systemctl status node_exporter

output of the service should look like this

We will have to install node exporter on every node we want to scrape their metrics.

The next step is to populate the list of nodes in our prometheus configuration file (prometheus.yml)

sudo nano /etc/prometheus/prometheus.yml

then update the list of your nodes you want to scrape. See sample below

You can now view the list of nodes that we are scrapping the metrics

Grafana Setup

This is what we will use to visualize all our data that prometheus scrapes from the different hosts.

Lets install grafana using the below

sudo apt-get install -y adduser libfontconfig1 musl
wget https://dl.grafana.com/enterprise/release/grafana-enterprise_10.4.1_amd64.deb
sudo dpkg -i grafana-enterprise_10.4.1_amd64.deb