Infrastructure Monitoring: Unleashing the Power of Prometheus and Grafana
What is Observability?
This is the ability to understand the state of a system based on the data it generates. This capability provides deeper insights into the system's internal operations.
What is Prometheus?
Prometheus is a system monitoring and alerting toolkit that was originally developed at SoundCloud. Since its inception in 2012, it has been widely adopted by numerous companies and organizations, fostering a vibrant developer and user community. Today, Prometheus operates as a standalone open-source project, maintained independently of any single company.
How does Prometheus help ?
Prometheus helps by enabling the generation of alerts when metrics reach a user-specified threshold. It collects metrics by scraping targets that expose metrics through an HTTP endpoint. The scraped metrics are then stored in a time-series database, which can be queried using Prometheus's built-in tool called PromQL.
We can monitor metrics like
Disk utilization
Uptime of devices
CPU utilization
Memory Utilization
Application specific data
By default, prometheus is configured to use a default path of /metrics but this can be changed to use a different path.
Most systems by default don’t collect metrics and expose them in a HTTP endpoint to be consumed by the prometheus server.
Exporters collects metrics and expose them in a format prometheus understands.
Prometheus have several exporters as listed below
Node Exporters
Windows Exporters
MYSQL
Apache
It follows a pull based model and needs to have a list of targets it wants to scrape.
Installation of Prometheus
head over to prometheus_download_page and copy the url for prometheus for your operating system. For our case, is linux. This should be downloaded.
wget https://github.com/prometheus/prometheus/releases/download/v2.51.1/prometheus-2.51.1.linux-amd64.tar.gz
Next, we will tar it
tar xvf prometheus-2.51.1.linux-amd64.tar.gz
Next, we will cd into the directory
cd prometheus-2.51.1.linux-amd64
create a user for running the prometheus process
sudo useradd --no-create-home --shell /bin/false prometheus
create a folder to store prometheus
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
update permissions
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
copy executables
sudo cp prometheus /usr/local/bin
sudo cp promtool /usr/local/bin
update permissions
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
copy consoles folder used for dashboard and visualization
sudo cp -r consoles /etc/prometheus
sudo cp -r console_libraries /etc/prometheus
update permissions
sudo chown -R prometheus:prometheus /etc/prometheus/consoles
sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
copy configuration files
sudo cp prometheus.yml /etc/prometheus/prometheus.yml
update permissions
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
Now, lets create prometheus service file
sudo nano /etc/systemd/system/prometheus.service
[Unit]
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
start the prometheus service
sudo systemctl start prometheus
check the prometheus service
sudo systemctl status prometheus
Enable the service on-boot
sudo systemctl enable prometheus
output of the service should look like this
Node Exporter
This is responsible for collecting the metrics on a linux host so prometheus can scrape the metrics
head over to prometheus_download_page and copy the url for node exporter.
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar the file
tar xvf node_exporter-1.7.0.linux-amd64.tar.gz
cd into the folder
cd node_exporter-1.7.0.linux-amd64/
copy executables
sudo cp node_exporter /usr/local/bin
create a user
sudo useradd --no-create-home --shell /bin/false node_exporter
update permission
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
create node exporter service file
sudo nano /etc/systemd/system/node_exporter.service
paste the below into the file
[Unit]
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
Start the service
sudo systemctl start node_exporter
Enable the service on-boot
sudo systemctl enable node_exporter
Check the status of the service
sudo systemctl status node_exporter
output of the service should look like this
We will have to install node exporter on every node we want to scrape their metrics.
The next step is to populate the list of nodes in our prometheus configuration file (prometheus.yml)
sudo nano /etc/prometheus/prometheus.yml
then update the list of your nodes you want to scrape. See sample below
You can now view the list of nodes that we are scrapping the metrics
Grafana Setup
This is what we will use to visualize all our data that prometheus scrapes from the different hosts.
Lets install grafana using the below
sudo apt-get install -y adduser libfontconfig1 musl
wget https://dl.grafana.com/enterprise/release/grafana-enterprise_10.4.1_amd64.deb
sudo dpkg -i grafana-enterprise_10.4.1_amd64.deb