DEV Community

Cover image for A Real-Time Earthquake Monitoring Pipeline with Kafka, MySQL, PostgreSQL, and Grafana
Denzel Kanyeki
Denzel Kanyeki

Posted on

A Real-Time Earthquake Monitoring Pipeline with Kafka, MySQL, PostgreSQL, and Grafana

In this project, I designed and built an end-to-end real-time data pipeline that monitors earthquakes from the USGS Earthquake API. The pipeline extracts and loads data into a MySQL database, captures changes via a MySQL Debezium CDC connector, streams it through into a Kafka topic, sinks it into PostgreSQL, and visualizes it in Grafana Cloud. The workflow is automated using Apache Airflow to run hourly.

You can access the GitHub repository here

Project Workflow Chart

This pipeline ensures that earthquake data is extracted, streamed, stored, and visualized in near real-time, fully automated and orchestrated with Apache Airflow to run hourly.

Project Workflow

Project Walkthrough

1. Earthquake Data Extraction from the USGS Earthquake API

The pipeline begins by calling the USGS Earthquake GeoJSON API, which provides up-to-date global earthquake data. The script:

  • Pulls earthquake events for the past 24 hours

  • Extracts key fields: magnitude, place, time, alert, coordinates, and more

  • Triggers an email alert to Gmail if the quake magnitude ≥ 4.5

This extraction script is written in Python using requests, smtplib for sending emails in Python via SMTP, sqlalchemy and pymysql.

2. Loading into MySQL Database

The extracted data is then cleaned and loaded into a MySQL database table called earthquake_data. This serves as the staging area where new and changed data is logged before flowing downstream.

The schema includes:

  • Event ID

  • Place and ___location coordinates

  • Magnitude and depth

  • Alert level

  • Timestamps (time, updated)

  • USGS metadata like URL, IDs, and status

Below is a snapshot of the data in a MySQL database:

MySQL database

3. Change Data Capture with MySQL Debezium Connector

To stream changes in the database in real time, the Debezium MySQL CDC Connector on Confluent Cloud listens to the binlog (binary log) of the MySQL database. It captures:

  • New earthquake events

  • Updates (e.g., revised magnitudes or alert levels)

  • Deletions

These changes are published to a Kafka topic in JSON_SR (JSON Schema Registry) format. This is very important as our Postgres Sink connector only accepts input in JSON_SR and not in JSON.

Below is a snapshot of the MySQL CDC connector.

MySQL connector

4. Kafka Streams Earthquake Events to a PostgreSQL Sink

Next, the PostgreSQL Sink Connector subscribes to the same Kafka topic. It:

  • Reads the after field from the change event

  • Deserializes the JSON_SR format

  • Inserts or updates the data into a PostgreSQL table for analytics

TIP: You must configure the connector to match table naming formats and use primary key fields to support upserts.

Below is a snapshot of the Postgres Sink Connector and the data in the Postgres sink:

Postgres Connector

Postgres Sink

5. Real-Time Analytics & Visualization in Grafana

The final PostgreSQL database now holds real-time earthquake records, ready to be queried.

Using Grafana Cloud, several dashboards were created, including:

  • Quakes Per Minute – a time series of seismic activity

  • Alerts by Magnitude – categorize quakes into green/yellow/orange/red

  • Interactive World Map – plot quakes by longitude/latitude using heat maps or markers

  • Top 5 Quake Locations – based on frequency

If you want to interact with the dashboard, follow this link

Below is a snapshot of the dashboard on Grafana Cloud

Grafana Dash

6. Automated Hourly Execution with Apache Airflow

The entire process is scheduled and monitored using Apache Airflow:

  • A DAG (Directed Acyclic Graph) defines the workflow

  • Runs every hour to ensure near real-time data ingestion and sends an email whenever a DAG run is successful or not

Airflow provides visibility into task failures, retries, and pipeline health via its web UI.

Below is a successful DAG run for the pipeline and email alert task.

DAG run

Why are the connectors running on Confluent Cloud?

Running our Debezium CDC Connector and Sink connector in Confluent offers a lot of advantages as opposed to managing our connectors manually with Kafka. Below is a table of the advantages that Confluent Cloud offers over self managed Kafka.

Advantage Confluent Cloud Self-Managed Kafka
Infrastructure Management Fully managed by Confluent, no need to manage servers Requires manual setup, scaling, and maintenance
Connector Management Easy UI for deploying and managing connectors Manual deployment and monitoring required
Scalability Seamless automatic scaling based on workloads Must manually configure and monitor scaling
Security Built-in enterprise-grade security (encryption, ACLs) Responsible for securing infrastructure yourself
Schema Registry Managed Schema Registry with simple integration Needs separate setup and maintenance
Monitoring & Observability Rich monitoring dashboards and alerting included Need to set up your own monitoring stack
Upgrades & Patching Handled automatically by Confluent You are responsible for patching/upgrading
High Availability & Reliability SLA-backed uptime with high availability Need to architect HA setups manually
Faster Time to Market Quick setup, lets you focus on building data pipelines Slower due to infrastructure and connector setup
Support Access to Confluent support and managed services Community support or paid enterprise support

Conclusion

This project enhanced my hands-on skills in API data extraction, real-time streaming with Kafka, and Change Data Capture (CDC) using Debezium. I gained practical experience building end-to-end data pipelines, automating workflows with Airflow, and creating real-time dashboards in Grafana. Overall, it was a great opportunity to apply data engineering concepts to a real-world monitoring use case.

If you have any questions, don't hesitate to reach out. Please leave a like, comment and follow for more informative data engineering content!

Top comments (0)