NAIROBI AIR QUALITY DATA MANAGEMENT SYSTEM
System Architecture Document
1. Introduction
1.1. Purpose
This document outlines the system architecture designed to create a robust,
scalable, and flexible air quality data management platform for Nairobi City
County Government. The system aims to collect, store, process, and display
air quality data obtained from specific sensors and/or reference monitors
across Nairobi County, such as AirQo and BAM. It will ensure accurate,
reliable, and real-time data management for both public and governmental
use.
1.2. Scope
The system provides two main functionalities:
1. Public Data Portal: Displays real-time and historical air quality data for
public awareness.
2. County Management Dashboard: Allows county officials to generate
reports, monitor air quality trends and make data-driven decisions.
2. System Flow Diagrams
3. System Architecture Overview
3.1 High-Level System Design
The system consists of the following core components:
● Data Collection Layer: Sensors and reference monitors are placed
strategically across Nairobi.
● Data Ingestion & Processing Layer: Ingestion and processing of the
real-time sensor data via our API.
● Storage Layer: Secure cloud database for structured and unstructured
air quality data.
● Application Layer: Public portal and county dashboard with
user-friendly interfaces.
● Security & Compliance Layer: Authentication, authorisation and data
protection measures.
3.1.1 Data Collection Layer
● Sensors & reference monitors: AirQo and BAM devices deployed at
various locations.
● Data transmission: Sensors send real-time data via GSM or Wi-Fi.
3.1.2 Data Ingestion & Processing layer
● Sophisticated streaming data pipeline: Data from sensors is ingested
into our pipeline
● Data Cleaning & Validation: Automated scripts to remove anomalies
and ensure data accuracy
● APIs & Microservices: Middleware to process and distribute data to
relevant endpoints
This allows for scalability & seamless integration with different sensor models.
3.1.3 Storage Layer
● Database: Scalable PostgreSQL storage for structured and
unstructured data
● Backup & redundancy: Automated backup strategies to prevent data
loss.
3.1.4 Application Layer
● Public web portal: Interactive dashboard for real-time and historical air
quality data
● County management dashboard: Advanced reporting tools for county
officials
● Mobile Access: Responsive design for mobile compatibility
3.1.5 Security & Compliance Layer
● User authentication & authorisation: Role-based access control
(RBAC) for different user levels.
● Data encryption: Secure encryption for data in transit and at rest.
● Regulatory compliance: Adherence to national and international air
quality data regulations.
● Monitoring and logging: For real-time alerts and system health checks
4. Architectural Components
4.1 Data Collection Layer
● Sensors & Reference Monitors: AirQo and BAM devices deployed at
various locations.
● Data Transmission: Sensors send real-time data via GSM or Wi-Fi.
Data Ingestion Layer
Responsibilities:
● Collect data from multiple sensor networks.
● Support multiple data formats and protocols
● Handle real-time and batch data inputs
Key Features:
● Data source authentication
● Error handling and retry mechanisms
● RS32-Serial logger/reader
● Communication with sensor endpoints
Technologies:
● Apache Kafka
4.2 Data Validation & Normalisation
Responsibilities:
● Standardise incoming data by use of a universal schema.
● Perform quality checks
● Apply calibration factors
● Flag anomalous readings
Key Algorithms:
● Statistical outlier detection
● Sensor-specific calibration models
● Unit conversion utilities
Technologies:
● Python with Pandas for data processing
● Custom validation rule engines
4.3 Central Data Storage
Database Strategy:
● TimeScaleDB (Time-series optimised PostgreSQL extension)
Data Model:
● Sensor metadata table
● Time-series measurements table
Storage Considerations:
● Multi-tier storage strategy
● Automated archiving of historical data
Data Retention Policy:
● 5 years of historical data archived.
4.4 Analytics Engine
Capabilities:
● Trend analysis
● Pollution source identification
● Predictive modeling
Technologies:
● Python for statistical analysis
4.5 API Layer
Features:
● RESTful and GraphQL endpoints
● Rate limiting
● Usage tracking
● Multiple output & input formats (JSON, CSV)
Integration Points:
● OpenAQ
● Other research platforms (World Air Quality Index)
4.6 Reporting & Visualisation
Dashboard Components:
● Real-time air quality index
● Historical trend graphs
● Geospatial heatmaps
● Comparative analysis tools
Technologies:
● Next.js for frontend
5. Infrastructure Specifications
5.1 Hosting
5.1.1 Hardware Requirements
Component Specifications Purpose Quantity
Application 8 vCPUs, 32GB Hosting core 2
Servers RAM, 500GB SSD application
components
Database Server 16 vCPUs, 64GB Primary data 1
RAM, 2TB SSD with storage
RAID 10
Load Balancer 4 vCPUs, 8GB RAM Traffic distribution 2
Backup Storage 8 TB storage System and data 1
capacity backups
Network Enterprise-grade Network security 1
Equipment router and firewall and management
5.1.2 Network Requirements
● Minimum bandwidth: 100Mbps dedicated connection
● VPN capability for secure remote administration
● Static IP addresses for production servers
● Primary and secondary DNS servers
● Organisation-validated (OV) SSL for data encryption and site
verification that ensures we cover all subdomains
5.1.3 Software Requirements
● Server OS: The primary server operating system should be the
Ubuntu server 22 LTS
5.2 Scalability
● Load balancers
5.3 Monitoring
● Real-time alerting system
6. Security Considerations
● End-to-end encryption
● Role-based access control
● Regular security audits