Production-Grade Crypto Market Intelligence Pipeline
TrendLab is an end-to-end automated pipeline designed to analyze cryptocurrency market data. It handles the complete lifecycle of financial data processing—from ingestion and technical analysis to machine learning inference and reporting.
This project serves as a reference implementation for a robust, scalable, and "production-ready" quantitative research environment. It is designed for engineers and researchers who need a stable foundation to experiment with ML strategies without the overhead of building the underlying infrastructure from scratch.
The pipeline operates in four distinct stages:
- Ingestion: resiliently fetches historical market data (Price, Volume, Market Cap) for assets like Bitcoin and Ethereum using external providers (CoinGecko).
- Processing: Computes standard technical indicators including RSI, Moving Averages (SMA 50/200), Volatility, and Drawdown metrics.
- Machine Learning: Trains predictive models (Logistic Regression) to forecast short-term market direction (Up/Down) using strictly validated time-series data.
- Reporting: Generates automated insights in Markdown and JSON formats, classifying market regimes (Trending/Ranging) and identifying risk signals.
- Clean Architecture: The codebase follows a Hexagonal Architecture (Ports & Adapters) pattern, ensuring strict separation between domain logic, application orchestration, and infrastructure concerns.
- Code Quality: Enforces high standards using
rufffor linting,mypyfor static type checking, andpytestfor comprehensive unit and functional testing. - CI/CD: Fully automated GitHub Actions pipelines handle testing, linting, Docker image building, and deployment.
- Infrastructure as Code (IaC): The project is cloud-agnostic and scalable, featuring Terraform scripts for AWS EKS/Azure AKS provisioning and Helm charts for Kubernetes deployment.
- Prevention of Look-Ahead Bias: A critical flaw in many financial ML projects is training on future data. TrendLab strictly enforces
TimeSeriesSplitand careful target shifting to ensure valid out-of-sample testing. - Modular Feature Engineering: The system is designed to allow new technical indicators or external data sources to be plugged in without refactoring the core pipeline.
- Containerization: Fully Dockerized application ensuring consistency across development, testing, and production environments.
- Orchestration: Ready for horizontal scaling via Kubernetes, allowing parallel processing of multiple assets.
We have identified several areas for future development to evolve TrendLab from a solid framework into a high-performance trading engine:
- Advanced ML Models: Transition from baseline Logistic Regression to non-linear models like XGBoost, LSTMs, or Transformers to capture complex market dynamics.
- Alternative Data: Integrate on-chain metrics, social sentiment analysis, and macroeconomic indicators to improve predictive signal-to-noise ratio.
- Enterprise Data Layer: Replace local Parquet persistence with a scalable solution using S3-compatible object storage and a time-series database (e.g., TimescaleDB) to handle terabytes of tick-level data.
- Backtesting Engine: Implement a full event-driven backtester to simulate PnL, slippage, and fees, providing a realistic assessment of strategy profitability beyond simple directional accuracy.
- API Scaling: Implement an internal caching proxy or upgrade data providers to handle high-frequency requests without hitting rate limits.
- Python 3.9+
- Docker & Docker Compose
- Poetry (for dependency management)
The easiest way to run the full service locally:
make build
make upThe API will be available at http://localhost:8080.
Trigger a pipeline run:
make run-localCheck logs:
docker-compose logs -fmake setup
# Run the pipeline for Bitcoin and Ethereum
poetry run trendlab run --assets btc --assets eth --days 365- Kubernetes: Helm charts for Dev, Hml, and Prd environments are located in
deploy/helm. - Terraform: Infrastructure definitions for AWS and Azure are available in
infra/. - CI/CD: Workflows defined in
.github/workflows.
MIT