Infrastructure Requirement
AlgoTrading
User Requirement
1. Create a separate Database
1. Historical for Back Testing
1. From 2023 to 2025 5,15, and 60min
2. Live with HA, High Performance and Low Latency.
2. Create a system that will execute algo trade with in 8 sec
1. Derived with in 5 Sec
2. Execute buy/ sell order 3 Sec
3. MIS for trade.
1. API Development
2. UI Design and Development for Data Visualisation
3. Strategy simulation
Historical Data Execute Trade
Logical System
Live Data Back-Testing/ MIS
User Requirement
1. Create a separate Database
1. Historical for Back Testing
2. For 5,15, and 60min data should be fetched From Jan 2023 to Jan 2025 2 Years ( Derived From March ) and continues.
3. For Historical if required we need to update historical data backwards for 5,15 and 60minutes
4. For Day, Week and Month data should be fetched from Last 25 Years.
5. Data should be update on nightly for 3000 symbols.
2. Live with HA, High Performance and Low Latency.
1. 3000 Symbols for live market.
2. System should be scalable beyond 3000 symbols with minimal effort.
3. Create a system that will execute algo trade with in 8 sec
1. OHLC, Calculated and Derived with in 5 Sec for all the timeframes daily (5min, 15min and 60min)
2. Execute buy/ sell order 3 Sec
4. MIS for trade.
1. API Development for Transaction, Historic and live 1 minute candle onward ( Expectation to build stream websocket
date to Dashboard.
2. UI Design and Development for Data Visualisation
3. Strategy simulation
Current Architecture
DigitalOcean KiteConnect
Websocket Data
Historic
Historic Data
MongoDB Droplet VM
Live Python WebSocket
Historic
Systemd Scheduler
Orders
Droplet
Buy & Sell Logic
Problems we are facing with current architecture
1. Managing multiple scheduler/services
2. System uptime/availability in case of failure or temporarily pause how to get the system
up and running
3. Avoiding data loss
4. Detecting scheduler/process failure
5. Cost effectiveness of the current system
6. Single database for live and historical, should it be separated ?
7. Performance issues in terms of logic execution.
8. System scalability validation to be validate in less time
Architecture Long-term
DigitalOcean KiteConnect
Websocket Data
Historic
MongoDB HA Kubernetes Cluster
Historic Data
Historic Apache Airflow WebSocket
Transactional
Transactional API WebApp
Transactional
Orders
Kafka HA Redis
Producer Live
Consumer Data for derived
Droplet Droplet
Monitoring Buy & Sell Logic
How the proposed architecture is solving the problem?
1. Managing multiple scheduler/services.
• Apache Airflow: Centralizes workflow orchestration, reducing the need for multiple independent
schedulers.
• Kubernetes Cluster: Manages and scales services dynamically, ensuring multiple services run
smoothly without manual intervention.
2. System uptime/availability in case of failure or temporarily pause how to get
the system up and running.
• Kafka HA (High Availability): Guarantees message durability and availability even if part of the
system fails.
• Redis: Ensures quick recovery of live data, as it is cached and accessible without re-fetching from the
source.
• Kubernetes: Automatically restarts failed pods and scales services as needed, ensuring minimal
downtime.
• MongoDB HA: High-availability configuration ensures that the database remains operational during
node failures.
How the proposed architecture is solving the problem?
3. Avoiding Data Loss
• Kafka: Acts as a durable buffer for real-time data streams. Data is persisted in Kafka topics until
consumed and processed, ensuring no data loss.
• MongoDB: Reliable storage for historical and transactional data, ensuring data persistence over time.
4. Detecting Scheduler/Process Failure
• Kubernetes Health Checks: Continuously monitors the health of deployed services and restarts failed
pods.
• Apache Airflow Monitoring: Provides task-level monitoring and alerts on failed jobs or processes.
• Kafka Consumer Lag: Detects delays or failures in data processing pipelines by monitoring lag
between producer and consumer.
How the proposed architecture is solving the problem?
5. Cost-Effectiveness of the Current System
• DigitalOcean: Simple and predectible pricing.
• Separation of Concerns: Different tools (Kafka, Redis, MongoDB) are used for specific purposes,
optimizing performance and cost.
• Kubernetes: Enables horizontal scaling, ensuring resources are used efficiently based on workload
demands.
6. Single Database for Live and Historical: Should It Be Separated
• Separate Databases:
• Use separate MongoDB instances or collections for live and historical data.
• Redis already offloads live data, but separating MongoDB instances further enhances
performance.
Resources and skill required
Designation Responsibility Skill Required