🛠️ Weekend Project: Building a Simple S3 Uploader with ELK & SQLite Logging
This weekend, I gave myself a challenge — build a small Python app that uploads a document to AWS S3 and tracks everything it does in the process.
But, being me, I couldn’t stop at just uploading. I also wanted to log and persist that upload — both locally in SQLite and visually in Elasticsearch using Kibana.
Let me walk you through how it came together, with code and context.
🔧 What I Wanted to Build
- A simple Flask web app that allows file uploads.
- Upload the file to an S3 bucket.
- Save metadata (filename, upload time, IP, S3 URL) to:
- 🗃️ SQLite database
- 📊 Elasticsearch
- Visualize logs in Kibana (with Docker Compose).
📦 Step 1: Installing Required Packages
pip install flask sqlalchemy boto3 elasticsearch python-dotenv
These libraries do the heavy lifting:
flask
: for the web interfacesqlalchemy
: to interact with SQLiteboto3
: to talk to AWS S3elasticsearch
: to send data to Elasticsearchpython-dotenv
: to load AWS credentials from a.env
file
🧱 Step 2: Setting Up SQLite with SQLAlchemy
# models.py
from sqlalchemy import create_engine, Column, String, DateTime
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from datetime import datetime
engine = create_engine('sqlite:///uploads.db')
Base = declarative_base()class Upload(Base):
__tablename__ = 'uploads'
filename = Column(String, primary_key=True)
s3_url = Column(String)
upload_time = Column(DateTime, default=datetime.utcnow)
ip = Column(String)
What’s happening here?
- We define a SQLite database called
uploads.db
. - We define a
Upload
model/table to store info about each file: filename
: the name of the files3_url
: the full S3 URL after uploadupload_time
: timestampip
: IP address of the uploader
Next, we initialize and write metadata to the DB
Session = sessionmaker(bind=engine)
session = Session()
def init_db():
Base.metadata.create_all(engine)def save_metadata(data):
upload = Upload(**data)
session.add(upload)
session.commit()
✅ init_db()
sets up the table
✅ save_metadata()
saves the data after each upload
☁️ Step 3: Uploading Files to AWS S3
# aws.py
import boto3
import os
from dotenv import load_dotenv
load_dotenv()AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
BUCKET = os.getenv("S3_BUCKET_NAME")
Here, we load the AWS credentials from a .env
file.
Then we initialize the S3 client:
s3 = boto3.client(
's3',
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY
)
And define the upload function:
def upload_to_s3(file):
s3.upload_fileobj(file, BUCKET, file.filename)
return f"https://{BUCKET}.s3.amazonaws.com/{file.filename}"
This function takes in a file object, uploads it to S3, and returns its public URL.
📊 Step 4: Logging to Elasticsearch
# elk_logger.py
from elasticsearch import Elasticsearch
from datetime import datetime
es = Elasticsearch([{'host': 'localhost', 'port': 9200, 'scheme': 'http'}])
We connect to a local Elasticsearch instance running on port 9200
.
Then define the log function:
def log_to_elk(metadata):
doc = {
"filename": metadata["filename"],
"s3_url": metadata["s3_url"],
"ip": metadata["ip"],
"upload_time": datetime.utcnow()
}
es.index(index="uploads", document=doc)
It stores each upload log as a document under the uploads
index.
🐳 Step 5: Setting Up ELK Stack with Docker Compose
# docker-compose.yml
version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.6.2
container_name: es
environment:
- discovery.type=single-node
- xpack.security.enabled=false
ports:
- 9200:9200
kibana:
image: docker.elastic.co/kibana/kibana:8.6.2
container_name: kibana
ports:
- 5601:5601
depends_on:
- elasticsearch
This YAML file spins up two containers:
elasticsearch
: stores logskibana
: visualizes logs on port 5601
To run it, just:
docker-compose up -d🔁 Final Integration with Flask
In app.py
, we:
- Accept file uploads
- Upload to S3
- Collect metadata
- Log to both SQLite and Elasticsearch
Here’s a very basic structure:
from flask import Flask, request
from aws import upload_to_s3
from models import init_db, save_metadata
from elk_logger import log_to_elk
app = Flask(__name__)
init_db()@app.route('/upload', methods=['POST'])
def upload():
file = request.files['file']
ip = request.remote_addr
s3_url = upload_to_s3(file) metadata = {
"filename": file.filename,
"s3_url": s3_url,
"ip": ip
} save_metadata(metadata)
log_to_elk(metadata) return {"message": "Uploaded successfully", "url": s3_url}
🧠 What I Learned
- Setting up AWS S3 with
boto3
is straightforward when credentials are managed properly. - SQLAlchemy makes managing local data easy and readable.
- Elasticsearch and Kibana offer incredible observability — even for small projects.
- Docker Compose is perfect for local ELK setups — one command, and your stack is up.
Challenges Faced
Version Mismatch Between Python Client and Elasticsearch Server:
- I initially installed an incompatible Elasticsearch Python client version (v9.0.1), which caused issues with request headers. The solution was to install the matching version (v8.6.2) to ensure compatibility.
Incompatibility with Python 3.13:
- The
collections
module changed in Python 3.13, causing import errors. This was resolved by modifying the import statement to usecollections.abc
instead ofcollections
.
Port Conflicts:
- The default Flask port (5000) was already in use by another application. I fixed this by either killing the conflicting process or running Flask on a different port.
S3 Bucket Policy and IAM Permissions Issues:
- The IAM user lacked proper permissions to upload to the S3 bucket. I had to add an IAM policy for the user in addition to the S3 bucket policy to allow
s3:PutObject
actions.
Incorrect Accept Headers in Elasticsearch Client Requests:
- Due to version differences between the Elasticsearch client and server, I needed to specify the correct
Accept
headers or align the client and server versions.
Missing Python Module:
- I encountered a
ModuleNotFoundError
due to missing dependencies. The fix was to install the necessary Python packages within a virtual environment.
Mixing System Python with Virtual Environment:
- I sometimes ran scripts with the global Python environment rather than the virtual environment, which led to dependency issues. Ensuring the virtual environment was active resolved this.
Lack of Documentation:
- Initially, there was no README or clear documentation for setting up or running the project. I created a detailed README and flowchart to outline the steps and architecture.
🎁 Bonus: Why This Is a Great DevOps Starter Project
This mini-project gave me exposure to:
✅ Flask + REST APIs
✅ AWS S3
✅ SQLAlchemy with SQLite
✅ Elasticsearch for centralized logging
✅ Docker & Docker Compose
✅ Environment management with .env
Want the code repo or a follow-up tutorial on deploying this to the cloud? Let me know!