Skip to content

Instantly share code, notes, and snippets.

@simonw
Created May 21, 2025 21:16
Show Gist options
  • Save simonw/543e4322c7a3144afb1cc2d685036742 to your computer and use it in GitHub Desktop.
Save simonw/543e4322c7a3144afb1cc2d685036742 to your computer and use it in GitHub Desktop.

2025-05-21T21:13:19 conversation: 01jvtb2n08df3cry05mp402zeq id: 01jvtb33qnh2fdre8avykd39nx

Model: devstral:latest

Prompt

Write Python code to download a CSV from a URL and write it to a new table in a SQLite database, create the table

Response

Sure! To accomplish this task, you'll need to perform the following steps:

  1. Download the CSV file from the given URL
  2. Parse the CSV data
  3. Create a new table in an SQLite database with appropriate columns based on the CSV content
  4. Insert the parsed data into the newly created table

Below is a Python script that performs all these tasks using the requests, pandas, and sqlite3 libraries:

import requests
import pandas as pd
import sqlite3
from io import StringIO

def download_csv(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.text
    else:
        raise Exception(f"Failed to download CSV: HTTP {response.status_code}")

def create_table_from_csv(conn, csv_data, table_name):
    # Parse the CSV data into a pandas DataFrame
    df = pd.read_csv(StringIO(csv_data))

    # Create the SQL statement for creating the table based on DataFrame columns
    column_definitions = ', '.join([f"{col} TEXT" for col in df.columns])
    create_table_sql = f"CREATE TABLE IF NOT EXISTS {table_name} ({column_definitions});"

    # Execute the CREATE TABLE statement
    with conn:
        conn.execute(create_table_sql)

def insert_csv_data_into_table(conn, csv_data, table_name):
    # Parse the CSV data into a pandas DataFrame
    df = pd.read_csv(StringIO(csv_data))

    # Insert the data into the SQLite table
    df.to_sql(table_name, conn, if_exists='append', index=False)

def main(url, db_file, table_name):
    # Step 1: Download the CSV file
    csv_data = download_csv(url)

    # Step 2: Connect to the SQLite database (create it if it doesn't exist)
    conn = sqlite3.connect(db_file)

    try:
        # Step 3: Create a new table based on the CSV data
        create_table_from_csv(conn, csv_data, table_name)

        # Step 4: Insert the CSV data into the newly created table
        insert_csv_data_into_table(conn, csv_data, table_name)
    finally:
        conn.close()

# Example usage
url = 'https://example.com/path/to/your/csvfile.csv'
db_file = 'your_database.db'  # Path to your SQLite database file
table_name = 'your_table_name'  # Name of the new table

main(url, db_file, table_name)

This script does the following:

  1. Downloads the CSV content from a given URL.
  2. Connects to an SQLite database (creates it if it doesn't exist).
  3. Creates a new table in the SQLite database based on the columns of the downloaded CSV file.
  4. Inserts the data from the CSV file into this newly created table.

Make sure you have the required libraries installed:

pip install requests pandas sqlite3

You can modify the url, db_file, and table_name variables in the example usage section to fit your specific use case.

Token usage:

1,250 input, 654 output

2025-05-21T21:14:00

Prompt

use asyncio and httpx and the standard library csv module

Response

Sure, let's rewrite the script using asyncio, httpx for asynchronous HTTP requests, and the standard library's csv module for parsing CSV data.

Here's the updated code:

import asyncio
import httpx
import sqlite3
import csv
from io import StringIO

async def download_csv(url):
    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        if response.status_code == 200:
            return response.text
        else:
            raise Exception(f"Failed to download CSV: HTTP {response.status_code}")

def create_table_from_csv(conn, csv_data, table_name):
    # Read the CSV data into a list of rows and determine column names
    reader = csv.reader(StringIO(csv_data))
    columns = next(reader)

    # Create the SQL statement for creating the table based on column names
    column_definitions = ', '.join([f"{col} TEXT" for col in columns])
    create_table_sql = f"CREATE TABLE IF NOT EXISTS {table_name} ({column_definitions});"

    # Execute the CREATE TABLE statement
    with conn:
        conn.execute(create_table_sql)

def insert_csv_data_into_table(conn, csv_data, table_name):
    reader = csv.reader(StringIO(csv_data))
    columns = next(reader)  # Get column names

    # Create an INSERT statement template
    placeholders = ', '.join(['?'] * len(columns))
    insert_sql = f"INSERT INTO {table_name} VALUES ({placeholders});"

    with conn:
        for row in reader:
            conn.execute(insert_sql, row)

async def main(url, db_file, table_name):
    # Step 1: Download the CSV file
    csv_data = await download_csv(url)

    # Step 2: Connect to the SQLite database (create it if it doesn't exist)
    conn = sqlite3.connect(db_file)

    try:
        # Step 3: Create a new table based on the CSV data
        create_table_from_csv(conn, csv_data, table_name)

        # Step 4: Insert the CSV data into the newly created table
        insert_csv_data_into_table(conn, csv_data, table_name)
    finally:
        conn.close()

# Example usage with asyncio event loop
if __name__ == "__main__":
    url = 'https://example.com/path/to/your/csvfile.csv'
    db_file = 'your_database.db'  # Path to your SQLite database file
    table_name = 'your_table_name'  # Name of the new table

    asyncio.run(main(url, db_file, table_name))

This script does the following:

  1. Downloads the CSV content from a given URL using httpx with asynchronous HTTP requests.
  2. Connects to an SQLite database (creates it if it doesn't exist).
  3. Creates a new table in the SQLite database based on the columns of the downloaded CSV file using the standard library's csv module.
  4. Inserts the data from the CSV file into this newly created table.

You'll need to install the httpx library:

pip install httpx

Modify the url, db_file, and table_name variables in the example usage section to fit your specific use case.

Token usage:

1,919 input, 702 output

2025-05-21T21:15:13

Prompt

suggest ways I could detect the right column types, no code just describe potential approaches

Response

Detecting the correct column types for a CSV file can be a bit challenging due to the variety of data formats and structures. Here are several approaches you could consider:

1. Sample-Based Type Detection

  • Description: Analyze a sample of rows (e.g., first 100 or 1,000 rows) in the CSV file to infer column types.
  • Pros: Faster and less memory-intensive than analyzing the entire dataset.
  • Cons: May not capture all possible values if they are rare.

2. Full Dataset Analysis

  • Description: Examine every value in each column of the full dataset to determine the most likely type for each column.
  • Pros: More accurate since it considers the entire range of data.
  • Cons: Can be slow and memory-intensive for very large CSV files.

3. Schema Inference Libraries

  • Description: Use existing libraries designed for schema inference, such as pandas' built-in type detection or specialized libraries like csv-schema.
  • Pros: Leverages optimized algorithms and existing tools.
  • Cons: May not be perfect and could still require tweaks.

4. Custom Rules-Based Detection

  • Description: Implement custom logic to detect types based on predefined rules (e.g., if a value matches a date pattern, it's a date; if it starts with "+" or "-", it might be an integer).
  • Pros: Can be highly customized and tailored to the specific dataset.
  • Cons: Requires significant effort to implement and maintain.

5. Machine Learning-Based Detection

  • Description: Use machine learning models trained on labeled datasets to predict column types based on patterns in the data.
  • Pros: Potentially more accurate if the model is well-trained.
  • Cons: Complex to set up and may not generalize well across different CSV structures.

6. Type Hints from Metadata

  • Description: If available, use metadata or schema information provided alongside the CSV file (e.g., a schema description in JSON format).
  • Pros: Directly informed by human knowledge about the data structure.
  • Cons: Not always available and may require additional steps to process.

7. User Input and Feedback

  • Description: Provide an initial guess for column types based on automated detection, then allow users to review and adjust these types as needed.
  • Pros: Combines automation with human expertise.
  • Cons: Requires user intervention.

Each approach has its trade-offs in terms of accuracy, complexity, and performance. Depending on your specific use case, you might choose one or combine multiple approaches for the best results.

Token usage:

2,640 input, 566 output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment