AI4Good Hackathon 2025 - Project Submission

Project Title

Affordable Housing Affordability Index: Data-Driven Analysis for Jacksonville and Peer Cities

Team Members

Tejaswi Vemuri, Kushul Reddy Palakala, Ros Maria Edathil Francis Stanly

Problem Case

Problem Case #2: City and Tract-level Affordability Indexes

Develop comprehensive affordability indices for cities and neighborhoods (census tract-level) using housing costs, income distribution, transportation access, and walkability data. The solution provides an interactive dashboard to:

  • Explore affordability trends across 25 U.S. cities
  • Compare regions and identify areas with greatest affordability challenges
  • Assess neighborhoods at risk of displacement
  • Highlight areas requiring targeted investment

Primary Focus: Jacksonville, FL with comparisons to 24 peer cities

Relevant Images

  1. Dashboard: City Comparison tab showing affordability rankings
  2. Jacksonville Tableau Map: Interactive tract-level affordability visualization
  3. PCA Analysis:
    • Component weights visualization (output/pca_analysis/pca_component_weights.png)
    • Variance explained chart (output/pca_analysis/pca_variance_explained.png)
    • Correlation heatmap (output/pca_analysis/pca_correlation_heatmap.png)
  4. Risk Category Distribution: Pie chart showing high-risk vs. moderate-risk tracts

Inspiration for the Project

Housing affordability is a critical issue affecting millions of Americans, particularly in rapidly growing cities like Jacksonville. We were inspired by the need for:

  • Data-driven decision making: Moving beyond anecdotal evidence to objective, quantifiable metrics
  • Equity and transparency: Identifying vulnerable neighborhoods that need support
  • Actionable insights: Providing city planners and policymakers with clear, tract-level data to target interventions
  • Comparative analysis: Understanding how Jacksonville compares to peer cities to learn from best practices

The State of JAX Initiative's focus on community well-being made this the perfect opportunity to create a tool that can drive real policy change.

What the Solution Does

Our Affordable Housing Affordability Index provides:

Core Features

  1. Comprehensive Affordability Scoring (0-100 scale)

    • 0-50: High Risk (Unaffordable)
    • 50-70: Moderate Risk
    • 70-85: Low Risk (Affordable)
    • 85-100: Very Low Risk (Highly Affordable)
  2. Multi-dimensional Analysis - Five key components:

    • Housing Cost Burden (proportion of income spent on housing)
    • Income-to-Housing Ratio (income sufficiency)
    • Transportation Accessibility (walkability, transit access)
    • Housing Quality (physical condition, overcrowding)
    • Affordable Housing Availability (subsidized units, supply)
  3. Interactive Dashboard with:

    • City Comparison: Rankings and benchmarks across 25 cities
    • Tract Analysis: Detailed neighborhood-level scoring with component breakdowns
    • Jacksonville Focus: Deep dive with embedded Tableau map for spatial visualization
    • Data Export: Download results for further analysis or Tableau integration
  4. 100% Data-Driven Methodology

    • PCA (Principal Component Analysis) variance-based weighting
    • No arbitrary percentages or subjective weights
    • Cross-correlation validation
    • Fully transparent and statistically justified

How You Built the Solution

Data Processing Pipeline

  1. Data Integration: Merged 6 major datasets covering 25 cities and 1,000+ census tracts

    • NHPD (National Housing Preservation Database)
    • HUD CHAS (Comprehensive Housing Affordability Strategy)
    • U.S. Census Bureau ACS
    • USDA Economic Research Service
  2. Component Score Calculation:

    • Threshold-based cost burden analysis (>30%, >50% income spent on housing)
    • Min-max normalization for fair comparisons
    • Inverse scaling for negative indicators
  3. PCA Variance Analysis:

    • Standardized all component scores
    • Performed PCA on 5 components
    • Calculated weights based on variance explained (eigenvalues)
    • Validated with cross-correlation analysis
  4. Index Aggregation:

    • Weighted combination of normalized component scores
    • Scaled to 0-100 for interpretability
    • Risk categorization based on quartiles

Technology Stack

Backend/Analysis:

  • Python 3.x
  • pandas & numpy (data processing)
  • scikit-learn (PCA, standardization)
  • scipy (statistical analysis)

Visualization:

  • Streamlit (interactive dashboard framework)
  • Plotly (interactive charts)
  • Tableau Public (geographic mapping)

Data Quality:

  • Robust deduplication (city-by-city merging)
  • Missing value handling

Challenges Your Team Ran Into

  1. Data Quality & Deduplication

    • Challenge: Multiple datasets had overlapping census tracts with different time periods, causing duplicate entries
    • Solution: Implemented city-by-city merging strategy with multi-level deduplication at pivot, merge, and final aggregation stages
  2. Risk Category Definition

    • Challenge: Initial 5-category system (Very High, High, Moderate, Low, Very Low) resulted in sparse categories
    • Solution: Consolidated to 4 categories by combining Very High and High Risk (0-50 range)
  3. Statistical Weight Justification

    • Challenge: Arbitrary 60/40 split between variance and correlation lacked statistical justification
    • Solution: Developed 100% PCA variance-based approach - fully data-driven with no arbitrary percentages
  4. Tableau GEOID Integration

    • Challenge: Census Tract GEOIDs being interpreted as numbers instead of strings, losing leading zeros
    • Solution: Explicit string conversion with CSV quoting (QUOTE_ALL) and 11-character padding
  5. Dashboard Complexity vs. Usability

    • Challenge: Balancing technical detail (PCA analysis) with user-friendly presentation
    • Solution: Moved PCA details to collapsible sidebar section; focused main tabs on actionable insights

Accomplishments That the Team is Proud Of

  1. Statistically Rigorous Methodology

    • Developed a fully data-driven weighting system using PCA variance analysis
    • No subjective or arbitrary decisions - everything justified by the data
    • Cross-correlation validation confirms component independence
  2. Comprehensive Geographic Coverage

    • Analyzed 25 cities and 1,000+ census tracts
    • Tract-level granularity enables neighborhood-specific interventions
    • Embedded interactive Tableau map for Jacksonville
  3. Actionable Insights for Jacksonville

    • Identified specific high-risk tracts requiring intervention
    • Provided peer city comparisons to inform best practices
    • Created exportable data for policy makers and city planners
  4. Clean, Professional Dashboard

    • Intuitive three-tab interface (City Comparison, Tract Analysis, Jacksonville Focus)
    • One-click data export (CSV for analysis, Tableau for mapping)
    • Color-coded risk visualization (red for high risk, yellow for moderate)
  5. Reproducible & Extensible

    • Well-documented code and methodology (README.md + docs/ folder)
    • Can easily add new cities or update with new data
    • Modular architecture allows for custom component weights

What Your Team Learned

Technical Skills

  • Advanced Data Wrangling: Handling multi-source, multi-temporal datasets with complex joins
  • PCA Application: Using principal component analysis for weight optimization, not just dimensionality reduction
  • Streamlit Development: Building production-ready interactive dashboards
  • Tableau Integration: Embedding Tableau Public visualizations in web applications

Domain Knowledge

  • Housing Affordability Metrics: Understanding HUD CHAS thresholds, cost burden calculations
  • Census Data Structure: Working with FIPS codes, GEOID formatting, tract boundaries
  • Spatial Analysis: Connecting quantitative metrics to geographic visualizations

Problem-Solving Approaches

  • Iterative Refinement: Started with 5 risk categories, refined to 4 based on data distribution
  • Data-Driven Decision Making: Let the data guide methodology (PCA weights) rather than assumptions
  • User-Centered Design: Simplified complex technical details for non-technical stakeholders

Collaboration & Project Management

  • Balancing Depth vs. Delivery: Knowing when to move complex analysis to documentation vs. dashboard
  • Stakeholder Communication: Presenting technical methodology in accessible terms

What is the Next Step for the Project After the Hackathon

  1. Stakeholder Engagement

    • Present findings to Jacksonville city planners and housing authorities
    • Gather feedback on which metrics are most actionable
    • Identify specific policy interventions for high-risk tracts
  2. Data Expansion

    • Add more cities (target: 50+ U.S. cities)
    • Incorporate 5-year historical trends to show affordability changes over time
    • Add eviction rates and displacement indicators
  3. Enhanced Spatial Analysis

    • Integrate Census TIGER/Line shapefiles for true choropleth maps
    • Add proximity analysis (distance to jobs, schools, healthcare)
    • Include gentrification indicators
  4. Predictive Modeling

    • Forecast future affordability trends based on development pipelines
    • Identify tracts at risk of displacement in next 2-5 years
    • Simulate impact of policy interventions (e.g., rent control, subsidies)
  5. Policy Recommendations Engine

    • Develop tract-specific intervention recommendations
    • Cost-benefit analysis for different affordable housing strategies
    • Prioritization framework for resource allocation
  6. Public API & Open Data

    • Make index scores and methodology publicly available
    • API for developers, researchers, and advocacy groups
    • Quarterly updates with latest data
  7. Mobile Application

    • Community-facing app for residents to explore their neighborhood's affordability
    • Notifications for new affordable housing opportunities
    • Connection to local housing resources
  8. Expand to Other Domains

    • Apply methodology to healthcare access, food security, transportation equity
    • Create comprehensive "Quality of Life" index
    • Partner with universities for ongoing research
  9. Advocacy & Impact

    • Work with housing advocacy groups to drive policy change
    • Annual "State of Affordability" report for Jacksonville
    • Replicable toolkit for other cities to create their own indices

Technologies Used to Build the Solution

Programming Languages

  • Python 3.13

Data Processing & Analysis

  • pandas (2.0+): Data manipulation, merging, pivoting
  • numpy (1.24+): Numerical operations, array processing
  • scikit-learn (1.3+): PCA, StandardScaler, normalization
  • scipy (1.11+): Statistical analysis, correlation matrices

Visualization & Dashboard

  • Streamlit (1.25+): Interactive web dashboard framework
  • Plotly (5.14+): Interactive charts (bar, pie, scatter, heatmap)
  • Tableau Public: Geographic mapping with filled tract visualization
  • kaleido: Exporting Plotly charts to PNG for presentations

Development Tools

  • Jupyter Notebooks: Exploratory data analysis
  • VS Code/Cursor: IDE for development

Data Sources & APIs

  • NHPD: National Housing Preservation Database
  • HUD CHAS: Comprehensive Housing Affordability Strategy
  • U.S. Census Bureau ACS: American Community Survey
  • USDA ERS: Economic Research Service data

Deployment & Distribution

  • CSV Export: For data sharing and further analysis
  • Streamlit Cloud (potential): For hosting the dashboard
  • Tableau Public: For shareable interactive maps

Built With

Share this project:

Updates