AI4Good Hackathon 2025 - Project Submission
Project Title
Affordable Housing Affordability Index: Data-Driven Analysis for Jacksonville and Peer Cities
Team Members
Tejaswi Vemuri, Kushul Reddy Palakala, Ros Maria Edathil Francis Stanly
Problem Case
Problem Case #2: City and Tract-level Affordability Indexes
Develop comprehensive affordability indices for cities and neighborhoods (census tract-level) using housing costs, income distribution, transportation access, and walkability data. The solution provides an interactive dashboard to:
- Explore affordability trends across 25 U.S. cities
- Compare regions and identify areas with greatest affordability challenges
- Assess neighborhoods at risk of displacement
- Highlight areas requiring targeted investment
Primary Focus: Jacksonville, FL with comparisons to 24 peer cities
Relevant Images
- Dashboard: City Comparison tab showing affordability rankings
- Jacksonville Tableau Map: Interactive tract-level affordability visualization
- PCA Analysis:
- Component weights visualization (
output/pca_analysis/pca_component_weights.png) - Variance explained chart (
output/pca_analysis/pca_variance_explained.png) - Correlation heatmap (
output/pca_analysis/pca_correlation_heatmap.png)
- Component weights visualization (
- Risk Category Distribution: Pie chart showing high-risk vs. moderate-risk tracts
Inspiration for the Project
Housing affordability is a critical issue affecting millions of Americans, particularly in rapidly growing cities like Jacksonville. We were inspired by the need for:
- Data-driven decision making: Moving beyond anecdotal evidence to objective, quantifiable metrics
- Equity and transparency: Identifying vulnerable neighborhoods that need support
- Actionable insights: Providing city planners and policymakers with clear, tract-level data to target interventions
- Comparative analysis: Understanding how Jacksonville compares to peer cities to learn from best practices
The State of JAX Initiative's focus on community well-being made this the perfect opportunity to create a tool that can drive real policy change.
What the Solution Does
Our Affordable Housing Affordability Index provides:
Core Features
Comprehensive Affordability Scoring (0-100 scale)
- 0-50: High Risk (Unaffordable)
- 50-70: Moderate Risk
- 70-85: Low Risk (Affordable)
- 85-100: Very Low Risk (Highly Affordable)
Multi-dimensional Analysis - Five key components:
- Housing Cost Burden (proportion of income spent on housing)
- Income-to-Housing Ratio (income sufficiency)
- Transportation Accessibility (walkability, transit access)
- Housing Quality (physical condition, overcrowding)
- Affordable Housing Availability (subsidized units, supply)
Interactive Dashboard with:
- City Comparison: Rankings and benchmarks across 25 cities
- Tract Analysis: Detailed neighborhood-level scoring with component breakdowns
- Jacksonville Focus: Deep dive with embedded Tableau map for spatial visualization
- Data Export: Download results for further analysis or Tableau integration
100% Data-Driven Methodology
- PCA (Principal Component Analysis) variance-based weighting
- No arbitrary percentages or subjective weights
- Cross-correlation validation
- Fully transparent and statistically justified
How You Built the Solution
Data Processing Pipeline
Data Integration: Merged 6 major datasets covering 25 cities and 1,000+ census tracts
- NHPD (National Housing Preservation Database)
- HUD CHAS (Comprehensive Housing Affordability Strategy)
- U.S. Census Bureau ACS
- USDA Economic Research Service
Component Score Calculation:
- Threshold-based cost burden analysis (>30%, >50% income spent on housing)
- Min-max normalization for fair comparisons
- Inverse scaling for negative indicators
PCA Variance Analysis:
- Standardized all component scores
- Performed PCA on 5 components
- Calculated weights based on variance explained (eigenvalues)
- Validated with cross-correlation analysis
Index Aggregation:
- Weighted combination of normalized component scores
- Scaled to 0-100 for interpretability
- Risk categorization based on quartiles
Technology Stack
Backend/Analysis:
- Python 3.x
- pandas & numpy (data processing)
- scikit-learn (PCA, standardization)
- scipy (statistical analysis)
Visualization:
- Streamlit (interactive dashboard framework)
- Plotly (interactive charts)
- Tableau Public (geographic mapping)
Data Quality:
- Robust deduplication (city-by-city merging)
- Missing value handling
Challenges Your Team Ran Into
Data Quality & Deduplication
- Challenge: Multiple datasets had overlapping census tracts with different time periods, causing duplicate entries
- Solution: Implemented city-by-city merging strategy with multi-level deduplication at pivot, merge, and final aggregation stages
Risk Category Definition
- Challenge: Initial 5-category system (Very High, High, Moderate, Low, Very Low) resulted in sparse categories
- Solution: Consolidated to 4 categories by combining Very High and High Risk (0-50 range)
Statistical Weight Justification
- Challenge: Arbitrary 60/40 split between variance and correlation lacked statistical justification
- Solution: Developed 100% PCA variance-based approach - fully data-driven with no arbitrary percentages
Tableau GEOID Integration
- Challenge: Census Tract GEOIDs being interpreted as numbers instead of strings, losing leading zeros
- Solution: Explicit string conversion with CSV quoting (QUOTE_ALL) and 11-character padding
Dashboard Complexity vs. Usability
- Challenge: Balancing technical detail (PCA analysis) with user-friendly presentation
- Solution: Moved PCA details to collapsible sidebar section; focused main tabs on actionable insights
Accomplishments That the Team is Proud Of
Statistically Rigorous Methodology
- Developed a fully data-driven weighting system using PCA variance analysis
- No subjective or arbitrary decisions - everything justified by the data
- Cross-correlation validation confirms component independence
Comprehensive Geographic Coverage
- Analyzed 25 cities and 1,000+ census tracts
- Tract-level granularity enables neighborhood-specific interventions
- Embedded interactive Tableau map for Jacksonville
Actionable Insights for Jacksonville
- Identified specific high-risk tracts requiring intervention
- Provided peer city comparisons to inform best practices
- Created exportable data for policy makers and city planners
Clean, Professional Dashboard
- Intuitive three-tab interface (City Comparison, Tract Analysis, Jacksonville Focus)
- One-click data export (CSV for analysis, Tableau for mapping)
- Color-coded risk visualization (red for high risk, yellow for moderate)
Reproducible & Extensible
- Well-documented code and methodology (README.md + docs/ folder)
- Can easily add new cities or update with new data
- Modular architecture allows for custom component weights
What Your Team Learned
Technical Skills
- Advanced Data Wrangling: Handling multi-source, multi-temporal datasets with complex joins
- PCA Application: Using principal component analysis for weight optimization, not just dimensionality reduction
- Streamlit Development: Building production-ready interactive dashboards
- Tableau Integration: Embedding Tableau Public visualizations in web applications
Domain Knowledge
- Housing Affordability Metrics: Understanding HUD CHAS thresholds, cost burden calculations
- Census Data Structure: Working with FIPS codes, GEOID formatting, tract boundaries
- Spatial Analysis: Connecting quantitative metrics to geographic visualizations
Problem-Solving Approaches
- Iterative Refinement: Started with 5 risk categories, refined to 4 based on data distribution
- Data-Driven Decision Making: Let the data guide methodology (PCA weights) rather than assumptions
- User-Centered Design: Simplified complex technical details for non-technical stakeholders
Collaboration & Project Management
- Balancing Depth vs. Delivery: Knowing when to move complex analysis to documentation vs. dashboard
- Stakeholder Communication: Presenting technical methodology in accessible terms
What is the Next Step for the Project After the Hackathon
Stakeholder Engagement
- Present findings to Jacksonville city planners and housing authorities
- Gather feedback on which metrics are most actionable
- Identify specific policy interventions for high-risk tracts
Data Expansion
- Add more cities (target: 50+ U.S. cities)
- Incorporate 5-year historical trends to show affordability changes over time
- Add eviction rates and displacement indicators
Enhanced Spatial Analysis
- Integrate Census TIGER/Line shapefiles for true choropleth maps
- Add proximity analysis (distance to jobs, schools, healthcare)
- Include gentrification indicators
Predictive Modeling
- Forecast future affordability trends based on development pipelines
- Identify tracts at risk of displacement in next 2-5 years
- Simulate impact of policy interventions (e.g., rent control, subsidies)
Policy Recommendations Engine
- Develop tract-specific intervention recommendations
- Cost-benefit analysis for different affordable housing strategies
- Prioritization framework for resource allocation
Public API & Open Data
- Make index scores and methodology publicly available
- API for developers, researchers, and advocacy groups
- Quarterly updates with latest data
Mobile Application
- Community-facing app for residents to explore their neighborhood's affordability
- Notifications for new affordable housing opportunities
- Connection to local housing resources
Expand to Other Domains
- Apply methodology to healthcare access, food security, transportation equity
- Create comprehensive "Quality of Life" index
- Partner with universities for ongoing research
Advocacy & Impact
- Work with housing advocacy groups to drive policy change
- Annual "State of Affordability" report for Jacksonville
- Replicable toolkit for other cities to create their own indices
Technologies Used to Build the Solution
Programming Languages
- Python 3.13
Data Processing & Analysis
- pandas (2.0+): Data manipulation, merging, pivoting
- numpy (1.24+): Numerical operations, array processing
- scikit-learn (1.3+): PCA, StandardScaler, normalization
- scipy (1.11+): Statistical analysis, correlation matrices
Visualization & Dashboard
- Streamlit (1.25+): Interactive web dashboard framework
- Plotly (5.14+): Interactive charts (bar, pie, scatter, heatmap)
- Tableau Public: Geographic mapping with filled tract visualization
- kaleido: Exporting Plotly charts to PNG for presentations
Development Tools
- Jupyter Notebooks: Exploratory data analysis
- VS Code/Cursor: IDE for development
Data Sources & APIs
- NHPD: National Housing Preservation Database
- HUD CHAS: Comprehensive Housing Affordability Strategy
- U.S. Census Bureau ACS: American Community Survey
- USDA ERS: Economic Research Service data
Deployment & Distribution
- CSV Export: For data sharing and further analysis
- Streamlit Cloud (potential): For hosting the dashboard
- Tableau Public: For shareable interactive maps
Log in or sign up for Devpost to join the conversation.