0% found this document useful (0 votes)
27 views19 pages

Trainity SQL Project Final

The document outlines a Data Analyst simulation project focused on operational analytics using advanced SQL techniques. Key tasks include analyzing job data, detecting duplicates, and tracking user engagement and growth metrics. Tools utilized include MySQL Workbench, Google Drive, and GitHub for data analysis and result storage.

Uploaded by

samlibraryvp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views19 pages

Trainity SQL Project Final

The document outlines a Data Analyst simulation project focused on operational analytics using advanced SQL techniques. Key tasks include analyzing job data, detecting duplicates, and tracking user engagement and growth metrics. Tools utilized include MySQL Workbench, Google Drive, and GitHub for data analysis and result storage.

Uploaded by

samlibraryvp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Operational Analytics & Metric Investigation

Trainity SQL Project | Data Analyst Simulation


Project Description

• This project simulates a Data Analyst role at a tech firm.

• Goal: Use advanced SQL to derive insights from operational data.


• Scope: Job Data Analysis & Investigating Metric Spikes.
• Key Tasks: Analyzing trends, spotting anomalies, and reporting insights.
Tools & Technologies Used

• - MySQL Workbench 8.0 CE


• - CSV data imports
• - SQL for analysis and metrics derivation
• - Google Drive for result storage
• - GitHub for version control
Jobs Reviewed Per Hour (Nov 2020)

• Query:
• SELECT COUNT(job_id)/(30*24) AS jobs_reviewed_hourly FROM job_data;
• Output: 0.0111

• Query:
• SELECT COUNT(DISTINCT job_id)/(30*24) AS jobs_reviewed_hourly_distinct FROM job_data;
• Output: 0.0083
Throughput: 7-Day Rolling Average

• Used window functions for rolling average:


• - COUNT(job_id) per day
• - AVG OVER 6 preceding rows

• Reasoning: 7-day average smooths out anomalies better than daily spikes.
• Sample Output:
• 25-11: 1.0 | 30-11: 1.33
Language Distribution (Last 30 Days)

• Query:
• SELECT language, COUNT(*) / (SELECT COUNT(*) FROM job_data) * 100 AS percentage
• Results (approx.):
• - Persian: 37.5%
• - English, Arabic, Hindi, French, Italian: ~12.5% each
Detecting Duplicates

• Used ROW_NUMBER() OVER (PARTITION BY job_id)


• Filtered for row_num > 1 to find duplicates.
• Sample Duplicates:
• - Job ID 23 appeared multiple times with same content.
Weekly User Engagement

• Query:
• EXTRACT(WEEK FROM occurred_at), COUNT(DISTINCT user_id)
• Sample Output:
• Week 18: 791 | Week 31: 1685
User Growth Tracking

• Query includes:
• - EXTRACT(YEAR, WEEK)
• - COUNT(DISTINCT active users)
• - Cumulative SUM
• Total Active Users: 9381 (2013–2014)
Weekly Retention (Signup Cohorts)

• Used JOIN on signup and engagement week.


• Filtered for 'complete_signup' and 'engagement'.
• Calculated retention_week = engagement_week - signup_week.
Engagement By Device (Weekly)

• Grouped by year, week, and device type.


• COUNT(DISTINCT user_id) per device.
• Used to analyze device-specific engagement trends.
Email Engagement Metrics

• Defined Categories:
• - Sent: 'sent_weekly_digest', 'sent_reengagement_email'
• - Opened: 'email_open'
• - Clicked: 'email_clickthrough'

• Metrics:
• - Opening Rate ≈ SUM(opened)/SUM(sent)
• - Click Rate ≈ SUM(clicked)/SUM(sent)
Conclusion & Learnings

• Applied SQL for operational and behavioral insights.


• Analyzed engagement trends, language shares, and anomalies.
• Improved understanding of rolling metrics, retention analysis, and duplicate checks.

You might also like