Lab - Exploring DataLake With Athena and Quicksight PDF
Lab - Exploring DataLake With Athena and Quicksight PDF
1
Table of Contents
Introduction .......................................................................................................................... 2
Query with Athena.........................................................................................................................3
Quick Sight ....................................................................................................................................7
1
Introduction
This lab introduces you to AWS Glue, Amazon Athena, and Amazon QuickSight. AWS Glue is a
fully managed data catalog and ETL service; Amazon Athena queries data; and Amazon
QuickSight provides visualization of the data you import.
Prerequisites
The DMS Lab is a prerequisite for this lab.
Getting Started
In this lab, you will complete the following tasks:
2
Query Data with Amazon Athena
1. In the AWS services console, search for Athena.
2. In the Query Editor, select your newly created database e.g., "ticketdata_george”.
Note: The type for fields id, sporting_event_id and ticketholder_id should be (double).
3
Next, we will query across tables parquet_sporting_event, parquet_sport_team, and
parquet_sport location.
4. Copy the following SQL syntax into the New Query 1 tab and click Run Query.
SELECT
e.id AS event_id,
e.sport_type_name AS sport,
e.start_date_time AS event_date_time,
h.name AS home_team,
a.name AS away_team,
l.name AS location,
l.city
FROM parquet_sporting_event e,
parquet_sport_team h,
parquet_sport_team a,
parquet_sport_location l
WHERE
e.home_team_id = h.id
AND e.away_team_id = a.id
AND e.location_id = l.id;
4
5. Click Create view from query.
6. Name the view "sporting_event_info" and click Create.
7. Copy the following SQL syntax into the New Query 2 tab and click Run Query.
5
SELECT t.id AS ticket_id,
e.event_id,
e.sport,
e.event_date_time,
e.home_team,
e.away_team,
e.location,
e.city,
t.seat_level,
t.seat_section,
t.seat_row,
t.seat,
t.ticket_price,
p.full_name AS ticketholder
FROM sporting_event_info e,
parquet_sporting_event_ticket t,
parquet_person p
WHERE
t.sporting_event_id = e.event_id
AND t.ticketholder_id = p.id
10. Copy the following SQL syntax into the New Query 3 tab and click Run Query.
SELECT
sport,
count(distinct location) as locations,
count(distinct event_id) as events,
count(*) as tickets,
avg(ticket_price) as avg_ticket_price
FROM sporting_event_ticket_info
GROUP BY 1
ORDER BY 1;
6
You query returns two results in approximately five seconds. The query scans 25MB of data,
which prior to converting to parquet, would have been 1.59GB of CSV files.
If this is the first time you have used QuickSight, you are prompted to create an account.
2. Click Sign up for QuickSight.
7
3. For account type, choose Standard.
4. Click Continue.
5. On the Create your QuickSight account page, fill out your name and email address.
6. Select region and the check boxes to enable autodiscovey, Amazon Athena, and Amazon
S3.
7. Click Choose S3 buckets and select your DMS bucket (e.g., "dms-lab-george").
8. Click Finish.
8
9. On the QuickSight landing page, click Manage Data.
10. Click New Data Set.
11. On the Create a Data Set page, select Athena as the data source.
12. For Data source name, type “ticketdata" and click Validate connection.
13. Click Create data source.
9
14. In the Database drop-down list, select the database name you created in the AWS Glue
lab.
15. Choose the "sporting_event_ticket_info" table and click Select.
16. To finish data set creation, choose the option Import to SPICE for quicker analytics and
click Visualize.
You will now be taken to the QuickSight Visualize interface where you can start building your
dashboard.
10
Note: The SPICE dataset will take a few minutes to be built, but you can continue to create
some charts on the underlying data.
1. In the Fields list, click the "ticket_price" column to populate the chart.
2. Click the expand icon in the upper right corner to expand the Field wells pane.
11
3. In the Visual types area, choose the Vertical bar chart icon. This layout requires a value
for the X-axis. Click the "event_date_time" field and you should see the visualization
update.
4. In the Fields list, click and drag the seat_level field to the Group/Color box in the Field
wells pane. You can also use the slider below the x axis to fit all of the data.
12
Let’s build on this one step further by changing the chart type to "Clustered bar combo chart"
and adding in the ticketholder for the Lines.
5. In the Visual types area, choose the Clustered bar combo chart icon.
6. In the Fields list, click and drag the ticketholder field to the Lines box in the Field wells
pane.
7. In the Field wells pane, click the Lines box and choose Count Distinct for Aggregate. You
can then see the y-axis update on the right-hand side.
13
Feel free to experiment with other chart types and different fields to get a sense of the data.
14
3. For Name, type EventFrom.
4. For Data type, choose Datetime.
5. For Default value, select 2018-01-01 00:00.
6. Click Create, and then close the Parameter Added dialog box.
8. Click Create.
9. In the Parameter Added dialog box, click Filter and then click Close.
10. Click the drop-down menu for the EventFrom parameter and choose Add control.
15
11. For Display name, specify Event From and click Add.
12. Repeat the process to add a control for EventTo with display name Event To.
You should now be able to see and expand the Controls section above the chart.
16
Create a QuickSight Filter
To complete the process, we will wire up a filter to these controls for all visuals.
17
4. Choose to make this filter apply to All visuals.
5. For Filter type, choose Time range and Between.
6. For Start date parameter, choose EventFrom.
7. For End date parameter, choose EventTo.
8. Click Apply.
18
Add Calculated Fields
In the next section, we will show you how to add calculated fields for "day of week" and "hour
of day" to your dataset and a new scatter plot for these two dependent variables.
1. Click the Add button on the top left and select Add a calculated field.
19
6. Click Add button in the top left and choose Add visual.
20
You have now completed the Amazon QuickSight part of the lab.
21