0% found this document useful (0 votes)
153 views

AirBnB Data Analysis - LLD

This low-level design document describes the architecture for analyzing Airbnb listings and reviews data. It outlines importing the CSV datasets, defining use cases, and performing exploratory data analysis and data preprocessing. The technical specifications section provides an overview of the listings and reviews datasets, including the number of records, features, and missing values. The architecture will import the datasets, analyze the data through defined use cases, and visualize insights.

Uploaded by

maqbool bhai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
153 views

AirBnB Data Analysis - LLD

This low-level design document describes the architecture for analyzing Airbnb listings and reviews data. It outlines importing the CSV datasets, defining use cases, and performing exploratory data analysis and data preprocessing. The technical specifications section provides an overview of the listings and reviews datasets, including the number of records, features, and missing values. The architecture will import the datasets, analyze the data through defined use cases, and visualize insights.

Uploaded by

maqbool bhai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Low Level Design (LLD)

MERGEFOR
MAT 2

Low Level Design (LLD)


Travel Data Analysis
(AirBnB Data Analysis)

Written By / Author Lokesh Attarde


Document Version LLD-V1.0
Last Revised Date 15/10/2021

AIRBNB DATA ANALYSIS 1


Low Level Design (LLD)
MERGEFOR
MAT 2

Document Version Control

Date Issued Version Description Author

15/10/2021 LLD-V1.0 First Version of Complete LLD Lokesh Attarde

AIRBNB DATA ANALYSIS 2


Low Level Design (LLD)
MERGEFOR
MAT 2

Contents
Document Version Control 2
Abstract 4
1 Introduction 5
1.1 Why this Low-Level design document? 5
1.2 Scope 5
1.3 Constraints 5
2 Technical Specifications 6
2.1 Listings Dataset 6
2.1.1 Listings Dataset Overview 6
2.2 Review Dataset 6
2.1.1 Review Dataset Overview 7
3 Architecture 8
3.1 Architecture Description 8
3.1.1 Data Description 8
3.1.2 Define the Use Cases 8
3.1.3 Import the Dataset 9
3.1.4 Exploratory Data Analysis (EDA) 9
3.1.5 Data Pre-processing, Data Cleaning & Imputation (Handling the Categorical &
Numerical Variables) 9
3.1.6 Analyse the Data 10
3.1.7 Visualize & Share Meaningful Insights 10
4 Technology Stack 11

AIRBNB DATA ANALYSIS 3


Low Level Design (LLD)
MERGEFOR
MAT 2

Abstract
Airbnb is an American company that facilitates an online marketplace for lodging, primarily homestays
for vacation rentals, and tourism activities. It basically connecting travelers with local hosts who want
to rent out their homes with people who are looking for accommodations in that locality. On the other
hand, this platform enables host to list their available space and earn extra income in the form of rent
and it also enables travelers to book unique homestays from local hosts, saving them money and giving
them a chance to interact with locals.
In the world of rising new technology and innovation, Travel industry is advancing with the role of
Data Science and Analytics. Data analysis can help them to understand their business in a quiet
different manner and helps to improve the quality of the service by identifying the weak areas of the
business. This study demonstrates the how different analysis help out to make better business
decisions and help analyze customer trends and satisfaction, which can lead to new and better
products and services. Different analysis performed such as Exploratory Data Analysis and Descriptive
Analysis on variety of use cases to get the key insights from this data based on which business decisions
will be taken.

AIRBNB DATA ANALYSIS 4


Low Level Design (LLD)
MERGEFOR
MAT 2

1 Introduction

1.1 Why this Low-Level design document?


The purpose of this LLD or a Low-Level Design (LLD) document is to give the internal logical design of
the actual program code for Airbnb Data Analysis project. LLD describes the class diagrams with the
methods and relations between classes and program specs. It describes the modules so that the
programmer can directly code the program from the document. This document is intended for both
the stakeholders and the developers of this project and will be proposed to the higher management
for its approval.
The main objective of the project is to analyse the various aspects with different use cases which
covers many aspects of airbnb listings. It helps in not only understanding the meaningful relationships
between attributes but it also allows us to do our own research and come-up with our findings.

1.2 Scope
Low-level design (LLD) is a component-level design process that follows a step-by step refinement
process. This process can be used for designing data structures, required software architecture, source
code and ultimately, performance algorithms. Overall, the data organization may be defined during
requirement analysis and then refined during data design work.
This study demonstrates the how different analysis help out to make better business decisions
and help analyse customer trends and satisfaction, which can lead to new and better products and
services.

1.3 Constraints
The analysis must be user friendly, code must be neat & clean, EDA must be automated as much as
possible because it will save huge amount of time. Moreover, users should not be required to have
any of the coding knowledge as the insights they are looking for are mentioned in-detail with
respective visuals.

AIRBNB DATA ANALYSIS 5


Low Level Design (LLD)
MERGEFOR
MAT 2

2 Technical Specifications
2.1 Listings Dataset -

2.1.1 Listings Dataset Overview –


The Listings dataset consists of a table with 11922 records and 20 features. Features are distributed
as 10 Continuous features and 10 Categorical features. There are a total 11.4% of records having
Missing values.

2.2 Reviews Dataset -

AIRBNB DATA ANALYSIS 6


Low Level Design (LLD)
MERGEFOR
MAT 2
2.2.1 Reviews Dataset Overview –
Following EDA report illustrate that this Reviews dataset consists of a table with 344404 records and
6 features. Features are distributed as 3 Continuous features and 3 Categorical features. There are an
only 418 cells having Missing values.

AIRBNB DATA ANALYSIS 7


Low Level Design (LLD)
MERGEFOR
MAT 2

3 Architecture

3.1 Architecture Description –


3.1.1 Data Description –
As we have seen earlier, in our listings dataset, we have around 1.19 Lacs of records with 20 different
features. Features are distributed as 10 Continuous features and 10 Categorical features and in our
reviews dataset, we have around 3.44 Lacs of records with 6 different features among them there are
3 Continuous features and 3 Categorical features. These datasets are given in the form of Comma
Separated Value (.csv) format.

3.1.2 Define the Use Cases –


At this stage, based on the given dataset and business problems we have defined the several Use
Cases to perform the analysis on and this will definitely help out get the key insights from this data
based on which business decisions will be taken. Furthermore, It helps in not only understanding the
meaningful relationships between attributes but it also allows us to do our own research and
come-up with our findings.

3.1.3 Import the Dataset –


As we have received the dataset in the form of Comma Separated Value (.csv) format, therefore we
can import the same using Pandas read_csv( ) function.

AIRBNB DATA ANALYSIS 8


Low Level Design (LLD)
MERGEFOR
MAT 2
3.1.4 Exploratory Data Analysis (EDA) –
 "Exploratory Data Analysis" (EDA) is a "Data Exploration" step in the Data Analysis Process,
where a number of techniques are used to better understand the dataset being used.
 Understanding the Dataset can refer to a number of things including but not limited to…
 Extracting Important "Variables".
 Identifying "Outliers", "Missing Values", or "Human Error".
 Understanding the Relationships between variables.
 Ultimately, maximizing our insights of a dataset and minimizing potential "Error" that
may occur later in the process.
 In other words, it will gives you a better Understanding of the "Variables" and
the "Relationships" between them.
 Here, we make use of dataprep module to automate our EDA process.
 It provides the following information:
 Overview: detect the types of columns in a DataFrame.
 Variables: variable type, unique values, distinct count, missing values
 Quartile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile
range
 Descriptive statistics like mean, mode, standard deviation, sum, median absolute
deviation, coefficient of variation, kurtosis, skewness.
 Correlations: highlighting of highly correlated variables, Spearman, Pearson and
Kendall matrices
 Missing Values: Bar Chart, Heatmap and spectrum of missing values.

3.1.5 Data Pre-processing, Data Cleaning & Imputation (Handling the


Categorical & Numerical Variables) –
Data pre-processing is a process of preparing the raw data and making it suitable for our analysis
purpose, where we have to do lot of Data Cleaning, handle the missing values by using appropriate
imputation techniques and based on that variable nature i.e. either of Categorical & Numerical
variable. Here, in this project, we have done the substitution/imputation of missing values using either
mean, median or mode according to the nature of those variables. Moreover, we also removed the
columns which are does not participate in our analysis.

AIRBNB DATA ANALYSIS 9


Low Level Design (LLD)
MERGEFOR
MAT 2
3.1.6 Analyse the Data –
Once the pre-processing is done, we are good to go with our actual analysis where we write lines of
codes and logics to prepare our data as per the defined use cases.

3.1.7 Visualize & Share Meaningful Insights –


Finally, it’s time to turn our data into some sort of visual representation. In short, Data visualization
is the process of translating large data sets and metrics into charts, graphs and other visuals such as
Bar Plot, Pie Chart, Heat map, Box Plot, Scatter Plot, and many more. The resulting visual
representation of data makes it easier to identify and share insights about the information
represented in the data.

Here is the beautiful glimpse of one of our visuals are –

All those different analysis help out to make better business decisions and help analyse customer
trends and satisfaction, which can lead to new and better products and services.

AIRBNB DATA ANALYSIS 10


Low Level Design (LLD)
MERGEFOR
MAT 2

4 Technology Stack

Data Manipulation &


Mathematical Computation Pandas, NumPy
Library

Matplotlib, Seaborn, Plotly,


Visualization Library
WordCloud, etc

EDA dataprep

Dataset .CSV Format

IDE Jupyter Notebook

AIRBNB DATA ANALYSIS 11

You might also like