100% found this document useful (1 vote)

156 views

Analysing Ad Budget

The document describes analyzing an advertising budget dataset to predict sales. It includes: 1. Importing and analyzing the dataset which has advertising budgets and sales data. 2. Creating feature and target variables to train and test a linear regression model to predict sales based on advertising budgets. 3. Splitting the data into training and test sets and fitting a linear regression model to predict sales for the test set. 4. Calculating the mean squared error to evaluate the model's performance.

Uploaded by

Srikanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

156 views

Analysing Ad Budget

Uploaded by

Srikanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

12/12/2019 Assignment 01

Assignment 01: Evaluate the Ad Budget Dataset of XYZ

Firm
The comments/sections provided are your cues to perform the assignment. You don't need to limit yourself to the
number of rows/cells provided. You can add additional rows in each section to add more lines of code.

If at any point in time you need help on solving this assignment, view our demo video to understand the different
steps of the code.

Happy coding!

1: Import the dataset

In [1]: #Import the required libraries

import pandas as pd

In [3]: #Import the advertising dataset

df_data = pd.read_csv('C:\\Users\\srikanth.ganji\\Desktop\\@SG\\OLD_Users_srik
anth.ganji_Desktp\\Desktop\\CDS\\lilsmp\\ASSIGNMENTS\\Lesson 8\\Advertising_Bu
dget_and_Sales\\Advertising Budget and Sales.csv')

2: Analyze the dataset

In [9]: #View the initial few records of the dataset

df_data.head()

Out[9]:
Unnamed: 0 TV Ad Budget ($) Radio Ad Budget ($) Newspaper Ad Budget ($) Sales ($)

0 1 230.1 37.8 69.2 22.1

1 2 44.5 39.3 45.1 10.4

2 3 17.2 45.9 69.3 9.3

3 4 151.5 41.3 58.5 18.5

4 5 180.8 10.8 58.4 12.9

In [10]: #Check the total number of elements in the dataset

df_data.size

Out[10]: 1000

file:///C:/Users/srikanth.ganji/Downloads/Analysing Ad Budget.html 1/4

12/12/2019 Assignment 01

3: Find the features or media channels used by the firm

In [7]: #Check the number of observations (rows) and attributes (columns) in the datas
et
df_data.shape

Out[7]: (200, 5)

In [8]: #View the names of each of the attributes

df_data.columns

Out[8]: Index(['Unnamed: 0', 'TV Ad Budget ($)', 'Radio Ad Budget ($)',

'Newspaper Ad Budget ($)', 'Sales ($)'],
dtype='object')

4: Create objects to train and test the model; find the sales figures for each channel

In [11]: #Create a feature object from the columns

X_feature = df_data[['Newspaper Ad Budget ($)','Radio Ad Budget ($)','TV Ad Bu
dget ($)']]

In [12]: #View the feature object

X_feature.head()

Out[12]:
Newspaper Ad Budget ($) Radio Ad Budget ($) TV Ad Budget ($)

0 69.2 37.8 230.1

1 45.1 39.3 44.5

2 69.3 45.9 17.2

3 58.5 41.3 151.5

4 58.4 10.8 180.8

In [13]: #Create a target object (Hint: use the sales column as it is the response of t
he dataset)
Y_target = df_data['Sales ($)']

In [14]: #View the target object

Y_target.head()

Out[14]: 0 22.1
1 10.4
2 9.3
3 18.5
4 12.9
Name: Sales ($), dtype: float64

file:///C:/Users/srikanth.ganji/Downloads/Analysing Ad Budget.html 2/4

12/12/2019 Assignment 01

In [15]: #Verify if all the observations have been captured in the feature object
X_feature.shape

Out[15]: (200, 3)

In [16]: #Verify if all the observations have been captured in the target object
Y_target.shape

Out[16]: (200,)

5: Split the original dataset into training and testing datasets for the model

In [17]: #Split the dataset (by default, 75% is the training data and 25% is the testin
g data)
from sklearn.model_selection import train_test_split
X_train,X_test, Y_train, Y_test = train_test_split(X_feature,Y_target,random_s
tate = 1)

In [19]: #Verify if the training and testing datasets are split correctly (Hint: use th
e shape() method)
print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape)

(150, 3)
(50, 3)
(150,)
(50,)

6: Create a model to predict the sales outcome

In [21]: #Create a linear regression model

from sklearn.linear_model import LinearRegression
linReg = LinearRegression()
linReg.fit(X_train,Y_train)

Out[21]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,

normalize=False)

In [24]: #Print the intercept and coefficients

print(linReg.intercept_)
print(linReg.coef_)

2.8769666223179176
[0.00345046 0.17915812 0.04656457]

file:///C:/Users/srikanth.ganji/Downloads/Analysing Ad Budget.html 3/4

12/12/2019 Assignment 01

In [27]: #Predict the outcome for the testing dataset

y_pred = linReg.predict(X_test)
y_pred

Out[27]: array([21.70910292, 16.41055243, 7.60955058, 17.80769552, 18.6146359 ,

23.83573998, 16.32488681, 13.43225536, 9.17173403, 17.333853 ,
14.44479482, 9.83511973, 17.18797614, 16.73086831, 15.05529391,
15.61434433, 12.42541574, 17.17716376, 11.08827566, 18.00537501,
9.28438889, 12.98458458, 8.79950614, 10.42382499, 11.3846456 ,
14.98082512, 9.78853268, 19.39643187, 18.18099936, 17.12807566,
21.54670213, 14.69809481, 16.24641438, 12.32114579, 19.92422501,
15.32498602, 13.88726522, 10.03162255, 20.93105915, 7.44936831,
3.64695761, 7.22020178, 5.9962782 , 18.43381853, 8.39408045,
14.08371047, 15.02195699, 20.35836418, 20.57036347, 19.60636679])