Predictive Modeling Lab Manual
Predictive Modeling Lab Manual
LAB MANUAL
AIM:
To Write the Program for Collecting initial data for the telecom firm
ALGORITHM:
Import from Microsoft Excel:
STEP1: From the Sources palette, place an Excel node on the stream canvas.
STEP2: Edit the Excel node. Click the Data tab, if not already selected.
STEP 3: In the File type box, ensure that Excel 2007-2016 (*.xlsx) is selected.
STEP4: In the Import file box, select telco x customer data.xlsx from the location where
it is stored.
STEP5: Ensure that the option First row has column names is enabled.
STEP 1: From the Sources palette, add a Var. File node to the stream canvas.
STEP 2: Edit the Var. File node. Click the File tab, if not already selected.
STEP3: In the File box, select telco x products.tab from the location where it is stored.
1
STEP4: Ensure that the option Read field names from file is enabled.
STEP5: In the Field delimiters section, click the Comma check box to disable it.
STEP6: In the Field delimiters section, click the Tab check box to enable it.
STEP 1: From the Sources palette, add a Statistics File node to the stream canvas.
STEP 2: Edit the Statistics File node. Click the Data tab, if not already selected.
STEP 3: In the File box, select telco x tariffs. sav from the location where it is stored.
STEP 4: Click the Use field format information to determine the storage check box to
enable it.
2
Set measurement levels:
STEP 1: From the Field Ops palette, add a Type node downstream from the Microsoft
Excel node.
STEP 5: Click the cell in the POSTAL_CODE row, Measurement column, and then
STEP6: Click the cell in the REGION row, Measurement column, and then
3
OUTPUT:
RESULT:
Thus, the Collect initial data for the telecom firm Program has been Executed
Successfully.
4
EX:2 Understand the telecommunications data
AIM:
To write the Program for Understanding the telecommunication data.
ALGORITHM:
STEP 6: From the Output palette, add a Data Audit node downstream from
the Type node.
STEP 9: Distribution output window, click the Count column header twice so that the
values are sorted descending by frequency. The Count column will then show the button
5
STEP 11: From the Edit menu, select Copy Microsoft Office Graphic Object.
STEP 12: The content will be copied to the clipboard. When you switch to a Microsoft
Office application
STEP 14: Data Audit output window, double-click the thumbnail for AGE.
Note: If the Data Audit output window is not displayed, use Alt+Tab to scroll through the
open windows until you locate it.
STEP 1: Edit the Type node and then click the Types tab, if not already selected.
STEP 2: Click the cell in the Values column, AGE row (where it reads [-1.0, 82.0]) and then
click Specify.
STEP 4: Click the Specify values and labels option, set then set
Lower to 12 and Upper to 90.
STEP 6: Click the cell in the Check column, AGE row and then click Warn from the
drop down.
6
STEP 7: Close the Type dialog box.
STEP 10: From the Tools menu, click Stream Properties and then click Messages.
(Alternatively, double-click the Show stream messages button in the status bar in the
bottom right corner of the window.)
STEP 12: Edit the Type node, click the cell in the Check column in the END_DATE row
and then set the action to Discard.
STEP 14: Run the Table node that is downstream from the Type node.
STEP 15: Scroll to the right so that you can view END_DATE and then scroll down to
verify that END_DATE is never $null$.
STEP 17: Edit the Type node, and in the Check column for END_DATE, set the action
to None.
7
Define blank values:
STEP 2: Click the cell in the Missing column, AGE row and then click Specify.
STEP2: Right-click any field and then click Select All from the context menu.
STEP 3: Right-click any field, click Set Missing from the context menu and then click On
(*).
STEP 4: Click the cell in the Missing column, REGION row and then select Specify.
STEP 5 : The define blanks option is enabled, and so is the Null option.
STEP 6: Close the REGION Values sub-dialog box and then close the Type dialog box.
8
OUTPUT:
RESULT:
Thus, the Understanding the telecommunication data Program has been Executed
Successfully.
9
EX:3 Set the unit analysis for the telecommunications data
AIM:
To Write a Program for set the unit analysis for the telecommunications data.
a) Remove duplicate records
b) Aggregate transactional data
c) Create flag fields and aggregate the data
ALGORITHM:
STEP 1 Import the data file telco x customer data.xlsx using Excel source node.
STEP 2: From the Record Ops palette, add a Distinct node downstream from
the Excel node named telco x customer data.xlsx.
STEP 6: From the Mode drop down, click Create a composite record for each group.
10
Another way:
STEP 9 : From the Mode drop down, click Include only the first record in each group.
STEP 10 : Click the Pick from the set of available fields button, click All and then
click OK.
STEP 1: Import the data file telco x products.dat, and attach a Table node to it.
STEP 4: From the Record Ops palette, add an Aggregate node downstream
STEP 10: Click the check box in the Mean column so that it is disabled.
11
STEP 11: Ensure that the Include record count in field check box is enabled and then
type NUMBER_OF_PRODUCTS in the text box.
STEP 13: Click the Keys are contiguous check box to enable it.
STEP 1: From the Field Ops palette, add a Type node downstream from the Var. File node
named telco x products.
STEP 2: Edit the Type node. Click the Types tab, if not already selected.
STEP 5: From the Field Ops palette, add a SetToFlag node downstream from
12
STEP 8: Click the Set fields drop down and then click PRODUCT.
STEP 9: Select all values in the Available set values box and then move them into
the Create flag fields box.
STEP 12: Click the Aggregate keys check box to enable it.
13
OUTPUT:
14
Create flag fields and aggregate the data:
RESULT:
Thus, the Set unit of analysis for the data Remove, Aggregate, Create Program has
been Executed Successfully.
15
EX:4 Relationships in the telecommunications data
AIM:
ALGORITHM:
STEP 2: From the Output palette, add a Matrix node downstream from the Type node.
STEP 5: In the Rows box, select HANDSET. In the Columns box, select CHURN.
STEP 6: Click the Include missing values check box to disable it.
STEP 9: Click the Include row and column totals check box to enable it.
16
STEP 10: Click Run.
STEP 12: From the Graphs palette, add a Distribution node downstream from
the Type node.
STEP 15: In the Field box, select HANDSET.In the Color box, select CHURN.
STEP 16: Click the Normalize by color check box to enable it.
STEP 1: From the Output palette, add a Means node downstream from the Type node.
17
STEP 7: Click Run.
STEP 9: From the Graphs palette, add a Histogram node downstream from
the Type node.
STEP 12: In the Field box, select DROPPED_CALLS. In the Color box, select CHURN.
18
OUTPUT:
19
Examine the relationship between categorical and continuous field
RESULT:
Thus, the Relationships in the data Program have been Executed Successfully.
20
EX:5 Predict Customer churn in the telecom dataset
AIM:
To Write a Program to Predict Customer churn in the telecom dataset.
a)Build Model using CHAID
b)Examine the CHAID Model
c)Apply the model to new data
ALGORITHM:
Import Dataset:
STEP 2: Insert a Select node which will only keep the valid records You can insert
a Table node and check the output.
STEP 3: From the Field Ops palette, add a Type node downstream from the Select node.
STEP 5: Click the Types tab, if not already selected.Click the Read Values button.
STEP 6: Click the cell in the CHURN row, Role column and then click Target from the
drop down.
STEP 7: Click the cell in the RETENTION row, Role column and then click None from
the drop down.
STEP 8: Click the cell in the DATA_KNOWN row, Role column and then
click None from the drop down.
21
Build Model:
STEP 1: Click the Modeling tab ,Add the CHAID node, located at the far right in the
palette, downstream from the Type node.
STEP 2: Run the CHAID node (right-click it and then click Run).
STEP 3: Click the Viewer tab. Navigate to the root of the tree.
STEP 5:Scroll all the way to the right in the Table output window.
STEP 6: Close the CHAID model nugget; you will return to the stream.
STEP 7 :You can also add an Analysis node from the Output palette in order to check
accuracy.
22
OUTPUT:
23
Apply the model to new data:
RESULT:
Thus, the Predict Customer churn in the telecom data Program has been Executed
Successfully.
24
EX:6 Create a segmentation Model
AIM:
To write a Program for Creating a Segmentation Model.
a) Create homogeneous groups (clusters) of customers based on usage patterns
ALGORITHM:
Create homogeneous groups of customers:
STEP 5: Click the Modeling palette, if not already selected. Click the Segmentation sub
palette at the left side.
STEP 6: Add a TwoStep node downstream from the Type node in the lower stream.
STEP 9: At the bottom of the left pane, beside View, click Clusters from the drop down
25
OUTPUT:
RESULT:
Thus, the Create Segmentation Model Program has been Executed Successfully.
26
EX:7 Using Function in IBM SPSS Modeler
AIM:
To write a Program for Function in IBM SPSS Modeler.
a)Date and Time Functions
b)String Functions
c)Statistical Functions
d)Missing Value Function
ALGORITHM:
STEP1: From the Sources palette, double click the Var. File node to add it to the stream
canvas.
STEP 2: Double-click the Var. File node to edit the Var. File node.
STEP 3: To the right of the File field, click the Browse for file button, navigate to the
relevant folder, click the telco x subset.csv file, and then click Open to import the data. Do
not close the Var. File dialog box.
STEP 4: In the Var. File dialog box, click Preview, and then scroll to the last fields in
the Preview output window.
STEP 7: From the Field Ops palette, double-click the Derive node to add it downstream
from the Var. File node.
27
STEP 8: Note: Placing node B downstream from node A means that the data flows from A
to B.
STEP 12: Note: Type the expression or use the Expression Builder to construct the
expression. In this course "enter" refers to typing or using the Expression Builder, according
to your preference. Here, when you use the Expression Builder, look for
the date_months_difference function in the Date and Time function group.
STEP 13:Click Preview, and then scroll to the last fields in the Preview output window.
The new field stores the number of months that elapsed between the two dates, as a real
number. If you want the result as an integer, use a function such as round, intof or to_integer.
STEP 16: From the Field Ops palette, add a Derive node downstream from
the Derive node named MONTHS_CUSTOMER.
STEP 17: Edit the Derive node, and then set the Mode to Multiple.
STEP 18: The Derive dialog box reflects the change. The Derive field box is replaced by a
Derive from box where the source fields are selected.
STEP 19: Under Derive from, click the Pick from the set of available fields button,
Ctrl+click CONNECT_DATE and END_DATE, and then click OK.
28
STEP 20: Beside Field name extension, replace the current extension by _MONTH.
STEP 22: Click Preview, and then scroll to the last fields in the Preview output window.
STEP1: From the Output palette, add a Table node downstream from the Derive node
named _MONTH.
29
STEP 9: Click Preview, and then move E-MAIL ADDRESS OK next to E-
MAIL_ADDRESS in the Preview output window. (Note: move E-MAIL ADDRESS OK
by dragging it to the left, until it is just right from E-MAIL ADDRESS.)
STEP 12: From the Field Ops palette, add a Derive node downstream from
the Derive node named E-MAIL ADDRESS OK.
STEP 15: Beside Derive as, click Flag from the list.
STEP 16: Under True when, enter length ('E-MAIL ADDRESS') = 0. If you use
the Expression Builder, locate the length function in the String function group.
STEP19:Under True when, enter length (trim ('E-MAIL ADDRESS')) = 0. If you use
the Expression Builder, locate the length and trim function in the String function group.
30
STEP 23: From the Field Ops palette, add a Derive node downstream from
the Derive node named NO E-MAIL ADDRESS.
STEP 26: Under Formula, enter locchar_back (., length ('E-MAIL ADDRESS'), 'E-
MAIL ADDRESS'). If you use the Expression Builder, locate
the locchar_back and length function in the String function group.
STEP 27: Click Preview, and then move POSITION PERIOD next to E-MAIL
ADDRESS.
STEP 30: From the Field Ops palette, add a Derive node downstream from
the Derive node named POSITION PERIOD.
STEP 33: Beside Derive as, click Conditional from the list.
31
STEP 36: Under Else, type undef.
STEP 37: Click Preview, and then scroll to the last fields in the Preview output window.
STEP1: From the Field Ops palette, add a Derive node downstream from the Derive node
named DOMAIN NAME.
STEP 5: Click Preview, and then move MEAN REVENUES next to D_REVENUES.
STEP 6: From the Field Ops palette, add a Derive node downstream from the Derive node
named MEAN REVENUES.
32
STEP 10: Click Preview, and then move SUM REVENUES next to D_REVENUES.
STEP 1: From the Field Ops palette, add a Derive node downstream from the Derive node
named SUM REVENUES.
STEP 9: From the Field Ops palette, add a Derive node downstream from the Derive node
named SUM REVENUES OK.
33
STEP 12: Beside Derive As, click Flag from the list.
STEP 13: Under True when, enter not (@NULL (END_DATE)). If you use
the Expression Builder, locate the @NULL function in the @ Functions function group.
STEP 14: Click Preview, and then move CHURN next to END_DATE.
34
OUTPUT:
Use date functions to derive fields:
35
Use statistical functions to derive fields:
RESULT:
Thus, the Function in IBM SPSS Modeler Program has been Executed Successfully.
36
EX:8 Add Fields to the Telecommunication data
AIM:
To write a Program for Add Fields to the Telecommunication data.
a) Drive fields as formula
b) Derive fields as flag or nominal.
c) Reclassify categorical fields.
d) Bin a continuous field into a categorical field with equal counts.
ALGORITHM:
STEP 2: From the Field Ops palette, add a Derive node downstream from the Type node.
STEP 7: From the Derive as drop down, click Formula, if not already selected.
37
STEP 11: Click the Check button.
STEP 13: From the Field Ops palette, add a Derive node downstream from
the Derive node named BILL_PEAK.
STEP 18: Field type: <Default>; the field will then be auto-typed as Continuous
STEP 21: From the Field Ops palette, add a Derive node downstream from
the Derive node named BILL_OFFPEAK.
38
STEP 26: Field type:<Default>; the field will then be auto-typed as Continuous
STEP 1: From the Field Ops palette, add a Derive node downstream from the Derive node
named BILL_TOTAL.
STEP 6: Field type: Flag (should be set automatically to Flag when you choose Derive as:
Flag)
39
STEP 11: From the Field Ops palette, add a Derive node downstream from
the Derive node named BILL_GT_0.
STEP 17: Click the cell in the Set field to column and then type 1.
STEP 18: Click the cell in the If this condition is true column and then
type BILL_TOTAL <= 100.
STEP 19: Repeat the previous two steps for the following values and expressions:
STEP 20: Set field to: 2 If this condition is true: BILL_TOTAL <= 200
STEP 21: Set field to: 3 If this condition is true: BILL_TOTAL > 200
STEP 1: From the Field Ops palette, add a Reclassify node downstream from
the Derive node named DISCOUNT.
40
STEP 3: Click the Settings tab, if not already selected.
STEP 7: Click the Get button to populate the ORIGINAL VALUE column with values
(this requires that HANDSET is instantiated upstream, which is the case here).
STEP 8: Beside For unspecified values use, click the Default value option and then ensure
that the value is undef.
STEP 1: From the Field Ops palette, add a Binning node downstream from
the Reclassify node named BRAND.
STEP 5: Click the Binning method drop down to view the options.
STEP 6: In the Binning method drop-down list, select Tiles (equal count).
41
STEP 8: Click the Custom N check box to enable it.
STEP 11: Connect the Binning node to the Distribution node which is already on the
canvas, so that the Distribution node is downstream from the Binning node.
42
OUTPUT:
Drive fields as formula:
43
Reclassify categorical fields:
RESULT:
Thus, the Add fields to the Telecommunication data Program have been Executed
Successfully.
44
EX:9 Create a Linear Regression Model to Predict
Employee Salaries
AIM:
ALGORITHM:
STEP1: From the Sources palette, add a Var. File node to a blank stream canvas, edit the
node, point to employee_data.txt, and then close the Var. File dialog box.
STEP2: From the Output palette, add a Table node downstream from the Var. File node,
run it, and then examine the output. The dataset is comprised of 474 employees.
STEP 4: From the Output palette, add a Data Audit node downstream from the Var.
File node, run it, and then examine the output.
STEP 1: From Field Ops, add a Type node downstream from the Var. File node.
45
STEP 5: he Role from gender to months_previous_experience is set to Input
STEP 1: From the Modeling palette, add a Linear node downstream from the Type node.
STEP 4: click the Basics item and clear the Automatically prepare data check box
STEP 5: click the Model Selection item and set the Model Selection method to Include
all predictors
STEP 7: Edit the generated model nugget, and then click the Model Summary item in the
pane on the left.
STEP 8: Click the Predictor Importance item in the pane on the left.
STEP 9 : The job_category field is by far the most important predictor. Gender is the second
most important field. Region and age are least important.
STEP 10: Click the Predicted by Observed item in the pane on the left.
STEP 11: The points are not scattered around the diagonal and the predicted values seem to
break up in two categories.
STEP 12: Click the Coefficients by Observed item in the pane on the left, and then, from
the Style list, select Table.
46
OUTPUT:
RESULT:
Thus, the Create Linear regression model to predict Employee Salaries Program has
been Executed Successfully.
47
EX:10 Use Logistic Regression to Predict Response to a
Charity Promotion Campaign
AIM:
To write a Use Logistic Regression to Predict Response to a Charity Promotion
Campaign
ALGORITHM:
STEP 1: From the Sources palette, double-click the Var. File node to add it to the stream.
Import the dataset charity.csv
STEP 2: From the Output palette, add a Data Audit node downstream from the Var.
File node, run the Data Audit node
STEP 3: double-click the Sample Graph for the response to campaign field.
STEP 2: Set the Training partition size to 70% and the Testing partition size to 30%.
Ensure that the Repeatable partition assignment option is enabled, with seed
value 1234567.
STEP 3: From the Field Ops palette, add a Type node downstream from the Partition node.
48
STEP 5: click the Read Values button. The Values column is populated with values from
the data.
STEP 6: set the Role for gender, age, mosaic bands, pre-campaign expenditure, and
pre-campaign visits to Input
STEP 8: ensure that the Role for the Partition field is set to Partition
STEP 1: From the Modeling palette, add a Logistic node downstream from
the Type node.
STEP 6: Add a second Logistic node downstream from the Type node.
49
STEP 9: For Procedure, select the Binomial option
STEP 10: below Categorical inputs, select mosaic bands, and for Base Category,
select First
STEP 11: click the Annotations tab, select the Custom option, and type custom close
the Logistic dialog box
STEP 12: Select the two Logistic nodes, right-click one of them, and click Run Selection.
STEP 13: Edit the Logistic model nugget named response to campaign, click the
Advanced tab, and scroll down to the Variables in the Equation table (the last table in the
output).
STEP 15: Edit the Logistic model nugget named custom, click the Advanced tab, and
scroll to the Categorical Variables Codings table.
STEP 16: You can add an Analysis node at the end and check accuracy levels.
50
OUTPUT:
RESULT:
51