0% found this document useful (0 votes)

20 views13 pages

WS1UNR

Uploaded by

Abiy Mulugeta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views13 pages

WS1UNR

Uploaded by

Abiy Mulugeta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Outline

Expanding Toolsets for Prediction

Introduction
Analysis Using Larger Datasets With Models
Many Variables Data
***** Variables
Regression vs. Decision Trees vs. Neural Networks Exploratory analysis
Model results
Spring 2006
Conclusion
®
Serge Herzog, PhD Clementine software
Sponsored by: Director, Institutional Analysis
University of Nevada, Reno
Center for Research Design and Analysis Reno, NV 89557
& the Office of Vice President for Research Serge@[Link] 2

Introduction Introduction

From Traditional Statistics to Data Mining From Traditional Statistics to Data Mining
Types of DM algorithms:
Definition of data mining (DM): Data
Decision trees Understanding
“…the process of discovering meaningful new & Preparation
Artificial neural networks
correlations, patterns, and trends by sifting through
large amounts of data…, and by using pattern (ANNs)
recognition technologies, as well as statistical and Cluster analysis
Modeling
mathematical techniques.” (The Gartner Group, Inc.) Traditional techniques
DM uses algorithms (i.e., a finite set of well- (e.g., regression, PCA)
defined instructions) to: Typically, DM projects Evaluation
Classify
follow steps in the
Categorize Events or outcomes of
interest Cross-Industry Standard
Estimate (predict)
Process for Data Mining Deployment
Visualize
(CRISP-DM):
3 4

1
Introduction Introduction

From Traditional Statistics to Data Mining Purpose of Study:

Unsupervised DM:
Explores and examines yet unknown patterns in data via Evaluate the accuracy of predicting a
classification and grouping techniques (e.g., clustering).
multinomial outcome of a grouped
Supervised DM:
(categorical) variable using
Predictive models are built, or trained, using data for which Logistic regression
the response variable is already known (e.g., rule-induction Decision trees
decision trees, neural networks, regression).
Generated models use the learned information (based on the Neural networks
initial test dataset) to predict outcomes with validation data
(or holdout sample) and to adjust for overprediction in the
test data; once acceptable levels of prediction are attained, Highlight potential operational benefits in
the model can be applied to predict outcomes with new data. the context of the case study used
5 6

Introduction Introduction

Focus of the Analysis Assumptions in Regression Analysis

Weigh the comparative accuracy of each Dependent (outcome) variable is continuous, in ordinary
analytical approach in predicting retention least-square regression, dichotomous in logistic
and time to degree completion (TTD) of regression
Independent (predictor) variables are uncorrelated with
undergraduate students at UNR each other (i.e., no multi-collinearity), though less an
issue in prediction
Independent variables are uncorrelated with the error
Though most predictors (IV) are correlated term (i.e., actual-to-predicted difference)
with each outcome (and informed by With small samples: Error term has a mean of zero,
constant variance, and errors are uncorrelated with each
scholarship), the analytical focus is not on other (i.e., variance is correlated with one or more
explaining retention/TTD (i.e. model fit), but predictors, generating heteroscedasticity in cross-
sectional data or autocorrelation in time-series data)
merely predicting it
7 8

2
Introduction Introduction

Definition: Decision Trees Artificial Neural Networks

1) “A way to represent rules underlying data with hierarchical,
Definition:
sequential structures that recursively partition the data.” (Murthy, 2005)
“System composed of many processing elements operating in parallel
2) Iterative splitting of data into discrete groups, with the goals to maximize
whose function is determined by network structure, connection strengths,
the ‘distance’ between groups at each split. (Two Crows Corp., 1999)
and processing performed at computing elements or nodes.”(DARPA, 1988)
Advantages:
Exploratory Advantages:
Non-parametric Handles both linear and
Transparent process of non-linear complex,
rule induction interactive relationships
among many variables
Handles continuous and
categorical data No data distribution,
variance homogeneity
Computational efficiency in assumptions required
classification via
hierarchical decomposition Adaptive learning based
on initial training data
Handles continuous and
Source: [Link]
categorical data
Types of tree generation Typical backpropagation network
9 Types of ANN generation 10

Introduction Introduction

Artificial Neural Networks Training Tree and ANN Models

Synaptic weights are derived
Definition: Controlling tree size with settings for:
from:
“System composed of many processing elements operating in parallel Maximum depth
Oj = ∫(
whosenfunction is determined by network structure, connection strengths,
∑ oi wji ), where
and processing performed at computing elements or nodes.”(DARPA, 1988)
Limit number of records in node
i =1 Pruning of full-size tree
Advantages: Adjusting for ‘downstream’ effect of upper-level splits (still
∫’ (x ) = - (1+ e –X)-2 e –X (-1)
i experimental?)
where o Handles both linear and
is the outcome, ‘Binning’, i.e. convert continuous into categorical predictors
xi is the inputcomplex,
non-linear vector, ‘Boosting’, i.e. re-sampling of misclassified records and combining of
wi interactive relationships
is the synaptic weight weak classification nodes (with low purity based on error rate, Gini
(randomlyamong
set formany variables
first record index, or cross-entropy) to reduce noise-to-signal ratio
No data distribution,
processed) Calculating connection weights and shaping ANN
variance
Vectors consist ofhomogeneity
one or more input architecture:
variablesassumptions
(predictors) required Choosing the weighting factor: backpropagation, radial basis
function, quasi-Newton, Levenberg-Marquardt, genetic algorithms
Adaptive learning based
Number of hidden-layers and networks of ANN topology
on initial training data
Synaptic weights Parameter settings to avoid local optima via Alpha momentum term
Handles continuous and Eta decay rate to control learning rate of model
categorical data Running continuous data to avoid ‘categorical explosion’
Typical backpropagation network
Types of ANN generation 11 12

3
Models Models

Analytical Approach Model Description

Compare prediction accuracy based on: CHAID
Uses Chi-squared statistics to identify optimal splits with two or more
subgroups. Starts with most significant predictor, determines multiple-
group differences and collapses groups with no significance; the
Logistic regression (LR) Baseline merging process stops at the preset testing level.
C&RT
Classification and regression tree (C&RT) Generates splits based on maximizing orthogonality between subgroups
Decision trees (measured via the impurity index); all splits are binary and outcome
Chi-squared Automatic variable can be continuous or categorical.
based on rule C5.0
Interaction Detector (CHAID)
induction Uses the 5.0 algorithm to generate a decision tree or ruleset based on
C5.0 Algorithm the predictor that provides maximum information gain. Split process
continues until sample is exhausted. Lowest-level splits are removed if
they don’t contribute to model significance.
Neural Net simple topology (Quick) Neural net multiple topologies
Backpropagation
Neural Net parallel topologies (Multi) Creates several networks in parallel based on specified number of
perceptron hidden layers and units (nodes) in each layer. Learning rate is a
Neural Net 3-layer topology (Prune) function of specified number of cycles and the Eta decay rate.
Neural net prune method
Test generated models on ‘holdout’ sample after Starts with large network of layers and nodes as specified and removes
weakest units during training; learning rate is determined via specified
randomized 50/50 data partition Eta decay rate.
13 return 14

Models Data

Data Samples and Sources

Models Tested
Data samples drawn from U. of Nevada-Reno (a
Second-year retention Degree completion land-grant, Carnegie Extensive Research Institution):
based on previously All graduates Retention: 8,018 new full-time freshmen entering fall
established models (including transfers) semesters 2000 through 2003 (96% of total cohort
Graduates who started population*)
New freshmen at end of
fall term as new freshmen Degree completion: 15,457 undergraduate degree
recipients graduating in spring 1995 through summer
Spring-retained at end of Both measured at end 2005 (99% of total undergrad-level graduates after
spring term of second year listwise deletion of incomplete records; 85 multiple-
degree holders counted once.)
Measured outcome: Measured outcome: Data sources:
1) Returned for second year 1) 3 years or less Student Information System
2) Transferred out within 1 year 2) 4 years at the institution
Human Resources System
3) Dropped out/stopped out 3) 5 years
4) 6 years or more ACT Student Profile Section
15 16
*Excluding athletes, non-degree seeking, and foreign students

4
Data Data

Number of Predictor Variables

Data Types Used in Both Cases
Used
Flag/binomial, e.g., yes/no, 0/1 Retention: 40-50 (depending on model)
Set/grouped, e.g., NV, other US, foreign
Time to Degree: 80-100 (depending on model)
Ordinal/rank, e.g., high/medium/low
Range/scale, i.e., continuous numerical In regression, more variables typically increase
model complexity and difficulty in estimating and
interpreting partial effect sizes of individual
predictors

17 18

Variables Variables

Variables Examined: Retention Variables Examined: Retention

*range/scale ~ flag/binomial ^ordinal/rank ``set/multinomial *range/scale ~ flag/binomial ^ordinal/rank ``set/multinomial
Campus experience (cont.) Financial aid
Student demographics
Attempted registrations ^
Gender ~ Fall/spring package by type of aid included ``
Average class size ^
Age *^ No aid
Academic experience
Ethnicity/race `` Package with loans and/or work study
Academic peer challenge ^
Residency ``
Fall/1st year GPA *^ Grants and/or scholarships only
Parent income ``(incl miss. cat)
Credit load (<15) ~ Millennium Scholarship only
Pre-collegiate experience
Major requires Calc 1 ~
High school GPA * 2nd year package offer by type of aid included ``
Nat/Phys science courses ^
ACT Engl./math scores * (SAT conv.) as above
Remedial math taken ~
ACT/SAT test date *^
Remedial English taken ~ Fall/spring institutional aid amount ($) *
Acad. preparation index *^
Math credits earned ^ 2nd year institutional aid amount offered ($) *
Pre-fall summer enrollment ~
All and math transfer credits ~
AP/IB credits ~ Fall/spring Pell Grant aid ~ Financial aid need ($) *
Fall/spring math grades ``
Graduate degree aspiration ~
Math ‘D’/’F’ grades ~ Millennium aid status (fall/spring) `` Fall and spring remaining
Campus experience Math ‘I’/’W’ grades ~ Never had it 1st year total remaining
On-campus living ~ Passed 1st year math ~
Use of athletic facilities ~ Received it, maintains eligibility Fall and spring total need
English 101/102 grades ``
Dual enrollment w/ CCs ~ Lost eligibility, continues eligibility before calculated aid offered
Program major type `` 19 20
Fall entry term ``

5
Variables Variables

Variables Examined: Time-to-Degree Variables Examined: Time-to-Degree

*range/scale ~ flag/binomial ^ordinal/rank ``set/multinomial
*range/scale ~ flag/binomial ^ordinal/rank ``set/multinomial
Outside course experiences
Course Grades
Took overseas courses (USAC) ~
Remedial math ``
Student demographics General Experience Took Cont. Education courses ~
College algebra ``
Gender ~ Initial status (New vs. Took courses at TMCC *
College general math ``
Took courses at WNCC *
Age * Transfer) ~ College trigonometry ``
Took internships ^
Initial program major `` Intro to statistics ``
Ethnicity/race `` English courses transferred in ^
Business calculus ``
Residency `` Graduating major `` Math courses transferred in ^
Calculus 1 ``
English distance courses ~
Number pgm major changes * English 101 ``
Math distance courses ~
Graduated with minor ~ English 102 ``
Pre-collegiate experience Core Humanities transferred in ^
Core Humanities 201-203 ``
Completed a senior thesis ~ Campus course experiences
ACT English * General capstone ``
Attempted registrations * Took honors courses ~
ACT math * Program major capstone ``
Took independent studies ^
Participated in varsity sports ~ Cumulative GPA *^
ACT Composite * Repeated a course ~
GPA trend *
Stopout time since first Number of D/F grades (%) *
Took remedial math/English ~
enrollment (%) *^ Capstone courses taken *
Number of I/W grades (%) *
‘Diversity’ courses taken *
Number of replacement grades *
Nat science courses in three
21 22
core areas (3 variables) *

Variables Variables

Variables Examined: Time-to-Degree Imputation of Missing Values

*range/scale ~ flag/binomial ^ordinal/rank ``set/multinomial

Credit hours Financial aid Ratio of earned to attempted credits (%) derived via
Total credits accumulated * Total aid received * linear regression for 493 cases
Total transfer credits articulated * Loans *
Total campus credits * Grants * Model Summary
Total math credits * Work study *
Total upper-division science credits * Merit-based aid * Adjusted Std. Error of
Total credits transferred in * Need-based aid * Model R R Square R Square the Estimate
1 .837a .701 .700 4.06135
Earned/attempted credits (%) * General fund aid *
Average credit load per enrolled term * Outside aid * a. Predictors: (Constant), Repeated a course, ACT Math
UNR Foundation aid * (miss=mean), Number of WNCC courses taken, # of
Faculty teaching courses taken replacement grades, Number of TMCC courses taken,
Percent of females * Acad. Dept-based aid *
Grants-in-Aid * Number of diversity courses taken, % of I/W grades
Percent of ethnic/racial minority * received, Number of math credits taken, Took honors
Percent of part-time faculty * Millennium Scholarship *
courses, % of D/F grades received, ACT English
Percent of adjunct faculty * Pell Grant aid *
(miss=mean), Number of upper div sci credits taken,
Percent at full-professor rank * ACT Composite (miss=mean)
Average age faculty *
23 24

6
Variables Exploratory Analysis

Imputation of Missing Values Learn about variable A

relationships during
Total campus credits derived via general linear model
exploratory stage
A
A

(R2=0.682; 1 factor,18 covariates, table) for 804 cases A

A
A
A AAA A
A A A
A
AA A
AA

Mean value substitution of missing ACT scores for 230

AAA AA
A
A AA AAA AAA A
A
AAA AA
A
AAA
A
AA
A
A
AA
A
AAA
AA
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
A
AA
A

cases AA AA A A
AA AA
A A A
AAA
A
AA
A
A AA A A A
AAA AA AAA
A AA
AAA AAAAA AA
AA
A
AAAAAA
A A
A A
A A A AA
A
AA
AA
A
AA
AA
A
AA
AA
A
AA
AA
A
AA
A AA AA AAA
AAA AA
AAAAA
A
AA
AAA
A AA A A A
AA
A A
AAA A
AA
A
AAA
A
A A
A
AAA
A AA A AA
AA
AA
A
AA
A
A
A
A
A
A
A
AA
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
AAA
A
A
A AA
A AAA AAA
AAA
A
A
AAA
AA
A
AAAA
AA
A
AA
AA
A
AAA A AA AAAA
A A
AA
A
AA
A
AA
AA
A
A A
AA
AA
AA
A
AAA
A
AAA
A
A
A A
A
A
AA A
AAAAA
A
AA
AA
A
AA
AA
A
AA
AA
AA
AAA
A
AAA
A
AA
AA
A
A
AA
A
AA
A
AA
A
A
AA
A
A
AA
AA AA A AAA
AA
AA
A A
AA
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
A
AA
A
AA
A
A
A
A
A
A
AA
A
A
AA
AA
A
A
AA
A
A
AAAA
AA
A
A
A
A
A
AA
A AA
AA
AA
A A
AA
AAA
A
AAA
AA
AAA
AA
A
AA
AA
A
AA
AA AA AAA AA
AA
AA
A
A AA
A
A AA
AA
AAAAA
A
AA
A
A
AA
A
AA
A
A
A
A
AA
A
A
A
AA
A
A
A
A
A
A
AAA
A
A
A
AA
A
A
A
AA
AA
A
A
A
AA
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
AA
A
A
AAA

Variable definitions
AA A A
A A
A A
AA A A
AA
AAAA A AAA A
AAAA AA
AA
AA
AA
A
AA
AAAAA
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
AA
A
A
A
A
AA
A
A
AA
A
AA
AA
AA
A A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AAA
A
A
A A
A
AAAA A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A AA A A A
AA
AA
A
A
A
AA
A
A
A
A
AA
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
AA
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
È
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
AA
A
A
A
A
AA
A A
AA A AAA
A AAA AAAAA
AAAA
AAAA
AAA AA A
A A A A A AA A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AAA A
A AA
AAA
AAA AAA AA
AA
A
AA
AA
A
AAAA
AA
A
A
AA
A
A
AA
A
A
AA
AA
AA
A
AA
A
AAA
A
AA
A
AAAAA
AAAA
A AAA
Initial study major: declared/pre-major, undeclared, non-degree AA
AAA A
AAAA
AAAA
AA
AAA
A
A
A
AA
AA
A
A
AA
A
A
A
AA
A
AA
AAA
A
A
AA
A
A
A
AA
A
A
A
AAA
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
AA
A
A
A
AA
A
AA
A
A
A
A
A
A
AA
A
A
A
A
AA
A
AA
A
A
A A
A
A
AA
A
A
AA
AA
A
A
A AA
A A
AAAA A A
A AAA A AA
A
A
A AA
AA A A
A
AAA
A
AA
AA
A
AA
AA
A
A
AA
AA
AA
A
A
AA
A A
AA
A
A
A
A AA
AAAA
A A
AA
A A
A A
AA A
A
AA A
AAA
A
A
A
AA
AA
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
AA
A
A
A
AAA
A
AA
AA
A
A
AAA
AA
A
AA
A
A
AA
A
AA
AA
A A AAA A AA
AAA
A AA
A
AA
A
AA
AA
A
A
A
AA
A
A
A
AA
A
AA
A
AA
A
A
A
A
AA
AA
A
AA
A
AA
A
A
A
AAA
A
A
AA
A
A
A
A
AAA
AA
A
AAA
A
A
AA
A
A
AAA
A
A AA
A A
AA
A
A A A AA
A A AA
AA AA
seeking, Intensive English A
A
AA
A A
AA
AAA
A A
AA
A
A
A
AA
A
A
AA
A
A
AA
A
A
A
AA
A
A
A
A
A
A
AA
A
A
AA
A
A
AA
A
A
A
A
AA
A
AA
A
A
AA
A
A
A
AA
AA
A
AA
A
AA
A
AA
A
AA
A
A
A
AA
AA
AA
AA
AAAAAA A
A AAA AAA
AAAA A AA
AA
AA
A A
A
AA
A
AA
AA
A
A
A
AA
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
AA
A
A
A
AA
A
A
A
AA
A
A
AA
A
AA
A
A
A
AA
A
AA
A
AA
A
A
A
AA
A
AA
A
A
AA
A
AA
A
A
AA
A
AA
A
A
A
AA
AA
AA
AA
AA AAAAA A
AAA AA AA A A
A A AAA AAA
A
A
A A
AA
AA
A
A
A
AA
A
A
AA
AA
A
A
A
AA
A
A
A
A
AA
AA
A
A
AA
AA
A
A
AA
A
A
AA
A
AA
A
A
AA
AA
A
A
AA
A
A
A
A
AA
A
A
AA
A
A
AA
A
A
AA
AA
AA
A
AA
A
A
AAA
AA
A
AA
AA
A
AAAA
AAAA
AAA
A
AAA
A
AA
AAAAA A A
A A
A AA A
AAA
AA
A
AA
A
A
AA
A
A
A
AA
A
A
AA
A
A
A
A
A
AA
A
A
AA
A
A
AA
A
AA
A
A
AA
A
AA
AA
A
A
A
AA
A
A
A
A
A
AA
A
AA
A
A
A
AA
A
A
AA
A
AA
AA
A
A
A
AAA
AA
A
AA
AAA
AA
AAA
AA
A AA
A
A A A
AAA AA
AA
AAAAA
A
AA
A
AA
A
A AA
A
AA
AA A
AA
AA
A A
A
A
AA A
A
A
AA A
A
A AA A
AA A
A AA A
A A A A AAA AAA A
AA
A A A
AAAA
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
AA
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
AA
A
A
A
A
AA
AA
A
AAA
A
A
AA
A
A
A
A
A
A
A
AA
A
AAA
A
A
A
A
A
A
A
A
A
AA
AA
A
AAAA
A
AA
A
AAA
AA
AAAAAA
AAA
AA
A
A
A
AAAA AAA A A
A
A
A
A A
A
A
AA
A
A
A
AA
A
A
AA
A
A
AA
A
A
A
AA
A
A
A
AA
A
A
AA
A
A
A
AA
A
A
AA
A
A
A
AA
AA
A
AA
A
AA
AA
A
AA
AA
A
AA
A
A
A
A
AA
A
AA
AA
AAA
AAA
AAAA A
AAA
AA
A A AA A
AA A
A A
AA A
A A A AA AA
Attempted registrations: registration attempt at time of fully A
A A
A
A
AA
A
A
AA
A
A
AA
A
A
A
A
AA
AA
A
A
A
AA
A
A
A
AA
A
A
AA
A
AA
A
A
AA
A
AA
A
A
A
AA
A
A
AA
AAA
A
AAA
A
A
AAA
AA
AA
AA
A
AA A
AA
AAAA
AA
AA A
A AA
AA
AA AAA
A AA
AAAA
AAA A AA
A AA
A
AA A
A
AAAA
A A A
A A
A A
A A A
A A A
A A
A A
A A A A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
AA
AA
AA
AAA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
AA
A A
A
A
A
AA
A
AA
A
A
A
A
A
A
A
AA
A
A
A
AA
A A
AAA
A
A
AA
A
A
A
A
AA
A
A
A
A
A
A
A
A
AA
A
AA
AAAA
A
A
A
A
AAA
A
A
AA
A
AA
A A
A
A
AAAA
A
AAA
A
A AA A
AA
AA AA
A
AA
AA
A A
AA
AAA A AA
A A
A A
A A A
A A AAA A
subscribed class section during registration period
A
A
A
AA
A
A
A
AA
AA
AA
AA
A
AA
AA
A A
A
A
AAA
AA
A
A
A A
A
AA
AAAAAA A
A AAAA
AAAA
A
AAAA
A
A A
A AA AA A
A A
AA
A
AA
A A
A AA A A
A
A AA AA A A A A
A A A A
A AAA A A A AA A A
AAA
AAA
A
A AAAA
A
A A
A
A
A A AA
AAA A
A A A A A AAA A A
A
AA
AA
AAAAAAAA
A
AA A AA AA A A AAA A A
A AA A A AAA
AA A A AA A
AAA
A
AA A A A A AAAAA AA
AAA AA A A A A
A AA A
Stopout time: number of fall/spring semesters not in attendance A AA A A A A A A A AAA A
AA A
A
AA A AA A
A AA
AA A
AA A
A
AAA A A A A
A A
A
A A A
A
A

after first campus-based course enrollment

A
A A A A A
A A A A
A A A
A
A
A A

Number of replacement grades: students may repeat up to 12 A

A
A

lower-division credits to replace original UNR grades

GPA trend: ratio of 24-credit GPA to final cumulative GPA
Natural sciences core course offerings geared for 3 groups of Does merit-based aid go to fast
majors: a) social science, b) natural science, c) engineering completers with higher grades?
25 26

Exploratory Analysis Exploratory Analysis

Declining GPA

27 28

7
Exploratory Analysis Exploratory Analysis

29 30

Exploratory Analysis Exploratory Analysis

31 (Faster completion is associated with exposure to adjunct faculty) 32

8
Model Results Model Results

Freshmen Retention Measured at End of Fall Freshmen Retention Measured at End of Spring
Prediction Accuracy Prediction Accuracy

100 100
Quick-Cont* Quick
95 95
Neural
% of cases correctly predicted

% of cases correctly predicted

Quick Multi
90 Neural 90 Nets
85 Multi Nets 85 Prune
80 80
Prune
CHAID
75 75
CHAID Decision
70 Decision 70 CR&T
Trees
Quick-Cont

CR&T
Log Reg

65 Trees 65
C5.0
60 C5.0 60
55 55 LogReg baseline
LogReg baseline
50 50 Accuracy increases by
Training (N=4,079) Validation (N=3,939) Training (N=4,079) Validation (N=3,939) almost 10 percentage
33
points compared to fall34
*continuous variables

Model Results Model Results

Time to Degree Measured with All Students Time to Degree Measured with New Students
Prediction Accuracy Prediction Accuracy

100 100
Quick Quick
95 95
Neural Neural
% of cases correctly predicted

% of cases correctly predicte

90 Multi 90 Multi
Nets Nets
85 Prune 85 Prune
80 80
CHAID CHAID
75 75
70 CR&T
Decision 70 CR&T
Decision
Trees Trees
65 65
C5.0 C5.0
60 60
55 LogReg baseline 55 LogReg baseline
50 50
Training (N=7,859) Validation (N=7,598) ~ 50% improvement Training (N=4,727) Validation (N=4,564)
Improved accuracy when
over logit model restricted to new students
35 only 36

9
Model Results Model Results

Time to Degree Measured with New Students Freshmen Retention Measured at End of Fall
Prediction Accuracy with Validation Data
Convidence Level of Correctly Predicted Cases
'Six Years Or More' Outcome
95
Quick 0.9
Quick-Cont*
90
% of cases correctly predicted

Neural 0.85
85 Multi Quick Neural
Nets
0.8 Nets
80 Prune Multi
0.75
75

Mean .
Prune
CHAID 0.7
70
Decision CHAID
65 CR&T 0.65 Decision
Trees CR&T Trees
60 C5.0 0.6
C5.0
55 0.55
LogReg baseline LogReg baseline
50
0.5
New and Transfers New Students Only
Training (N=4,079) Validation (N=3,939)
(N=7,598) (N=4,564)
37 *continuous variables 38

Model Results Model Results

Freshmen Retention Measured at End of Spring Time to Degree Measured with All Students
Convidence Level of Correctly Predicted Cases Convidence Level of Correctly Predicted Cases

0.95 0.8
Quick Quick
0.9
Neural 0.75 Neural
0.85 Multi Multi
Nets Nets
0.8 Prune 0.7 Prune
Mean .

Mean .
0.75
CHAID 0.65 CHAID
0.7
Decision Decision
0.65 CR&T 0.6 CR&T
Trees Trees
0.6 C5.0 C5.0
0.55
0.55
LogReg baseline LogReg baseline
0.5 0.5
Training (N=4,079) Validation (N=3,939) Training (N=7,859) Validation (N=7,598)

39 40

10
Model Results Conclusion

Time to Degree Measured with New Students Comparison of Model Accuracy

Convidence Level of Correctly Predicted Cases
Mean accuracy level
Marginal improvement over regression when using more
0.95
Quick ‘matured’ variables in established retention model
0.9 Significant improvement over regression when using greater
Neural
0.85 Multi
Nets number of exploratory variables in time-to-degree model,
especially with
0.8 Prune Multi-layer, pruned neural net
Mean .

0.75 C5.0 decision tree

CHAID
0.7
Decision
0.65 CR&T
Trees
Confidence level of correct prediction
Better results for decision trees and regression model compared
0.6 C5.0 to neural nets, except for more complex model (i.e. time-to-
0.55 degree with new and transfer students) high for pruned
LogReg baseline neural net, low for C5.0
0.5
Training (N=4,727) Validation (N=4,564)

41 42

Conclusion Clementine
®

Potential Operational Benefits SPSS Clementine Data-Stream Pane ®

Fifteen percentage point improvement in correctly estimating time-

to-degree
End of sophomore year model: improved classification of 525
second-year students at examined institution
Likely enhancement of institutional enrollment projections, which are
based on class-standing flow model
Better targeting of students ‘at risk’ prior to choosing/commencing
program major (mitigate chance of subsequent changes in major)

Accelerated degree completion for estimated 6-year plus graduates

Net present cost of a four-year degree to average student entering
college in 2003 is ~ $107,000 (opportunity cost minus total attendance
cost) (Barrow and Rouse, 2005). Faster completion reduces net cost,
reduces time to recoup investment.
Speeding up time to graduation by one year may save a student
around $28,000 in foregone earnings (not counting the higher
increment of tuition and fees for a 6-year graduate compared to a 5-year
graduate).

43 44

11
® ®
Clementine Clementine

SPSS Clementine Data-Mining Application

®
SPSS Clementine Data-Mining Application
®

Generated ANN model characteristics: Generated CHAID ruleset (left) and coefficients for logistic
regression model (right):
Equation For 5 years:
-0.397 * GPAfinal +
0.05718 * CAPST +
-0.03169 * MACRS +
0.0005202 * UDSCI +
-0.04871 * TMCC +
0.01273 * WNCC +
-0.02395 * DIVCL +
-0.1026 * AGE +
-0.0001391 * TOTAL +
0.000129 * LOANS +
0.00009678 * GRANT +
0.0002089 * WORKS +
0.0001765 * MERIT +
0.000006031 * NEEDB +
-0.00002301 * GENFN +
-0.00008531 * OUTSI +
45
-0.00007588 * UNRFN + 46

® ®
Clementine Clementine

SPSS Clementine Data-Mining Application

®
SPSS Clementine Data-Mining Application
®

Generated C&RT tree structure: Generated R&RT model characteristics:

47 48

12
®
Model Characteristics Clementine
Logistic Regression
Main effects, simple entry; Cox & Snell = 0.728; run time 23 sec.; strongest
variables: average credit load, transfer credits, residency, stopout time, English
102 grade, English transfer
C5.0 decision tree
Tree depth: 1; no boosting; rules for each outcome: 38, 23, 40, 53 (default: 4 Link to presentation:
years), run time: 7 sec
C&R decision tree [Link]
Tree depth: 5; run time 39 sec
CHAID decision tree
Acknowledgement:
Tree depth: 4; run time 21 sec
SPSS Inc. is being thanked for providing a demo
Neural net quick method
Neurons per layer: 158 at input; 8 in hidden layer; 4 in output; run time 22 sec version of their Clementine® software used in this
Neural multi-topology net study.
Neurons per layer: 158 at input; 5 in hidden layer; 4 in output; run time 29 min
8 sec; strongest variables: average credit load, age, transfer credits, stopout
time, earned/attempted credits, English transfer, starting major, change of
major; topology setting: 2 20 3; 2 25 5, 2 22
Neural net multilayer pruned
Neurons per layer: 38 at input; 4 in 1st hidden layer; 2 in 2nd hidden layer, 2 in
3rd hidden layer, 4 in output; run time 45 min; strongest variables: average
credit load, age, application status, stopout time, GPA trend, transfer credits,
earned/attempted credits, residency, starting major, change of major, %
irregular faculty 49

References

Ampazis, N. Introduction to neural networks. [Link] Downloaded 10/3/05

Baker, B. D., Richards, C.E. (1999). A comparison of conventional linear regression methods and neural networks for
forecasting educational spending. Economics of Education Review 18: 405-415.
Barrow, L., Rouse, C. E. (2005). Does college still pay? The Economists’ Voice 2 (4): 1-8.
Byers Gonzalez, J.M., and DesJardins, S.L. (April 2002). Artificial neural networks: a new approach for predicting
application behavior. Research in Higher Education 43(2):235-258.
Defense Advanced Research Project Agency [DARPA] (1988). Neural Network Study. AFCEA International Press.
Eno, D., McLaughlin, G.W., Brozovsky, P., and Sheldon, P. (May 1998). Predicting freshman success based on high school
record and other measures. Paper presented at the Association for Institutional Research Forum, Minneapolis, MN.
Everson, H.T., Chance, D., and Lykins, S. (April 1994). Using artificial neural networks in educational research: Some
comparisons with linear statistical models. Paper presented at the annual meeting of the National Council on Measurement
in Education, New Orleans, LA.
Goodman, P.H., and Harrell, F.E. Neural networks: Advantages and limitations for biostatistical modeling. Washoe Medical
Center, Reno, NV. Paper available at [Link]/nevprop
Luan, J. (June 2002). Data mining and knowledge management in higher education – potential applications. Paper
presented at the Association for Institutional Research Forum, Toronto, Canada.
Murthy, S. K. (1998). Automatic construction of decision trees from data: a multi-disciplinary survey. Data Mining and
Knowledge Discovery 2: 345-389.
Porter, S. R. (June 1999). Viewing one-year retention as a continuum: the use of dichotomous logistic regression, ordered
logit and multinomial logit. Paper presented at the Association of Institutional Research, Seattle, WA.
Song, Q., and Chissom, B.S. (April, 1993) New models for forecasting enrollments: fuzzy time series and neural network
approaches. Paper presented at the American Educational Research Association, Atlanta, GA.
Stergiou, C. What is a neural network? [Link] Dowloaded
10/3/05
Thomas, E., Dawes, W., and Reznik, G. (2001). Using predictive modeling to target student recruitment: Theory and
practice. AIR Professional File, Number 78.
Using data mining to detect fraud. SPSS White paper available at
[Link]
Van Nelson, C., and Neff, K. J. (October, 1990). Comparing and contrasting neural network solutions to classical statistical
solutions. Paper presented at the Midwestern Educational Research Association, Chicago, IL.
Vijayaraman, B.S., and Osyk, B. (1997). A Survey of Neural Network Publications. ERIC ED-422942 (Original in:
Proceedings of the International Academy for Information Management Annual Conference, Atlanta, GA, December 12-14,
1997).
Wilkie P., and Pugh, D. Understanding your financial customers with Clementine: Neural Networks in Royal SunAlliance Life
and Pensions. SPSS white paper available at [Link]
51

Neural Networks in Data Mining Applications
No ratings yet
Neural Networks in Data Mining Applications
9 pages
Decision Tree Classification Techniques
No ratings yet
Decision Tree Classification Techniques
41 pages
Heart Disease Classification via Neural Networks
No ratings yet
Heart Disease Classification via Neural Networks
8 pages
Machine Learning: Decision Trees & Models
No ratings yet
Machine Learning: Decision Trees & Models
75 pages
Multilayer Neural Networks Overview
No ratings yet
Multilayer Neural Networks Overview
15 pages
Understanding Machine Learning Models
No ratings yet
Understanding Machine Learning Models
39 pages
Predicting Campus Placement with Data Mining
No ratings yet
Predicting Campus Placement with Data Mining
10 pages
Evaluating Student Performance with Naïve Bayes
No ratings yet
Evaluating Student Performance with Naïve Bayes
7 pages
Classification vs. Prediction Explained
No ratings yet
Classification vs. Prediction Explained
21 pages
Neural Networks in Data Mining
No ratings yet
Neural Networks in Data Mining
15 pages
ML Mod 3 - Chap 2
No ratings yet
ML Mod 3 - Chap 2
19 pages
Predicting Student Success with ANN
No ratings yet
Predicting Student Success with ANN
13 pages
Dropout Prediction at Unitelma Sapienza
No ratings yet
Dropout Prediction at Unitelma Sapienza
97 pages
Regression vs Classification in ML
No ratings yet
Regression vs Classification in ML
14 pages
Data Mining Techniques and Algorithms
No ratings yet
Data Mining Techniques and Algorithms
47 pages
Unit III
No ratings yet
Unit III
12 pages
Classification Techniques
No ratings yet
Classification Techniques
118 pages
Prediction Model For Students PDF
No ratings yet
Prediction Model For Students PDF
4 pages
Unit II
No ratings yet
Unit II
15 pages
Machine Learning & AI Course Overview
No ratings yet
Machine Learning & AI Course Overview
67 pages
Introduction to Data Mining Techniques
No ratings yet
Introduction to Data Mining Techniques
42 pages
Data Mining: Classification Techniques
No ratings yet
Data Mining: Classification Techniques
50 pages
Understanding Regression and Classification Techniques
No ratings yet
Understanding Regression and Classification Techniques
21 pages
Review of Soft Computing in Forecasting
No ratings yet
Review of Soft Computing in Forecasting
19 pages
Data Mining: Classification & Prediction
No ratings yet
Data Mining: Classification & Prediction
25 pages
Machine Learning: Decision Trees & Models
No ratings yet
Machine Learning: Decision Trees & Models
75 pages
Seminar on Artificial Neural Networks
No ratings yet
Seminar on Artificial Neural Networks
26 pages
Classification and Prediction in Data Mining
No ratings yet
Classification and Prediction in Data Mining
9 pages
Overview of Data Mining Techniques
No ratings yet
Overview of Data Mining Techniques
43 pages
Students' Performance Prediction Using Deep Neural Network: Bendangnuksung and Dr. Prabu P
No ratings yet
Students' Performance Prediction Using Deep Neural Network: Bendangnuksung and Dr. Prabu P
6 pages
CH 06
No ratings yet
CH 06
22 pages
Data Mining: Classification & Prediction
No ratings yet
Data Mining: Classification & Prediction
43 pages
Classification and Prediction Techniques
100% (3)
Classification and Prediction Techniques
63 pages
Classification Rule Mining Overview
No ratings yet
Classification Rule Mining Overview
25 pages
Big Data Analytics: Classification Methods
No ratings yet
Big Data Analytics: Classification Methods
72 pages
Part B Ch-2 & 3
No ratings yet
Part B Ch-2 & 3
6 pages
Logistic Regression and Neural Networks Guide
No ratings yet
Logistic Regression and Neural Networks Guide
12 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
118 pages
Key Machine Learning Techniques Explained
No ratings yet
Key Machine Learning Techniques Explained
73 pages
Supervised Learning Techniques Syllabus
No ratings yet
Supervised Learning Techniques Syllabus
183 pages
Overview of Artificial Neural Networks
No ratings yet
Overview of Artificial Neural Networks
34 pages
AI Modeling: Rule-Based vs Learning Approaches
No ratings yet
AI Modeling: Rule-Based vs Learning Approaches
7 pages
ML Unit II (1).pptx
No ratings yet
ML Unit II (1).pptx
183 pages
CART Decision Trees and Greedy Approach
No ratings yet
CART Decision Trees and Greedy Approach
50 pages
Classification and Prediction Overview
No ratings yet
Classification and Prediction Overview
15 pages
Numeric Prediction and Classification Overview
No ratings yet
Numeric Prediction and Classification Overview
75 pages
Classification vs. Prediction Overview
100% (1)
Classification vs. Prediction Overview
67 pages
Hybrid Classifier for Student Graduation Prediction
No ratings yet
Hybrid Classifier for Student Graduation Prediction
6 pages
Neural Networks in Data Mining Analysis
No ratings yet
Neural Networks in Data Mining Analysis
5 pages
ANN-Based Wind Speed Forecasting in Mardin
No ratings yet
ANN-Based Wind Speed Forecasting in Mardin
6 pages
K-Nearest Neighbors Classification Overview
No ratings yet
K-Nearest Neighbors Classification Overview
77 pages
NN Classification Rules Overview
No ratings yet
NN Classification Rules Overview
13 pages
Data Mining: Key Concepts and Methods
No ratings yet
Data Mining: Key Concepts and Methods
20 pages
Student Performance Prediction with ANN
No ratings yet
Student Performance Prediction with ANN
10 pages
Data Classification and Prediction Methods
No ratings yet
Data Classification and Prediction Methods
13 pages
Data Mining Techniques Overview
No ratings yet
Data Mining Techniques Overview
47 pages
Regression and Decision Tree Applications
No ratings yet
Regression and Decision Tree Applications
18 pages
Predictive Modeling Techniques Overview
No ratings yet
Predictive Modeling Techniques Overview
9 pages
Understanding Predictive Analytics Techniques
No ratings yet
Understanding Predictive Analytics Techniques
13 pages
Ethiopia Education Statistics 2001
No ratings yet
Ethiopia Education Statistics 2001
171 pages
Fundamental Data Mining in Institutional Research Workshop
No ratings yet
Fundamental Data Mining in Institutional Research Workshop
68 pages
Education Statistics Annual Abstract 1992
No ratings yet
Education Statistics Annual Abstract 1992
171 pages
Kaizen in Welding Practices
No ratings yet
Kaizen in Welding Practices
2 pages
BOU Student Enrollment Trends Analysis
No ratings yet
BOU Student Enrollment Trends Analysis
13 pages
Automated Text Classification Overview
No ratings yet
Automated Text Classification Overview
13 pages
Enrollment Prediction Models Overview
No ratings yet
Enrollment Prediction Models Overview
17 pages
Vol 101
No ratings yet
Vol 101
124 pages
Intelligent Workbook for Portuguese Learners
No ratings yet
Intelligent Workbook for Portuguese Learners
11 pages
Johnson Creek School District Enrollment Projections
No ratings yet
Johnson Creek School District Enrollment Projections
33 pages
Semantic Web Services Overview and Tutorial
No ratings yet
Semantic Web Services Overview and Tutorial
220 pages
Data Mining for CRM at Ethiopian Airlines
No ratings yet
Data Mining for CRM at Ethiopian Airlines
140 pages
Segmentation of Cells From Microscopic
No ratings yet
Segmentation of Cells From Microscopic
67 pages
Predicting Student Success with Data Mining
No ratings yet
Predicting Student Success with Data Mining
5 pages
Python Tutorial
No ratings yet
Python Tutorial
88 pages
Murach's Android Programming Guide
100% (1)
Murach's Android Programming Guide
683 pages
Microsoft.VisualBasic.FileIO Overview
No ratings yet
Microsoft.VisualBasic.FileIO Overview
181 pages
SPECTRA SoftPOS: Benefits for Merchants
No ratings yet
SPECTRA SoftPOS: Benefits for Merchants
3 pages
Visa VSDC CA Public Keys Update
No ratings yet
Visa VSDC CA Public Keys Update
9 pages
Understanding Card Tokenisation Process
No ratings yet
Understanding Card Tokenisation Process
2 pages
E Book Tokenization
No ratings yet
E Book Tokenization
8 pages
Contactless Mester Card
No ratings yet
Contactless Mester Card
1 page
Ingenico Move/Desk 3500 User Manual
No ratings yet
Ingenico Move/Desk 3500 User Manual
2 pages
Visa Global Level 3 L3 Testing Guidelines and FAQ Version 1.14 - Build 015 - FINAL - 061523
No ratings yet
Visa Global Level 3 L3 Testing Guidelines and FAQ Version 1.14 - Build 015 - FINAL - 061523
8 pages
POS Device Operation Manual: Ingenico 2500 & 3200
No ratings yet
POS Device Operation Manual: Ingenico 2500 & 3200
24 pages
Data Science Fundamentals and Techniques
No ratings yet
Data Science Fundamentals and Techniques
103 pages
A Simple and Flexible Rating Method For Predicting Success in The NCAA Basketball Tournament: Updated Results From 2007
No ratings yet
A Simple and Flexible Rating Method For Predicting Success in The NCAA Basketball Tournament: Updated Results From 2007
18 pages
Fitting Fragility Functions with MLE
No ratings yet
Fitting Fragility Functions with MLE
10 pages
Muk Bang
No ratings yet
Muk Bang
7 pages
Camm 4e Ch09 PPT
No ratings yet
Camm 4e Ch09 PPT
71 pages
Math 3150 Homework 9
No ratings yet
Math 3150 Homework 9
7 pages
Burn Treatment Outcomes in Ethiopian Children
No ratings yet
Burn Treatment Outcomes in Ethiopian Children
5 pages
Leptospirosis Awareness in Selangor Community
No ratings yet
Leptospirosis Awareness in Selangor Community
8 pages
Freeway Accident Risk Assessment Report
No ratings yet
Freeway Accident Risk Assessment Report
102 pages
Food Insecurity in Ethiopia's Amhara Region
No ratings yet
Food Insecurity in Ethiopia's Amhara Region
11 pages
Protein Design with CoVES Method
No ratings yet
Protein Design with CoVES Method
12 pages
Treatment With Tocilizumab or Corticosteroids For COVID 2021 Clinical Micro
No ratings yet
Treatment With Tocilizumab or Corticosteroids For COVID 2021 Clinical Micro
9 pages
J. Nutr.-2002-De Pee-2215-21
No ratings yet
J. Nutr.-2002-De Pee-2215-21
7 pages
User Satisfaction in Greek Transit Systems
No ratings yet
User Satisfaction in Greek Transit Systems
13 pages
Binary Dependent Variable Regression
No ratings yet
Binary Dependent Variable Regression
35 pages
BCS602 Module 3: k-NN and Regression
No ratings yet
BCS602 Module 3: k-NN and Regression
29 pages
Recipe Traffic Prediction Model
No ratings yet
Recipe Traffic Prediction Model
12 pages
Tax Compliance in Ethiopia: A Review
No ratings yet
Tax Compliance in Ethiopia: A Review
14 pages
Statistical Methods for General Insurance
No ratings yet
Statistical Methods for General Insurance
6 pages
Multinomial Logit Analysis of Occupations
No ratings yet
Multinomial Logit Analysis of Occupations
5 pages
Classifying Sexual Homicides: Key Insights
No ratings yet
Classifying Sexual Homicides: Key Insights
38 pages
HSM 17
No ratings yet
HSM 17
25 pages
Corticosteroids and Acanthamoeba Keratitis Outcomes
No ratings yet
Corticosteroids and Acanthamoeba Keratitis Outcomes
8 pages
WinBUGS Model Examples Volume II
No ratings yet
WinBUGS Model Examples Volume II
42 pages
Banking Relationships in M&A Advisory
No ratings yet
Banking Relationships in M&A Advisory
26 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
2 pages
Mendenhall R
No ratings yet
Mendenhall R
14 pages
Model Accuracy in Customer Churn Prediction
No ratings yet
Model Accuracy in Customer Churn Prediction
5 pages
Logistic Regression for Binary Classification
No ratings yet
Logistic Regression for Binary Classification
14 pages
Health Care Changes For Children With Special Health Care
No ratings yet
Health Care Changes For Children With Special Health Care
8 pages

WS1UNR

Uploaded by

WS1UNR

Uploaded by

Outline

Expanding Toolsets for Prediction

From Traditional Statistics to Data Mining Purpose of Study:

Focus of the Analysis Assumptions in Regression Analysis

Definition: Decision Trees Artificial Neural Networks

Artificial Neural Networks Training Tree and ANN Models

Analytical Approach Model Description

Data Samples and Sources

Number of Predictor Variables

Variables Examined: Retention Variables Examined: Retention

Variables Examined: Time-to-Degree Variables Examined: Time-to-Degree

Variables Examined: Time-to-Degree Imputation of Missing Values

Imputation of Missing Values Learn about variable A

(R2=0.682; 1 factor,18 covariates, table) for 804 cases A

Mean value substitution of missing ACT scores for 230

after first campus-based course enrollment

 Number of replacement grades: students may repeat up to 12 A

lower-division credits to replace original UNR grades

Exploratory Analysis Exploratory Analysis

Exploratory Analysis Exploratory Analysis

31 (Faster completion is associated with exposure to adjunct faculty) 32

% of cases correctly predicted

Model Results Model Results

% of cases correctly predicte

Model Results Model Results

Time to Degree Measured with New Students Comparison of Model Accuracy

0.75  C5.0 decision tree

Potential Operational Benefits SPSS Clementine Data-Stream Pane ®

 Fifteen percentage point improvement in correctly estimating time-

 Accelerated degree completion for estimated 6-year plus graduates

SPSS Clementine Data-Mining Application

SPSS Clementine Data-Mining Application

Generated C&RT tree structure: Generated R&RT model characteristics:

Ampazis, N. Introduction to neural networks. [Link] Downloaded 10/3/05

You might also like

Number of replacement grades: students may repeat up to 12 A

0.75 C5.0 decision tree

Fifteen percentage point improvement in correctly estimating time-

Accelerated degree completion for estimated 6-year plus graduates