0% found this document useful (0 votes)

120 views20 pages

Data Mining Tutorial: Stack Overflow Insights

The document describes preprocessing a Stack Overflow dataset for use in an Apriori frequent itemset mining experiment. It involves decompressing the dataset files without extracting to disk, streaming XML processing to select only question posts and their tags, and outputting the tags to a file. This preprocessing reduces the dataset size to around 50 MB uncompressed or 15 MB gzipped, suitable for the Apriori analysis.

Uploaded by

Manigandan Sundaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views20 pages

Data Mining Tutorial: Stack Overflow Insights

Uploaded by

Manigandan Sundaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Mining

Tutorial
E. Schubert,
E. Ntoutsi

Introduction
Data Mining Tutorial
Downloading

Preprocessing
Session 2: Stack Overflow data set
Apriori FIM

Conclusions
Erich Schubert, Eirini Ntoutsi

Ludwig-Maximilians-Universität München

2012-05-10 — KDD class tutorial

Stack Overflow
Introduction to “SO”

Data Mining
Tutorial Stack overflow is a programming QA website:
E. Schubert,
E. Ntoutsi
[Link]
Introduction
I Users post programming questions
Downloading I Other users post answers
Preprocessing
I Up- and Downvotes on questions and answers
Apriori FIM

Conclusions
I Awards for good answers, questions and active users
I Tags to organize questions
I Moderation by users with high reputation
I Size: 2.8m questions, 5.8m answers, 11m comments,
22m votes, 30k tags
(Yes, you could post your homework questions there. This is not recommended,
as usually the teachers want you to solve the problems yourself to learn from the
problem, not the solution. Plus, a good question there should already contain
source code)
Stack Overflow
Screenshot of first question

Data Mining
Tutorial First (non-deleted) question on SO:
E. Schubert,
E. Ntoutsi

Introduction

Downloading

Preprocessing

Apriori FIM

Conclusions
Stack Overflow data set
Getting the data

Data Mining
Tutorial StackOverflow publishes data dumps:
E. Schubert,
E. Ntoutsi
[Link]

Introduction I A torrent download with about 5 GB. 7zip-compressed.

Downloading I About 4 GB for the main stackoverflow site.
Preprocessing
I Main .xml file is 8 GB, post history is 11 GB.
Apriori FIM

Conclusions So this means:

I That is pretty big!
I Maybe not everyone here should download it.
I I will not demo this live, but provide result data for you.
I You can not load the XML in your DOM parser.
I In fact, you might even be unable to decompress it
(due to a 4 GB file size limit on many file systems).
Stack Overflow data set
A first peek inside the 7zip file.

Data Mining
Tutorial > 7z l [Link].7z.001
E. Schubert,
E. Ntoutsi Date Time Size Compressed Name
------------------- ------------ ------------ ------------------------
Introduction 2011-09-06 [Link] 170594039 457479414 092011 Stack Overflow/[Link]
2011-09-06 [Link] 1916999879 092011 Stack Overflow/[Link]
Downloading 2011-09-06 [Link] 10958639384 1841985260 092011 Stack Overflow/[Link]
2011-09-06 [Link] 7569879502 1454543990 092011 Stack Overflow/[Link]
Preprocessing 2011-09-06 [Link] 193250161 132626278 092011 Stack Overflow/[Link]
2011-09-06 [Link] 1346527241 092011 Stack Overflow/[Link]
Apriori FIM 2011-06-13 [Link] 1786 092011 Stack Overflow/[Link]
Conclusions 2011-09-06 [Link] 4780 092011 Stack Overflow/[Link]
2011-09-06 [Link] 0 0 092011 Stack Overflow
------------------- ------------ ------------ ------------------------
22155896772 3886634942 8 files, 1 folders

We are interested in the [Link] file for our Apriori

experiment.
Stack Overflow data set
Preprocessing — the plan

Data Mining
Tutorial So we will need to preprocess the data to get it into a
E. Schubert,
E. Ntoutsi
“workable” size. Here is what we have to do:
Introduction I Decompress into a stream, not to the harddisk.
Downloading I Streaming XML processing using XML pull (to avoid
Preprocessing
processing the full XML file at once)
Apriori FIM

Conclusions
I Select interesting data only and dump these parts only
So, let us have a peek into the [Link] file.
Stack Overflow data set
Preprocessing — [Link] file

Data Mining
Tutorial Inspect the file with: 7z x -so [Link].7z.001
E. Schubert,
E. Ntoutsi
"092011 Stack Overflow/[Link]" | less
<?xml version="1.0" encoding="utf-8"?>
Introduction <posts>
<row Id="4" PostTypeId="1" AcceptedAnswerId="7"
Downloading CreationDate="2008-07-31T[Link].667" Score="83" ViewCount="7351"
Body="I’m new to C#, and I want to use a track-bar to change a form’s
Preprocessing opacity.

This is my
code:

<pre><code>decimal trans = [Link]
Apriori FIM
/ 5000
[Link] =
Conclusions trans
</code></pre>

When I try to build it, I
get this error:

<blockquote>
 Cannot
implicitly convert type ’decimal’ to
’double’
</blockquote>

I tried making
<code>trans</code> a double, but then the control doesn’t work.
This code worked fine for me in [Link]. 

What do I
need to do differently?
" OwnerUserId="8"
LastEditorUserId="140328" LastEditorDisplayName="Rich B"
LastEditDate="2011-08-20T[Link].213"
LastActivityDate="2011-08-31T[Link].077" Title="When setting a form’s opacity
should I use a decimal or double?" Tags="<c#><winforms>"
AnswerCount="12" CommentCount="19" FavoriteCount="13" />
[...]
</posts>
Stack Overflow data set
Preprocessing — [Link] processing

Data Mining
Tutorial Looks worse than it is:
E. Schubert,
E. Ntoutsi
The Tags="..." attribute is quite simple, it decodes to:
<c#><winforms>
Introduction

Downloading I Stream via 7z x -so

Preprocessing I Process one <row ../> element at a time
Apriori FIM

Conclusions
I Extract Tags="..." attribute
I Extract tags by <.*> pattern
I Output tags for use in Apriori into an ascii file
I Do not include anything else: no ID, no title — we do
not need them here
I Worst case size estimation: 2.8m questions × up to 5
tags × 20 characters = 280 MB
I Reality: 50 MB uncompressed, 15 MB gzip.
Stack Overflow data set
Preprocessing — [Link] in python

Data Mining
Tutorial
#!/bin/python
E. Schubert, import subprocess, sys, re
E. Ntoutsi
from [Link] import pulldom
Introduction
archive = "[Link].7z.001"
Downloading xmlfile = "092011 Stack Overflow/[Link]"
Preprocessing

Apriori FIM
cmd = ["7z", "x", "-so", archive, xmlfile]

Conclusions proc = [Link](cmd, stdin=None,

stdout=[Link], shell=False)
events = [Link]([Link])

for event, node in events:

if event == pulldom.START_ELEMENT:
if [Link] == "row":
[Link](node)
processRow(node) # NEXT SLIDE
# print [Link]()

[Link]()
Stack Overflow data set
Preprocessing — [Link] in python

Data Mining
Tutorial And the main processing function:
E. Schubert,
E. Ntoutsi tagre = [Link]("<([^>]+)>")

Introduction def processRow(node):

Downloading
# Questions (type 1) only
typ = [Link]("PostTypeId")
Preprocessing
if typ != "1":
Apriori FIM return
Conclusions # Get Tags attribute
tags = [Link]("Tags")
if not tags or len(tags) == 0:
return
# Remove the <> wrappers, separate by space.
print " ".join([Link](tags))
Stack Overflow data set
Preprocessed data file — [Link]

Data Mining
Tutorial Resulting data set:
E. Schubert,
E. Ntoutsi
c# winforms
Introduction
html css internet-explorer-7
c# conversion j#
Downloading c# datetime
Preprocessing c# .net datetime timespan
Apriori FIM
html browser time timezone
c# math
Conclusions
c# linq web-services .net-3.5
mysql database
performance algorithm language-agnostic unix pi
php
mysql database triggers
c++ c sockets mainframe zos
flex actionscript-3
sql-server datatable
c# .net [Link] timer

Now, let us do some Apriori on this data set!

Planning Apriori

Data Mining
Tutorial Weka unfortunately does not scale up well to this data set
E. Schubert,
E. Ntoutsi
size.
Plus, we first would need to convert it into the .arff file
Introduction
for Weka.
Downloading

Preprocessing
Why not just write Apriori ourselves?
Apriori FIM
Choosing appropriate minsup values might be tricky, too.
Conclusions
So we will just look at the top itemsets in each run.
With just 50 MB of uncompressed data, we should be able
to keep all of them in memory!
Loading the data with python

Data Mining
Tutorial Python is for lazy people. Loading text data is easy:
E. Schubert,
E. Ntoutsi
#!/bin/python
Introduction import gzip
Downloading

Preprocessing db=[]
Apriori FIM
for line in [Link]("[Link]"):
Conclusions
[Link]([Link]().split(" "))

print "Database size:", len(db)

Output:

Database size: 2012348

Itemset class

Data Mining
Tutorial Class to represent an itemset:
E. Schubert, class itemset():
E. Ntoutsi def __init__(self, tokens, support=0):
[Link] = list(tokens)
Introduction [Link] = support
Downloading

Preprocessing
def tokenstr(self):
return "+".join([Link])
Apriori FIM

Conclusions def str(self):

return [Link]()+": "+str([Link])

def cmp(self, other):

return cmp([Link](), [Link]())

def __hash__(self):
return hash([Link]())
Computing the 1-Itemsets

Data Mining
Tutorial
oneitems = dict()
E. Schubert, for rec in db:
E. Ntoutsi
for tag in rec:
item = itemset([tag])
Introduction
item = [Link](item, item)
Downloading [Link] += 1
Preprocessing oneitems = list([Link]())
Apriori FIM
# Inspect:
Conclusions [Link](lambda a,b: cmp([Link], [Link]))
print len(oneitems), map(str, oneitems[:10])
print str(oneitems[100]), str(oneitems[200])

Output:
29551 [’c#: 211338’, ’java: 153561’, ’php: 142125’,
’javascript: 126296’, ’jquery: 109129’, ’iphone: 96748’,
’android: 93247’, ’.net: 89646’, ’[Link]: 88938’,
’c++: 84777’]
[Link]-mvc-2: 7219 c#-4.0: 4055
Computing the 1-Itemsets

Data Mining
Tutorial So we will try with minsupport= 1000:
E. Schubert,
E. Ntoutsi
minsupport = 1000
Introduction oneitems = filter(
Downloading lambda x: [Link] >= minsupport,
Preprocessing oneitems)
Apriori FIM
itemsets = [oneitems]
Conclusions
print len(oneitems)

Output:
777
Apriori-Gen

Data Mining
Tutorial Generating candidates:
E. Schubert,
E. Ntoutsi def apriorigen(curitems):
[Link]() # by tags
Introduction for i in range(0, len(curitems) - 1):
Downloading
toka = curitems[i].tokens
for j in range(i + 1, len(curitems)):
Preprocessing
tokb = curitems[j].tokens
Apriori FIM # Prefix test:
Conclusions if not toka[:-1] == tokb[:-1]: break
cand = toka + tokb[-1:] # Extend with last
# Pruning test:
ok = True
for i in range(len(cand) - 2):
t = cand[:i] + cand[i+1:] # without i
if not contains(curitems, itemset(t)):
ok = False
break
if ok: yield itemset(cand) # generate itemset
Apriori FIM

Data Mining
Tutorial Main loop for FIM:
E. Schubert,
E. Ntoutsi while True:
size = len(itemsets) + 1
Introduction cand = dict()
Downloading
for c in apriorigen(itemsets[-1]): cand[c] = c
if len(cand) == 0: break
Preprocessing
for rec in db:
Apriori FIM for subset in [Link](rec, size):
Conclusions subset = [Link](itemset(subset))
if subset: [Link] += 1
[Link](filter(
lambda i: [Link] >= minsupport, cand))
itemsets[-1].sort(lambda a,b: cmp([Link], [Link]))
print size, map(str, itemsets[-1][:5])

Output:
[’ruby+ruby-on-rails: 12266’, ’c#+winforms: 12236’,
’android+java: 11252’, ’c#+wpf: 11143’, ’ios+iphone: 10431’]
[’c#+wpf+xaml: 1600’, ’mysql+php+sql: 1064’]
More frequent itemsets

Data Mining
Tutorial Setting minsupport = 500 finds more itemsets:
E. Schubert,
E. Ntoutsi

Introduction Output:
[’ruby+ruby-on-rails: 12266’, ’c#+winforms: 12236’, ’android+java: 11252’,
Downloading
’c#+wpf: 11143’, ’ios+iphone: 10431’, ’cocoa-touch+iphone: 10371’,
Preprocessing ’c#+linq: 8184’, ’c+c++: 8131’, ’ajax+javascript: 7951’, ’java+swing: 7296’,
’mysql+sql: 7018’, ’cocoa+objective-c: 6670’, ’cocoa-touch+objective-c: 6598’,
Apriori FIM ’jquery+php: 6564’, ’ipad+iphone: 6311’, ’[Link]+javascript: 5761’,
’sql-server+tsql: 5728’, ’jquery+jquery-ui: 5728’, ’eclipse+java: 5537’,
Conclusions ’hibernate+java: 5336’]

[’c#+wpf+xaml: 1600’, ’mysql+php+sql: 1064’, ’cocoa-touch+ios+iphone: 935’,

’cocoa-touch+iphone+uikit: 879’, ’hibernate+java+orm: 838’,
’activerecord+ruby+ruby-on-rails: 783’, ’c#+databinding+wpf: 754’,
’gui+java+swing: 730’, ’oracle+plsql+sql: 698’,
’cocoa-touch+iphone+uitableview: 693’, ’cocoa+cocoa-touch+iphone: 652’,
’cocoa+cocoa-touch+objective-c: 570’, ’database+mysql+sql: 567’,
’database+database-design+mysql: 532’, ’ios+iphone+uitableview: 514’,
’c#+silverlight+windows-phone-7: 513’, ’cocoa-touch+ipad+iphone: 500’]
Conclusions

Data Mining
Tutorial I The results were okay, but not very surprising (in fact,
E. Schubert,
E. Ntoutsi
most results are very obvious!)
Introduction
I The data contains redundant tags (mysql, sql)
Downloading I 5 tag limit affects output
Preprocessing I Data mining does not guarantee new results,
Apriori FIM
unfortunately
Conclusions

DM Tutoria
No ratings yet
DM Tutoria
116 pages
Data Mining Fundamentals and Techniques
No ratings yet
Data Mining Fundamentals and Techniques
9 pages
Managing Large Datasets with Weka
No ratings yet
Managing Large Datasets with Weka
41 pages
Data Warehouse & Mining Overview
No ratings yet
Data Warehouse & Mining Overview
105 pages
Statistical Computing Seminar in Python
No ratings yet
Statistical Computing Seminar in Python
21 pages
Big Data Processing Methods and Strategies
No ratings yet
Big Data Processing Methods and Strategies
29 pages
Data Mining Challenges and Solutions
No ratings yet
Data Mining Challenges and Solutions
26 pages
Data Mining and Weka Data Prep Guide
No ratings yet
Data Mining and Weka Data Prep Guide
12 pages
Data Preprocessing Techniques Explained
No ratings yet
Data Preprocessing Techniques Explained
50 pages
Understanding Big Data and Its Challenges
No ratings yet
Understanding Big Data and Its Challenges
6 pages
Chap 3 Prep
No ratings yet
Chap 3 Prep
374 pages
Market Basket Analysis & Apriori Algorithm
No ratings yet
Market Basket Analysis & Apriori Algorithm
26 pages
Pig Latin: A Data-Parallel Language
100% (1)
Pig Latin: A Data-Parallel Language
33 pages
Data Mining Course Overview and Techniques
No ratings yet
Data Mining Course Overview and Techniques
41 pages
Ontology Evolution for Deep Web Mining
No ratings yet
Ontology Evolution for Deep Web Mining
4 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
52 pages
Data Preprocessing Techniques
No ratings yet
Data Preprocessing Techniques
32 pages
Predictive Analytics with WEKA
No ratings yet
Predictive Analytics with WEKA
65 pages
Data Preprocessing Techniques Explained
No ratings yet
Data Preprocessing Techniques Explained
54 pages
Data Mining Lab Manual Using WEKA
No ratings yet
Data Mining Lab Manual Using WEKA
66 pages
Overview of Apache Spark History
No ratings yet
Overview of Apache Spark History
31 pages
Data Mining Concepts and Trends
No ratings yet
Data Mining Concepts and Trends
90 pages
AI & ML Basics: Python & Visualization
No ratings yet
AI & ML Basics: Python & Visualization
22 pages
Data Preprocessing Techniques Explained
No ratings yet
Data Preprocessing Techniques Explained
22 pages
Cloud Computing and Distributed Systems
No ratings yet
Cloud Computing and Distributed Systems
46 pages
Comprehensive Python AI Course Guide
No ratings yet
Comprehensive Python AI Course Guide
12 pages
Apache Pig: Big Data Analytics Overview
No ratings yet
Apache Pig: Big Data Analytics Overview
67 pages
Data Collection and Preprocessing Techniques
No ratings yet
Data Collection and Preprocessing Techniques
12 pages
Data Mining Overview and Applications
No ratings yet
Data Mining Overview and Applications
12 pages
Mining Massive Datasets Course Overview
No ratings yet
Mining Massive Datasets Course Overview
37 pages
Apache Pig Overview and Features
No ratings yet
Apache Pig Overview and Features
61 pages
Inverted Index Construction Techniques
No ratings yet
Inverted Index Construction Techniques
46 pages
Data Preprocessing Techniques Overview
No ratings yet
Data Preprocessing Techniques Overview
15 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
18 pages
Data Wrangling & Visualization Techniques
No ratings yet
Data Wrangling & Visualization Techniques
41 pages
Data Preprocessing Techniques Explained
No ratings yet
Data Preprocessing Techniques Explained
67 pages
Data Warehouse and Mining Lecture Notes
No ratings yet
Data Warehouse and Mining Lecture Notes
69 pages
Big Data and Hadoop Overview Guide
No ratings yet
Big Data and Hadoop Overview Guide
394 pages
HBase Commands for Sample Database Setup
100% (1)
HBase Commands for Sample Database Setup
394 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
54 pages
Association Rule Mining Overview
No ratings yet
Association Rule Mining Overview
64 pages
Big Data Analytics with Hadoop Overview
No ratings yet
Big Data Analytics with Hadoop Overview
32 pages
Weka Tool: Pros and Cons
No ratings yet
Weka Tool: Pros and Cons
19 pages
Unit II
No ratings yet
Unit II
158 pages
Data Preprocessing Techniques in Data Mining
No ratings yet
Data Preprocessing Techniques in Data Mining
46 pages
Data Mining Techniques and Standards
No ratings yet
Data Mining Techniques and Standards
34 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
43 pages
Data Science Class Lectures Updated
No ratings yet
Data Science Class Lectures Updated
165 pages
Data Mining Techniques Overview
100% (1)
Data Mining Techniques Overview
15 pages
135 - PGCon 2009 - Aster v6
No ratings yet
135 - PGCon 2009 - Aster v6
18 pages
Big Data Processing with Hadoop & Spark
No ratings yet
Big Data Processing with Hadoop & Spark
11 pages
DBMS
No ratings yet
DBMS
42 pages
Challenges and Solutions for Big Data Handling
No ratings yet
Challenges and Solutions for Big Data Handling
19 pages
Data Wrangling and Cleaning Techniques
No ratings yet
Data Wrangling and Cleaning Techniques
94 pages
Data Mining Techniques in Bioinformatics
No ratings yet
Data Mining Techniques in Bioinformatics
21 pages
Lessons from Building Google Systems
100% (3)
Lessons from Building Google Systems
73 pages
Data Preprocessing Techniques in Python
No ratings yet
Data Preprocessing Techniques in Python
46 pages
Big Data Developer Job Description
No ratings yet
Big Data Developer Job Description
1 page
Rent Receipt Template for Tenants
No ratings yet
Rent Receipt Template for Tenants
1 page
Informatica Questionnaire
No ratings yet
Informatica Questionnaire
79 pages
BESCOM New Location Codes List
No ratings yet
BESCOM New Location Codes List
1 page
SQL Interview Questions and Answers
No ratings yet
SQL Interview Questions and Answers
13 pages
ControlM Concepts Guide
100% (1)
ControlM Concepts Guide
48 pages
DS QlikView 11 System Requirements en
No ratings yet
DS QlikView 11 System Requirements en
3 pages
Control M
No ratings yet
Control M
340 pages
Creating Database Views and Sequences
No ratings yet
Creating Database Views and Sequences
40 pages
Future ICT Trends Impacting Industries
No ratings yet
Future ICT Trends Impacting Industries
10 pages
Linux Practical Exam Overview
0% (1)
Linux Practical Exam Overview
4 pages
Java Database Connection Overview
No ratings yet
Java Database Connection Overview
62 pages
Enhancing Efficiency in Functional Analysis
No ratings yet
Enhancing Efficiency in Functional Analysis
23 pages
SQL Wildcards and Their Usage
No ratings yet
SQL Wildcards and Their Usage
41 pages
Dhrumil Patel: Data Analyst Profile
No ratings yet
Dhrumil Patel: Data Analyst Profile
3 pages
ER Model and Conceptual Design Overview
No ratings yet
ER Model and Conceptual Design Overview
70 pages
Beauty Standards in Mean Girls Analysis
No ratings yet
Beauty Standards in Mean Girls Analysis
62 pages
LC-3 Instruction Set Overview
No ratings yet
LC-3 Instruction Set Overview
2 pages
Key Elements of Technical Writing
No ratings yet
Key Elements of Technical Writing
1 page
Getting Closer by Increasing Distance: The Dynamics of Value Creation Spheres in Health Care Logistics
No ratings yet
Getting Closer by Increasing Distance: The Dynamics of Value Creation Spheres in Health Care Logistics
12 pages
Database Security and Recovery Essentials
No ratings yet
Database Security and Recovery Essentials
9 pages
Prompt Template Grounding Preparations
No ratings yet
Prompt Template Grounding Preparations
12 pages
AI With Power BI and Tableau
No ratings yet
AI With Power BI and Tableau
10 pages
Enhancing Finance Research with Qualitative Methods
No ratings yet
Enhancing Finance Research with Qualitative Methods
9 pages
Quizzes on Information Systems Management
No ratings yet
Quizzes on Information Systems Management
70 pages
Understanding JDBC in J2EE Applications
No ratings yet
Understanding JDBC in J2EE Applications
27 pages
Leon3 Simulation Guide 0 2
No ratings yet
Leon3 Simulation Guide 0 2
12 pages
Characteristics of Arrays in C
No ratings yet
Characteristics of Arrays in C
6 pages
Introduction to SQL and DBMS Concepts
No ratings yet
Introduction to SQL and DBMS Concepts
5 pages
SQL Server BI (SSAS, SSRS)
No ratings yet
SQL Server BI (SSAS, SSRS)
30 pages
Image Guide Openstack
No ratings yet
Image Guide Openstack
64 pages
IT501 Database Management Exam 2023
No ratings yet
IT501 Database Management Exam 2023
5 pages
Learning Microsoft Access 2007 - Invoices
60% (5)
Learning Microsoft Access 2007 - Invoices
14 pages
Victory School Club Membership System
100% (11)
Victory School Club Membership System
7 pages
Retail Insight Generator for Analytics
No ratings yet
Retail Insight Generator for Analytics
5 pages
Understanding Community Planning Basics
No ratings yet
Understanding Community Planning Basics
98 pages
Consumer Complaint Analysis by Anusree
No ratings yet
Consumer Complaint Analysis by Anusree
12 pages
Big Data Analytics Insights and Benefits
No ratings yet
Big Data Analytics Insights and Benefits
3 pages

Data Mining Tutorial: Stack Overflow Insights

Uploaded by

Data Mining Tutorial: Stack Overflow Insights

Uploaded by

Data Mining

2012-05-10 — KDD class tutorial

Introduction I A torrent download with about 5 GB. 7zip-compressed.

Conclusions So this means:

We are interested in the [Link] file for our Apriori

Downloading I Stream via 7z x -so

Conclusions proc = [Link](cmd, stdin=None,

for event, node in events:

Introduction def processRow(node):

Now, let us do some Apriori on this data set!

print "Database size:", len(db)

Database size: 2012348

Conclusions def __str__(self):

def __cmp__(self, other):

[’c#+wpf+xaml: 1600’, ’mysql+php+sql: 1064’, ’cocoa-touch+ios+iphone: 935’,

You might also like

Conclusions def str(self):

def cmp(self, other):