Data Modeling March 16
Data Modeling March 16
MY NOTES
What is modeling?
Let’s explore some foundational knowledge for
modeling.
Math&
Statistics
Knowledge
Traditional
ML research
DS
Hacking DangerSubstantive
Skills zone Expertise
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Sample size N
For statistical inference N < All
For big data N == All
For some atypical big data analysis N == 1
World model through the eyes of a prolific twitter user
Followers of Ashton Kuchar: If you analyze the twitter data you
may get a world view from his point of view
Analysis for inference purposes you don’t need all the data.
At Google (at the originator big data algs.) people sample all
the time.
However if you want to render, you cannot sample.
Some DNA-based search you cannot sample.
Say we make some conclusions with samples from Twitter data
we cannot extend it beyond the population that uses twitter.
And this is what is happening now…be aware of biases.
Another example is of the tweets pre- and post- hurricane
Sandy..
Yelp example..
APW corpus:
Associated Press Worldstream
5.7 GB data
https://data.world/associatedpress
Design, code, deploy
23