Links
=====

## Preface

[Data Science Venn Diagram](http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram)

## 1. Introduction

[OkCupid Questions](http://blog.okcupid.com/index.php/the-best-questions-for-first-dates/)

[Facebook on coordinated migration](https://www.facebook.com/notes/facebook-data-science/coordinated-migration/10151930946453859)

[Facebook on NFL fandom](https://www.facebook.com/notes/facebook-data-science/nfl-fans-on-facebook/10151298370823859)

[Target's predictive modeling](http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html)

[Making government more effective](http://www.marketplace.org/topics/tech/beyond-ad-clicks-using-big-data-social-good)

[Helping homelessness](http://dssg.io/2014/08/20/paths-homelessness.html)

[Improving public health](https://plus.google.com/communities/109572103057302114737)

## 2. A Crash Course in Python

http://python.org

[Anaconda](https://store.continuum.io/cshop/anaconda/)

[pip](https://pypi.python.org/pypi/pip)

[IPython](http://ipython.org/)

[the Zen of Python](http://legacy.python.org/dev/peps/pep-0020/)

[official Python tutorial](https://docs.python.org/2/tutorial/)

[official IPython tutorial](http://ipython.org/ipython-doc/2/interactive/tutorial.html)

[IPython videos and presentations](http://ipython.org/videos.html)

[Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do)

## 3. Visualizing Data

[matplotlib](http://matplotlib.org/)

[seaborn](http://www.stanford.edu/~mwaskom/software/seaborn/)

[D3.js](http://d3js.org/)

[Bokeh](http://bokeh.pydata.org/)

[ggplot](https://pypi.python.org/pypi/ggplot)

## 4. Linear Algebra

[Linear Algebra, from UC Davis](https://www.math.ucdavis.edu/~linear/)

[Linear Algebra, from Saint Michael's College](http://joshua.smcvt.edu/linearalgebra/)

[Linear Algebra Done Wrong](http://www.math.brown.edu/~treil/papers/LADW/LADW.html)

[SciPy linear algebra module](http://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html)

## 5. Statistics

[Non-obvious tricks for computing medians](http://en.wikipedia.org/wiki/Quickselect)

[Almost "average squared deviation from the mean"](http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation)

["angrily accused of experimenting on your users"](http://www.nytimes.com/2014/06/30/technology/facebook-tinkers-with-users-emotions-in-news-feed-experiment-stirring-outcry.html)

[SciPy stats](http://docs.scipy.org/doc/scipy/reference/stats.html)

[pandas](http://pandas.pydata.org/)

[StatsModels](http://statsmodels.sourceforge.net/)

[OpenIntro Statistics](https://www.openintro.org/stat/textbook.php)

[OpenStax Introductory Statistics](http://openstaxcollege.org/textbooks/introductory-statistics)

## 6. Probability

[the Monty Hall Problem](http://en.wikipedia.org/wiki/Monty_Hall_problem)

[error function](http://en.wikipedia.org/wiki/Error_function)

[binary search](http://en.wikipedia.org/wiki/Binary_search_algorithm)

[SciPy stats](http://docs.scipy.org/doc/scipy/reference/stats.html)

[Introduction to Probability](http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf)

## 7. Hypothesis and Inference

[continuity correction](http://en.wikipedia.org/wiki/Continuity_correction)

[P-hacking](http://www.nature.com/news/scientific-method-statistical-errors-1.14700)

["The Earth Is Round (p < .05)"](http://ist-socrates.berkeley.edu/~maccoun/PP279_Cohen1.pdf)

[conjugate priors](http://www.johndcook.com/blog/conjugate_prior_diagram/)

[Coursera -- Data Analysis and Statistical Inference](https://www.coursera.org/course/statistics)

## 8. Gradient Descent

[Active Calculus](http://gvsu.edu/s/xr/)

[scikit-learn stochastic gradient descent](http://scikit-learn.org/stable/modules/sgd.html)

## 9. Getting Data

[running Python scripts without the Python command](http://stackoverflow.com/questions/15587877/run-a-python-script-in-terminal-without-the-python-command)

[opening csv files in binary mode](http://stackoverflow.com/questions/4249185/using-python-to-append-csv-files)

[BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/)

[requests](http://docs.python-requests.org/en/latest/)

[GitHub API](http://developer.github.com/v3/)

http://www.pythonapi.com/

http://www.pythonforbeginners.com/development/list-of-python-apis/

http://www.programmableweb.com/

[Twython](https://github.com/ryanmcgrath/twython)

https://apps.twitter.com/

[Twitter Search API](https://dev.twitter.com/docs/api/1.1/get/search/tweets)

[unicode](https://docs.python.org/2/howto/unicode.html)

[Twitter Streaming API](https://dev.twitter.com/docs/api/1.1/get/statuses/sample)

[scrapy](http://scrapy.org/)

[pandas](http://pandas.pydata.org/)

## 10. Working With Data

[pandas](http://pandas.pydata.org/)

[Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do)

[scikit-learn matrix decomposition](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition)

## 11. Machine Learning

[prevalence of "Luke"](http://www.babycenter.com/babyNameAllPops.htm?babyNameId=2918)

[prevalence of leukemia](http://seer.cancer.gov/statfacts/html/leuks.html)

[harmonic mean](http://en.wikipedia.org/wiki/Harmonic_mean)

[Coursera -- Machine Learning](https://www.coursera.org/course/ml)

[Caltech -- Machine Learning](https://work.caltech.edu/telecourse.html)

[The Elements of Statistical Learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/)

## 12. Nearest Neighbors

[the length represented by a degree of longitude](http://en.wikipedia.org/wiki/Longitude#Length_of_a_degree_of_longitude)

[scikit-learn nearest neighbor models](http://scikit-learn.org/stable/modules/neighbors.html)

## 13. Naive Bayes

[SpamAssassin public corpus](https://spamassassin.apache.org/publiccorpus/)

[7-Zip](http://www.7-zip.org/)

[the Porter stemmer](http://tartarus.org/martin/PorterStemmer/)

["A Plan for Spam"](http://www.paulgraham.com/spam.html)

["Better Bayesian Filtering"](http://www.paulgraham.com/better.html)

[scikit-learn Naive Bayes](http://scikit-learn.org/stable/modules/naive_bayes.html)

## 14. Simple Linear Regression

## 15. Multiple Regression

[scikit-learn linear model](http://scikit-learn.org/stable/modules/linear_model.html)

[StatsModels](http://statsmodels.sourceforge.net/)

## 16. Logistic Regression

[scikit-learn logistic regression](http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression)

[scikit-learn support vector machines](http://scikit-learn.org/stable/modules/svm.html)

[libsvm](http://www.csie.ntu.edu.tw/~cjlin/libsvm/)

## 17. Decision Trees

[Twenty Questions](http://en.wikipedia.org/wiki/Twenty_Questions)

[scikit-learn decision trees](http://scikit-learn.org/stable/modules/tree.html)

[scikit-learn ensembles](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble)

http://en.wikipedia.org/wiki/Decision_tree_learning

## 18. Neural Networks

[Coursera -- Neural Networks for Machine Learning](https://www.coursera.org/course/neuralnets)

[Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/)

[pybrain](http://pybrain.org/)

## 19. Clustering

[RGB color model](http://en.wikipedia.org/wiki/RGB_color_model)

[SciPy](http://www.scipy.org/)

## 20. Natural Language Processing

["What is Data Science"](http://radar.oreilly.com/2010/06/what-is-data-science.html)

[Natural Language Toolkit](http://www.nltk.org/)

[NLTK book](http://www.nltk.org/book/)

[gensim](http://radimrehurek.com/gensim/)

## 21. Network Analysis

[Centrality](http://en.wikipedia.org/wiki/Centrality)

[NetworkX](http://networkx.github.io/)

[Gephi](http://gephi.github.io/)

## 22. Recommender Systems

[Crab](http://muricoca.github.io/crab/)

[Graphlab recommender toolkit](http://graphlab.com/products/create/docs/graphlab.toolkits.recommender.html)

[Netflix prize](http://www.netflixprize.com/)

## 23. Databases

[SQLite](http://www.sqlite.org/)

[MySQL](http://www.mysql.com/)

[PostgreSQL](http://www.postgresql.org/)

[MongoDB](http://www.mongodb.org/)

[NoSQL](http://en.wikipedia.org/wiki/NoSQL)

## 24. Map-Reduce

[Hadoop](http://hadoop.apache.org/)

[Elastic MapReduce](http://aws.amazon.com/elasticmapreduce/)

[mrjob](https://github.com/Yelp/mrjob)

[Spark](http://spark.apache.org/)

[Storm](http://storm.incubator.apache.org/)

## 25. Go Forth And Do Data Science

[IPython](http://ipython.org/)

[NumPy](http://www.numpy.org/)

[pandas](http://pandas.pydata.org/)

[scikit-learn](http://scikit-learn.org/)

[many, many scikit-learn examples](http://scikit-learn.org/stable/auto_examples/)

[matplotlib examples](http://matplotlib.org/examples/)

[matplotlib gallery](http://matplotlib.org/gallery.html)

[seaborn](http://web.stanford.edu/~mwaskom/software/seaborn/)

[D3.js](http://d3js.org/)

[D3 gallery](https://github.com/mbostock/d3/wiki/Gallery)

[Bokeh](http://bokeh.pydata.org/)

[Data.gov](http://www.data.gov/)

[r/datasets](http://www.reddit.com/r/datasets) and [r/data](http://www.reddit.com/r/data)

[Amazon public data sets](http://aws.amazon.com/public-data-sets/)

[100 Interesting Data Sets](http://rs.io/100-interesting-data-sets-for-statistics/)

[Kaggle](https://www.kaggle.com/)

[Hacker News](https://news.ycombinator.com/news)

[Hacker News Story Classifier](https://github.com/joelgrus/hackernews)

[Seattle Real-Time 911](http://www2.seattle.gov/fire/realtime911/getRecsForDatePub.asp?action=Today)

[social network analysis of fire trucks](https://github.com/joelgrus/fire)

[machine learning on t-shirts](https://github.com/joelgrus/shirts)