# Books

## Machine Learning

- An Introduction to Machine Learning with Python
**Python Machine Learning – Sebastian Raschka****Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More****Logistic Regression: From Introductory to Advanced Concepts and Applications – Scott Menard**- Introduction to Linear Regression Analysis (Wiley Series in Probability and Statistics
- Introduction to Time Series and Forecasting – Brockwell, Peter J., Davis, Richard A.
- Pattern Recognition in Practice IV: Multiple Paradigms, Comparative Studies and Hybrid Systems: Multiple Paradigms, Comparative Studies and Hybrid Systems … Intelligence and Pattern Recognition) [Print Replica]
- The Nature of Statistical Learning Theory (Information Science and Statistics) by Vladimir Vapnik
- Learning From Data – by Yaser S. Abu-Mostafa
**Exploratory Analysis of Spatial and Temporal Data: A Systematic Approach by Natalia Andrienko****From Big Data to Big Profits: Success with Data and Analytics by Russell Walker****Programming Collective Intelligence: Building Smart Web 2.0 Applications by Toby Segaran**- Data Science from Scratch: First Principles with Python by Joel Grus
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)
- Pattern Recognition and Machine Learning by Christopher M. Bishop
**Advanced Analytics with Spark: Patterns for Learning from Data at Scale – by Sandy Ryza**- Hadoop: The Definitive Guide
- Natural Language Processing with Python
- Graph Analysis and Visualization: Discovering Business Opportunity in Linked Data
**How to Solve It: A New Aspect of Mathematical Method (Princeton Science Library)**- Doing Data Science
- Data Science fro Scratch

## Big Data Books

- Big Data – Principles and best practices of scalable realtime data systems by Nathan Marz and James Warren
- Advanced Analytics with Spark: Patterns for Learning from Data at Scale

## Python

- Head First Python
**Mastering pandas**- Think Python
**Web Scraping with Python: Collecting Data from the Modern Web by Ryan Mitchell**- Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney
- Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization (Treading on Python Book 3)

## Entrepreneurs in Data Science

- Data Science For Business
- Big Data at Work
- Lean Analytics
- Moneyball
- Elon Musk
- Keeping up with the Quants
- The Signal and the Noise
- When Genius Failed
- Lean Startup
- Web Analytics 2.0
- Predictive Analytics
- Freakonomics
- Founders at Work
- Bootstrapping a Business
- Analytics at Work

## Visualization

- Visualization Analysis and Design by Tamara Munzner
- Information Visualization: Perception for Design by Colin Ware
- Envisioning Information by Edward R. Tufte
- Visual Explanations: Images and Quantities, Evidence and Narrative by Edward R. Tufte
- The Visual Display of Quantitative Information by Edward R. Tufte
- Visualization Handbook – by Charles D. Hansen
- Readings in Information Visualization: Using Vision to Think by Stuart K. Card

## Matematics

- Linear Algebra and Its Applications by Gilbert Strang
**How to Solve It: A New Aspect of Mathematical Method (Princeton Science Library)**

## General

- If You’re So Smart, Why Aren’t You Happy?: The Surprising Path from Career Success to Life Success
- The Element: How Finding Your Passion Changes Everything
- Working Identity: Unconventional Strategies for Reinventing Your Career – by Herminia Ibarra
- The Innovator’s Dilemma: When New Technologies Cause Great Firms to Fail (Management of Innovation and Change) by Clayton M. Christensen
- The Last Lecture – by Randy Pausch
- The 7 Habits of Highly Effective People: Powerful Lessons in Personal Change by Stephen R. Covey
- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!

# Machine Learning

Methods that are having most impact in industry today are:

- Logistic regression
- Decision trees
- Boosting
- Deep Learning

## Tools and Libraries

- AdversariaLib – AdversariaLib is an open-source python library for the security evaluation of machine learning (ML)-based classifiers under adversarial attacks.
- Distributed (Deep) Machine Learning Community – A Community of Awesome Distributed Machine Learning Projects
- eXtreme Gradient Boosting – XGBoost – An optimized general purpose gradient boosting library. The library is parallelized, and also provides an optimized distributed version.
- Ensemble methods – The goal of ensemble methods is to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator.
- Java Machine Learning Tools & Libraries
- Machine Learning Research Group @University of Oxford

## Courses / Certifications / Competitions

- Machine Learning – Stanford University – Coursera
- Machine Learning Specialization – University of Washington – Coursera
- Intro to Data Analysis – Data Analysis Using NumPy and Pandas
- CS109 Data Science – Harvard
- Data Science and Engineering with Spark – edx
- CS229 – Machine Learning – Autumn 2015 – Stanford
- Data School – Data Science for beginners – Videos
- Learning from data – Yaser Abu-Mostafa – Youtube
- Intro to Machine Learning – Udacity
- Machine Learning Nanodegree – Udacity
- Scalable Machine Learning – Berkeley – edX
- Social Network Analysis – University of Michigan – Coursera
- Deep Learning – Google – Udacity
- CCP Data Scientist Exams
- http://nborwankar.github.io/LearnDataScience/

## Deep Learning / Neural Network

- Deep Learning – by Yann LeCun, Yoshua Bengio & Geoffrey Hinton
- Chris Olah’s neural network blog
- YouTube channel with two minute paper descriptions
- Tom Scott’s video Automated Weapons and the Battlefield of 2050
- Hidden Technical Debt in Machine Learning Systems
- Talking Machines podcast
- Deep Learning with Spark and TensorFlow
- The Neural Network That Remembers – With short-term memory, recurrent neural networks gain some amazing abilities

## Articles / Papers

- XGBoost: A Scalable Tree Boosting System – Tianqi Chen, Carlos Guestrin
- Naive Bayes and Text Classification I
- Six reasons why I recommend scikit-learn – by Ben Lorica
- API design for machine learning software: experiences from the scikit-learn project
- Machine Learning: The High-Interest Credit Card of Technical Debt
- Boosted Trees
- A Tutorial on Support Vector Machines for Pattern Recognition – CHRISTOPHER J.C. BURGES
- Comparative Study of Techniques for Large Scale Feature Selection, F. Ferri, P. Pudil, M. Hatef, and J. Kittler
- Feature Selection in scikit-learn
- A Guide to Bayesian Statistics

## Computer Vision

- SIFT [Lowe ’99]: Object Recognition from Local Scale-Invariant Features
- Spin Images [Johnson & Herbert ’99]: Spin-Images: A Representation for 3-D Surface Matching
- Textons [Malik et al. ’99]: Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons
- RIFT [Lazebnik ’04]: A sparse texture representation using local affine regions
- GLOH [Mikolajczyk & Schmid ’05]: A performance evaluation of local descriptors
- HoG [Dalal & Triggs ’05]: Histograms of Oriented Gradients for Human Detection
- SURF [Bay et al. ’06]: SURF: Speeded Up Robust Features
- ImageNet [Krizhevsky ’12]: ImageNet Classification with Deep Convolutional Neural Networks

## Science fiction

- Stephenson’s Seveneves
- Ancillary Justice
- Hannu Rajaniemi’s The Quantum Thief

## Data Science

- District Data Labs: Hands-on data science tutorials, lessons, and other awesome content
- What is data science?: The future belongs to the companies and people that turn data into products.
- Six qualities of a great data scientist
- Analyzing the Analyzers
- 4 trends in security data science: From intelligent investigation to cloud “security-as-a-service,” what you need to know for 2016
- How to Become a Data Scientist for Free
- Religiously follow this infographic on how to become a data scientist
- The Open Source Data Science Masters

## Competitions

## Data Visualization

- Seaborn: statistical data visualization
- Data-Driven Documents: d3.js
- Newbie to D3.js Expert: Complete path to create interactive visualization using D3.js
- PhiloGL
- Data Visualization – Coursera (As part of Data Mining Specialization)

## Graph Analytics

- Graph Database Neo4j books
- Paper – Pregel: A System for Large-Scale Graph Processing
- Graph Analytics—Lessons Learned and Challenges Ahead – tutorial
- Graphviz – Graph Visualization Software
- Apache TinkerPop™ – A graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP)

## Statistics

- OpenIntro
- Introduction to Statistical Learning – Free Download
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, Jerome Friedman – Free Download
- Think Stats – Alien B Downey – Free Download
- From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science – Free Download
- Introduction to Bayesian Statistics – William M Bolstad – Free Download
- Discovering Statistics using R – @amazon
- Convex Optimization by Stephen Boyd – Book @amazon.in
- R in a Nutshell by Joseph Adler – Book@amazon.in
- R for Everyone: Advanced Analytics and Graphics by Jared Lander (Addison – Wesley Data and Analytics) – Book@amazon.in
- The Art of R Programming: A Tour of Statistical Software Design by Norman Matloff – Book@amazon.in
- Statistical Inference by Casella – Book@amazon.in
- Bayesian Data Analysis, Third Edition (Chapman & Hall/CRC Texts in Statistical Science) by Andrew Gelman Book@amazon.in
- Data Analysis Using Regression and Multilevel/Hierarchical Models (Analytical Methods for Social Research) by Andrew Gelman Book@amazon.in
- Advanced Data Analysis from an Elementary Point of View by Cosma Rohilla Shalizi – Link

## Mathematics

- Introduction to Linear Algebra – Gilbert Strang @amazon
- Linear Algebra Review and Reference – Zico Kolter – CMU
- Matrix Computation – Gene H Golub and Charles F Van Loan – Free Download
- A Probabilistic Theory of Pattern Recognition – Luc Devroye, Laszlo Gyorfi and Gabor Lugosi – Free Download
- Introduction of Math of Neural Networks – Jeff Heaton @amazon
- Advanced Engineering Mathematics – Erwin Kreyszig – Free Download
- Cookbook on Probability and Statistics – Matthias Vallentin – Free Download
- Linear Algebra And its Applications Paperback – 2007 by Gilbert Strang – Book @amazon.in
- First Course in Probability by Sheldon Ross (Author) – Book @amazon.in
- Additional Resources

## Artificial Intelligence and Machine Learning

- Pattern Recognition and Machine Learning (Information Science and Statistics) by Christopher Bishop Book@amazon.in
- Bayesian Reasoning and Machine Learning Kindle Edition by David Barber – Book@amazon.in
**Programming Collective Intelligence: Building Smart Web 2.0 Applications – Book@amazon.in**- Artificial Intelligence: A Modern Approach by Stuart Russell – Book@amazon.in
- Foundations of Machine Learning (Adaptive Computation and Machine Learning series) by Mehryar Mohri – Book@amazon.in
- Introduction to Machine Learning (Adaptive Computation and Machine Learning series) by Ethem Alpaydin – Book@amazon.in
- Field Experiments – Design, Analysis, and Interpretation by Alan S. Gerber – Book@amazon.in
- Statistics for Experimenters: Design, Innovation, and Discovery (Wiley Series in Probability and Statistics) by George E. P. Box – Book@amazon.in
- The Elements of Graphing Data by William S. Cleveland – Book@amazon.in
- Visualize This: The FlowingData Guide to Design, Visualization, and Statistics by Nathan Yau – Book@amazon.in
- The Visual Display of Quantitative Information by Edward R. Tufte – Book@amazon.in

## Data Mining

- Mining of Massive Datasets: Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman
- 27 Free Data Mining Books

## Natural Language Processing

- NLP @google
- Natural Language Processing – Stanford University – Coursera
- Natural Language Processing by Michael Collins – Columbia Unversity – Coursera
- A Primer on Neural Network Models for Natural Language Processing – by Yoav Goldberg

## Python

- Learning Python by Mark Lutz – Book@amazon.in
- Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney – Book@amazon.in
- Fluent Python 1st Edition by Luciano Ramalho
- Python for Data Analysis – Wes McKinney

## Data / Datasets

## Collecting Data for Intelligence

## Others

- Profiling Top Kagglers: Leustagos, Current #7 / Highest #1
- Gregreda
- Rise of the Data Scientist
- Data Science from Scratch: First Principles with Python by Joel Grus @amazon
- Using Apache Spark for Massively Parallel NLP
- PRA Lab

## Reading / Watching List

- Using Apache Spark to predict attack vectors among billions of users and trillions of events
- Secure Because Math? Challenges on Applying Machine Learning to Security
- Scalable Machine Learning – Complex Data Analysis at Scale
- The Security Data Lake – Leveraging Big Data Technologies to Build a Common Data Repository for Security

## Data Science Conferences

# Big Data

## Videos

## Articles

- XGBoost4J: Portable Distributed XGBoost in Spark, Flink and Dataflow
- How to beat the CAP theorem – Nathan Marz
- Questioning the Lambda Architecture – The Lambda Architecture has its merits, but alternatives are worth exploring.

## Tools

–**Apache Flink** is an open source platform for distributed stream and batch data processing.

## Courses / Training

# Sentiment Analysis

- Naive Bayes and Text Classification I – Introduction and Theory by Sebastian Raschka
- Sentiment Analysis of Twitter Data: A Survey of Techniques
- Stanford University – Natural Language Processing – Coursera
- Sentiment Symposium
- Must-read Sentiment Analysis from hadyelsahar
- Deeply Moving: Deep Learning for Sentiment Analysis – Stanford University
- Sentiment Analysis: Mining Opinions, Sentiments, and Emotions by Bing Liu
- Opinion Mining and Sentiment Analysis by Bo Pang and Lillian Lee

# Scala

- Structure and Interpretation of Computer Programs Second Edition – Book
- Functional Programming Principles in Scala by Martin Odersky – Coursera
- Principles of Reactive Programming – by Martin Odersky, Erik Meijer, Roland Kuhn – Coursera
- Programming in Scala: A Comprehensive Step-by-Step Guide, Third Edition by Martin Odersky, Lex Spoon, Bill Venners

# Spark

- http://spark.meetup.com/
- http://spark.apache.org/community.html
- http://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html
- http://hortonworks.com/blog/category/spark/
- http://spark-packages.org/
- http://www.spark.tc/blog/
- https://spark.apache.org/docs/latest/
- http://research.google.com/archive/mapreduce.html
- https://forums.databricks.com/
- http://blog.cloudera.com/blog/category/spark/
- http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html
- https://github.com/apache/spark/
- http://www.jcmit.com/mem2014.htm
- https://databricks.com/blog/category/engineering
- https://amplab.cs.berkeley.edu/wp-content/uploads/2015/03/SparkSQLSigmod2015.pdf
- http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
- http://www.cs.berkeley.edu/~matei/papers/2010/hotcloud_spark.pdf
- https://en.wikipedia.org/wiki/SQL
- http://sqlzoo.net/
- http://www.w3schools.com/sql/
- http://www.sql-tutorial.net/
- https://www.1keydata.com/sql/sql.html
- http://www.sqlcourse.com/intro.html
- http://quickbase.intuit.com/articles/ultimate-web-guide-to-sql-database-language
- http://spark.apache.org/docs/latest/sql-programming-guide.html#compatibility-with-apache-hive
- https://en.wikipedia.org/wiki/Join_(SQL)
- https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
- http://www.w3schools.com/sql/sql_join.asp

Very good collection of books on Data Science.