## Python Tools for Machine Learning:

- scikit-learn (Python Machine Learning Library): scikit-learn is a very popular machine learning library in the Python community. It provides sample applications, tutorials, and code examples that are available online. It uses two other Python scientific computing libraries called SciPy and NumPy.
- SciPy Library (Scientific Computing Tools): This library supports statistical distributions, optimization of functions, linear algebra, and a variety of specialized mathematical functions. SciPy provides a way to store sparse matrices, which are a way to store large tables that contain mostly zeros.
- NumPy Library: Python library for scientific computing that contains support for some fundamental data structures, such as multi-dimensional arrays.
- Pandas (Python Data Analysis Library): This is a Python library for data manipulation and analysis.
- The main data structure pandas support is called a DataFrame, which is basically like a spreadsheet table with rows and named columns. Unlike the arrays of NumPy, the columns in a DataFrame can be of all different types.
- Pandas also support reading and writing data in a variety of formats, CSV files, SQL, and more.

- Matplotlib is a widely used Python 2D plotting library that produces publication quality figures in a variety of formats and interactive environments across platforms. matplotlib.pyplot is widely used for data analysis since it can create histograms, bar charts, error charts, scatter plots, and so forth with just a few lines of code.
- Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
- Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks.

## Choosing the right estimator¶

Often the hardest part of solving a machine learning problem can be finding the right estimator for the job. Different estimators are better suited for different types of data and different problems. The flowchart below is designed to give users a bit of a rough guide on how to approach problems with regard to which estimators to try on your data. [1]

## Learn

- Videos from dataschool.io
- What is
*machine learning*, and*how does it work*? – video – ipynb - Setting up
*Python for machine learning*: scikit-learn and IPython Notebook – video – ipynb *Getting started in*scikit*-learn*with the famous iris dataset – video – ipynb*Training*a machine learning model with scikit-learn – video – ipynb*Comparing*machine learning*models*in scikit-learn – video – ipynb*Data science*in Python:*pandas*,*seaborn*, scikit*-learn*– video – ipynb- Selecting the
*best model*in scikit-learn using*cross-validation*– video – ipynb - How to find the
*best model parameters*in scikit-learn – video – ipynb - How to
*evaluate a classifier*in scikit-learn – video – ipynb

- What is

## Other Pointers

- Six reasons why I recommend scikit-learn
- API design for machine learning software: experiences from the scikit-learn project
- scikit-learn