Machine Learning with scikit-learn

Python Tools for Machine Learning:

  • scikit-learn (Python Machine Learning Library): scikit-learn is a very popular machine learning library in the Python community. It provides sample applications, tutorials, and code examples that are available online. It uses two other Python scientific computing libraries called SciPy and NumPy.
  • SciPy Library (Scientific Computing Tools):  This library supports statistical distributions, optimization of functions, linear algebra, and a variety of specialized mathematical functions. SciPy provides a way to store sparse matrices, which are a way to store large tables that contain mostly zeros.
  • NumPy Library: Python library for scientific computing that contains support for some fundamental data structures, such as multi-dimensional arrays.
  • Pandas (Python Data Analysis Library): This is a Python library for data manipulation and analysis.
    • The main data structure pandas support is called a DataFrame, which is basically like a spreadsheet table with rows and named columns. Unlike the arrays of NumPy, the columns in a DataFrame can be of all different types.
    • Pandas also support reading and writing data in a variety of formats, CSV files, SQL, and more.
  • Matplotlib is a widely used Python 2D plotting library that produces publication quality figures in a variety of formats and interactive environments across platforms. matplotlib.pyplot is widely used for data analysis since it can create histograms, bar charts, error charts, scatter plots, and so forth with just a few lines of code.
  • Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
  • Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks.

Choosing the right estimator¶

Often the hardest part of solving a machine learning problem can be finding the right estimator for the job. Different estimators are better suited for different types of data and different problems. The flowchart below is designed to give users a bit of a rough guide on how to approach problems with regard to which estimators to try on your data. [1]

Choosing the right estimator
Choosing the right estimator

Learn

  • Videos from dataschool.io
    • What is machine learning, and how does it work? – videoipynb
    • Setting up Python for machine learning: scikit-learn and IPython Notebook – videoipynb
    • Getting started in scikit-learn with the famous iris dataset – videoipynb
    • Training a machine learning model with scikit-learn – videoipynb
    • Comparing machine learning models in scikit-learn – videoipynb
    • Data science in Python: pandas, seaborn, scikit-learnvideoipynb
    • Selecting the best model in scikit-learn using cross-validationvideoipynb
    • How to find the best model parameters in scikit-learn – videoipynb
    • How to evaluate a classifier in scikit-learn – videoipynb

Other Pointers

Leave a Comment

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.

Fork me on GitHub