Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

eugen/mlstudy

Open more actions menu

Repository files navigation

Welcome

Syllabus

1: Intro

Getting to know each other, discussing the plan, the guidelines for sharing the assignments, etc.

Going through a small demo/tutorial for Jupyter Notebooks. See [demo notebook here](1. Intro to Jupyter Notebooks - Python.ipynb).

2: Linear Regression

Create simple prediction models from small datasets. See [demo notebook here](2. Linear Regression - Python.ipynb)

Recommended datasets:

  • House prices: Predict the price of a house based on surface, lot size, #bathrooms, #bedrooms, etc.
  • Titanic survivability: Predict the likelyhood of someone surviving the sinking of the Titanic based on their gender, age, passenger class and some other variables.
  • Video Game sales with ratings: Predict how well a game will sell based on the critic rating, user rating, publisher and genre.

3: Binary Classification

Go over binary classification problems and some algorithms for solving them, e.g logistic regression. See [demo notebook here](3. Binary Classification - Python.ipynb)

Recommended datasets:

4: Clustering

Solve some simple clustering prodblems with K-nearest neighbors/K-means. See [demo notebook here](4. Clustering - Python.ipynb)

5: Recommendations

Create a model for product recommendations with collaborative filtering. See [demo notebook here](5. Collaborative Filtering - Python.ipynb)

Datasets

There's no machine learning without something to learn. This section contains a list of places where you can find datasets useful for a ML study group / course / training.

Theory

There are many sources that cover the theory of machine learning.

Full courses

Books

Cheatsheets

Diagrams that assist you in choosing the correct model to train:

Note: these only hint the correct algorithm to use for a particular situation and are still useful regardless of the platform one uses.

Libraries

Integrated offline environments

  • Anaconda: Simple way to offline install Python, Jupyter Notebooks and all required libraries for data science & machine learning. Should work for other languages besides Python (R, Ruby, Scala, Java, JS) but untested. Feel free to add details here if you've tried it.
  • RStudio: Very nice IDE for R

Online Environments

  • Kaggle: Online hosting of Jupyter Notebooks. Supports Python (2?) and R.
  • Azure Notebooks: Online hosting of Jupyter Notebooks. Supports Python 2&3, R and F#
  • Anaconda Cloud: Packages must be developed offline, but can then be uploaded to Anaconda Cloud and shared with everyone.

Python

Java/.NET/R/Lua/Others

To anyone interested in using any of these: Feel free to add dedicated sections.

Other Tools

  • Gist: Preferred way of sharing code snippets.

  • Jupyter Notebook viewer: Allows viewing of Jupyter notebooks from any URL, github repo or gist.

Related Subjects

Statistics

Highly recommended course available for free on Coursera: Basic Statistics, by University of Amsterdam

Statistics cheatsheets:

About

Repo used as a base of operations for a machine learning study group

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Morty Proxy This is a proxified and sanitized view of the page, visit original site.