This repo contains the teaching material for the Introduction to Python (and useful libraries) masterclass at the Data Science Retreat.
- About Me
- The Python Programming Language
- Why Python?
- Python for DS Components
- Python 2 vs. Python 3
- Installing Python and all useful packages
- Running the IPython interpreter and a python file
- Jupyter Notebook
- Python basics
- Pandas
- Intro tutorial on pandas basics
- Data Munging with Pandas
- Scikit-learn
- Your first data analysis case
Slides for this section can be found here.
Slide deck for this entire section is available here.
Slides on this topic start here
Slides on this topic start here
Slides on this topic start here
A great notebook covering the main differences has been written by Sebastian Raschka.
To keep your code compatible with both Python 2 and Python 3, you might also want to use this Cheat Sheet.
Slides on this topic start here
Slides on this topic start here
A live demo will be given during the masterclass.
Experiment further with the IPython Notebook environment with this Jupyter Notebook. Try to clone or download it, before opening it, running and modifying its cells.
Many more Jupyter features in this blog post.
Times to get your hands dirty. Read and test for yourself the examples provided in: The SciPy Lectures -- The Python Language.
Practice those examples using alternatively python files, the IPython interpreter and an IPython Notebook.
To go further:
- Tutorial: Data structures
- Tutorial: Working with dataframes
- Tutorial: Using pandas on the MovieLens dataset
- Introduction to machine learning with scikit-learn slides
- Doing machine learning with scikit-learn slides
- Tutorial: Introduction to scikit-learn
- To go further
A great source of data problems nowadays is the Kaggle platform. We'll be starting today with a simple but representative dataset: Titanic: Machine Learning from Disaster.
- Guide for orientation to approach the problem
IMPORTANT: you will find plenty of materials to analyze this data, however you'll learn the most if you give the problem some thought and try out several things before resorting to ready-made answers.
This repository contains a variety of content: some developed by Amélie Anglade, some derived from or largely inspired by third-parties' work, and some entirely from third-parties.
The third-party content is distributed under the license provided by those parties. Any derivative work respects the original licenses, and credits its initial authors.
Original content developed by Amélie Anglade is distributed under the MIT license.