diff --git a/00-README.ipynb b/00-README.ipynb
new file mode 100644
index 0000000..4a343ee
--- /dev/null
+++ b/00-README.ipynb
@@ -0,0 +1,124 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "f3151511",
+   "metadata": {},
+   "source": [
+    "# Python in High Performance Computing\n",
+    "\n",
+    "This binder image includes several exercices from the CSC course \"Python in High Performance Computing\". The course is part of PRACE Training activity at CSC (https://www.futurelearn.com/courses/python-in-hpc). \n",
+    "\n",
+    "Also, it includes material from a Dask tutorial given at SciPy 2020 conference.\n",
+    "\n",
+    "## NOTE : Exercices with \"-->\" are suggestions to start\n",
+    "\n",
+    "## Exercises\n",
+    "\n",
+    "<!-- This is commented out. \n",
+    "### Basic array manipulation\n",
+    "\n",
+    " - [Reference vs. copy](numpy/reference-copy)\n",
+    " - [Array creation](numpy/array-creation)\n",
+    " - [Array slicing](numpy/array-slicing)\n",
+    " - [Split and combine arrays](numpy/split-combine)\n",
+    " - [Subdiagonal matrix](numpy/subdiagonal-matrix)\n",
+    "\n",
+    "### NumPy tools\n",
+    "\n",
+    " - [Input and output](numpy/input-output)\n",
+    " - [Polynomials](numpy/polynomials)\n",
+    " - [Random numbers](numpy/random-numbers)\n",
+    " - [Linear algebra](numpy/linear-algebra)\n",
+    "\n",
+    "### Advanced NumPy\n",
+    "\n",
+    " - [Advanced indexing](numpy/advanced-indexing)\n",
+    " - [Mutating DNA sequence](numpy/dna-mutation)\n",
+    " - [Translation with broadcasting](numpy/broadcast-translation)\n",
+    " - [Finite-difference](numpy/finite-difference)\n",
+    " - [Numerical integration](numpy/integration)\n",
+    " - [Temporary arrays](numpy/temporary-arrays)\n",
+    " - [Numexpr](numpy/numexpr)\n",
+    " -->\n",
+    "\n",
+    "### Performance analysis\n",
+    "\n",
+    "1. Read the doc at [Profiling apps](performance/cprofile.ipynb)\n",
+    "2. Open a terminal and do the following exercice\n",
+    "\n",
+    " - **[--> Using cProfile](performance/cprofile)**\n",
+    "\n",
+    "### Multiprocessing\n",
+    "\n",
+    "1. Read the doc at [Python Multiprocessing](multiprocessing/Multiprocessing.ipynb)\n",
+    "2. Open a terminal and do the following exercices\n",
+    "\n",
+    " - [Simple calculation](multiprocessing/simple-calculation)\n",
+    " - [Work distribution](multiprocessing/work-distribution)\n",
+    "\n",
+    "### Parallel programming with mpi4py\n",
+    "\n",
+    "1. Read the doc at [MPI on Python](mpi/MPI_on_Python.ipynb)\n",
+    "2. Open a terminal and do the following exercices\n",
+    "\n",
+    " - **[--> Hello World](mpi/hello-world)**\n",
+    " - [Simple message exchange](mpi/message-exchange)\n",
+    " - [Message chain](mpi/message-chain)\n",
+    " - **[--> Non-blocking communication](mpi/non-blocking)**\n",
+    " - **[--> Collective operations](mpi/collectives)**\n",
+    "\n",
+    "### Dask\n",
+    "\n",
+    "1. Open the following notebooks to experiment (no need of a terminal)\n",
+    "\n",
+    " - **[--> Delayed](dask/01_dask.delayed.ipynb)**\n",
+    " - [Understanding 'Lazy'](dask/01x_lazy.ipynb)\n",
+    " - [Bags](dask/02_bag.ipynb)\n",
+    " - [Arrays](dask/03_array.ipynb)\n",
+    " - **[--> Dataframe](dask/04_dataframe.ipynb)**\n",
+    " - **[--> Distributed mode](dask/05_distributed.ipynb)**\n",
+    " - [Distributed advanced](dask/06_distributed_advanced.ipynb)\n",
+    " - [Storage optimization](dask/07_dataframe_storage.ipynb)\n",
+    " - [Machine Learning](dask/08_machine_learning.ipynb)\n",
+    "\n",
+    "### Bonus exercises\n",
+    "\n",
+    " - [Game of life](numpy/game-of-life)\n",
+    " - [Rotation with broadcasting](numpy/broadcast-rotation)\n",
+    " - [Two dimensional heat equation](numpy/heat-equation)\n",
+    " - [Parallel heat equation](mpi/heat-equation)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9b4dbf0a",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/00-overview.md b/00-overview.md
new file mode 100644
index 0000000..51887f0
--- /dev/null
+++ b/00-overview.md
@@ -0,0 +1,39 @@
+---
+title:  Python and High-Performance Computing
+lang:   en
+---
+
+# Efficiency
+
+- Python is an interpreted language
+    - no pre-compiled binaries, all code is translated on-the-fly to
+      machine instructions
+    - byte-code as a middle step which may be stored (.pyc)
+
+- All objects are dynamic in Python
+    - nothing is fixed == optimisation nightmare
+    - lot of overhead from metadata
+
+- Flexibility is good, but comes with a cost!
+
+
+# Improving Python performance
+
+- Array based computations with NumPy
+- Using extended Cython programming language
+- Embed compiled code in a Python program
+    - C/C++, Fortran
+- Utilize parallel processing
+
+
+# Parallelisation strategies for Python
+
+- Global Interpreter Lock (GIL)
+    - CPython's memory management is not thread-safe
+    - no threads possible, except for I/O etc.
+    - affects overall performance of threading
+
+- Process-based "threading" with multiprocessing
+    - fork independent processes that have a limited way to communicate
+
+- **Message-passing** is the Way to Go to achieve true parallelism in Python
diff --git a/02-performance-analysis.md b/02-performance-analysis.md
new file mode 100644
index 0000000..bce003a
--- /dev/null
+++ b/02-performance-analysis.md
@@ -0,0 +1,147 @@
+---
+title:  Performance analysis
+lang:   en
+---
+
+# Performance measurement {.section}
+
+# Measuring application performance
+
+- Correctness is the most import factor in any application
+    - Premature optimization is the root of all evil\!
+- Before starting to optimize application, one should measure where time is
+  spent
+    - Typically 90 % of time is spent in 10 % of application
+
+<div class=column>
+- Mind the algorithm!
+    - Recursive calculation of Fibonacci numbers
+</div>
+
+<div class=column>
+
+<small>
+
+|                                |   Speedup |
+|--------------------------------|-----------|
+| Pure Python                    | 1         |
+| Pure C                         | 126       |
+| Pure Python (better algorithm) | 24e6      |
+
+</small>
+
+</div>
+
+
+
+# Measuring application performance
+
+- Applications own timers
+- **timeit** module
+- **cProfile** module
+- Full fedged profiling tools: TAU, Intel Vtune, Python Tools for Visual
+  Studio ...
+
+
+# Measuring application performance
+
+- Python **time** module can be used for measuring time spent in specific
+  part of the program
+    - `time.perf_counter()` : include time spent in other processes
+	- `time.process_time()` : only time in current process
+
+```python
+import time
+
+t0 = time.process_time()
+for n in range(niter):
+    heavy_calculation()
+t1 = time.process_time()
+
+print('Time spent in heavy calculation', t1-t0)
+```
+
+
+# timeit module
+
+- Easy timing of small bits of Python code
+- Tries to avoid common pitfalls in measuring execution times
+- Command line interface and Python interface
+- `%timeit` magic in IPython
+
+```python
+In [1]: from mymodule import func
+In [2]: %timeit func()
+
+10 loops, best of 3: 433 msec per loop
+```
+```bash
+$ python -m timeit -s "from mymodule import func" "func()"
+
+10 loops, best of 3: 433 msec per loop
+```
+
+
+# cProfile
+
+- Execution profile of Python program
+    - Time spent in different parts of the program
+    - Call graphs
+- Python API:
+- Profiling whole program from command line
+
+```python
+import cProfile
+...
+
+# profile statement and save results to a file func.prof
+cProfile.run('func()', 'func.prof')
+```
+```bash
+$ python -m cProfile -o myprof.prof myprogram.py
+```
+
+
+# Investigating profile with pstats
+
+- Printing execution time of selected functions
+- Sorting by function name, time, cumulative time, ...
+- Python module interface and interactive browser
+
+<div class="column">
+
+```
+In [1]: from pstats import Stats
+In [2]: p = Stats('myprof.prof')
+In [3]: p.strip_dirs()
+In [4]: p.sort_stats('time')
+In [5]: p.print_stats(5)
+
+Mon Oct 12 10:11:00 2016 my.prof
+...
+```
+
+</div>
+<div class="column">
+
+```bash
+$ python -m pstats myprof.prof
+
+Welcome to the profile statistics
+% strip
+% sort time
+% stats 5
+
+Mon Oct 12 10:11:00 2016 my.prof
+...
+```
+
+</div>
+
+
+# Summary
+
+- Python has various built-in tools for measuring application performance
+- **time** module
+- **timeit** module
+- **cProfile** and **pstats** modules
diff --git a/03-multiprocessing.md b/03-multiprocessing.md
new file mode 100644
index 0000000..d468e43
--- /dev/null
+++ b/03-multiprocessing.md
@@ -0,0 +1,270 @@
+---
+title:  Multiprocessing
+lang:   en
+---
+
+# Processes and threads
+
+![](img/processes-threads.png)
+
+<div class="column">
+
+## Process
+
+- Independent execution units
+- Have their own state information and *own memory* address space
+
+</div>
+<div class="column">
+
+## Thread
+
+- A single process may contain multiple threads
+- Have their own state information, but *share* the *same memory*
+  address space
+
+</div>
+
+
+# Processes and threads
+
+![](img/processes-threads.png)
+
+<div class="column">
+
+## Process
+
+- Long-lived: created when parallel program started, killed when
+  program is finished
+- Explicit communication between processes
+
+</div>
+<div class="column">
+
+## Thread
+
+- Short-lived: created when entering a parallel region, destroyed
+  (joined) when region ends
+- Communication through shared memory
+
+</div>
+
+# Processes and threads
+
+![](img/processes-threads.png)
+
+<div class="column">
+
+## Process
+
+- MPI
+    - good performance
+    - scales from a laptop to a supercomputer
+
+</div>
+<div class="column">
+
+## Thread
+
+- OpenMP
+    - C / Fortran, not Python
+- threading module
+    - only for I/O bound tasks (maybe)
+    - Global Interpreter Lock (GIL) limits usability
+
+</div>
+
+
+# Processes and threads
+
+![](img/processes-threads.png)
+
+<div class="column">
+
+## Process
+
+- MPI
+    - good performance
+    - scales from a laptop to a supercomputer
+
+</div>
+<div class="column">
+
+## ~~Thread~~ Process
+
+- multiprocessing module
+    - relies on OS for forking worker processes that mimic threads
+    - limited communication between the parallel processes
+
+</div>
+
+
+# Multiprocessing
+
+- Underlying OS used to spawn new independent subprocesses
+- processes are independent and execute code in an asynchronous manner
+    - no guarantee on the order of execution
+- Communication possible only through dedicated, shared communication
+  channels
+    - Queues, Pipes
+    - must be created before a new process is forked
+
+
+# Spawn a process
+
+```python
+from multiprocessing import Process
+import os
+
+def hello(name):
+    print 'Hello', name
+    print 'My PID is', os.getpid()
+    print "My parent's PID is", os.getppid()
+
+# Create a new process
+p = Process(target=hello, args=('Alice', ))
+
+# Start the process
+p.start()
+print 'Spawned a new process from PID', os.getpid()
+
+# End the process
+p.join()
+```
+
+
+# Communication
+
+- Sharing data
+    - shared memory, data manager
+- Pipes
+    - direct communication between two processes
+- Queues
+    - work sharing among a group of processes
+- Pool of workers
+    - offloading tasks to a group of worker processes
+
+
+# Queues
+
+- FIFO (*first-in-first-out*) task queues that can be used to distribute
+  work among processes
+- Shared among all processes
+    - all processes can add and retrieve data from the queue
+- Automatically takes care of locking, so can be used safely with minimal
+  hassle
+
+
+# Queues
+
+```python
+from multiprocessing import Process, Queue
+
+def f(q):
+    while True:
+        x = q.get()
+        if x is None:
+            break
+        print(x**2)
+
+q = Queue()
+for i in range(100):
+    q.put(i)
+# task queue: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ..., 99]
+
+for i in range(3):
+    q.put(None)
+    p = Process(target=f, args=(q, ))
+    p.start()
+```
+
+
+# Queues
+
+```python
+from multiprocessing import Process, Queue
+
+def f(q):
+    while True:
+        x = q.get()
+        if x is None: # if sentinel, stop execution
+            break
+        print(x**2)
+
+q = Queue()
+for i in range(100):
+    q.put(i)
+# task queue: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ..., 99]
+
+for i in range(3):
+    q.put(None) # add sentinels to the queue to signal STOP
+    p = Process(target=f, args=(q, ))
+    p.start()
+```
+
+
+# Pool of workers
+
+- Group of processes that carry out tasks assigned to them
+    1. Master process submits tasks to the pool
+    2. Pool of worker processes perform the tasks
+    3. Master process retrieves the results from the pool
+- Blocking and non-blocking (= asynchronous) calls available
+
+
+# Pool of workers
+
+```python
+from multiprocessing import Pool
+import time
+
+def f(x):
+    return x**2
+
+pool = Pool(8)
+
+# Blocking execution (with a single process)
+result = pool.apply(f, (4,))
+print(result)
+
+# Non-blocking execution "in the background"
+result = pool.apply_async(f, (12,))
+while not result.ready():
+    time.sleep(1)
+print(result.get())
+# an alternative to "sleeping" is to use e.g. result.get(timeout=1)
+```
+
+
+# Pool of workers
+
+```python
+from multiprocessing import Pool
+import time
+
+def f(x):
+    return x**2
+
+pool = Pool(8)
+
+# calculate x**2 in parallel for x in 0..9
+result = pool.map(f, range(10))
+print(result)
+
+# non-blocking alternative
+result = pool.map_async(f, range(10))
+while not result.ready():
+    time.sleep(1)
+print(result.get())
+```
+
+
+# Summary
+
+- Parallelism achieved by launching new OS processes
+- Only limited communication possible
+    - work sharing: queues / pool of workers
+- Non-blocking execution available
+    - do something else while waiting for results
+- Further information:
+  https://docs.python.org/2/library/multiprocessing.html
diff --git a/04-mpi4py.md b/04-mpi4py.md
new file mode 100644
index 0000000..9e7d533
--- /dev/null
+++ b/04-mpi4py.md
@@ -0,0 +1,801 @@
+---
+title:  MPI for Python
+lang:   en
+---
+
+# Message Passing Interface {.section}
+
+# Message passing interface
+
+- MPI is an application programming interface (API) for communication
+  between separate processes
+- MPI programs are portable and scalable
+    - the same program can run on different types of computers, from PC's
+      to supercomputers
+    - the most widely used approach for distributed parallel computing
+- MPI is flexible and comprehensive
+    - large (over 300 procedures)
+    - concise (often only 6 procedures are needed)
+- MPI standard defines C and Fortran interfaces
+    - MPI for Python (mpi4py) provides an unofficial Python interface
+
+
+# Processes and threads
+
+![](img/processes-threads-highlight-proc.svg){.center width=80%}
+
+
+<div class="column">
+
+## Process
+
+- Independent execution units
+- Have their own state information and *own memory* address space
+
+</div>
+<div class="column">
+
+## Thread
+
+- A single process may contain multiple threads
+- Have their own state information, but *share* the *same memory*
+  address space
+
+</div>
+
+
+# Execution model
+
+- MPI program is launched as a set of *independent*, *identical processes*
+    - execute the same program code and instructions
+    - can reside in different nodes (or even in different computers)
+- The way to launch a MPI program depends on the system
+    - mpiexec, mpirun, srun, aprun, ...
+    - mpiexec/mpirun in training class
+    - srun on puhti.csc.fi
+
+
+# MPI rank
+
+- Rank: ID number given to a process
+    - it is possible to query for rank
+    - processes can perform different tasks based on their rank
+
+```python
+if (rank == 0):
+    # do something
+elif (rank == 1):
+    # do something else
+else:
+    # all other processes do something different
+```
+
+
+# Data model
+
+- Each MPI process has its own *separate* memory space, i.e. all
+  variables and data structures are *local* to the process
+- Processes can exchange data by sending and receiving messages
+
+![](img/data-model.svg){.center width=90%}
+
+
+# MPI communicator
+
+- Communicator: a group containing all the processes that will participate
+  in communication
+    - in mpi4py most MPI calls are implemented as methods of a
+    communicator object
+    - `MPI_COMM_WORLD` contains all processes (`MPI.COMM_WORLD` in
+    mpi4py)
+    - user can define custom communicators
+
+
+# Routines in MPI for Python
+
+- Communication between processes
+    - sending and receiving messages between two processes
+    - sending and receiving messages between several processes
+- Synchronization between processes
+- Communicator creation and manipulation
+- Advanced features (e.g. user defined datatypes, one-sided communication
+  and parallel I/O)
+
+
+# Getting started
+
+- Basic methods of communicator object
+    - `Get_size()` Number of processes in communicator
+    - `Get_rank()` rank of this process
+
+```python
+from mpi4py import MPI
+
+comm = MPI.COMM_WORLD # communicator object containing all processes
+
+size = comm.Get_size()
+rank = comm.Get_rank()
+
+print("I am rank %d in group of %d processes" % (rank, size))
+```
+
+
+# Running an example program
+
+```bash
+$ mpiexec -n 4 python3 hello.py
+
+I am rank 2 in group of 4 processes
+I am rank 0 in group of 4 processes
+I am rank 3 in group of 4 processes
+I am rank 1 in group of 4 processes
+```
+
+```python
+from mpi4py import MPI
+
+comm = MPI.COMM_WORLD # communicator object containing all processes
+
+size = comm.Get_size()
+rank = comm.Get_rank()
+
+print("I am rank %d in group of %d processes" % (rank, size))
+```
+
+
+# Point-to-Point Communication {.section}
+
+# MPI communication
+
+<div class="column">
+
+- Data is local to the MPI processes
+    - They need to *communicate* to coordinate work
+- Point-to-point communication
+    - Messages are sent between two processes
+- Collective communication
+    - Involving a number of processes at the same time
+
+</div>
+
+<div class="column">
+
+![](img/communication-schematic.svg){.center width=50%}
+
+</div>
+
+
+# MPI point-to-point operations
+
+- One process *sends* a message to another process that *receives* it
+- Sends and receives in a program should match - one receive per send
+- Each message contains
+    - The actual *data* that is to be sent
+    - The *datatype* of each element of data
+    - The *number of elements* the data consists of
+    - An identification number for the message (*tag*)
+    - The ranks of the *source* and *destination* process
+- With **mpi4py** it is often enough to specify only *data* and
+  *source* and *destination*
+
+# Sending and receiving data
+
+- Sending and receiving a dictionary
+
+```python
+from mpi4py import MPI
+
+comm = MPI.COMM_WORLD # communicator object containing all processes
+rank = comm.Get_rank()
+
+if rank == 0:
+    data = {'a': 7, 'b': 3.14}
+    comm.send(data, dest=1)
+elif rank == 1:
+    data = comm.recv(source=0)
+```
+
+
+# Sending and receiving data
+
+- Arbitrary Python objects can be communicated with the send and
+  receive methods of a communicator
+
+<div class="column">
+
+`.send(data, dest)`
+  : `data`{.input}
+    : Python object to send
+
+    `dest`{.input}
+    : destination rank
+
+</div>
+<div class="column">
+
+`.recv(source)`
+  : `source`{.input}
+    : source rank
+    : note: data is provided as return value
+
+</div>
+
+- Destination and source ranks have to match!
+
+
+# Blocking routines & deadlocks
+
+- `send()` and `recv()` are *blocking* routines
+    - the functions exit only once it is safe to use the data (memory)
+    involved in the communication
+- Completion depends on other processes => risk for *deadlocks*
+    - for example, if all processes call `recv()` there is no-one left to
+    call a corresponding `send()` and the program is *stuck forever*
+
+
+# Typical point-to-point communication patterns
+
+![](img/comm_patt.svg){.center width=100%}
+
+<br>
+
+- Incorrect ordering of sends and receives may result in a deadlock
+
+
+# Case study: parallel sum
+
+<div class=column style="width:30%">
+![](img/parallel-sum-0.svg){.center width=70%}
+</div>
+
+<div class=column style="width:68%">
+## Initial state
+
+An array A containing floating point numbers read from a a file by the first
+MPI task (rank 0).
+
+## Goal
+
+Calculate the total sum of all elements in array A in parallel.
+</div>
+
+
+# Case study: parallel sum
+
+<div class=column style="width:30%">
+![](img/parallel-sum-0.svg){.center width=70%}
+</div>
+
+<div class=column style="width:68%">
+## Parallel algorithm
+
+<pre style="border:none; margin-top:1em; font-size:1em">
+1. Scatter the data
+   1.1. receive operation for scatter
+   1.2. send operation for scatter
+2. Compute partial sums in parallel
+3. Gather the partial sums
+   3.1. receive operation for gather
+   3.2. send operation for gather
+4. Compute the total sum
+</pre>
+
+</div>
+
+
+# Step 1.1: Receive operation for scatter
+
+![](img/parallel-sum-1.1.png){.center width=55%}
+
+
+# Step 1.2: Send operation for scatter
+
+![](img/parallel-sum-1.2.png){.center width=55%}
+
+
+# Step 2: Compute partial sums in parallel
+
+![](img/parallel-sum-2.png){.center width=55%}
+
+
+# Step 3.1: Receive operation for gather
+
+![](img/parallel-sum-3.1.png){.center width=55%}
+
+
+# Step 3.2: Send operation for gather
+
+![](img/parallel-sum-3.2.png){.center width=55%}
+
+
+# Step 4: Compute the total sum
+
+![](img/parallel-sum-4.png){.center width=55%}
+
+
+# Communicating NumPy arrays
+
+- Arbitrary Python objects are converted to byte streams (pickled) when
+  sending and back to Python objects (unpickled) when receiving
+    - these conversions may be a serious overhead to communication
+- Contiguous memory buffers (such as NumPy arrays) can be communicated
+  with very little overhead using upper case methods:
+    - `Send(data, dest)`
+    - `Recv(data, source)`
+    - note the difference in receiving: the data array has to exist at the
+      time of call
+
+
+# Send/receive a NumPy array
+
+- Note the difference between upper/lower case!
+    - send/recv: general Python objects, slow
+    - Send/Recv: continuous arrays, fast
+
+```python
+from mpi4py import MPI
+import numpy
+
+comm = MPI.COMM_WORLD
+rank = comm.Get_rank()
+
+data = numpy.empty(100, dtype=float)
+if rank == 0:
+    data[:] = numpy.arange(100, dtype=float)
+    comm.Send(data, dest=1)
+elif rank == 1:
+    comm.Recv(data, source=0)
+```
+
+
+# Combined send and receive
+
+- Send one message and receive another with a single command
+    - reduces risk for deadlocks
+- Destination and source ranks can be same or different
+    - `MPI.PROC_NULL` can be used for *no destination/source*
+
+```python
+data = numpy.arange(10, dtype=float) * (rank + 1)
+buffer = numpy.empty(data.shape, dtype=data.dtype)
+
+if rank == 0:
+    dest, source = 1, 1
+elif rank == 1:
+    dest, source = 0, 0
+
+comm.Sendrecv(data, dest=dest, recvbuf=buffer, source=source)
+```
+
+
+# MPI datatypes
+
+- MPI has a number of predefined datatypes to represent data
+    - e.g. `MPI.INT` for integer and `MPI.DOUBLE` for float
+- No need to specify the datatype for Python objects or Numpy arrays
+    - objects are serialised as byte streams
+    - automatic detection for NumPy arrays
+- If needed, one can also define custom datatypes
+    - for example to use non-contiguous data buffers
+
+# Summary
+
+- Point-to-point communication = messages are sent between two MPI
+  processes
+- Point-to-point operations enable any parallel communication pattern (in
+  principle)
+- Arbitrary Python objects (that can be pickled!)
+    - `send` / `recv`
+    - `sendrecv`
+- Memory buffers such as Numpy arrays
+    - `Send` / `Recv`
+    - `Sendrecv`
+
+
+# Non-blocking Communication {.section}
+
+# Non-blocking communication
+
+- Non-blocking sends and receives
+    - `isend` & `irecv`
+    - returns immediately and sends/receives in background
+    - return value is a Request object
+- Enables some computing concurrently with communication
+- Avoids many common dead-lock situations
+
+
+# Non-blocking communication
+
+- Have to finalize send/receive operations
+    - `wait()`
+        - Waits for the communication started with `isend` or `irecv` to
+          finish (blocking)
+    - `test()`
+        - Tests if the communication has finished (non-blocking)
+- You can mix non-blocking and blocking p2p routines
+    - e.g., receive `isend` with `recv`
+
+
+# Example: non-blocking send/receive
+
+```python
+rank = comm.Get_rank()
+size = comm.Get_size()
+
+if rank == 0:
+    data = arange(size, dtype=float) * (rank + 1)
+    req = comm.Isend(data, dest=1)    # start a send
+    calculate_something(rank)         # .. do something else ..
+    req.wait()                        # wait for send to finish
+    # safe to read/write data again
+
+elif rank == 1:
+    data = empty(size, float)
+    req = comm.Irecv(data, source=0)  # post a receive
+    calculate_something(rank)         # .. do something else ..
+    req.wait()                        # wait for receive to finish
+    # data is now ready for use
+```
+
+
+# Multiple non-blocking operations
+
+- Methods `waitall()` and `waitany()` may come handy when dealing with
+  multiple non-blocking operations (available in the `MPI.Request` class)
+    - `Request.waitall(requests)`
+        - wait for all initiated requests to complete
+    - `Request.waitany(requests)`
+        - wait for any initiated request to complete
+- For example, assuming `requests` is a list of request objects, one can wait
+  for all of them to be finished with:
+
+~~~python
+MPI.Request.waitall(requests)
+~~~
+
+
+# Example: non-blocking message chain
+
+<small>
+
+~~~python
+from mpi4py import MPI
+import numpy
+
+comm = MPI.COMM_WORLD
+rank = comm.Get_rank()
+size = comm.Get_size()
+
+data = numpy.arange(10, dtype=float) * (rank + 1)  # send buffer
+buffer = numpy.zeros(10, dtype=float)              # receive buffer
+
+tgt = rank + 1
+src = rank - 1
+if rank == 0:
+    src = MPI.PROC_NULL
+if rank == size - 1:
+    tgt = MPI.PROC_NULL
+
+req = []
+req.append(comm.Isend(data, dest=tgt))
+req.append(comm.Irecv(buffer, source=src))
+
+MPI.Request.waitall(req)
+~~~
+
+</small>
+
+
+# Overlapping computation and communication
+
+<div class="column">
+~~~python
+request_in = comm.Irecv(ghost_data)
+request_out = comm.Isend(border_data)
+
+compute(ghost_independent_data)
+request_in.wait()
+
+compute(border_data)
+request_out.wait()
+~~~
+</div>
+
+<div class="column">
+![](img/non-blocking-pattern.png)
+</div>
+
+
+# Summary
+
+- Non-blocking communication is usually the smart way to do point-to-point
+  communication in MPI
+- Non-blocking communication realization
+    - `isend` / `Isend`
+    - `irecv` / `Irecv`
+    - `request.wait()`
+
+
+# Communicators {.section}
+
+# Communicators
+
+- The communicator determines the "communication universe"
+    - The source and destination of a message is identified by process rank
+      *within* the communicator
+- So far: `MPI.COMM_WORLD`
+- Processes can be divided into subcommunicators
+    - Task level parallelism with process groups performing separate tasks
+    - Collective communication within a group of processes
+    - Parallel I/O
+
+
+# Communicators
+
+<div class="column">
+- Communicators are dynamic
+- A task can belong simultaneously to several communicators
+    - Unique rank in each communicator
+</div>
+<div class="column">
+![](img/communicator.svg){.center width=80%}
+</div>
+
+
+
+# User-defined communicators
+
+- By default a single, universal communicator exists to which all
+  processes belong (`MPI.COMM_WORLD`)
+- One can create new communicators, e.g. by splitting this into
+  sub-groups
+
+```python
+comm = MPI.COMM_WORLD
+rank = comm.Get_rank()
+
+color = rank % 4
+
+local_comm = comm.Split(color)
+local_rank = local_comm.Get_rank()
+
+print("Global rank: %d Local rank: %d" % (rank, local_rank))
+```
+
+
+# Collective Communication {.section}
+
+# Collective communication
+
+- Collective communication transmits data among all processes in a process
+  group (communicator)
+    - these routines must be called by all the processes in the group
+    - amount of sent and received data must match
+- Collective communication includes
+    - data movement
+    - collective computation
+    - synchronization
+- Example
+    - `comm.barrier()` makes every task hold until all tasks in the
+      communicator `comm` have called it
+
+
+# Collective communication
+
+- Collective communication typically outperforms point-to-point
+  communication
+- Code becomes more compact (and efficient!) and easier to maintain:
+    - For example, communicating a Numpy array of 1M elements from task 0 to all
+      other tasks:
+
+<div class="column">
+
+```python
+if rank == 0:
+    for i in range(1, size):
+    comm.Send(data, i)
+else:
+    comm.Recv(data, 0)
+```
+
+</div>
+<div class="column">
+
+```python
+comm.Bcast(data, 0)
+```
+
+</div>
+
+
+# Broadcast
+
+- Send the same data from one process to all the other
+
+![](img/mpi-bcast.svg){.center width=80%}
+
+
+# Broadcast
+
+- Broadcast sends same data to all processes
+
+```python
+from mpi4py import MPI
+import numpy
+
+comm = MPI.COMM_WORLD
+rank = comm.Get_rank()
+
+if rank == 0:
+    py_data = {'key1' : 0.0, 'key2' : 11}  # Python object
+    data = np.arange(8) / 10.              # NumPy array
+else:
+    py_data = None
+    data = np.zeros(8)
+
+new_data = comm.bcast(py_data, root=0)
+
+comm.Bcast(data, root=0)
+```
+
+
+# Scatter
+
+- Send equal amount of data from one process to others
+- Segments A, B, ... may contain multiple elements
+
+![](img/mpi-scatter.svg){.center width=80%}
+
+
+# Scatter
+
+- Scatter distributes data to processes
+
+```python
+from mpi4py import MPI
+from numpy import arange, empty
+
+comm = MPI.COMM_WORLD
+rank = comm.Get_rank()
+size = comm.Get_size()
+if rank == 0:
+    py_data = range(size)
+    data = arange(size**2, dtype=float)
+else:
+    py_data = None
+    data = None
+
+new_data = comm.scatter(py_data, root=0)  # returns the value
+
+buffer = empty(size, float)         # prepare a receive buffer
+comm.Scatter(data, buffer, root=0)  # in-place modification
+```
+
+
+# Gather
+
+- Collect data from all the process to one process
+- Segments A, B, ... may contain multiple elements
+
+![](img/mpi-gather.svg){.center width=80%}
+
+
+# Gather
+
+- Gather pulls data from all processes
+
+```python
+from mpi4py import MPI
+from numpy import arange, zeros
+
+comm = MPI.COMM_WORLD
+rank = comm.Get_rank()
+size = comm.Get_size()
+
+data = arange(10, dtype=float) * (rank + 1)
+buffer = zeros(size * 10, float)
+
+n = comm.gather(rank, root=0)     # returns the value
+comm.Gather(data, buffer, root=0) # in-place modification
+```
+
+
+# Reduce
+
+- Applies an operation over set of processes and places result in
+  single process
+
+![](img/mpi-reduce.svg){.center width=80%}
+
+# Reduce
+
+- Reduce gathers data and applies an operation on it
+
+```python
+from mpi4py import MPI
+from numpy import arange, empty
+
+comm = MPI.COMM_WORLD
+rank = comm.Get_rank()
+size = comm.Get_size()
+
+data = arange(10 * size, dtype=float) * (rank + 1)
+buffer = zeros(size * 10, float)
+
+n = comm.reduce(rank, op=MPI.SUM, root=0)     # returns the value
+comm.Reduce(data, buffer, op=MPI.SUM, root=0) # in-place modification
+```
+
+
+# Other common collective operations
+
+Scatterv
+  : each process receives different amount of data
+
+Gatherv
+  : each process sends different amount of data
+
+Allreduce
+  : all processes receive the results of reduction
+
+Alltoall
+  : each process sends and receives to/from each other
+
+Alltoallv
+  : each process sends and receives different amount of data
+
+
+
+# Non-blocking collectives
+
+- New in MPI 3: no support in mpi4py
+- Non-blocking collectives enable the overlapping of communication and
+  computation together with the benefits of collective communication
+- Restrictions
+    - have to be called in same order by all ranks in a communicator
+    - mixing of blocking and non-blocking collectives is not allowed
+
+
+
+
+# Common mistakes with collectives
+
+1. Using a collective operation within one branch of an if-else test based on
+   the rank of the process
+    - for example: `if rank == 0: comm.bcast(...)`
+    - all processes in a communicator must call a collective routine!
+2. Assuming that all processes making a collective call would complete at
+   the same time.
+3. Using the input buffer also as an output buffer:
+    - for example: `comm.Scatter(a, a, MPI.SUM)`
+    - always use different memory locations (arrays) for input and output!
+
+
+# Summary
+
+- Collective communications involve all the processes within a
+  communicator
+    - all processes must call them
+- Collective operations make code more transparent and compact
+- Collective routines allow optimizations by MPI library
+- MPI-3 contains also non-blocking collectives, but these are currently
+  not supported by MPI for Python
+
+
+# On-line resources
+
+- Documentation for mpi4py is quite limited
+    - short on-line manual available at
+    [https://mpi4py.readthedocs.io/](https://mpi4py.readthedocs.io/)
+- Some good references:
+    - "A Python Introduction to Parallel Programming with MPI" *by Jeremy
+      Bejarano* [http://materials.jeremybejarano.com/MPIwithPython/](http://materials.jeremybejarano.com/MPIwithPython/)
+    - "mpi4py examples" *by Jörg Bornschein* [https://github.com/jbornschein/mpi4py-examples](https://github.com/jbornschein/mpi4py-examples)
+
+
+# Summary
+
+- mpi4py provides Python interface to MPI
+- MPI calls via communicator object
+- Possible to communicate arbitrary Python objects
+- NumPy arrays can be communicated with nearly same speed as in C/Fortran
diff --git a/README.md b/README.md
index 22d94d9..87d93d2 100644
--- a/README.md
+++ b/README.md
@@ -1,18 +1,13 @@
 # Python in High Performance Computing
 
-Exercise material and model answers for the CSC course "Python in High Performance Computing". The course is part of PRACE Training activity at CSC.
+This binder image includes several exercices from the CSC course "Python in High Performance Computing". The course is part of PRACE Training activity at CSC (https://www.futurelearn.com/courses/python-in-hpc). 
 
-This master branch contains always the material for latest course, past
-courses are stored in tags.
+Also, it includes material from a Dask tutorial given at SciPy 2020 conference.
 
-Online version of the course is run regularly in [FutureLearn](https://www.futurelearn.com/courses/python-in-hpc). 
-
-Articles and videos of the course are also available in a simple form in this [site](docs/mooc/index.md).
+## NOTE : Exercices with "-->" are suggestions to start
 
 ## Exercises
 
-[General instructions](exercise-instructions.md)
-
 
 ### Basic array manipulation
 
@@ -41,18 +36,7 @@ Articles and videos of the course are also available in a simple form in this [s
 
 ### Performance analysis
 
- - [Using cProfile](performance/cprofile)
-
-### Optimising with Cython
-
- - [Creating simple extension](cython/simple-extension)
- - [Using static typing](cython/static-typing)
- - [Using C-functions](cython/c-functions)
- - [Optimising heat equation](cython/heat-equation)
-
-### Interfacing with libraries
-
- - [C libraries](interface/c)
+ - **[--> Using cProfile](performance/cprofile)**
 
 ### Multiprocessing
 
@@ -61,11 +45,23 @@ Articles and videos of the course are also available in a simple form in this [s
 
 ### Parallel programming with mpi4py
 
- - [Hello World](mpi/hello-world)
+ - **[--> Hello World](mpi/hello-world)**
  - [Simple message exchange](mpi/message-exchange)
  - [Message chain](mpi/message-chain)
- - [Non-blocking communication](mpi/non-blocking)
- - [Collective operations](mpi/collectives)
+ - **[--> Non-blocking communication](mpi/non-blocking)**
+ - **[--> Collective operations](mpi/collectives)**
+
+### Dask
+
+ - **[--> Delayed](dask/01_dask.delayed.ipynb)**
+ - [Understanding 'Lazy'](dask/01x_lazy.ipynb)
+ - [Bags](dask/02_bag.ipynb)
+ - [Arrays](dask/03_array.ipynb)
+ - **[--> Dataframe](dask/04_dataframe.ipynb)**
+ - **[--> Distributed mode](dask/05_distributed.ipynb)**
+ - [Distributed advanced](dask/06_distributed_advanced.ipynb)
+ - [Storage optimization](dask/07_dataframe_storage.ipynb)
+ - [Machine Learning](dask/08_machine_learning.ipynb)
 
 ### Bonus exercises
 
diff --git a/binder/apt.txt b/binder/apt.txt
new file mode 100644
index 0000000..4d95609
--- /dev/null
+++ b/binder/apt.txt
@@ -0,0 +1 @@
+graphviz
diff --git a/binder/environment.yml b/binder/environment.yml
new file mode 100644
index 0000000..b85ec04
--- /dev/null
+++ b/binder/environment.yml
@@ -0,0 +1,41 @@
+name: env-hpc
+
+channels:
+  - conda-forge
+  - williamfgc
+
+dependencies:
+  - python=3.8
+  - mpi4py 
+  - openmpi
+  - cython
+  - cffi
+  - numexpr
+  - nodejs
+  - jupyterlab>=2.0.0,<3
+  - numpy>=1.18.1
+  - h5py
+  - scipy>=1.3.0
+  - toolz
+  - bokeh>=2.0.0
+  - dask=2021.08.0
+  - dask-labextension>=2.0.0
+  - distributed=2021.08.0
+  - notebook
+  - matplotlib
+  - Pillow
+  - pandas>=1.0.1
+  - pandas-datareader
+  - pytables
+  - scikit-learn>=0.22.1
+  - scikit-image>=0.15.0
+  - snakeviz
+  - ujson
+  - pip
+  - s3fs
+  - fastparquet
+  - dask-ml
+  - ipywidgets>=7.5
+  - cachey
+  - python-graphviz
+  - zarr
diff --git a/binder/jupyterlab-workspace.json b/binder/jupyterlab-workspace.json
new file mode 100644
index 0000000..0a5ad1b
--- /dev/null
+++ b/binder/jupyterlab-workspace.json
@@ -0,0 +1,94 @@
+{
+  "data": {
+    "file-browser-filebrowser:cwd": {
+      "path": ""
+    },
+    "dask-dashboard-launcher:individual-progress": {
+      "data": {
+        "route": "individual-progress",
+        "label": "Progress"
+      }
+    },
+    "dask-dashboard-launcher:individual-task-stream": {
+      "data": {
+        "route": "individual-task-stream",
+        "label": "Task Stream"
+      }
+    },
+    "layout-restorer:data": {
+      "main": {
+        "dock": {
+          "type": "split-area",
+          "orientation": "horizontal",
+          "sizes": [
+            0.5,
+            0.5
+          ],
+          "children": [
+            {
+              "type": "tab-area",
+              "currentIndex": 0,
+              "widgets": [
+                "notebook:00_overview.ipynb"
+              ]
+            },
+            {
+              "type": "split-area",
+              "orientation": "vertical",
+              "sizes": [
+                0.5,
+                0.5
+              ],
+              "children": [
+                {
+                  "type": "tab-area",
+                  "currentIndex": 0,
+                  "widgets": [
+                    "dask-dashboard-launcher:individual-task-stream"
+                  ]
+                },
+                {
+                  "type": "tab-area",
+                  "currentIndex": 0,
+                  "widgets": [
+                    "dask-dashboard-launcher:individual-progress"
+                  ]
+                }
+              ]
+            }
+          ]
+        },
+        "mode": "multiple-document",
+        "current": "notebook:00_overview.ipynb"
+      },
+      "left": {
+        "collapsed": false,
+        "current": "filebrowser",
+        "widgets": [
+          "filebrowser",
+          "running-sessions",
+          "dask-dashboard-launcher",
+          "command-palette",
+          "tab-manager"
+        ]
+      },
+      "right": {
+        "collapsed": true,
+        "widgets": []
+      }
+    },
+    "dask-dashboard-launcher": {
+      "url": "DASK_DASHBOARD_URL",
+      "cluster": ""
+    },
+    "notebook:00_overview.ipynb": {
+      "data": {
+        "path": "00_overview.ipynb",
+        "factory": "Notebook"
+      }
+    }
+  },
+  "metadata": {
+    "id": "/lab"
+  }
+}
diff --git a/binder/postBuild b/binder/postBuild
new file mode 100755
index 0000000..4925180
--- /dev/null
+++ b/binder/postBuild
@@ -0,0 +1,6 @@
+#!/bin/bash
+
+# Install the JupyterLab dask-labextension
+jupyter labextension install dask-labextension
+jupyter labextension install @jupyter-widgets/jupyterlab-manager
+jupyter labextension install @bokeh/jupyter_bokeh
diff --git a/binder/start b/binder/start
new file mode 100755
index 0000000..792ee7a
--- /dev/null
+++ b/binder/start
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+# Replace DASK_DASHBOARD_URL with the proxy location
+sed -i -e "s|DASK_DASHBOARD_URL|/user/${JUPYTERHUB_USER}/proxy/8787|g" binder/jupyterlab-workspace.json
+
+# Import the workspace
+jupyter lab workspaces import binder/jupyterlab-workspace.json
+export DASK_TUTORIAL_SMALL=1
+
+exec "$@"
diff --git a/cython/c-functions/README.md b/cython/c-functions/README.md
deleted file mode 100644
index c79f63c..0000000
--- a/cython/c-functions/README.md
+++ /dev/null
@@ -1,36 +0,0 @@
-## Using C-functions
-
-Fibonacci numbers are a sequence of integers defined by the recurrence 
-relation 
-   
-   F<sub>n</sub> = F<sub>n-1</sub> + F<sub>n-2</sub>
- 
-with the initial values F<sub>0</sub>=0, F<sub>1</sub>=1.
-
-The module [fib.py](fib.py) contains a function `fibonacci(n)` that
-calculates recursively F<sub>n</sub>. The function can be used e.g. as
-
-```python
-from fib import fibonacci
-
-fibonacci(30)
-```
-
-Make a Cython version of the module, and investigate how adding type
-information and making `fibonacci` a C-function affects performance
-(hint: function needs to be called both from Python and C). Use
-`timeit` for performance measurements, either from command line
-
-```bash
-$ python3 -m timeit -s "from fib import fibonacci" "fibonacci(30)"
-```
-
-or within IPython
-
-```python
-In []: %timeit fibonacci(30)
-```
-
-**Note:** this recursive algorithm is very inefficient way of calculating
-Fibonacci numbers and pure Python implemention of better algorithm
-outperforms Cython implementation drastically.
diff --git a/cython/c-functions/fib.py b/cython/c-functions/fib.py
deleted file mode 100644
index 0b2bfa4..0000000
--- a/cython/c-functions/fib.py
+++ /dev/null
@@ -1,4 +0,0 @@
-def fibonacci(n):
-    if n < 2:
-        return n
-    return fibonacci(n-2) + fibonacci(n-1)
diff --git a/cython/c-functions/solution/fib.pyx b/cython/c-functions/solution/fib.pyx
deleted file mode 100644
index f8946d6..0000000
--- a/cython/c-functions/solution/fib.pyx
+++ /dev/null
@@ -1,9 +0,0 @@
-cpdef int fibonacci(int n):
-    if n < 2:
-        return n
-    return fibonacci(n-2) + fibonacci(n-1)
-
-def fibonacci_py(n):
-    if n < 2:
-        return n
-    return fibonacci_py(n-2) + fibonacci_py(n-1)
diff --git a/cython/c-functions/solution/fib_py.py b/cython/c-functions/solution/fib_py.py
deleted file mode 100644
index 5b8692a..0000000
--- a/cython/c-functions/solution/fib_py.py
+++ /dev/null
@@ -1,12 +0,0 @@
-from functools import lru_cache
-
-def fibonacci(n):
-    if n < 2:
-        return n
-    return fibonacci(n-2) + fibonacci(n-1)
-
-@lru_cache(maxsize=None)
-def fibonacci_cached(n):
-    if n < 2:
-        return n
-    return fibonacci_cached(n-2) + fibonacci_cached(n-1)
diff --git a/cython/c-functions/solution/setup.py b/cython/c-functions/solution/setup.py
deleted file mode 100644
index cb4073b..0000000
--- a/cython/c-functions/solution/setup.py
+++ /dev/null
@@ -1,11 +0,0 @@
-from distutils.core import setup, Extension
-from Cython.Build import cythonize
-
-ext = Extension("fib",
-                sources=["fib.pyx"],
-               )
-
-setup(
-     ext_modules=cythonize(ext)
-)
-
diff --git a/cython/c-functions/solution/test_fib.py b/cython/c-functions/solution/test_fib.py
deleted file mode 100644
index e333721..0000000
--- a/cython/c-functions/solution/test_fib.py
+++ /dev/null
@@ -1,26 +0,0 @@
-from fib import fibonacci
-from fib_py import fibonacci as fibonacci_py, fibonacci_cached
-from timeit import repeat
-
-ncython = 100
-npython = 10
-ncached = 10000000
-
-# Pure Python
-time_python = repeat("fibonacci_py(30)", number=npython, globals=locals())
-time_python = min(time_python) / npython
-
-# Cython
-time_cython = repeat("fibonacci(30)", number=ncython, globals=locals())
-time_cython = min(time_cython) / ncython
-
-# Python, cached
-time_cached = repeat("fibonacci_cached(30)", number=ncached, globals=locals())
-time_cached = min(time_cached) / ncached
-
-print("Pure Python:          {:5.4f} s".format(time_python))
-print("Cython:               {:5.4f} ms".format(time_cython*1.e3))
-print("Speedup:              {:5.1f}".format(time_python / time_cython))
-print("Pure Python cached:   {:5.4f} us".format(time_cached*1.e6))
-print("Speedup over Cython:  {:5.1e}".format(time_cython / time_cached))
-
diff --git a/cython/heat-equation/README.md b/cython/heat-equation/README.md
deleted file mode 100644
index 6f3a72b..0000000
--- a/cython/heat-equation/README.md
+++ /dev/null
@@ -1,29 +0,0 @@
-## Optimising heat equation with Cython
-
-### Creating a Cython extension
-
-Write a `setup.py` for creating a Cython version of [heat.py](heat.py)
-module, and use it from the main program [heat_main.py](heat_main.py).
-How much does simple Cythonization (i.e. diminishing the interpreting
-overhead) improve the performance?
-
-### Optimising
-
-Based on the profile in the performance measurement
-[exercise](../../performance/cprofile) optimise the most time
-consuming part of the algorithm. If you did not finish the profiling
-exercise, you can look at example profile [here](profile.md). 
-
-Utilize all the tricks you have learned so far (type declarations,
-fast array indexing, compiler directives, C functions, ...).
-
-Investigate how the different optimizations affect the performance. You
-can use applications own timers and/or **timeit**. Annotated HTML-report with
-`cython -a …` can be useful when tuning performance.
-
-When finished with the optimisation, compare performance to
-Python/NumPy model solution (in
-[numpy/heat-equation](../../numpy/heat-equation)), which uses array 
-operations. You can play around also with larger input data as provided in
-[bottle_medium.dat](bottle_medium.dat) and [bottle_large.dat](bottle_large.dat).
-
diff --git a/cython/heat-equation/bottle.dat b/cython/heat-equation/bottle.dat
deleted file mode 120000
index fcc9630..0000000
--- a/cython/heat-equation/bottle.dat
+++ /dev/null
@@ -1 +0,0 @@
-../../numpy/heat-equation/bottle.dat
\ No newline at end of file
diff --git a/cython/heat-equation/bottle_large.dat b/cython/heat-equation/bottle_large.dat
deleted file mode 120000
index 39f979b..0000000
--- a/cython/heat-equation/bottle_large.dat
+++ /dev/null
@@ -1 +0,0 @@
-../../numpy/heat-equation/bottle_large.dat
\ No newline at end of file
diff --git a/cython/heat-equation/bottle_medium.dat b/cython/heat-equation/bottle_medium.dat
deleted file mode 120000
index d730869..0000000
--- a/cython/heat-equation/bottle_medium.dat
+++ /dev/null
@@ -1 +0,0 @@
-../../numpy/heat-equation/bottle_medium.dat
\ No newline at end of file
diff --git a/cython/heat-equation/heat.py b/cython/heat-equation/heat.py
deleted file mode 100644
index cd6a03d..0000000
--- a/cython/heat-equation/heat.py
+++ /dev/null
@@ -1,54 +0,0 @@
-import numpy as np
-import matplotlib
-matplotlib.use('Agg')
-import matplotlib.pyplot as plt
-
-# Set the colormap
-plt.rcParams['image.cmap'] = 'BrBG'
-
-def evolve(u, u_previous, a, dt, dx2, dy2):
-    """Explicit time evolution.
-       u:            new temperature field
-       u_previous:   previous field
-       a:            diffusion constant
-       dt:           time step. """
-
-    n, m = u.shape
-
-    for i in range(1, n-1):
-        for j in range(1, m-1):
-            u[i, j] = u_previous[i, j] + a * dt * ( \
-             (u_previous[i+1, j] - 2*u_previous[i, j] + \
-              u_previous[i-1, j]) / dx2 + \
-             (u_previous[i, j+1] - 2*u_previous[i, j] + \
-                 u_previous[i, j-1]) / dy2 )
-    u_previous[:] = u[:]
-
-def iterate(field, field0, a, dx, dy, timesteps, image_interval):
-    """Run fixed number of time steps of heat equation"""
-
-    dx2 = dx**2
-    dy2 = dy**2
-
-    # For stability, this is the largest interval possible
-    # for the size of the time-step:
-    dt = dx2*dy2 / ( 2*a*(dx2+dy2) )    
-
-    for m in range(1, timesteps+1):
-        evolve(field, field0, a, dt, dx2, dy2)
-        if m % image_interval == 0:
-            write_field(field, m)
-
-def init_fields(filename):
-    # Read the initial temperature field from file
-    field = np.loadtxt(filename)
-    field0 = field.copy() # Array for field of previous time step
-    return field, field0
-
-def write_field(field, step):
-    plt.gca().clear()
-    plt.imshow(field)
-    plt.axis('off')
-    plt.savefig('heat_{0:03d}.png'.format(step))
-
-
diff --git a/cython/heat-equation/heat_main.py b/cython/heat-equation/heat_main.py
deleted file mode 100644
index b129e5c..0000000
--- a/cython/heat-equation/heat_main.py
+++ /dev/null
@@ -1,55 +0,0 @@
-from __future__ import print_function
-import time
-import argparse
-
-from heat import init_fields, write_field, iterate
-
-
-def main(input_file='bottle.dat', a=0.5, dx=0.1, dy=0.1, 
-         timesteps=200, image_interval=4000):
-
-    # Initialise the temperature field
-    field, field0 = init_fields(input_file)
-
-    print("Heat equation solver")
-    print("Diffusion constant: {}".format(a))
-    print("Input file: {}".format(input_file))
-    print("Parameters")
-    print("----------")
-    print("  nx={} ny={} dx={} dy={}".format(field.shape[0], field.shape[1],
-                                             dx, dy))
-    print("  time steps={}  image interval={}".format(timesteps,
-                                                         image_interval))
-
-    # Plot/save initial field
-    write_field(field, 0)
-    # Iterate
-    t0 = time.time()
-    iterate(field, field0, a, dx, dy, timesteps, image_interval)
-    t1 = time.time()
-    # Plot/save final field
-    write_field(field, timesteps)
-
-    print("Simulation finished in {0} s".format(t1-t0))
-
-if __name__ == '__main__':
-
-    # Process command line arguments
-    parser = argparse.ArgumentParser(description='Heat equation')
-    parser.add_argument('-dx', type=float, default=0.01,
-                        help='grid spacing in x-direction')
-    parser.add_argument('-dy', type=float, default=0.01,
-                        help='grid spacing in y-direction')
-    parser.add_argument('-a', type=float, default=0.5,
-                        help='diffusion constant')
-    parser.add_argument('-n', type=int, default=200,
-                        help='number of time steps')
-    parser.add_argument('-i', type=int, default=4000,
-                        help='image interval')
-    parser.add_argument('-f', type=str, default='bottle.dat', 
-                        help='input file')
-
-    args = parser.parse_args()
-
-    main(args.f, args.a, args.dx, args.dy, args.n, args.i)
-
diff --git a/cython/heat-equation/profile.md b/cython/heat-equation/profile.md
deleted file mode 100644
index b846704..0000000
--- a/cython/heat-equation/profile.md
+++ /dev/null
@@ -1,21 +0,0 @@
-## Example profile for heat equation solver
-
-```
-         591444 function calls (582598 primitive calls) in 15.498 seconds
-
-   Ordered by: internal time
-   List reduced from 3224 to 10 due to restriction <10>
-
-   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
-      200   14.837    0.074   14.837    0.074 heat.py:9(evolve)
-        2    0.070    0.035    0.070    0.035 {built-in method matplotlib._png.write_png}
-      241    0.052    0.000    0.052    0.000 {built-in method marshal.loads}
-     2467    0.023    0.000    0.040    0.000 inspect.py:614(cleandoc)
- 1052/959    0.023    0.000    0.066    0.000 {built-in method builtins.__build_class__}
-    33/31    0.018    0.001    0.023    0.001 {built-in method _imp.create_dynamic}
-     3228    0.014    0.000    0.014    0.000 {built-in method numpy.array}
-    40000    0.014    0.000    0.017    0.000 npyio.py:771(floatconv)
-    274/1    0.013    0.000   15.498   15.498 {built-in method builtins.exec}
-      556    0.011    0.000    0.011    0.000 <frozen importlib._bootstrap>:78(acquire)
-
-```
diff --git a/cython/heat-equation/solution/heat.pyx b/cython/heat-equation/solution/heat.pyx
deleted file mode 100644
index 6d4a0cb..0000000
--- a/cython/heat-equation/solution/heat.pyx
+++ /dev/null
@@ -1,70 +0,0 @@
-import numpy as np
-cimport numpy as cnp
-import cython
-
-import matplotlib
-matplotlib.use('Agg')
-import matplotlib.pyplot as plt
-
-# Set the colormap
-plt.rcParams['image.cmap'] = 'BrBG'
-
-@cython.boundscheck(False)
-@cython.wraparound(False)
-@cython.cdivision(True)
-@cython.profile(True)
-cdef evolve(cnp.ndarray[cnp.double_t, ndim=2] u, 
-            cnp.ndarray[cnp.double_t, ndim=2] u_previous,		
-            double a, double dt, double dx2, double dy2):
-    """Explicit time evolution.
-       u:            new temperature field
-       u_previous:   previous field
-       a:            diffusion constant
-       dt:           time step. """
-
-    cdef int n = u.shape[0]
-    cdef int m = u.shape[1]
-
-    cdef int i,j
-
-    # Multiplication is more efficient than division
-    cdef double dx2inv = 1. / dx2
-    cdef double dy2inv = 1. / dy2
-
-    for i in range(1, n-1):
-        for j in range(1, m-1):
-            u[i, j] = u_previous[i, j] + a * dt * ( \
-             (u_previous[i+1, j] - 2*u_previous[i, j] + \
-              u_previous[i-1, j]) * dx2inv + \
-             (u_previous[i, j+1] - 2*u_previous[i, j] + \
-                 u_previous[i, j-1]) * dy2inv )
-    u_previous[:] = u[:]
-
-def iterate(field, field0, a, dx, dy, timesteps, image_interval):
-    """Run fixed number of time steps of heat equation"""
-
-    dx2 = dx**2
-    dy2 = dy**2
-
-    # For stability, this is the largest interval possible
-    # for the size of the time-step:
-    dt = dx2*dy2 / ( 2*a*(dx2+dy2) )    
-
-    for m in range(1, timesteps+1):
-        evolve(field, field0, a, dt, dx2, dy2)
-        if m % image_interval == 0:
-            write_field(field, m)
-
-def init_fields(filename):
-    # Read the initial temperature field from file
-    field = np.loadtxt(filename)
-    field0 = field.copy() # Array for field of previous time step
-    return field, field0
-
-def write_field(field, step):
-    plt.gca().clear()
-    plt.imshow(field)
-    plt.axis('off')
-    plt.savefig('heat_{0:03d}.png'.format(step))
-
-
diff --git a/cython/heat-equation/solution/heat_main.py b/cython/heat-equation/solution/heat_main.py
deleted file mode 100644
index b129e5c..0000000
--- a/cython/heat-equation/solution/heat_main.py
+++ /dev/null
@@ -1,55 +0,0 @@
-from __future__ import print_function
-import time
-import argparse
-
-from heat import init_fields, write_field, iterate
-
-
-def main(input_file='bottle.dat', a=0.5, dx=0.1, dy=0.1, 
-         timesteps=200, image_interval=4000):
-
-    # Initialise the temperature field
-    field, field0 = init_fields(input_file)
-
-    print("Heat equation solver")
-    print("Diffusion constant: {}".format(a))
-    print("Input file: {}".format(input_file))
-    print("Parameters")
-    print("----------")
-    print("  nx={} ny={} dx={} dy={}".format(field.shape[0], field.shape[1],
-                                             dx, dy))
-    print("  time steps={}  image interval={}".format(timesteps,
-                                                         image_interval))
-
-    # Plot/save initial field
-    write_field(field, 0)
-    # Iterate
-    t0 = time.time()
-    iterate(field, field0, a, dx, dy, timesteps, image_interval)
-    t1 = time.time()
-    # Plot/save final field
-    write_field(field, timesteps)
-
-    print("Simulation finished in {0} s".format(t1-t0))
-
-if __name__ == '__main__':
-
-    # Process command line arguments
-    parser = argparse.ArgumentParser(description='Heat equation')
-    parser.add_argument('-dx', type=float, default=0.01,
-                        help='grid spacing in x-direction')
-    parser.add_argument('-dy', type=float, default=0.01,
-                        help='grid spacing in y-direction')
-    parser.add_argument('-a', type=float, default=0.5,
-                        help='diffusion constant')
-    parser.add_argument('-n', type=int, default=200,
-                        help='number of time steps')
-    parser.add_argument('-i', type=int, default=4000,
-                        help='image interval')
-    parser.add_argument('-f', type=str, default='bottle.dat', 
-                        help='input file')
-
-    args = parser.parse_args()
-
-    main(args.f, args.a, args.dx, args.dy, args.n, args.i)
-
diff --git a/cython/heat-equation/solution/setup.py b/cython/heat-equation/solution/setup.py
deleted file mode 100644
index 3906cb8..0000000
--- a/cython/heat-equation/solution/setup.py
+++ /dev/null
@@ -1,6 +0,0 @@
-from distutils.core import setup, Extension
-from Cython.Build import cythonize
-
-setup(
-     ext_modules=cythonize("heat.pyx"),
-)
diff --git a/cython/simple-extension/README.md b/cython/simple-extension/README.md
deleted file mode 100644
index d4b33d5..0000000
--- a/cython/simple-extension/README.md
+++ /dev/null
@@ -1,21 +0,0 @@
-## Simple Cython extension
-
-### Creating a Cython extension
-Create a simple Cython module (you can name it e.g. `cyt_module.pyx`)
-containing the following function:
-```
-def subtract(x, y):
-    result = x - y
-    return result
-```
-
-Create then a **setup.py** for building the extension module.
-Try to utilize the module e.g. as
-```
-from cyt_module import subtract
-
-subtract(4.5, 2)
-```
-in interactive interpreter or in a simple script. Try different argument
-types.
-
diff --git a/cython/simple-extension/solution/cyt_module.pyx b/cython/simple-extension/solution/cyt_module.pyx
deleted file mode 100644
index 0059fa1..0000000
--- a/cython/simple-extension/solution/cyt_module.pyx
+++ /dev/null
@@ -1,3 +0,0 @@
-def subtract(x, y):
-    result = x - y
-    return result
diff --git a/cython/simple-extension/solution/setup.py b/cython/simple-extension/solution/setup.py
deleted file mode 100644
index c944b64..0000000
--- a/cython/simple-extension/solution/setup.py
+++ /dev/null
@@ -1,6 +0,0 @@
-from distutils.core import setup, Extension
-from Cython.Build import cythonize
-
-setup(
-     ext_modules=cythonize("cyt_module.pyx")
-)
diff --git a/cython/static-typing/README.md b/cython/static-typing/README.md
deleted file mode 100644
index 1c98277..0000000
--- a/cython/static-typing/README.md
+++ /dev/null
@@ -1,21 +0,0 @@
-## Using static typing 
-
-Continue with the simple Cython module for subtracting two numbers:
-```
-def subtract(x, y):
-    result = x - y
-    return result
-```
-
-Declare the function internal variable `result` as integer. Try to call the
-function with different types of arguments (integers and floats), what kind of
-results you do get?
-
-Next, declare also the function arguments as integers, and rebuild the module 
-(Note: if working with interactive interpreter you need to exit or
-reload the module). What happens when you now call the function with
-floating point arguments?
-
-Finally, try to declare arguments as floating point numbers (while keeping
-`result` as integer), what happens?
-
diff --git a/cython/static-typing/cyt_module.pyx b/cython/static-typing/cyt_module.pyx
deleted file mode 100644
index 0059fa1..0000000
--- a/cython/static-typing/cyt_module.pyx
+++ /dev/null
@@ -1,3 +0,0 @@
-def subtract(x, y):
-    result = x - y
-    return result
diff --git a/cython/static-typing/setup.py b/cython/static-typing/setup.py
deleted file mode 100644
index c944b64..0000000
--- a/cython/static-typing/setup.py
+++ /dev/null
@@ -1,6 +0,0 @@
-from distutils.core import setup, Extension
-from Cython.Build import cythonize
-
-setup(
-     ext_modules=cythonize("cyt_module.pyx")
-)
diff --git a/cython/static-typing/solution/cyt_module.pyx b/cython/static-typing/solution/cyt_module.pyx
deleted file mode 100644
index 942222b..0000000
--- a/cython/static-typing/solution/cyt_module.pyx
+++ /dev/null
@@ -1,4 +0,0 @@
-def subtract(int x, int y):
-    cdef int result
-    result = x - y
-    return result
diff --git a/cython/static-typing/solution/setup.py b/cython/static-typing/solution/setup.py
deleted file mode 100644
index c944b64..0000000
--- a/cython/static-typing/solution/setup.py
+++ /dev/null
@@ -1,6 +0,0 @@
-from distutils.core import setup, Extension
-from Cython.Build import cythonize
-
-setup(
-     ext_modules=cythonize("cyt_module.pyx")
-)
diff --git a/dask/01_dask.delayed.ipynb b/dask/01_dask.delayed.ipynb
new file mode 100644
index 0000000..e21bad8
--- /dev/null
+++ b/dask/01_dask.delayed.ipynb
@@ -0,0 +1,824 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/lsteffenel/hpc-python/blob/master/dask/01_dask.delayed.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "\n",
+        "\n",
+        "#0 - Prepare the environment\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "D__H_IQO1CYl"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!python -m pip install \"dask[complete]\""
+      ],
+      "metadata": {
+        "id": "RnIBnGqRys4P"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "5Ty3fHi3yjYM"
+      },
+      "source": [
+        "<img src=\"http://dask.readthedocs.io/en/latest/_images/dask_horizontal.svg\"\n",
+        "     align=\"right\"\n",
+        "     width=\"30%\"\n",
+        "     alt=\"Dask logo\\\">\n",
+        "\n",
+        "# Parallelize code with `dask.delayed`\n",
+        "\n",
+        "In this section we parallelize simple for-loop style code with Dask and `dask.delayed`. Often, this is the only function that you will need to convert functions for use with Dask.\n",
+        "\n",
+        "This is a simple way to use `dask` to parallelize existing codebases or build [complex systems](https://blog.dask.org/2018/02/09/credit-models-with-dask).  This will also help us to develop an understanding for later sections.\n",
+        "\n",
+        "**Related Documentation**\n",
+        "\n",
+        "* [Delayed documentation](https://docs.dask.org/en/latest/delayed.html)\n",
+        "* [Delayed screencast](https://www.youtube.com/watch?v=SHqFmynRxVU)\n",
+        "* [Delayed API](https://docs.dask.org/en/latest/delayed-api.html)\n",
+        "* [Delayed examples](https://examples.dask.org/delayed.html)\n",
+        "* [Delayed best practices](https://docs.dask.org/en/latest/delayed-best-practices.html)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "bkfgAv-myjYN"
+      },
+      "source": [
+        "As we'll see in the [distributed scheduler notebook](05_distributed.ipynb), Dask has several ways of executing code in parallel. We'll use the distributed scheduler by creating a `dask.distributed.Client`. For now, this will provide us with some nice diagnostics. We'll talk about schedulers in depth later."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "MkkdG6SAyjYO"
+      },
+      "outputs": [],
+      "source": [
+        "from dask.distributed import Client\n",
+        "\n",
+        "client = Client(n_workers=4)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "fFDIJJKEyjYO"
+      },
+      "source": [
+        "## Basics\n",
+        "\n",
+        "First let's make some toy functions, `inc` and `add`, that sleep for a while to simulate work. We'll then time running these functions normally.\n",
+        "\n",
+        "In the next section we'll parallelize this code."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "OL49VTeJyjYO"
+      },
+      "outputs": [],
+      "source": [
+        "from time import sleep\n",
+        "\n",
+        "def inc(x):\n",
+        "    sleep(1)\n",
+        "    return x + 1\n",
+        "\n",
+        "def add(x, y):\n",
+        "    sleep(1)\n",
+        "    return x + y"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "KgZ2W0cVyjYP"
+      },
+      "source": [
+        "We time the execution of this normal code using the `%%time` magic, which is a special function of the Jupyter Notebook."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "JcECfN17yjYP"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "# This takes three seconds to run because we call each\n",
+        "# function sequentially, one after the other\n",
+        "\n",
+        "x = inc(1)\n",
+        "y = inc(2)\n",
+        "z = add(x, y)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "uUdJNtaiyjYP"
+      },
+      "source": [
+        "### Parallelize with the `dask.delayed` decorator\n",
+        "\n",
+        "Those two increment calls *could* be called in parallel, because they are totally independent of one-another.\n",
+        "\n",
+        "We'll transform the `inc` and `add` functions using the `dask.delayed` function. When we call the delayed version by passing the arguments, exactly as before, the original function isn't actually called yet - which is why the cell execution finishes very quickly.\n",
+        "Instead, a *delayed object* is made, which keeps track of the function to call and the arguments to pass to it.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "eFTHIdtuyjYP"
+      },
+      "outputs": [],
+      "source": [
+        "from dask import delayed"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "vMTAK61lyjYP"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "# This runs immediately, all it does is build a graph\n",
+        "\n",
+        "x = delayed(inc)(1)\n",
+        "y = delayed(inc)(2)\n",
+        "z = delayed(add)(x, y)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "cKC-tohgyjYQ"
+      },
+      "source": [
+        "This ran immediately, since nothing has really happened yet.\n",
+        "\n",
+        "To get the result, call `compute`. Notice that this runs faster than the original code."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "YDZWoImxyjYQ"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "# This actually runs our computation using a local thread pool\n",
+        "\n",
+        "z.compute()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "x0t4hp3myjYQ"
+      },
+      "source": [
+        "## What just happened?\n",
+        "\n",
+        "The `z` object is a lazy `Delayed` object.  This object holds everything we need to compute the final result, including references to all of the functions that are required and their inputs and relationship to one-another.  We can evaluate the result with `.compute()` as above or we can visualize the task graph for this value with `.visualize()`."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "WVHdtHC4yjYQ"
+      },
+      "outputs": [],
+      "source": [
+        "z"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "anspmi42yjYQ"
+      },
+      "outputs": [],
+      "source": [
+        "# Look at the task graph for `z`\n",
+        "z.visualize()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "xt0LXrFbyjYQ"
+      },
+      "source": [
+        "Notice that this includes the names of the functions from before, and the logical flow of the outputs of the `inc` functions to the inputs of `add`."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "QuFdCk99yjYQ"
+      },
+      "source": [
+        "### Some questions to consider:\n",
+        "\n",
+        "-  Why did we go from 3s to 2s?  Why weren't we able to parallelize down to 1s?\n",
+        "-  What would have happened if the inc and add functions didn't include the `sleep(1)`?  Would Dask still be able to speed up this code?\n",
+        "-  What if we have multiple outputs or also want to get access to x or y?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "5BJ346pXyjYQ"
+      },
+      "source": [
+        "## Exercise: Parallelize a for loop\n",
+        "\n",
+        "`for` loops are one of the most common things that we want to parallelize.  Use `dask.delayed` on `inc` and `sum` to parallelize the computation below:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "tHhIBtD-yjYQ"
+      },
+      "outputs": [],
+      "source": [
+        "data = [1, 2, 3, 4, 5, 6, 7, 8]"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Sr9Yh9CJyjYR"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "# Sequential code\n",
+        "\n",
+        "results = []\n",
+        "for x in data:\n",
+        "    y = inc(x)\n",
+        "    results.append(y)\n",
+        "\n",
+        "total = sum(results)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "TWakOv6DyjYR"
+      },
+      "outputs": [],
+      "source": [
+        "total"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "uZkEHmiYyjYR"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "# Your parallel code here..."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "JQHBoqOxyjYR"
+      },
+      "source": [
+        "How do the graph visualizations compare with the given solution, compared to a version with the `sum` function used directly rather than wrapped with `delayed`? Can you explain the latter version? You might find the result of the following expression illuminating\n",
+        "```python\n",
+        "delayed(inc)(1) + delayed(inc)(2)\n",
+        "```"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Vklm4jM2yjYR"
+      },
+      "source": [
+        "## Exercise: Parallelizing a for-loop code with control flow\n",
+        "\n",
+        "Often we want to delay only *some* functions, running a few of them immediately.  This is especially helpful when those functions are fast and help us to determine what other slower functions we should call.  This decision, to delay or not to delay, is usually where we need to be thoughtful when using `dask.delayed`.\n",
+        "\n",
+        "In the example below we iterate through a list of inputs.  If that input is even then we want to call `inc`.  If the input is odd then we want to call `double`.  This `is_even` decision to call `inc` or `double` has to be made immediately (not lazily) in order for our graph-building Python code to proceed."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "IvT-dT0gyjYR"
+      },
+      "outputs": [],
+      "source": [
+        "def double(x):\n",
+        "    sleep(1)\n",
+        "    return 2 * x\n",
+        "\n",
+        "def is_even(x):\n",
+        "    return not x % 2\n",
+        "\n",
+        "data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "g61VYVJDyjYR"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "# Sequential code\n",
+        "\n",
+        "results = []\n",
+        "for x in data:\n",
+        "    if is_even(x):\n",
+        "        y = double(x)\n",
+        "    else:\n",
+        "        y = inc(x)\n",
+        "    results.append(y)\n",
+        "\n",
+        "total = sum(results)\n",
+        "print(total)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "ximAhsaAyjYR"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "# Your parallel code here...\n",
+        "# TODO: parallelize the sequential code above using dask.delayed\n",
+        "# You will need to delay some functions, but not all"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "hYEZxOBUyjYR"
+      },
+      "outputs": [],
+      "source": [
+        "%time total.compute()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "0sX_AzlSyjYR"
+      },
+      "outputs": [],
+      "source": [
+        "total.visualize()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "h-eN3QMeyjYS"
+      },
+      "source": [
+        "### Some questions to consider:\n",
+        "\n",
+        "-  What are other examples of control flow where we can't use delayed?\n",
+        "-  What would have happened if we had delayed the evaluation of `is_even(x)` in the example above?\n",
+        "-  What are your thoughts on delaying `sum`?  This function is both computational but also fast to run."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "C0WLxFhYyjYS"
+      },
+      "source": [
+        "## Exercise: Parallelizing a Pandas Groupby Reduction\n",
+        "\n",
+        "In this exercise we read several CSV files and perform a groupby operation in parallel.  We are given sequential code to do this and parallelize it with `dask.delayed`.\n",
+        "\n",
+        "The computation we will parallelize is to compute the mean departure delay per airport from some historical flight data.  We will do this by using `dask.delayed` together with `pandas`.  In a future section we will do this same exercise with `dask.dataframe`."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "FMMozPrkyjYS"
+      },
+      "source": [
+        "## Create data\n",
+        "\n",
+        "Run this code to prep some data.\n",
+        "\n",
+        "This downloads and extracts some historical flight data for flights out of NYC between 1990 and 2000. The data is originally from [here](http://stat-computing.org/dataexpo/2009/the-data.html)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!wget https://raw.githubusercontent.com/lsteffenel/hpc-python/refs/heads/master/dask/prep.py\n",
+        "!wget https://raw.githubusercontent.com/lsteffenel/hpc-python/refs/heads/master/dask/accounts.py\n",
+        "!wget https://raw.githubusercontent.com/lsteffenel/hpc-python/refs/heads/master/dask/config.py\n",
+        "!wget https://raw.githubusercontent.com/lsteffenel/hpc-python/refs/heads/master/dask/sources.py\n",
+        "!mkdir data\n"
+      ],
+      "metadata": {
+        "id": "cfpI4Qs3zHhD"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "B0Tk0KDwyjYS"
+      },
+      "outputs": [],
+      "source": [
+        "%run prep.py -d flights"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "L8imuytMyjYS"
+      },
+      "source": [
+        "### Inspect data"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "E510UHCJyjYS"
+      },
+      "outputs": [],
+      "source": [
+        "import os\n",
+        "sorted(os.listdir(os.path.join('data', 'nycflights')))"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "WRgQis8xyjYV"
+      },
+      "source": [
+        "### Read one file with `pandas.read_csv` and compute mean departure delay"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Wf-74OJOyjYV"
+      },
+      "outputs": [],
+      "source": [
+        "import pandas as pd\n",
+        "df = pd.read_csv(os.path.join('data', 'nycflights', '1990.csv'))\n",
+        "df.head()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "VIKTRQDVyjYV"
+      },
+      "outputs": [],
+      "source": [
+        "# What is the schema?\n",
+        "df.dtypes"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "WB1MMhQGyjYV"
+      },
+      "outputs": [],
+      "source": [
+        "# What originating airports are in the data?\n",
+        "df.Origin.unique()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "OgojpaNxyjYW"
+      },
+      "outputs": [],
+      "source": [
+        "# Mean departure delay per-airport for one year\n",
+        "df.groupby('Origin').DepDelay.mean()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "PlPr5RS-yjYW"
+      },
+      "source": [
+        "### Sequential code: Mean Departure Delay Per Airport\n",
+        "\n",
+        "The above cell computes the mean departure delay per-airport for one year. Here we expand that to all years using a sequential for loop."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "E3Q-VEQByjYW"
+      },
+      "outputs": [],
+      "source": [
+        "from glob import glob\n",
+        "filenames = sorted(glob(os.path.join('data', 'nycflights', '*.csv')))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "AK6iiYPtyjYW"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "\n",
+        "sums = []\n",
+        "counts = []\n",
+        "for fn in filenames:\n",
+        "    # Read in file\n",
+        "    df = pd.read_csv(fn)\n",
+        "\n",
+        "    # Groupby origin airport\n",
+        "    by_origin = df.groupby('Origin')\n",
+        "\n",
+        "    # Sum of all departure delays by origin\n",
+        "    total = by_origin.DepDelay.sum()\n",
+        "\n",
+        "    # Number of flights by origin\n",
+        "    count = by_origin.DepDelay.count()\n",
+        "\n",
+        "    # Save the intermediates\n",
+        "    sums.append(total)\n",
+        "    counts.append(count)\n",
+        "\n",
+        "# Combine intermediates to get total mean-delay-per-origin\n",
+        "total_delays = sum(sums)\n",
+        "n_flights = sum(counts)\n",
+        "mean = total_delays / n_flights"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "KdCgIehhyjYW"
+      },
+      "outputs": [],
+      "source": [
+        "mean"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "gJ-3vJYiyjYW"
+      },
+      "source": [
+        "### Parallelize the code above\n",
+        "\n",
+        "Use `dask.delayed` to parallelize the code above.  Some extra things you will need to know.\n",
+        "\n",
+        "1.  Methods and attribute access on delayed objects work automatically, so if you have a delayed object you can perform normal arithmetic, slicing, and method calls on it and it will produce the correct delayed calls.\n",
+        "\n",
+        "    ```python\n",
+        "    x = delayed(np.arange)(10)\n",
+        "    y = (x + 1)[::2].sum()  # everything here was delayed\n",
+        "    ```\n",
+        "2.  Calling the `.compute()` method works well when you have a single output.  When you have multiple outputs you might want to use the `dask.compute` function:\n",
+        "\n",
+        "    ```python\n",
+        "    >>> from dask import compute\n",
+        "    >>> x = delayed(np.arange)(10)\n",
+        "    >>> y = x ** 2\n",
+        "    >>> min_, max_ = compute(y.min(), y.max())\n",
+        "    >>> min_, max_\n",
+        "    (0, 81)\n",
+        "    ```\n",
+        "    \n",
+        "    This way Dask can share the intermediate values (like `y = x**2`)\n",
+        "    \n",
+        "So your goal is to parallelize the code above (which has been copied below) using `dask.delayed`.  You may also want to visualize a bit of the computation to see if you're doing it correctly."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "_DoKsD4CyjYW"
+      },
+      "outputs": [],
+      "source": [
+        "from dask import compute"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "VdmkgZwqyjYW"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "\n",
+        "# copied sequential code\n",
+        "\n",
+        "sums = []\n",
+        "counts = []\n",
+        "for fn in filenames:\n",
+        "    # Read in file\n",
+        "    df = pd.read_csv(fn)\n",
+        "\n",
+        "    # Groupby origin airport\n",
+        "    by_origin = df.groupby('Origin')\n",
+        "\n",
+        "    # Sum of all departure delays by origin\n",
+        "    total = by_origin.DepDelay.sum()\n",
+        "\n",
+        "    # Number of flights by origin\n",
+        "    count = by_origin.DepDelay.count()\n",
+        "\n",
+        "    # Save the intermediates\n",
+        "    sums.append(total)\n",
+        "    counts.append(count)\n",
+        "\n",
+        "# Combine intermediates to get total mean-delay-per-origin\n",
+        "total_delays = sum(sums)\n",
+        "n_flights = sum(counts)\n",
+        "mean = total_delays / n_flights"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "BCm_RNynyjYW"
+      },
+      "outputs": [],
+      "source": [
+        "mean"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "yVdfUNAsyjYW"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "# your code here"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "MDNhnnokyjYX"
+      },
+      "outputs": [],
+      "source": [
+        "# ensure the results still match\n",
+        "mean"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "q3yaVgSZyjYX"
+      },
+      "source": [
+        "### Some questions to consider:\n",
+        "\n",
+        "- How much speedup did you get? Is this how much speedup you'd expect?\n",
+        "- Experiment with where to call `compute`. What happens when you call it on `sums` and `counts`? What happens if you wait and call it on `mean`?\n",
+        "- Experiment with delaying the call to `sum`. What does the graph look like if `sum` is delayed? What does the graph look like if it isn't?\n",
+        "- Can you think of any reason why you'd want to do the reduction one way over the other?\n",
+        "\n",
+        "### Learn More\n",
+        "\n",
+        "Visit the [Delayed documentation](https://docs.dask.org/en/latest/delayed.html). In particular, this [delayed screencast](https://www.youtube.com/watch?v=SHqFmynRxVU) will reinforce the concepts you learned here and the [delayed best practices](https://docs.dask.org/en/latest/delayed-best-practices.html) document collects advice on using `dask.delayed` well."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "4MZwEJeUyjYX"
+      },
+      "source": [
+        "## Close the Client\n",
+        "\n",
+        "Before moving on to the next exercise, make sure to close your client or stop this kernel."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "V1HkMqOyyjYX"
+      },
+      "outputs": [],
+      "source": [
+        "client.close()"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3 (ipykernel)",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.8.12"
+    },
+    "colab": {
+      "provenance": [],
+      "include_colab_link": true
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/dask/01x_lazy.ipynb b/dask/01x_lazy.ipynb
new file mode 100644
index 0000000..eb33178
--- /dev/null
+++ b/dask/01x_lazy.ipynb
@@ -0,0 +1,727 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/lsteffenel/hpc-python/blob/master/dask/01x_lazy.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "#0 - Prepare the environment"
+      ],
+      "metadata": {
+        "id": "tYclpfd31yB4"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!python -m pip install \"dask[complete]\""
+      ],
+      "metadata": {
+        "id": "yfU09xHb1tVq"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!wget https://raw.githubusercontent.com/lsteffenel/hpc-python/refs/heads/master/dask/prep.py\n",
+        "!wget https://raw.githubusercontent.com/lsteffenel/hpc-python/refs/heads/master/dask/accounts.py\n",
+        "!wget https://raw.githubusercontent.com/lsteffenel/hpc-python/refs/heads/master/dask/config.py\n",
+        "!wget https://raw.githubusercontent.com/lsteffenel/hpc-python/refs/heads/master/dask/sources.py\n",
+        "!wget https://raw.githubusercontent.com/lsteffenel/hpc-python/refs/heads/master/README.md\n",
+        "!mkdir data"
+      ],
+      "metadata": {
+        "id": "_Dh1Hsed2GQx"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "dhWwKZYh1sPg"
+      },
+      "source": [
+        "<img src=\"https://github.com/lsteffenel/hpc-python/blob/master/dask/images/dask_horizontal.svg?raw=1\" align=\"right\" width=\"30%\">"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Gg9ecWva1sPg"
+      },
+      "source": [
+        "# Lazy execution"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "lqqrSo9J1sPg"
+      },
+      "source": [
+        "Here we discuss some of the concepts behind dask, and lazy execution of code. You do not need to go through this material if you are eager to get on with the tutorial, but it may help understand the concepts underlying dask, how these things fit in with techniques you might already be using, and how to understand things that can go wrong."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "6emxpbi61sPh"
+      },
+      "source": [
+        "## Prelude"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "-s2T32mH1sPh"
+      },
+      "source": [
+        "As Python programmers, you probably already perform certain *tricks* to enable computation of larger-than-memory datasets, parallel execution or delayed/background execution. Perhaps with this phrasing, it is not clear what we mean, but a few examples should make things clearer. The point of Dask is to make simple things easy and complex things possible!\n",
+        "\n",
+        "Aside from the [detailed introduction](http://dask.pydata.org/en/latest/), we can summarize the basics of Dask as follows:\n",
+        "\n",
+        "- process data that doesn't fit into memory by breaking it into blocks and specifying task chains\n",
+        "- parallelize execution of tasks across cores and even nodes of a cluster\n",
+        "- move computation to the data rather than the other way around, to minimize communication overhead\n",
+        "\n",
+        "All of this allows you to get the most out of your computation resources, but program in a way that is very familiar: for-loops to build basic tasks, Python iterators, and the NumPy (array) and Pandas (dataframe) functions for multi-dimensional or tabular data, respectively.\n",
+        "\n",
+        "The remainder of this notebook will take you through the first of these programming paradigms. This is more detail than some users will want, who can skip ahead to the iterator, array and dataframe sections; but there will be some data processing tasks that don't easily fit into those abstractions and need to fall back to the methods here.\n",
+        "\n",
+        "We include a few examples at the end of the notebooks showing that the ideas behind how Dask is built are not actually that novel, and experienced programmers will have met parts of the design in other situations before. Those examples are left for the interested."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Oouyiz9v1sPh"
+      },
+      "source": [
+        "## Dask is a graph execution engine"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "y9K-tLwy1sPh"
+      },
+      "source": [
+        "Dask allows you to construct a prescription for the calculation you want to carry out. That may sound strange, but a simple example will demonstrate that you can achieve this while programming with perfectly ordinary Python functions and for-loops. We saw this in the previous notebook."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "rbpgtBdF1sPi"
+      },
+      "outputs": [],
+      "source": [
+        "from dask import delayed\n",
+        "\n",
+        "@delayed\n",
+        "def inc(x):\n",
+        "    return x + 1\n",
+        "\n",
+        "@delayed\n",
+        "def add(x, y):\n",
+        "    return x + y"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "6azvl-Gs1sPi"
+      },
+      "source": [
+        "Here we have used the delayed annotation to show that we want these functions to operate lazily — to save the set of inputs and execute only on demand. `dask.delayed` is also a function which can do this, without the annotation, leaving the original function unchanged, e.g.,\n",
+        "```python\n",
+        "    delayed_inc = delayed(inc)\n",
+        "```"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "c1KTz3Ow1sPi"
+      },
+      "outputs": [],
+      "source": [
+        "# this looks like ordinary code\n",
+        "x = inc(15)\n",
+        "y = inc(30)\n",
+        "total = add(x, y)\n",
+        "# x, y and total are all delayed objects.\n",
+        "# They contain a prescription of how to carry out the computation"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "dgp-SLJ71sPi"
+      },
+      "source": [
+        "Calling a delayed function created a delayed object (`x, y, total`) which can be examined interactively. Making these objects is somewhat equivalent to constructs like the `lambda` or function wrappers (see below). Each holds a simple dictionary describing the task graph, a full specification of how to carry out the computation.\n",
+        "\n",
+        "We can visualize the chain of calculations that the object `total` corresponds to as follows; the circles are functions, rectangles are data/results."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "HNpSaktp1sPj"
+      },
+      "outputs": [],
+      "source": [
+        "total.visualize()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "iYZ0kfwM1sPj"
+      },
+      "source": [
+        "But so far, no functions have actually been executed. This demonstrated the division between the graph-creation part of Dask (`delayed()`, in this example) and the graph execution part of Dask.\n",
+        "\n",
+        "To run the \"graph\" in the visualization, and actually get a result, do:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "dzsqSzja1sPj"
+      },
+      "outputs": [],
+      "source": [
+        "# execute all tasks\n",
+        "total.compute()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "kcFdr2Bq1sPj"
+      },
+      "source": [
+        "**Why should you care about this?**\n",
+        "\n",
+        "By building a specification of the calculation we want to carry out before executing anything, we can pass the specification to an *execution engine* for evaluation. In the case of Dask, this execution engine could be running on many nodes of a cluster, so you have access to the full number of CPU cores and memory across all the machines. Dask will intelligently execute your calculation with care for minimizing the amount of data held in memory, while parallelizing over the tasks that make up a graph. Notice that in the animated diagram below, where four workers are processing the (simple) graph, execution progresses vertically up the branches first, so that intermediate results can be expunged before moving onto a new branch.\n",
+        "\n",
+        "With `delayed` and normal pythonic looped code, very complex graphs can be built up and passed on to Dask for execution. See a nice example of [simulated complex ETL](https://blog.dask.org/2017/01/24/dask-custom) work flow.\n",
+        "\n",
+        "![this](https://github.com/lsteffenel/hpc-python/blob/master/dask/images/grid_search_schedule.gif?raw=1)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "3IY506_61sPj"
+      },
+      "source": [
+        "### Exercise"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "mOf6qGMH1sPj"
+      },
+      "source": [
+        "We will apply `delayed` to a real data processing task, albeit a simple one.\n",
+        "\n",
+        "Consider reading three CSV files with `pd.read_csv` and then measuring their total length. We will consider how you would do this with ordinary Python code, then build a graph for this process using delayed, and finally execute this graph using Dask, for a handy speed-up factor of more than two (there are only three inputs to parallelize over)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "sMErX_hH1sPj"
+      },
+      "outputs": [],
+      "source": [
+        "%run prep.py -d accounts"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Bgjwxj9-1sPj"
+      },
+      "outputs": [],
+      "source": [
+        "import pandas as pd\n",
+        "import os\n",
+        "filenames = [os.path.join('data', 'accounts.%d.csv' % i) for i in [0, 1, 2]]\n",
+        "filenames"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "gRV-xnMU1sPj"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "\n",
+        "# normal, sequential code\n",
+        "a = pd.read_csv(filenames[0])\n",
+        "b = pd.read_csv(filenames[1])\n",
+        "c = pd.read_csv(filenames[2])\n",
+        "\n",
+        "na = len(a)\n",
+        "nb = len(b)\n",
+        "nc = len(c)\n",
+        "\n",
+        "total = sum([na, nb, nc])\n",
+        "print(total)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "kl48ZPeD1sPj"
+      },
+      "source": [
+        "Your task is to recreate this graph again using the delayed function on the original Python code. The three functions you want to delay are `pd.read_csv`, `len` and `sum`.."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "MMKgFwQX1sPk"
+      },
+      "source": [
+        "```python\n",
+        "delayed_read_csv = delayed(pd.read_csv)\n",
+        "a = delayed_read_csv(filenames[0])\n",
+        "...\n",
+        "\n",
+        "total = ...\n",
+        "\n",
+        "# execute\n",
+        "%time total.compute()   \n",
+        "```"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "XEdUQZsx1sPk"
+      },
+      "outputs": [],
+      "source": [
+        "# your verbose code here"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "DCb168xA1sPk"
+      },
+      "source": [
+        "Next, repeat this using loops, rather than writing out all the variables."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "lDNqFku51sPk"
+      },
+      "outputs": [],
+      "source": [
+        "# your concise code here"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "rWSTsFCU1sPk"
+      },
+      "source": [
+        "**Notes**\n",
+        "\n",
+        "Delayed objects support various operations:\n",
+        "```python\n",
+        "    x2 = x + 1\n",
+        "```"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Ui6Ca7mt1sPk"
+      },
+      "source": [
+        "if `x` was a delayed result (like `total`, above), then so is `x2`. Supported operations include arithmetic operators, item or slice selection, attribute access and method calls - essentially anything that could be phrased as a `lambda` expression.\n",
+        "\n",
+        "Operations which are *not* supported include mutation, setter methods, iteration (for) and bool (predicate)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "KotMwxJ11sPk"
+      },
+      "source": [
+        "## Appendix: Further detail and examples"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "6V9pjoNU1sPk"
+      },
+      "source": [
+        "The following examples show that the kinds of things Dask does are not so far removed from normal Python programming when dealing with big data. These examples are **only meant for experts**, typical users can continue with the next notebook in the tutorial."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "1Azz3yWX1sPk"
+      },
+      "source": [
+        "### Example 1: simple word count"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "tEF1NXEx1sPk"
+      },
+      "source": [
+        "This directory contains a file called `README.md`. How would you count the number of words in that file?\n",
+        "\n",
+        "The simplest approach would be to load all the data into memory, split on whitespace and count the number of results. Here we use a regular expression to split words."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "ABzNlj1q1sPl"
+      },
+      "outputs": [],
+      "source": [
+        "import re\n",
+        "splitter = re.compile('\\w+')\n",
+        "with open('README.md', 'r') as f:\n",
+        "    data = f.read()\n",
+        "result = len(splitter.findall(data))\n",
+        "result"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "eOu1upyq1sPl"
+      },
+      "source": [
+        "The trouble with this approach is that it does not scale - if the file is very large, it, and the generated list of words, might fill up memory. We can easily avoid that, because we only need a simple sum, and each line is totally independent of the others. Now we evaluate each piece of data and immediately free up the space again, so we could perform this on arbitrarily-large files. Note that there is often a trade-off between time-efficiency and memory footprint: the following uses very little memory, but may be slower for files that do not fill a large faction of memory. In general, one would like chunks small enough not to stress memory, but big enough for efficient use of the CPU."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "_28zNCG61sPl"
+      },
+      "outputs": [],
+      "source": [
+        "result = 0\n",
+        "with open('README.md', 'r') as f:\n",
+        "    for line in f:\n",
+        "        result += len(splitter.findall(line))\n",
+        "result"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Uuh1D-k71sPl"
+      },
+      "source": [
+        "### Example 2: background execution"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "F2USbXgU1sPl"
+      },
+      "source": [
+        "There are many tasks that take a while to complete, but don't actually require much of the CPU, for example anything that requires communication over a network, or input from a user. In typical sequential programming, execution would need to halt while the process completes, and then continue execution. That would be dreadful for user experience (imagine the slow progress bar that locks up the application and cannot be canceled), and wasteful of time (the CPU could have been doing useful work in the meantime).\n",
+        "\n",
+        "For example, we can launch processes and get their output as follows:\n",
+        "```python\n",
+        "    import subprocess\n",
+        "    p = subprocess.Popen(command, stdout=subprocess.PIPE)\n",
+        "    p.returncode\n",
+        "```"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "43Lga1h01sPl"
+      },
+      "source": [
+        "The task is run in a separate process, and the return-code will remain `None` until it completes, when it will change to `0`. To get the result back, we need `out = p.communicate()[0]` (which would block if the process was not complete)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "AtB2jv0r1sPm"
+      },
+      "source": [
+        "Similarly, we can launch Python processes and threads in the background. Some methods allow mapping over multiple inputs and gathering the results, more on that later.  The thread starts and the cell completes immediately, but the data associated with the download only appears in the queue object some time later."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "oNKk21Ol1sPm"
+      },
+      "outputs": [],
+      "source": [
+        "# Edit sources.py to configure source locations\n",
+        "import sources\n",
+        "sources.lazy_url"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "44RX6ORx1sPm"
+      },
+      "outputs": [],
+      "source": [
+        "import threading\n",
+        "import queue\n",
+        "import urllib\n",
+        "\n",
+        "def get_webdata(url, q):\n",
+        "    u = urllib.request.urlopen(url)\n",
+        "    # raise ValueError\n",
+        "    q.put(u.read())\n",
+        "\n",
+        "q = queue.Queue()\n",
+        "t = threading.Thread(target=get_webdata, args=(sources.lazy_url, q))\n",
+        "t.start()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "nrF1w-S-1sPm"
+      },
+      "outputs": [],
+      "source": [
+        "# fetch result back into this thread. If the worker thread is not done, this would wait.\n",
+        "q.get()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Iug-wjFt1sPm"
+      },
+      "source": [
+        "Consider: what would you see if there had been an exception within the `get_webdata` function? You could uncomment the `raise` line, above, and re-execute the two cells. What happens? Is there any way to debug the execution to find the root cause of the error?"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "MX_6U4uL1sPm"
+      },
+      "source": [
+        "### Example 3: delayed execution"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "emys9SFf1sPm"
+      },
+      "source": [
+        "There are many ways in Python to specify the computation you want to execute, but only run it *later*."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "gLKPLF7s1sPm"
+      },
+      "outputs": [],
+      "source": [
+        "def add(x, y):\n",
+        "    return x + y\n",
+        "\n",
+        "# Sometimes we defer computations with strings\n",
+        "x = 15\n",
+        "y = 30\n",
+        "z = \"add(x, y)\"\n",
+        "eval(z)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "J6NZkGHb1sPm"
+      },
+      "outputs": [],
+      "source": [
+        "# we can use lambda or other \"closure\"\n",
+        "x = 15\n",
+        "y = 30\n",
+        "z = lambda: add(x, y)\n",
+        "z()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "9pjtVE0M1sPn"
+      },
+      "outputs": [],
+      "source": [
+        "# A very similar thing happens in functools.partial\n",
+        "\n",
+        "import functools\n",
+        "z = functools.partial(add, x, y)\n",
+        "z()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "D11ZABMw1sPn"
+      },
+      "outputs": [],
+      "source": [
+        "# Python generators are delayed execution by default\n",
+        "# Many Python functions expect such iterable objects\n",
+        "\n",
+        "def gen():\n",
+        "    res = x\n",
+        "    yield res\n",
+        "    res += y\n",
+        "    yield res\n",
+        "\n",
+        "g = gen()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "8cGva_Fh1sPn"
+      },
+      "outputs": [],
+      "source": [
+        "# run once: we get one value and execution halts within the generator\n",
+        "# run again and the execution completes\n",
+        "next(g)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "s5KWeb0y1sPn"
+      },
+      "source": [
+        "### Dask graphs"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "spwcyqcl1sPn"
+      },
+      "source": [
+        "Any Dask object, such as `total`, above, has an attribute which describes the calculations necessary to produce that result. Indeed, this is exactly the graph that we have been talking about, which can be visualized. We see that it is a simple dictionary, in which the keys are unique task identifiers, and the values are the functions and inputs for calculation.\n",
+        "\n",
+        "`delayed` is a handy mechanism for creating the Dask graph, but the adventurous may wish to play with the full fexibility afforded by building the graph dictionaries directly. Detailed information can be found [here](http://dask.pydata.org/en/latest/graphs.html)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "W1JTK90u1sPn"
+      },
+      "outputs": [],
+      "source": [
+        "total.dask"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "CRac1LyO1sPn"
+      },
+      "outputs": [],
+      "source": [
+        "dict(total.dask)"
+      ]
+    }
+  ],
+  "metadata": {
+    "anaconda-cloud": {},
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.8.6"
+    },
+    "colab": {
+      "provenance": [],
+      "include_colab_link": true
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/dask/02_bag.ipynb b/dask/02_bag.ipynb
new file mode 100644
index 0000000..dc5da4a
--- /dev/null
+++ b/dask/02_bag.ipynb
@@ -0,0 +1,717 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\"images/dask_horizontal.svg\" align=\"right\" width=\"30%\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Bag: Parallel Lists for semi-structured data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Dask-bag excels in processing data that can be represented as a sequence of arbitrary inputs. We'll refer to this as \"messy\" data, because it can contain complex nested structures, missing fields, mixtures of data types, etc. The *functional* programming style fits very nicely with standard Python iteration, such as can be found in the `itertools` module.\n",
+    "\n",
+    "Messy data is often encountered at the beginning of data processing pipelines when large volumes of raw data are first consumed. The initial set of data might be JSON, CSV, XML, or any other format that does not enforce strict structure and datatypes.\n",
+    "For this reason, the initial data massaging and processing is often done with Python `list`s, `dict`s, and `set`s.\n",
+    "\n",
+    "These core data structures are optimized for general-purpose storage and processing.  Adding streaming computation with iterators/generator expressions or libraries like `itertools` or [`toolz`](https://toolz.readthedocs.io/en/latest/) let us process large volumes in a small space.  If we combine this with parallel processing then we can churn through a fair amount of data.\n",
+    "\n",
+    "Dask.bag is a high level Dask collection to automate common workloads of this form.  In a nutshell\n",
+    "\n",
+    "    dask.bag = map, filter, toolz + parallel execution\n",
+    "    \n",
+    "**Related Documentation**\n",
+    "\n",
+    "* [Bag documentation](https://docs.dask.org/en/latest/bag.html)\n",
+    "* [Bag screencast](https://youtu.be/-qIiJ1XtSv0)\n",
+    "* [Bag API](https://docs.dask.org/en/latest/bag-api.html)\n",
+    "* [Bag examples](https://examples.dask.org/bag.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%run prep.py -d accounts"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Again, we'll use the distributed scheduler. Schedulers will be explained in depth [later](05_distributed.ipynb)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dask.distributed import Client\n",
+    "\n",
+    "client = Client(n_workers=4)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Creation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can create a `Bag` from a Python sequence, from files, from data on S3, etc.\n",
+    "We demonstrate using `.take()` to show elements of the data. (Doing `.take(1)` results in a tuple with one element)\n",
+    "\n",
+    "Note that the data are partitioned into blocks, and there are many items per block. In the first example, the two partitions contain five elements each, and in the following two, each file is partitioned into one or more bytes blocks."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# each element is an integer\n",
+    "import dask.bag as db\n",
+    "b = db.from_sequence([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], npartitions=2)\n",
+    "b.take(3)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# each element is a text file, where each line is a JSON object\n",
+    "# note that the compression is handled automatically\n",
+    "import os\n",
+    "b = db.read_text(os.path.join('data', 'accounts.*.json.gz'))\n",
+    "b.take(1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Edit sources.py to configure source locations\n",
+    "import sources\n",
+    "sources.bag_url"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Requires `s3fs` library\n",
+    "# each partition is a remote CSV text file\n",
+    "b = db.read_text(sources.bag_url,\n",
+    "                 storage_options={'anon': True})\n",
+    "b.take(1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Manipulation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`Bag` objects hold the standard functional API found in projects like the Python standard library, `toolz`, or `pyspark`, including `map`, `filter`, `groupby`, etc..\n",
+    "\n",
+    "Operations on `Bag` objects create new bags.  Call the `.compute()` method to trigger execution, as we saw for `Delayed` objects.  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def is_even(n):\n",
+    "    return n % 2 == 0\n",
+    "\n",
+    "b = db.from_sequence([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])\n",
+    "c = b.filter(is_even).map(lambda x: x ** 2)\n",
+    "c"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# blocking form: wait for completion (which is very fast in this case)\n",
+    "c.compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Example: Accounts JSON data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We've created a fake dataset of gzipped JSON data in your data directory.  This is like the example used in the `DataFrame` example we will see later, except that it has bundled up all of the entires for each individual `id` into a single record.  This is similar to data that you might collect off of a document store database or a web API.\n",
+    "\n",
+    "Each line is a JSON encoded dictionary with the following keys\n",
+    "\n",
+    "*  id: Unique identifier of the customer\n",
+    "*  name: Name of the customer\n",
+    "*  transactions: List of `transaction-id`, `amount` pairs, one for each transaction for the customer in that file"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "filename = os.path.join('data', 'accounts.*.json.gz')\n",
+    "lines = db.read_text(filename)\n",
+    "lines.take(3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Our data comes out of the file as lines of text. Notice that file decompression happened automatically. We can make this data look more reasonable by mapping the `json.loads` function onto our bag."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "js = lines.map(json.loads)\n",
+    "# take: inspect first few elements\n",
+    "js.take(3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Basic Queries"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once we parse our JSON data into proper Python objects (`dict`s, `list`s, etc.) we can perform more interesting queries by creating small Python functions to run on our data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# filter: keep only some elements of the sequence\n",
+    "js.filter(lambda record: record['name'] == 'Alice').take(5)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def count_transactions(d):\n",
+    "    return {'name': d['name'], 'count': len(d['transactions'])}\n",
+    "\n",
+    "# map: apply a function to each element\n",
+    "(js.filter(lambda record: record['name'] == 'Alice')\n",
+    "   .map(count_transactions)\n",
+    "   .take(5))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# pluck: select a field, as from a dictionary, element[field]\n",
+    "(js.filter(lambda record: record['name'] == 'Alice')\n",
+    "   .map(count_transactions)\n",
+    "   .pluck('count')\n",
+    "   .take(5))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Average number of transactions for all of the Alice entries\n",
+    "(js.filter(lambda record: record['name'] == 'Alice')\n",
+    "   .map(count_transactions)\n",
+    "   .pluck('count')\n",
+    "   .mean()\n",
+    "   .compute())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Use `flatten` to de-nest"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the example below we see the use of `.flatten()` to flatten results.  We compute the average amount for all transactions for all Alices."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "(js.filter(lambda record: record['name'] == 'Alice')\n",
+    "   .pluck('transactions')\n",
+    "   .take(3))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "(js.filter(lambda record: record['name'] == 'Alice')\n",
+    "   .pluck('transactions')\n",
+    "   .flatten()\n",
+    "   .take(3))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "(js.filter(lambda record: record['name'] == 'Alice')\n",
+    "   .pluck('transactions')\n",
+    "   .flatten()\n",
+    "   .pluck('amount')\n",
+    "   .take(3))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "(js.filter(lambda record: record['name'] == 'Alice')\n",
+    "   .pluck('transactions')\n",
+    "   .flatten()\n",
+    "   .pluck('amount')\n",
+    "   .mean()\n",
+    "   .compute())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Groupby and Foldby"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Often we want to group data by some function or key.  We can do this either with the `.groupby` method, which is straightforward but forces a full shuffle of the data (expensive) or with the harder-to-use but faster `.foldby` method, which does a streaming combined groupby and reduction.\n",
+    "\n",
+    "*  `groupby`:  Shuffles data so that all items with the same key are in the same key-value pair\n",
+    "*  `foldby`:  Walks through the data accumulating a result per key\n",
+    "\n",
+    "*Note: the full groupby is particularly bad. In actual workloads you would do well to use `foldby` or switch to `DataFrame`s if possible.*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### `groupby`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Groupby collects items in your collection so that all items with the same value under some function are collected together into a key-value pair."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "b = db.from_sequence(['Alice', 'Bob', 'Charlie', 'Dan', 'Edith', 'Frank'])\n",
+    "b.groupby(len).compute()  # names grouped by length"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "b = db.from_sequence(list(range(10)))\n",
+    "b.groupby(lambda x: x % 2).compute()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "b.groupby(lambda x: x % 2).starmap(lambda k, v: (k, max(v))).compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### `foldby`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Foldby can be quite odd at first.  It is similar to the following functions from other libraries:\n",
+    "\n",
+    "*  [`toolz.reduceby`](http://toolz.readthedocs.io/en/latest/streaming-analytics.html#streaming-split-apply-combine)\n",
+    "*  [`pyspark.RDD.combineByKey`](http://abshinn.github.io/python/apache-spark/2014/10/11/using-combinebykey-in-apache-spark/)\n",
+    "\n",
+    "When using `foldby` you provide \n",
+    "\n",
+    "1.  A key function on which to group elements\n",
+    "2.  A binary operator such as you would pass to `reduce` that you use to perform reduction per each group\n",
+    "3.  A combine binary operator that can combine the results of two `reduce` calls on different parts of your dataset.\n",
+    "\n",
+    "Your reduction must be associative.  It will happen in parallel in each of the partitions of your dataset.  Then all of these intermediate results will be combined by the `combine` binary operator."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "b.foldby(lambda x: x % 2, binop=max, combine=max).compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Example with account data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We find the number of people with the same name."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "# Warning, this one takes a while...\n",
+    "result = js.groupby(lambda item: item['name']).starmap(lambda k, v: (k, len(v))).compute()\n",
+    "print(sorted(result))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "# This one is comparatively fast and produces the same result.\n",
+    "from operator import add\n",
+    "def incr(tot, _):\n",
+    "    return tot + 1\n",
+    "\n",
+    "result = js.foldby(key='name', \n",
+    "                   binop=incr, \n",
+    "                   initial=0, \n",
+    "                   combine=add, \n",
+    "                   combine_initial=0).compute()\n",
+    "print(sorted(result))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Exercise: compute total amount per name"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We want to groupby (or foldby) the `name` key, then add up the all of the amounts for each name.\n",
+    "\n",
+    "Steps\n",
+    "\n",
+    "1.  Create a small function that, given a dictionary like \n",
+    "\n",
+    "        {'name': 'Alice', 'transactions': [{'amount': 1, 'id': 123}, {'amount': 2, 'id': 456}]}\n",
+    "        \n",
+    "    produces the sum of the amounts, e.g. `3`\n",
+    "    \n",
+    "2.  Slightly change the binary operator of the `foldby` example above so that the binary operator doesn't count the number of entries, but instead accumulates the sum of the amounts."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code here..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## DataFrames"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For the same reasons that Pandas is often faster than pure Python, `dask.dataframe` can be faster than `dask.bag`.  We will work more with DataFrames later, but from the point of view of a Bag, it is frequently the end-point of the \"messy\" part of data ingestion—once the data can be made into a data-frame, then complex split-apply-combine logic will become much more straight-forward and efficient.\n",
+    "\n",
+    "You can transform a bag with a simple tuple or flat dictionary structure into a `dask.dataframe` with the `to_dataframe` method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df1 = js.to_dataframe()\n",
+    "df1.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This now looks like a well-defined DataFrame, and we can apply Pandas-like computations to it efficiently."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Using a Dask DataFrame, how long does it take to do our prior computation of numbers of people with the same name?  It turns out that `dask.dataframe.groupby()` beats `dask.bag.groupby()` by more than an order of magnitude; but it still cannot match `dask.bag.foldby()` for this case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%time df1.groupby('name').id.count().compute().head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Denormalization"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This DataFrame format is less-than-optimal because the `transactions` column is filled with nested data so Pandas has to revert to `object` dtype, which is quite slow in Pandas.  Ideally we want to transform to a dataframe only after we have flattened our data so that each record is a single `int`, `string`, `float`, etc.."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def denormalize(record):\n",
+    "    # returns a list for each person, one item per transaction\n",
+    "    return [{'id': record['id'], \n",
+    "             'name': record['name'], \n",
+    "             'amount': transaction['amount'], \n",
+    "             'transaction-id': transaction['transaction-id']}\n",
+    "            for transaction in record['transactions']]\n",
+    "\n",
+    "transactions = js.map(denormalize).flatten()\n",
+    "transactions.take(3)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = transactions.to_dataframe()\n",
+    "df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "# number of transactions per name\n",
+    "# note that the time here includes the data load and ingestion\n",
+    "df.groupby('name')['transaction-id'].count().compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Limitations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Bags provide very general computation (any Python function.)  This generality\n",
+    "comes at cost.  Bags have the following known limitations\n",
+    "\n",
+    "1.  Bag operations tend to be slower than array/dataframe computations in the\n",
+    "    same way that Python tends to be slower than NumPy/Pandas\n",
+    "2.  ``Bag.groupby`` is slow.  You should try to use ``Bag.foldby`` if possible.\n",
+    "    Using ``Bag.foldby`` requires more thought. Even better, consider creating\n",
+    "    a normalised dataframe."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Learn More\n",
+    "\n",
+    "* [Bag documentation](https://docs.dask.org/en/latest/bag.html)\n",
+    "* [Bag screencast](https://youtu.be/-qIiJ1XtSv0)\n",
+    "* [Bag API](https://docs.dask.org/en/latest/bag-api.html)\n",
+    "* [Bag examples](https://examples.dask.org/bag.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Shutdown"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client.shutdown()"
+   ]
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/dask/03_array.ipynb b/dask/03_array.ipynb
new file mode 100644
index 0000000..9ab5068
--- /dev/null
+++ b/dask/03_array.ipynb
@@ -0,0 +1,994 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\"images/dask_horizontal.svg\" align=\"right\" width=\"30%\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Arrays"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\"images/array.png\" width=\"25%\" align=\"right\">\n",
+    "Dask array provides a parallel, larger-than-memory, n-dimensional array using blocked algorithms. Simply put: distributed Numpy.\n",
+    "\n",
+    "*  **Parallel**: Uses all of the cores on your computer\n",
+    "*  **Larger-than-memory**:  Lets you work on datasets that are larger than your available memory by breaking up your array into many small pieces, operating on those pieces in an order that minimizes the memory footprint of your computation, and effectively streaming data from disk.\n",
+    "*  **Blocked Algorithms**:  Perform large computations by performing many smaller computations\n",
+    "\n",
+    "In this notebook, we'll build some understanding by implementing some blocked algorithms from scratch.\n",
+    "We'll then use Dask Array to analyze large datasets, in parallel, using a familiar NumPy-like API.\n",
+    "\n",
+    "**Related Documentation**\n",
+    "\n",
+    "* [Array documentation](https://docs.dask.org/en/latest/array.html)\n",
+    "* [Array screencast](https://youtu.be/9h_61hXCDuI)\n",
+    "* [Array API](https://docs.dask.org/en/latest/array-api.html)\n",
+    "* [Array examples](https://examples.dask.org/array.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%run prep.py -d random"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dask.distributed import Client\n",
+    "\n",
+    "client = Client(n_workers=4)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Blocked Algorithms"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A *blocked algorithm* executes on a large dataset by breaking it up into many small blocks.\n",
+    "\n",
+    "For example, consider taking the sum of a billion numbers.  We might instead break up the array into 1,000 chunks, each of size 1,000,000, take the sum of each chunk, and then take the sum of the intermediate sums.\n",
+    "\n",
+    "We achieve the intended result (one sum on one billion numbers) by performing many smaller results (one thousand sums on one million numbers each, followed by another sum of a thousand numbers.)\n",
+    "\n",
+    "We do exactly this with Python and NumPy in the following example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load data with h5py\n",
+    "# this creates a pointer to the data, but does not actually load\n",
+    "import h5py\n",
+    "import os\n",
+    "f = h5py.File(os.path.join('data', 'random.hdf5'), mode='r')\n",
+    "dset = f['/x']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Compute sum using blocked algorithm**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Before using dask, let's consider the concept of blocked algorithms. We can compute the sum of a large number of elements by loading them chunk-by-chunk, and keeping a running total.\n",
+    "\n",
+    "Here we compute the sum of this large array on disk by \n",
+    "\n",
+    "1.  Computing the sum of each 1,000,000 sized chunk of the array\n",
+    "2.  Computing the sum of the 1,000 intermediate sums\n",
+    "\n",
+    "Note that this is a sequential process in the notebook kernel, both the loading and summing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Compute sum of large array, one million numbers at a time\n",
+    "sums = []\n",
+    "for i in range(0, 1_000_000_000, 1_000_000):\n",
+    "    chunk = dset[i: i + 1_000_000]  # pull out numpy array\n",
+    "    sums.append(chunk.sum())\n",
+    "\n",
+    "total = sum(sums)\n",
+    "print(total)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Exercise:  Compute the mean using a blocked algorithm"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that we've seen the simple example above, try doing a slightly more complicated problem. Compute the mean of the array, assuming for a moment that we don't happen to already know how many elements are in the data.  You can do this by changing the code above with the following alterations:\n",
+    "\n",
+    "1.  Compute the sum of each block\n",
+    "2.  Compute the length of each block\n",
+    "3.  Compute the sum of the 1,000 intermediate sums and the sum of the 1,000 intermediate lengths and divide one by the other\n",
+    "\n",
+    "This approach is overkill for our case but does nicely generalize if we don't know the size of the array or individual blocks beforehand."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Compute the mean of the array"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%load solutions/03_mean_by_block.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`dask.array` contains these algorithms\n",
+    "--------------------------------------------\n",
+    "\n",
+    "Dask.array is a NumPy-like library that does these kinds of tricks to operate on large datasets that don't fit into memory.  It extends beyond the linear problems discussed above to full N-Dimensional algorithms and a decent subset of the NumPy interface."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Create `dask.array` object**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can create a `dask.array` `Array` object with the `da.from_array` function.  This function accepts\n",
+    "\n",
+    "1.  `data`: Any object that supports NumPy slicing, like `dset`\n",
+    "2.  `chunks`: A chunk size to tell us how to block up our array, like `(1_000_000,)`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import dask.array as da\n",
+    "x = da.from_array(dset, chunks=(1_000_000,))\n",
+    "x"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Manipulate `dask.array` object as you would a numpy array**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that we have an `Array` we perform standard numpy-style computations like arithmetic, mathematics, slicing, reductions, etc..\n",
+    "\n",
+    "The interface is familiar, but the actual work is different. `dask_array.sum()` does not do the same thing as `numpy_array.sum()`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**What's the difference?**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`dask_array.sum()` builds an expression of the computation. It does not do the computation yet. `numpy_array.sum()` computes the sum immediately."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*Why the difference?*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Dask arrays are split into chunks. Each chunk must have computations run on that chunk explicitly. If the desired answer comes from a small slice of the entire dataset, running the computation over all data would be wasteful of CPU and memory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "result = x.sum()\n",
+    "result"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Compute result**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Dask.array objects are lazily evaluated.  Operations like `.sum` build up a graph of blocked tasks to execute.  \n",
+    "\n",
+    "We ask for the final result with a call to `.compute()`.  This triggers the actual computation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "result.compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Exercise:  Compute the mean"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And the variance, std, etc..  This should be a small change to the example above.\n",
+    "\n",
+    "Look at what other operations you can do with the Jupyter notebook's tab-completion."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Does this match your result from before?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Performance and Parallelism\n",
+    "-------------------------------\n",
+    "\n",
+    "<img src=\"images/fail-case.gif\" width=\"40%\" align=\"right\">\n",
+    "\n",
+    "In our first examples we used `for` loops to walk through the array one block at a time.  For simple operations like `sum` this is optimal.  However for complex operations we may want to traverse through the array differently.  In particular we may want the following:\n",
+    "\n",
+    "1.  Use multiple cores in parallel\n",
+    "2.  Chain operations on a single blocks before moving on to the next one\n",
+    "\n",
+    "`Dask.array` translates your array operations into a graph of inter-related tasks with data dependencies between them.  Dask then executes this graph in parallel with multiple threads.  We'll discuss more about this in the next section.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "1.  Construct a 20000x20000 array of normally distributed random values broken up into 1000x1000 sized chunks\n",
+    "2.  Take the mean along one axis\n",
+    "3.  Take every 100th element"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import dask.array as da\n",
+    "\n",
+    "x = da.random.normal(10, 0.1, size=(20000, 20000),   # 400 million element array \n",
+    "                              chunks=(1000, 1000))   # Cut into 1000x1000 sized chunks\n",
+    "y = x.mean(axis=0)[::100]                            # Perform NumPy-style operations"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x.nbytes / 1e9  # Gigabytes of the input processed lazily"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "y.compute()     # Time to compute the result"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Performance comparison\n",
+    "---------------------------\n",
+    "\n",
+    "The following experiment was performed on a heavy personal laptop.  Your performance may vary.  If you attempt the NumPy version then please ensure that you have more than 4GB of main memory."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**NumPy: 19s, Needs gigabytes of memory**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```python\n",
+    "import numpy as np\n",
+    "\n",
+    "%%time \n",
+    "x = np.random.normal(10, 0.1, size=(20000, 20000)) \n",
+    "y = x.mean(axis=0)[::100] \n",
+    "y\n",
+    "\n",
+    "CPU times: user 19.6 s, sys: 160 ms, total: 19.8 s\n",
+    "Wall time: 19.7 s\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Dask Array: 4s, Needs megabytes of memory**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```python\n",
+    "import dask.array as da\n",
+    "\n",
+    "%%time\n",
+    "x = da.random.normal(10, 0.1, size=(20000, 20000), chunks=(1000, 1000))\n",
+    "y = x.mean(axis=0)[::100] \n",
+    "y.compute() \n",
+    "\n",
+    "CPU times: user 29.4 s, sys: 1.07 s, total: 30.5 s\n",
+    "Wall time: 4.01 s\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Discussion**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Notice that the Dask array computation ran in 4 seconds, but used 29.4 seconds of user CPU time. The numpy computation ran in 19.7 seconds and used 19.6 seconds of user CPU time.\n",
+    "\n",
+    "Dask finished faster, but used more total CPU time because Dask was able to transparently parallelize the computation because of the chunk size."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*Questions*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*  What happens if the dask chunks=(20000,20000)?\n",
+    "    * Will the computation run in 4 seconds?\n",
+    "    * How much memory will be used?\n",
+    "* What happens if the dask chunks=(25,25)?\n",
+    "    * What happens to CPU and memory?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Exercise:  Meteorological data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There is 2GB of somewhat artifical weather data in HDF5 files in `data/weather-big/*.hdf5`.  We'll use the `h5py` library to interact with this data and `dask.array` to compute on it.\n",
+    "\n",
+    "Our goal is to visualize the average temperature on the surface of the Earth for this month.  This will require a mean over all of this data.  We'll do this in the following steps\n",
+    "\n",
+    "1.  Create `h5py.Dataset` objects for each of the days of data on disk (`dsets`)\n",
+    "2.  Wrap these with `da.from_array` calls \n",
+    "3.  Stack these datasets along time with a call to `da.stack`\n",
+    "4.  Compute the mean along the newly stacked time axis with the `.mean()` method\n",
+    "5.  Visualize the result with `matplotlib.pyplot.imshow`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%run prep.py -d weather"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import h5py\n",
+    "from glob import glob\n",
+    "import os\n",
+    "\n",
+    "filenames = sorted(glob(os.path.join('data', 'weather-big', '*.hdf5')))\n",
+    "dsets = [h5py.File(filename, mode='r')['/t2m'] for filename in filenames]\n",
+    "dsets[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dsets[0][:5, :5]  # Slicing into h5py.Dataset object gives a numpy array"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "fig = plt.figure(figsize=(16, 8))\n",
+    "plt.imshow(dsets[0][::4, ::4], cmap='RdBu_r');"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Integrate with `dask.array`**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Make a list of `dask.array` objects out of your list of `h5py.Dataset` objects using the `da.from_array` function with a chunk size of `(500, 500)`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "arrays = [da.from_array(dset, chunks=(500, 500)) for dset in dsets]\n",
+    "arrays"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Stack this list of `dask.array` objects into a single `dask.array` object with `da.stack`**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Stack these along the first axis so that the shape of the resulting array is `(31, 5760, 11520)`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "x = da.stack(arrays, axis=0)\n",
+    "x"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Plot the mean of this array along the time (`0th`) axis**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "raises-exception"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "# complete the following:\n",
+    "fig = plt.figure(figsize=(16, 8))\n",
+    "plt.imshow(..., cmap='RdBu_r')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "result = x.mean(axis=0)\n",
+    "fig = plt.figure(figsize=(16, 8))\n",
+    "plt.imshow(result, cmap='RdBu_r');"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Plot the difference of the first day from the mean**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "result = x[0] - x.mean(axis=0)\n",
+    "fig = plt.figure(figsize=(16, 8))\n",
+    "plt.imshow(result, cmap='RdBu_r');"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Exercise:  Subsample and store"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the above exercise the result of our computation is small, so we can call `compute` safely.  Sometimes our result is still too large to fit into memory and we want to save it to disk.  In these cases you can use one of the following two functions\n",
+    "\n",
+    "1.  `da.store`: Store dask.array into any object that supports numpy setitem syntax, e.g.\n",
+    "\n",
+    "        f = h5py.File('myfile.hdf5')\n",
+    "        output = f.create_dataset(shape=..., dtype=...)\n",
+    "        \n",
+    "        da.store(my_dask_array, output)\n",
+    "        \n",
+    "2.  `da.to_hdf5`: A specialized function that creates and stores a `dask.array` object into an `HDF5` file.\n",
+    "\n",
+    "        da.to_hdf5('data/myfile.hdf5', '/output', my_dask_array)\n",
+    "        \n",
+    "The task in this exercise is to **use numpy step slicing to subsample the full dataset by a factor of two in both the latitude and longitude direction and then store this result to disk** using one of the functions listed above.\n",
+    "\n",
+    "As a reminder, Python slicing takes three elements\n",
+    "\n",
+    "    start:stop:step\n",
+    "\n",
+    "    >>> L = [1, 2, 3, 4, 5, 6, 7]\n",
+    "    >>> L[::3]\n",
+    "    [1, 4, 7]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import h5py\n",
+    "from glob import glob\n",
+    "import os\n",
+    "import dask.array as da\n",
+    "\n",
+    "filenames = sorted(glob(os.path.join('data', 'weather-big', '*.hdf5')))\n",
+    "dsets = [h5py.File(filename, mode='r')['/t2m'] for filename in filenames]\n",
+    "\n",
+    "arrays = [da.from_array(dset, chunks=(500, 500)) for dset in dsets]\n",
+    "\n",
+    "x = da.stack(arrays, axis=0)\n",
+    "\n",
+    "result = x[:, ::2, ::2]\n",
+    "\n",
+    "da.to_zarr(result, os.path.join('data', 'myfile.zarr'), overwrite=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Example: Lennard-Jones potential"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The [Lennard-Jones potential](https://en.wikipedia.org/wiki/Lennard-Jones_potential) is used in partical simuluations in physics, chemistry and engineering. It is highly parallelizable.\n",
+    "\n",
+    "First, we'll run and profile the Numpy version on 7,000 particles."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "# make a random collection of particles\n",
+    "def make_cluster(natoms, radius=40, seed=1981):\n",
+    "    np.random.seed(seed)\n",
+    "    cluster = np.random.normal(0, radius, (natoms,3))-0.5\n",
+    "    return cluster\n",
+    "\n",
+    "def lj(r2):\n",
+    "    sr6 = (1./r2)**3\n",
+    "    pot = 4.*(sr6*sr6 - sr6)\n",
+    "    return pot\n",
+    "\n",
+    "# build the matrix of distances\n",
+    "def distances(cluster):\n",
+    "    diff = cluster[:, np.newaxis, :] - cluster[np.newaxis, :, :]\n",
+    "    mat = (diff*diff).sum(-1)\n",
+    "    return mat\n",
+    "\n",
+    "# the lj function is evaluated over the upper traingle\n",
+    "# after removing distances near zero\n",
+    "def potential(cluster):\n",
+    "    d2 = distances(cluster)\n",
+    "    dtri = np.triu(d2)\n",
+    "    energy = lj(dtri[dtri > 1e-6]).sum()\n",
+    "    return energy"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cluster = make_cluster(int(7e3), radius=500)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%time potential(cluster)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Notice that the most time consuming function is `distances`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# this would open in another browser tab\n",
+    "# %load_ext snakeviz\n",
+    "# %snakeviz potential(cluster)\n",
+    "\n",
+    "# alternative simple version given text results in this tab\n",
+    "%prun -s tottime potential(cluster)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Dask version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here's the Dask version. Only the `potential` function needs to be rewritten to best utilize Dask.\n",
+    "\n",
+    "Note that `da.nansum` has been used over the full $NxN$ distance matrix to improve parallel efficiency.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import dask.array as da\n",
+    "\n",
+    "# compute the potential on the entire\n",
+    "# matrix of distances and ignore division by zero\n",
+    "def potential_dask(cluster):\n",
+    "    d2 = distances(cluster)\n",
+    "    energy = da.nansum(lj(d2))/2.\n",
+    "    return energy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's convert the NumPy array to a Dask array. Since the entire NumPy array fits in memory it is more computationally efficient to chunk the array by number of CPU cores."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from os import cpu_count\n",
+    "\n",
+    "dcluster = da.from_array(cluster, chunks=cluster.shape[0]//cpu_count())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This step should scale quite well with number of cores. The warnings are complaining about dividing by zero, which is why we used `da.nansum` in `potential_dask`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "e = potential_dask(dcluster)\n",
+    "%time e.compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Limitations\n",
+    "-----------\n",
+    "\n",
+    "Dask Array does not implement the entire numpy interface.  Users expecting this\n",
+    "will be disappointed.  Notably Dask Array has the following failings:\n",
+    "\n",
+    "1.  Dask does not implement all of ``np.linalg``.  This has been done by a\n",
+    "    number of excellent BLAS/LAPACK implementations and is the focus of\n",
+    "    numerous ongoing academic research projects.\n",
+    "2.  Dask Array does not support some operations where the resulting shape\n",
+    "    depends on the values of the array. For those that it does support\n",
+    "    (for example, masking one Dask Array with another boolean mask),\n",
+    "    the chunk sizes will be unknown, which may cause issues with other\n",
+    "    operations that need to know the chunk sizes.\n",
+    "3.  Dask Array does not attempt operations like ``sort`` which are notoriously\n",
+    "    difficult to do in parallel and are of somewhat diminished value on very\n",
+    "    large data (you rarely actually need a full sort).\n",
+    "    Often we include parallel-friendly alternatives like ``topk``.\n",
+    "4.  Dask development is driven by immediate need, and so many lesser used\n",
+    "    functions, like ``np.sometrue`` have not been implemented purely out of\n",
+    "    laziness.  These would make excellent community contributions.\n",
+    "    \n",
+    "* [Array documentation](https://docs.dask.org/en/latest/array.html)\n",
+    "* [Array screencast](https://youtu.be/9h_61hXCDuI)\n",
+    "* [Array API](https://docs.dask.org/en/latest/array-api.html)\n",
+    "* [Array examples](https://examples.dask.org/array.html)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client.shutdown()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/dask/04_dataframe.ipynb b/dask/04_dataframe.ipynb
new file mode 100644
index 0000000..2296a9b
--- /dev/null
+++ b/dask/04_dataframe.ipynb
@@ -0,0 +1,836 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\"http://dask.readthedocs.io/en/latest/_images/dask_horizontal.svg\"\n",
+    "     align=\"right\"\n",
+    "     width=\"30%\"\n",
+    "     alt=\"Dask logo\\\">\n",
+    "\n",
+    "\n",
+    "# Dask DataFrames\n",
+    "\n",
+    "We finished Chapter 1 by building a parallel dataframe computation over a directory of CSV files using `dask.delayed`.  In this section we use `dask.dataframe` to automatically build similiar computations, for the common case of tabular computations.  Dask dataframes look and feel like Pandas dataframes but they run on the same infrastructure that powers `dask.delayed`.\n",
+    "\n",
+    "In this notebook we use the same airline data as before, but now rather than write for-loops we let `dask.dataframe` construct our computations for us.  The `dask.dataframe.read_csv` function can take a globstring like `\"data/nycflights/*.csv\"` and build parallel computations on all of our data at once.\n",
+    "\n",
+    "## When to use `dask.dataframe`\n",
+    "\n",
+    "Pandas is great for tabular datasets that fit in memory. Dask becomes useful when the dataset you want to analyze is larger than your machine's RAM. The demo dataset we're working with is only about 200MB, so that you can download it in a reasonable time, but `dask.dataframe` will scale to  datasets much larger than memory."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\"images/pandas_logo.png\" align=\"right\" width=\"28%\">\n",
+    "\n",
+    "The `dask.dataframe` module implements a blocked parallel `DataFrame` object that mimics a large subset of the Pandas `DataFrame` API. One Dask `DataFrame` is comprised of many in-memory pandas `DataFrames` separated along the index. One operation on a Dask `DataFrame` triggers many pandas operations on the constituent pandas `DataFrame`s in a way that is mindful of potential parallelism and memory constraints.\n",
+    "\n",
+    "**Related Documentation**\n",
+    "\n",
+    "* [DataFrame documentation](https://docs.dask.org/en/latest/dataframe.html)\n",
+    "* [DataFrame screencast](https://youtu.be/AT2XtFehFSQ)\n",
+    "* [DataFrame API](https://docs.dask.org/en/latest/dataframe-api.html)\n",
+    "* [DataFrame examples](https://examples.dask.org/dataframe.html)\n",
+    "* [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/)\n",
+    "\n",
+    "**Main Take-aways**\n",
+    "\n",
+    "1.  Dask DataFrame should be familiar to Pandas users\n",
+    "2.  The partitioning of dataframes is important for efficient execution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%run prep.py -d flights"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dask.distributed import Client\n",
+    "\n",
+    "client = Client(n_workers=4)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We create artifical data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from prep import accounts_csvs\n",
+    "accounts_csvs()\n",
+    "\n",
+    "import os\n",
+    "import dask\n",
+    "filename = os.path.join('data', 'accounts.*.csv')\n",
+    "filename"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Filename includes a glob pattern `*`, so all files in the path matching that pattern will be read into the same Dask DataFrame."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import dask.dataframe as dd\n",
+    "df = dd.read_csv(filename)\n",
+    "df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# load and count number of rows\n",
+    "len(df)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "What happened here?\n",
+    "- Dask investigated the input path and found that there are three matching files \n",
+    "- a set of jobs was intelligently created for each chunk - one per original CSV file in this case\n",
+    "- each file was loaded into a pandas dataframe, had `len()` applied to it\n",
+    "- the subtotals were combined to give you the final grand total."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Real Data\n",
+    "\n",
+    "Lets try this with an extract of flights in the USA across several years. This data is specific to flights out of the three airports in the New York City area."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = dd.read_csv(os.path.join('data', 'nycflights', '*.csv'),\n",
+    "                 parse_dates={'Date': [0, 1, 2]})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Notice that the respresentation of the dataframe object contains no data - Dask has just done enough to read the start of the first file, and infer the column names and dtypes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can view the start and end of the data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "raises-exception"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "df.tail()  # this fails"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### What just happened?\n",
+    "\n",
+    "Unlike `pandas.read_csv` which reads in the entire file before inferring datatypes, `dask.dataframe.read_csv` only reads in a sample from the beginning of the file (or first file if using a glob). These inferred datatypes are then enforced when reading all partitions.\n",
+    "\n",
+    "In this case, the datatypes inferred in the sample are incorrect. The first `n` rows have no value for `CRSElapsedTime` (which pandas infers as a `float`), and later on turn out to be strings (`object` dtype). Note that Dask gives an informative error message about the mismatch. When this happens you have a few options:\n",
+    "\n",
+    "- Specify dtypes directly using the `dtype` keyword. This is the recommended solution, as it's the least error prone (better to be explicit than implicit) and also the most performant.\n",
+    "- Increase the size of the `sample` keyword (in bytes)\n",
+    "- Use `assume_missing` to make `dask` assume that columns inferred to be `int` (which don't allow missing values) are actually floats (which do allow missing values). In our particular case this doesn't apply.\n",
+    "\n",
+    "In our case we'll use the first option and directly specify the `dtypes` of the offending columns. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = dd.read_csv(os.path.join('data', 'nycflights', '*.csv'),\n",
+    "                 parse_dates={'Date': [0, 1, 2]},\n",
+    "                 dtype={'TailNum': str,\n",
+    "                        'CRSElapsedTime': float,\n",
+    "                        'Cancelled': bool})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.tail()  # now works"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Computations with `dask.dataframe`\n",
+    "\n",
+    "We compute the maximum of the `DepDelay` column. With just pandas, we would loop over each file to find the individual maximums, then find the final maximum over all the individual maximums\n",
+    "\n",
+    "```python\n",
+    "maxes = []\n",
+    "for fn in filenames:\n",
+    "    df = pd.read_csv(fn)\n",
+    "    maxes.append(df.DepDelay.max())\n",
+    "    \n",
+    "final_max = max(maxes)\n",
+    "```\n",
+    "\n",
+    "We could wrap that `pd.read_csv` with `dask.delayed` so that it runs in parallel. Regardless, we're still having to think about loops, intermediate results (one per file) and the final reduction (`max` of the intermediate maxes). This is just noise around the real task, which pandas solves with\n",
+    "\n",
+    "```python\n",
+    "df = pd.read_csv(filename, dtype=dtype)\n",
+    "df.DepDelay.max()\n",
+    "```\n",
+    "\n",
+    "`dask.dataframe` lets us write pandas-like code, that operates on larger than memory datasets in parallel."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%time df.DepDelay.max().compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This writes the delayed computation for us and then runs it.  \n",
+    "\n",
+    "Some things to note:\n",
+    "\n",
+    "1.  As with `dask.delayed`, we need to call `.compute()` when we're done.  Up until this point everything is lazy.\n",
+    "2.  Dask will delete intermediate results (like the full pandas dataframe for each file) as soon as possible.\n",
+    "    -  This lets us handle datasets that are larger than memory\n",
+    "    -  This means that repeated computations will have to load all of the data in each time (run the code above again, is it faster or slower than you would expect?)\n",
+    "    \n",
+    "As with `Delayed` objects, you can view the underlying task graph using the `.visualize` method:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# notice the parallelism\n",
+    "df.DepDelay.max().visualize()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercises\n",
+    "\n",
+    "In this section we do a few `dask.dataframe` computations. If you are comfortable with Pandas then these should be familiar. You will have to think about when to call `compute`.\n",
+    "\n",
+    "### 1.) How many rows are in our dataset?\n",
+    "\n",
+    "If you aren't familiar with pandas, how would you check how many records are in a list of tuples?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code here"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%load solutions/04_exo1.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.) In total, how many non-canceled flights were taken?\n",
+    "\n",
+    "With pandas, you would use [boolean indexing](https://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code here"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%load solutions/04_exo2.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.) In total, how many non-cancelled flights were taken from each airport?\n",
+    "\n",
+    "*Hint*: use [`df.groupby`](https://pandas.pydata.org/pandas-docs/stable/groupby.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code here"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%load solutions/04_exo3.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4.) What was the average departure delay from each airport?\n",
+    "\n",
+    "Note, this is the same computation you did in the previous notebook (is this approach faster or slower?)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code here"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%load solutions/04_exo4.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5.) What day of the week has the worst average departure delay?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code here"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%load solutions/04_exo5.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Sharing Intermediate Results\n",
+    "\n",
+    "When computing all of the above, we sometimes did the same operation more than once. For most operations, `dask.dataframe` hashes the arguments, allowing duplicate computations to be shared, and only computed once.\n",
+    "\n",
+    "For example, lets compute the mean and standard deviation for departure delay of all non-canceled flights. Since dask operations are lazy, those values aren't the final results yet. They're just the recipe required to get the result.\n",
+    "\n",
+    "If we compute them with two calls to compute, there is no sharing of intermediate computations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "non_cancelled = df[~df.Cancelled]\n",
+    "mean_delay = non_cancelled.DepDelay.mean()\n",
+    "std_delay = non_cancelled.DepDelay.std()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "\n",
+    "mean_delay_res = mean_delay.compute()\n",
+    "std_delay_res = std_delay.compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "But let's try by passing both to a single `compute` call."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "\n",
+    "mean_delay_res, std_delay_res = dask.compute(mean_delay, std_delay)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Using `dask.compute` takes roughly 1/2 the time. This is because the task graphs for both results are merged when calling `dask.compute`, allowing shared operations to only be done once instead of twice. In particular, using `dask.compute` only does the following once:\n",
+    "\n",
+    "- the calls to `read_csv`\n",
+    "- the filter (`df[~df.Cancelled]`)\n",
+    "- some of the necessary reductions (`sum`, `count`)\n",
+    "\n",
+    "To see what the merged task graphs between multiple results look like (and what's shared), you can use the `dask.visualize` function (we might want to use `filename='graph.pdf'` to save the graph to disk so that we can zoom in more easily):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dask.visualize(mean_delay, std_delay)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## How does this compare to Pandas?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Pandas is more mature and fully featured than `dask.dataframe`.  If your data fits in memory then you should use Pandas.  The `dask.dataframe` module gives you a limited `pandas` experience when you operate on datasets that don't fit comfortably in memory.\n",
+    "\n",
+    "During this tutorial we provide a small dataset consisting of a few CSV files.  This dataset is 45MB on disk that expands to about 400MB in memory. This dataset is small enough that you would normally use Pandas.\n",
+    "\n",
+    "We've chosen this size so that exercises finish quickly.  Dask.dataframe only really becomes meaningful for problems significantly larger than this, when Pandas breaks with the dreaded \n",
+    "\n",
+    "    MemoryError:  ...\n",
+    "    \n",
+    "Furthermore, the distributed scheduler allows the same dataframe expressions to be executed across a cluster. To enable massive \"big data\" processing, one could execute data ingestion functions such as `read_csv`, where the data is held on storage accessible to every worker node (e.g., amazon's S3), and because most operations begin by selecting only some columns, transforming and filtering the data, only relatively small amounts of data need to be communicated between the machines.\n",
+    "\n",
+    "Dask.dataframe operations use `pandas` operations internally.  Generally they run at about the same speed except in the following two cases:\n",
+    "\n",
+    "1.  Dask introduces a bit of overhead, around 1ms per task.  This is usually negligible.\n",
+    "2.  When Pandas releases the GIL `dask.dataframe` can call several pandas operations in parallel within a process, increasing speed somewhat proportional to the number of cores. For operations which don't release the GIL, multiple processes would be needed to get the same speedup."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Dask DataFrame Data Model\n",
+    "\n",
+    "For the most part, a Dask DataFrame feels like a pandas DataFrame.\n",
+    "So far, the biggest difference we've seen is that Dask operations are lazy; they build up a task graph instead of executing immediately (more details coming in [Schedulers](05_distributed.ipynb)).\n",
+    "This lets Dask do operations in parallel and out of core.\n",
+    "\n",
+    "In [Dask Arrays](03_array.ipynb), we saw that a `dask.array` was composed of many NumPy arrays, chunked along one or more dimensions.\n",
+    "It's similar for `dask.dataframe`: a Dask DataFrame is composed of many pandas DataFrames. For `dask.dataframe` the chunking happens only along the index.\n",
+    "\n",
+    "<img src=\"http://dask.pydata.org/en/latest/_images/dask-dataframe.svg\" width=\"30%\">\n",
+    "\n",
+    "We call each chunk a *partition*, and the upper / lower bounds are *divisions*.\n",
+    "Dask *can* store information about the divisions. For now, partitions come up when you write custom functions to apply to Dask DataFrames"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Converting `CRSDepTime` to a timestamp\n",
+    "\n",
+    "This dataset stores timestamps as `HHMM`, which are read in as integers in `read_csv`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "crs_dep_time = df.CRSDepTime.head(10)\n",
+    "crs_dep_time"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To convert these to timestamps of scheduled departure time, we need to convert these integers into `pd.Timedelta` objects, and then combine them with the `Date` column.\n",
+    "\n",
+    "In pandas we'd do this using the `pd.to_timedelta` function, and a bit of arithmetic:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "\n",
+    "# Get the first 10 dates to complement our `crs_dep_time`\n",
+    "date = df.Date.head(10)\n",
+    "\n",
+    "# Get hours as an integer, convert to a timedelta\n",
+    "hours = crs_dep_time // 100\n",
+    "hours_timedelta = pd.to_timedelta(hours, unit='h')\n",
+    "\n",
+    "# Get minutes as an integer, convert to a timedelta\n",
+    "minutes = crs_dep_time % 100\n",
+    "minutes_timedelta = pd.to_timedelta(minutes, unit='m')\n",
+    "\n",
+    "# Apply the timedeltas to offset the dates by the departure time\n",
+    "departure_timestamp = date + hours_timedelta + minutes_timedelta\n",
+    "departure_timestamp"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Custom code and Dask Dataframe\n",
+    "\n",
+    "We could swap out `pd.to_timedelta` for `dd.to_timedelta` and do the same operations on the entire dask DataFrame. But let's say that Dask hadn't implemented a `dd.to_timedelta` that works on Dask DataFrames. What would you do then?\n",
+    "\n",
+    "`dask.dataframe` provides a few methods to make applying custom functions to Dask DataFrames easier:\n",
+    "\n",
+    "- [`map_partitions`](http://dask.pydata.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.map_partitions)\n",
+    "- [`map_overlap`](http://dask.pydata.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.map_overlap)\n",
+    "- [`reduction`](http://dask.pydata.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.reduction)\n",
+    "\n",
+    "Here we'll just be discussing `map_partitions`, which we can use to implement `to_timedelta` on our own:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Look at the docs for `map_partitions`\n",
+    "\n",
+    "help(df.CRSDepTime.map_partitions)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The basic idea is to apply a function that operates on a DataFrame to each partition.\n",
+    "In this case, we'll apply `pd.to_timedelta`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "hours = df.CRSDepTime // 100\n",
+    "# hours_timedelta = pd.to_timedelta(hours, unit='h')\n",
+    "hours_timedelta = hours.map_partitions(pd.to_timedelta, unit='h')\n",
+    "\n",
+    "minutes = df.CRSDepTime % 100\n",
+    "# minutes_timedelta = pd.to_timedelta(minutes, unit='m')\n",
+    "minutes_timedelta = minutes.map_partitions(pd.to_timedelta, unit='m')\n",
+    "\n",
+    "departure_timestamp = df.Date + hours_timedelta + minutes_timedelta"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "departure_timestamp"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "departure_timestamp.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Exercise: Rewrite above to use a single call to `map_partitions`\n",
+    "\n",
+    "This will be slightly more efficient than two separate calls, as it reduces the number of tasks in the graph."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def compute_departure_timestamp(df):\n",
+    "    pass  # TODO: implement this"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "raises-exception"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "departure_timestamp = df.map_partitions(compute_departure_timestamp)\n",
+    "\n",
+    "departure_timestamp.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%load solutions/04_map_partitions.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Limitations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### What doesn't work?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Dask.dataframe only covers a small but well-used portion of the Pandas API.\n",
+    "This limitation is for two reasons:\n",
+    "\n",
+    "1.  The Pandas API is *huge*\n",
+    "2.  Some operations are genuinely hard to do in parallel (e.g. sort)\n",
+    "\n",
+    "Additionally, some important operations like ``set_index`` work, but are slower\n",
+    "than in Pandas because they include substantial shuffling of data, and may write out to disk."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Learn More\n",
+    "\n",
+    "\n",
+    "* [DataFrame documentation](https://docs.dask.org/en/latest/dataframe.html)\n",
+    "* [DataFrame screencast](https://youtu.be/AT2XtFehFSQ)\n",
+    "* [DataFrame API](https://docs.dask.org/en/latest/dataframe-api.html)\n",
+    "* [DataFrame examples](https://examples.dask.org/dataframe.html)\n",
+    "* [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client.shutdown()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/dask/05_distributed.ipynb b/dask/05_distributed.ipynb
new file mode 100644
index 0000000..fcfe2b6
--- /dev/null
+++ b/dask/05_distributed.ipynb
@@ -0,0 +1,424 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\"images/dask_horizontal.svg\" align=\"right\" width=\"30%\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Distributed"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As we have seen so far, Dask allows you to simply construct graphs of tasks with dependencies, as well as have graphs created automatically for you using functional, Numpy or Pandas syntax on data collections. None of this would be very useful, if there weren't also a way to execute these graphs, in a parallel and memory-aware way. So far we have been calling `thing.compute()` or `dask.compute(thing)` without worrying what this entails. Now we will discuss the options available for that execution, and in particular, the distributed scheduler, which comes with additional functionality.\n",
+    "\n",
+    "Dask comes with four available schedulers:\n",
+    "- \"threaded\" (aka \"threading\"): a scheduler backed by a thread pool\n",
+    "- \"processes\": a scheduler backed by a process pool\n",
+    "- \"single-threaded\" (aka \"sync\"): a synchronous scheduler, good for debugging\n",
+    "- distributed: a distributed scheduler for executing graphs on multiple machines, see below.\n",
+    "\n",
+    "To select one of these for computation, you can specify at the time of asking for a result, e.g.,\n",
+    "```python\n",
+    "myvalue.compute(scheduler=\"single-threaded\")  # for debugging\n",
+    "```\n",
+    "\n",
+    "You can also set a default scheduler either temporarily\n",
+    "```python\n",
+    "with dask.config.set(scheduler='processes'):\n",
+    "    # set temporarily for this block only\n",
+    "    # all compute calls within this block will use the specified scheduler\n",
+    "    myvalue.compute()\n",
+    "    anothervalue.compute()\n",
+    "```\n",
+    "\n",
+    "Or globally\n",
+    "```python\n",
+    "# set until further notice\n",
+    "dask.config.set(scheduler='processes')\n",
+    "```\n",
+    "\n",
+    "Let's try out a few schedulers on the familiar case of the flights data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%run prep.py -d flights"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "dd.Scalar<series-..., dtype=float64>"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import dask.dataframe as dd\n",
+    "import os\n",
+    "df = dd.read_csv(os.path.join('data', 'nycflights', '*.csv'),\n",
+    "                 parse_dates={'Date': [0, 1, 2]},\n",
+    "                 dtype={'TailNum': object,\n",
+    "                        'CRSElapsedTime': float,\n",
+    "                        'Cancelled': bool})\n",
+    "\n",
+    "# Maximum average non-cancelled delay grouped by Airport\n",
+    "largest_delay = df[~df.Cancelled].groupby('Origin').DepDelay.mean().max()\n",
+    "largest_delay"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      " threading, 0.2795 s; result, 17.05 hours\n",
+      " processes, 1.9635 s; result, 17.05 hours\n",
+      "      sync, 0.2051 s; result, 17.05 hours\n"
+     ]
+    }
+   ],
+   "source": [
+    "# each of the following gives the same results (you can check!)\n",
+    "# any surprises?\n",
+    "import time\n",
+    "for sch in ['threading', 'processes', 'sync']:\n",
+    "    t0 = time.time()\n",
+    "    r = largest_delay.compute(scheduler=sch)\n",
+    "    t1 = time.time()\n",
+    "    print(f\"{sch:>10}, {t1 - t0:0.4f} s; result, {r:0.2f} hours\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Some Questions to Consider:\n",
+    "\n",
+    "- How much speedup is possible for this task (hint, look at the graph).\n",
+    "- Given how many cores are on this machine, how much faster could the parallel schedulers be than the single-threaded scheduler.\n",
+    "- How much faster was using threads over a single thread? Why does this differ from the optimal speedup?\n",
+    "- Why is the multiprocessing scheduler so much slower here?\n",
+    "\n",
+    "The `threaded` scheduler is a fine choice for working with large datasets out-of-core on a single machine, as long as the functions being used release the [GIL](https://wiki.python.org/moin/GlobalInterpreterLock) most of the time. NumPy and pandas release the GIL in most places, so the `threaded` scheduler is the default for `dask.array` and `dask.dataframe`. The distributed scheduler, perhaps with `processes=False`, will also work well for these workloads on a single machine.\n",
+    "\n",
+    "For workloads that do hold the GIL, as is common with `dask.bag` and custom code wrapped with `dask.delayed`, we recommend using the distributed scheduler, even on a single machine. Generally speaking, it's more intelligent and provides better diagnostics than the `processes` scheduler.\n",
+    "\n",
+    "https://docs.dask.org/en/latest/scheduling.html provides some additional details on choosing a scheduler.\n",
+    "\n",
+    "For scaling out work across a cluster, the distributed scheduler is required."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Making a cluster"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Simple method"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `dask.distributed` system is composed of a single centralized scheduler and one or more worker processes. [Deploying](https://docs.dask.org/en/latest/setup.html) a remote Dask cluster involves some additional effort. But doing things locally is just involves creating a `Client` object, which lets you interact with the \"cluster\" (local threads or processes on your machine). For more information see [here](https://docs.dask.org/en/latest/setup/single-distributed.html). \n",
+    "\n",
+    "Note that `Client()` takes a lot of optional [arguments](https://distributed.dask.org/en/latest/local-cluster.html#api), to configure the number of processes/threads, memory limits and other "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "d42d8fb969e9440ea17267bfa2fc6373",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "VBox(children=(HTML(value='<h2>LocalCluster</h2>'), HBox(children=(HTML(value='\\n<div>\\n  <style scoped>\\n    …"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from dask.distributed import Client\n",
+    "\n",
+    "# Setup a local cluster.\n",
+    "# By default this sets up 1 worker per core\n",
+    "client = Client(n_workers=3)\n",
+    "client.cluster"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If you aren't in jupyterlab and using the `dask-labextension`, be sure to click the `Dashboard` link to open up the diagnostics dashboard.\n",
+    "\n",
+    "## Executing with the distributed client"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Consider some trivial calculation, such as we've used before, where we have added sleep statements in order to simulate real work being done."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dask import delayed\n",
+    "import time\n",
+    "\n",
+    "def inc(x):\n",
+    "    time.sleep(5)\n",
+    "    return x + 1\n",
+    "\n",
+    "def dec(x):\n",
+    "    time.sleep(3)\n",
+    "    return x - 1\n",
+    "\n",
+    "def add(x, y):\n",
+    "    time.sleep(7)\n",
+    "    return x + y"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "By default, creating a `Client` makes it the default scheduler. Any calls to `.compute` will use the cluster your `client` is attached to, unless you specify otherwise, as above.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "3"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "x = delayed(inc)(1)\n",
+    "y = delayed(dec)(2)\n",
+    "total = delayed(add)(x, y)\n",
+    "total.compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The tasks will appear in the web UI as they are processed by the cluster and, eventually, a result will be printed as output of the cell above. Note that the kernel is blocked while waiting for the result. The resulting tasks block graph might look something like below. Hovering over each block gives which function it related to, and how long it took to execute. ![this](images/tasks.png)\n",
+    "\n",
+    "You can also see a simplified version of the graph being executed on Graph pane of the dashboard, so long as the calculation is in-flight."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's return to the flights computation from before, and see what happens on the dashboard (you may wish to have both the notebook and dashboard side-by-side). How did does this perform compared to before?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 284 ms, sys: 28.7 ms, total: 313 ms\n",
+      "Wall time: 3.1 s\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "17.053456221198157"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "%time largest_delay.compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this particular case, this should be as fast or faster than the best case, threading, above. Why do you suppose this is? You should start your reading [here](https://distributed.dask.org/en/latest/index.html#architecture), and in particular note that the distributed scheduler was a complete rewrite with more intelligence around sharing of intermediate results and which tasks run on which worker. This will result in better performance in *some* cases, but still larger latency and overhead compared to the threaded scheduler, so there will be rare cases where it performs worse. Fortunately, the dashboard now gives us a lot more [diagnostic information](https://distributed.dask.org/en/latest/diagnosing-performance.html). Look at the Profile page of the dashboard to find out what takes the biggest fraction of CPU time for the computation we just performed?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If all you want to do is execute computations created using delayed, or run calculations based on the higher-level data collections, then that is about all you need to know to scale your work up to cluster scale. However, there is more detail to know about the distributed scheduler that will help with efficient usage. See the chapter Distributed, Advanced."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Exercise\n",
+    "\n",
+    "Run the following computations while looking at the diagnostics page. In each case what is taking the most time?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Number of flights\n",
+    "_ = len(df)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Number of non-cancelled flights\n",
+    "_ = len(df[~df.Cancelled])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Number of non-cancelled flights per-airport\n",
+    "_ = df[~df.Cancelled].groupby('Origin').Origin.count().compute()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Average departure delay from each airport?\n",
+    "_ = df[~df.Cancelled].groupby('Origin').DepDelay.mean().compute()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Average departure delay per day-of-week\n",
+    "_ = df.groupby(df.Date.dt.dayofweek).DepDelay.mean().compute()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "distributed.client - ERROR - Failed to reconnect to scheduler after 10.00 seconds, closing client\n",
+      "_GatheringFuture exception was never retrieved\n",
+      "future: <_GatheringFuture finished exception=CancelledError()>\n",
+      "asyncio.exceptions.CancelledError\n"
+     ]
+    }
+   ],
+   "source": [
+    "client.shutdown()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/dask/06_distributed_advanced.ipynb b/dask/06_distributed_advanced.ipynb
new file mode 100644
index 0000000..108cf72
--- /dev/null
+++ b/dask/06_distributed_advanced.ipynb
@@ -0,0 +1,702 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\"images/dask_horizontal.svg\" align=\"right\" width=\"30%\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Distributed, Advanced"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Distributed futures"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dask.distributed import Client\n",
+    "c = Client(n_workers=4)\n",
+    "c.cluster"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the previous chapter, we showed that executing a calculation (created using delayed) with the distributed executor is identical to any other executor. However, we now have access to additional functionality, and control over what data is held in memory.\n",
+    "\n",
+    "To begin, the `futures` interface (derived from the built-in `concurrent.futures`) allows map-reduce like functionality. We can submit individual functions for evaluation with one set of inputs, or evaluated over a sequence of inputs with `submit()` and `map()`. Notice that the call returns immediately, giving one or more *futures*, whose status begins as \"pending\" and later becomes \"finished\". There is no blocking of the local Python session."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here is the simplest example of `submit` in action:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def inc(x):\n",
+    "    return x + 1\n",
+    "\n",
+    "fut = c.submit(inc, 1)\n",
+    "fut"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can re-execute the following cell as often as we want as a way to poll the status of the future. This could of course be done in a loop, pausing for a short time on each iteration. We could continue with our work, or view a progressbar of work still going on, or force a wait until the future is ready. \n",
+    "\n",
+    "In the meantime, the `status` dashboard (link above next to the Cluster widget) has gained a new element in the task stream, indicating that `inc()` has completed, and the progress section at the problem shows one task complete and held in memory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fut"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Possible alternatives you could investigate:\n",
+    "```python\n",
+    "from dask.distributed import wait, progress\n",
+    "progress(fut)\n",
+    "```\n",
+    "would show a progress bar in *this* notebook, rather than having to go to the dashboard. This progress bar is also asynchronous, and doesn't block the execution of other code in the meanwhile.\n",
+    "\n",
+    "```python\n",
+    "wait(fut)\n",
+    "```\n",
+    "would block and force the notebook to wait until the computation pointed to by `fut` was done. However, note that the result of `inc()` is sitting in the cluster, it would take **no time** to execute the computation now, because Dask notices that we are asking for the result of a computation it already knows about. More on this later."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# grab the information back - this blocks if fut is not ready\n",
+    "c.gather(fut)\n",
+    "# equivalent action when only considering a single future\n",
+    "# fut.result()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here we see an alternative way to execute work on the cluster: when you submit or map with the inputs as futures, the *computation moves to the data* rather than the other way around, and the client, in the local Python session, need never see the intermediate values. This is similar to building the graph using delayed, and indeed, delayed can be used in conjunction with futures. Here we use the delayed object `total` from before."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Some trivial work that takes time\n",
+    "# repeated from the Distributed chapter.\n",
+    "\n",
+    "from dask import delayed\n",
+    "import time\n",
+    "\n",
+    "def inc(x):\n",
+    "    time.sleep(5)\n",
+    "    return x + 1\n",
+    "\n",
+    "def dec(x):\n",
+    "    time.sleep(3)\n",
+    "    return x - 1\n",
+    "\n",
+    "def add(x, y):\n",
+    "    time.sleep(7)\n",
+    "    return x + y\n",
+    "\n",
+    "x = delayed(inc)(1)\n",
+    "y = delayed(dec)(2)\n",
+    "total = delayed(add)(x, y)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# notice the difference from total.compute()\n",
+    "# notice that this cell completes immediately\n",
+    "fut = c.compute(total)\n",
+    "fut"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "c.gather(fut) # waits until result is ready"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### `Client.submit`\n",
+    "\n",
+    "`submit` takes a function and arguments, pushes these to the cluster, returning a *Future* representing the result to be computed. The function is passed to a worker process for evaluation. Note that this cell returns immediately, while computation may still be ongoing on the cluster."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fut = c.submit(inc, 1)\n",
+    "fut"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This looks a lot like doing `compute()`, above, except now we are passing the function and arguments directly to the cluster. To anyone used to `concurrent.futures`, this will look familiar. This new `fut` behaves the same way as the one above. Note that we have now over-written the previous definition of `fut`, which will get garbage-collected, and, as a result, that previous result is released by the cluster\n",
+    "\n",
+    "### Exercise: Rebuild the above delayed computation using `Client.submit` instead\n",
+    "\n",
+    "The arguments passed to `submit` can be futures from other submit operations or delayed objects. The former, in particular, demonstrated the concept of *moving the computation to the data* which is one of the most powerful elements of programming with Dask.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code here"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
+    }
+   },
+   "outputs": [],
+   "source": [
+    "x = c.submit(inc, 1)\n",
+    "y = c.submit(dec, 2)\n",
+    "total = c.submit(add, x, y)\n",
+    "\n",
+    "print(total)     # This is still a future\n",
+    "c.gather(total)   # This blocks until the computation has finished\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Each futures represents a result held, or being evaluated by the cluster. Thus we can control caching of intermediate values - when a future is no longer referenced, its value is forgotten. In the solution, above, futures are held for each of the function calls. These results would not need to be re-evaluated if we chose to submit more work that needed them.\n",
+    "\n",
+    "We can explicitly pass data from our local session into the cluster using `scatter()`, but usually better is to construct functions that do the loading of data within the workers themselves, so that there is no need to serialise and communicate the data. Most of the loading functions within Dask, sudh as `dd.read_csv`, work this way. Similarly, we normally don't want to `gather()` results that are too big in memory.\n",
+    "\n",
+    "The [full API](http://distributed.readthedocs.io/en/latest/api.html) of the distributed scheduler gives details of interacting with the cluster, which remember, can be on your local machine or possibly on a massive computational resource. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The futures API offers a work submission style that can easily emulate the map/reduce paradigm (see `c.map()`) that may be familiar to many people. The intermediate results, represented by futures, can be passed to new tasks without having to bring the pull locally from the cluster, and new work can be assigned to work on the output of previous jobs that haven't even begun yet.\n",
+    "\n",
+    "Generally, any Dask operation that is executed using `.compute()` can be submitted for asynchronous execution using `c.compute()` instead, and this applies to all collections. Here is an example with the calculation previously seen in the Bag chapter. We have replaced the `.compute()` method there with the distributed client version, so, again, we could continue to submit more work (perhaps based on the result of the calculation), or, in the next cell, follow the progress of the computation. A similar progress-bar appears in the monitoring UI page."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%run prep.py -d accounts"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import dask.bag as db\n",
+    "import os\n",
+    "import json\n",
+    "filename = os.path.join('data', 'accounts.*.json.gz')\n",
+    "lines = db.read_text(filename)\n",
+    "js = lines.map(json.loads)\n",
+    "\n",
+    "f = c.compute(js.filter(lambda record: record['name'] == 'Alice')\n",
+    "       .pluck('transactions')\n",
+    "       .flatten()\n",
+    "       .pluck('amount')\n",
+    "       .mean())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dask.distributed import progress\n",
+    "# note that progress must be the last line of a cell\n",
+    "# in order to show up\n",
+    "progress(f)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# get result.\n",
+    "c.gather(f)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# release values by deleting the futures\n",
+    "del f, fut, x, y, total"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Persist"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Considering which data should be loaded by the workers, as opposed to passed, and which intermediate values to persist in worker memory, will in many cases determine the computation efficiency of a process.\n",
+    "\n",
+    "In the example here, we repeat a calculation from the Array chapter - notice that each call to `compute()` is roughly the same speed, because the loading of the data is included every time."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%run prep.py -d random"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import h5py\n",
+    "import os\n",
+    "f = h5py.File(os.path.join('data', 'random.hdf5'), mode='r')\n",
+    "dset = f['/x']\n",
+    "import dask.array as da\n",
+    "x = da.from_array(dset, chunks=(1000000,))\n",
+    "\n",
+    "%time x.sum().compute()\n",
+    "%time x.sum().compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If, instead, we persist the data to RAM up front (this takes a few seconds to complete - we could `wait()` on this process), then further computations will be much faster."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# changes x from a set of delayed prescriptions\n",
+    "# to a set of futures pointing to data in RAM\n",
+    "# See this on the UI dashboard.\n",
+    "x = c.persist(x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%time x.sum().compute()\n",
+    "%time x.sum().compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Naturally, persisting every intermediate along the way is a bad idea, because this will tend to fill up all available RAM and make the whole system slow (or break!). The ideal persist point is often at the end of a set of data cleaning steps, when the data is in a form which will get queried often. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Exercise**: how is the memory associated with `x` released, once we know we are done with it?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Asynchronous computation\n",
+    "<img style=\"float: right;\" src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/3/32/Rosenbrock_function.svg/450px-Rosenbrock_function.svg.png\" height=200 width=200>\n",
+    "\n",
+    "One benefit of using the futures API is that you can have dynamic computations that adjust as things progress. Here we implement a simple naive search by looping through results as they come in, and submit new points to compute as others are still running.\n",
+    "\n",
+    "Watching the [diagnostics dashboard](../../9002/status) as this runs you can see computations are being concurrently run while more are being submitted. This flexibility can be useful for parallel algorithms that require some level of synchronization.\n",
+    "\n",
+    "Lets perform a very simple minimization using dynamic programming. The function of interest is known as Rosenbrock:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# a simple function with interesting minima\n",
+    "import time\n",
+    "\n",
+    "def rosenbrock(point):\n",
+    "    \"\"\"Compute the rosenbrock function and return the point and result\"\"\"\n",
+    "    time.sleep(0.1)\n",
+    "    score = (1 - point[0])**2 + 2 * (point[1] - point[0]**2)**2\n",
+    "    return point, score"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Initial setup, including creating a graphical figure. We use Bokeh for this, which allows for dynamic update of the figure as results come in. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from bokeh.io import output_notebook, push_notebook\n",
+    "from bokeh.models.sources import ColumnDataSource\n",
+    "from bokeh.plotting import figure, show\n",
+    "import numpy as np\n",
+    "output_notebook()\n",
+    "\n",
+    "# set up plot background\n",
+    "N = 500\n",
+    "x = np.linspace(-5, 5, N)\n",
+    "y = np.linspace(-5, 5, N)\n",
+    "xx, yy = np.meshgrid(x, y)\n",
+    "d = (1 - xx)**2 + 2 * (yy - xx**2)**2\n",
+    "d = np.log(d)\n",
+    "\n",
+    "p = figure(x_range=(-5, 5), y_range=(-5, 5))\n",
+    "p.image(image=[d], x=-5, y=-5, dw=10, dh=10, palette=\"Spectral11\");"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We start off with a point at (0, 0), and randomly scatter test points around it. Each evaluation takes ~100ms, and as result come in, we test to see if we have a new best point, and choose random points around that new best point, as the search box shrinks.\n",
+    "\n",
+    "We print the function value and current best location each time we have a new best value."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dask.distributed import as_completed\n",
+    "from random import uniform\n",
+    "\n",
+    "scale = 5                  # Intial random perturbation scale\n",
+    "best_point = (0, 0)        # Initial guess\n",
+    "best_score = float('inf')  # Best score so far\n",
+    "startx = [uniform(-scale, scale) for _ in range(10)]\n",
+    "starty = [uniform(-scale, scale) for _ in range(10)]\n",
+    "\n",
+    "# set up plot\n",
+    "source = ColumnDataSource({'x': startx, 'y': starty, 'c': ['grey'] * 10})\n",
+    "p.circle(source=source, x='x', y='y', color='c')\n",
+    "t = show(p, notebook_handle=True)\n",
+    "\n",
+    "# initial 10 random points\n",
+    "futures = [c.submit(rosenbrock, (x, y)) for x, y in zip(startx, starty)]\n",
+    "iterator = as_completed(futures)\n",
+    "\n",
+    "for res in iterator:\n",
+    "    # take a completed point, is it an improvement?\n",
+    "    point, score = res.result()\n",
+    "    if score < best_score:\n",
+    "        best_score, best_point = score, point\n",
+    "        print(score, point)\n",
+    "\n",
+    "    x, y = best_point\n",
+    "    newx, newy = (x + uniform(-scale, scale), y + uniform(-scale, scale))\n",
+    "    \n",
+    "    # update plot\n",
+    "    source.stream({'x': [newx], 'y': [newy], 'c': ['grey']}, rollover=20)\n",
+    "    push_notebook(document=t)\n",
+    "    \n",
+    "    # add new point, dynamically, to work on the cluster\n",
+    "    new_point = c.submit(rosenbrock, (newx, newy))\n",
+    "    iterator.add(new_point)  # Start tracking new task as well\n",
+    "\n",
+    "    # Narrow search and consider stopping\n",
+    "    scale *= 0.99\n",
+    "    if scale < 0.001:\n",
+    "        break\n",
+    "point"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Debugging"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "When something goes wrong in a distributed job, it is hard to figure out what the problem was and what to do about it. When a task raises an exception, the exception will show up when that result, or other result that depend upon it, is gathered.\n",
+    "\n",
+    "Consider the following delayed calculation to be computed by the cluster. As usual, we get back a future, which the cluster is working on to compute (this happens very slowly for the trivial procedure)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@delayed\n",
+    "def ratio(a, b):\n",
+    "    return a // b\n",
+    "\n",
+    "ina = [5, 25, 30]\n",
+    "inb = [5, 5, 6]\n",
+    "out = delayed(sum)([ratio(a, b) for (a, b) in zip(ina, inb)])\n",
+    "f = c.compute(out)\n",
+    "f"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We only get to know what happened when we gather the result (this is also true for `out.compute()`, except we could not have done other stuff in the meantime). For the first set of inputs, it works fine."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "c.gather(f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "But if we introduce bad input, an exception is raised. The exception happens in `ratio`, but only comes to our attention when calculating the sum."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "raises-exception"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "ina = [5, 25, 30]\n",
+    "inb = [5, 0, 6]\n",
+    "out = delayed(sum)([ratio(a, b) for (a, b) in zip(ina, inb)])\n",
+    "f = c.compute(out)\n",
+    "c.gather(f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The display in this case makes the origin of the exception obvious, but this is not always the case. How should this be debugged, how would we go about finding out the exact conditions that caused the exception? \n",
+    "\n",
+    "The first step, of course, is to write well-tested code which makes appropriate assertions about its input and clear warnings and error messages when something goes wrong. This applies to all code.\n",
+    "\n",
+    "The most typical thing to do is to execute some portion of the computation in the local thread, so that we can run the Python debugger and query the state of things at the time that the exception happened. Obviously, this cannot be performed on the whole data-set when dealing with Big Data on a cluster, but a suitable sample will probably do even then."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "raises-exception"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "import dask\n",
+    "with dask.config.set(scheduler=\"sync\"):\n",
+    "    # do NOT use c.compute(out) here - we specifically do not\n",
+    "    # want the distributed scheduler\n",
+    "    out.compute()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# uncomment to enter post-mortem debugger\n",
+    "# %debug"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The trouble with this approach is that Dask is meant for the execution of large datasets/computations - you probably can't simply run the whole thing \n",
+    "in one local thread, else you wouldn't have used Dask in the first place. So the code above should only be used on a small part of the data that also exihibits the error. \n",
+    "Furthermore, the method will not work when you are dealing with futures (such as `f`, above, or after persisting) instead of delayed-based computations.\n",
+    "\n",
+    "As an alternative, you can ask the scheduler to analyze your calculation and find the specific sub-task responsible for the error, and pull only it and its dependnecies locally for execution."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "raises-exception"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "c.recreate_error_locally(f)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# uncomment to enter post-mortem debugger\n",
+    "# %debug"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, there are errors other than exceptions, when we need to look at the state of the scheduler/workers. In the standard \"LocalCluster\" we started, we\n",
+    "have direct access to these."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "[(k, v.state) for k, v in c.cluster.scheduler.tasks.items() if v.exception is not None]"
+   ]
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/dask/06_futures.ipynb b/dask/06_futures.ipynb
new file mode 100644
index 0000000..7def2d5
--- /dev/null
+++ b/dask/06_futures.ipynb
@@ -0,0 +1,476 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/lsteffenel/hpc-python/blob/master/dask/06_futures.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "9WPrgai8OYFm"
+      },
+      "source": [
+        "<img src=\"https://github.com/lsteffenel/hpc-python/blob/master/dask/images/dask_horizontal.svg?raw=1\" align=\"right\" width=\"30%\">"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "lV_uAcQpOYFn"
+      },
+      "source": [
+        "# Utilisation de Futures avec le mode Distributed"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!python -m pip install dask[complete]"
+      ],
+      "metadata": {
+        "id": "HOLWJpHlOegA"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "e5_CGMmkOYFn"
+      },
+      "source": [
+        "## Distributed futures"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "zINc7rv1OYFn"
+      },
+      "outputs": [],
+      "source": [
+        "from dask.distributed import Client\n",
+        "c = Client(n_workers=4)\n",
+        "c.cluster"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "RLupP0MLOYFo"
+      },
+      "source": [
+        "Dans le chapitre précédent, nous avons montré que l'exécution d'un calcul (créé avec `delayed`) avec l'exécuteur distribué est identique à tout autre exécuteur. Cependant, nous avons maintenant accès à une fonctionnalité supplémentaire et au contrôle sur les données maintenues en mémoire.\n",
+        "\n",
+        "Pour commencer, l'interface `futures` (dérivée de `concurrent.futures` intégré) permet une fonctionnalité de type map-reduce. Nous pouvons soumettre des fonctions individuelles pour évaluation avec un ensemble d'entrées, ou les évaluer sur une séquence d'entrées avec `submit()` et `map()`. Notez que l'appel retourne immédiatement, fournissant un ou plusieurs *futures*, dont le statut commence en \"pending\" puis devient \"finished\". Il n'y a pas de blocage de la session Python locale.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ilgmrjK6OYFo"
+      },
+      "source": [
+        "Voici le plus simple exemple de `submit` en action :\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "1ghMA5SUOYFo"
+      },
+      "outputs": [],
+      "source": [
+        "def inc(x):\n",
+        "    return x + 1\n",
+        "\n",
+        "fut = c.submit(inc, 1)\n",
+        "fut"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "UJfE-cfJOYFo"
+      },
+      "source": [
+        "Nous pouvons ré-exécuter la cellule suivante aussi souvent que nous le souhaitons pour sonder le statut du future. Cela pourrait bien sûr être fait dans une boucle, en pausant brièvement à chaque itération. Nous pourrions continuer notre travail, ou visualiser une barre de progression du travail en cours, ou forcer l'attente jusqu'à ce que le future soit prêt.\n",
+        "\n",
+        "Si vous consultez le dashboard, le tableau de bord `status` a gagné un nouvel élément dans le flux de tâches, indiquant que `inc()` s'est terminé, et la section de progression pour le problème montre une tâche terminée et maintenue en mémoire.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "I8RZkej6OYFo"
+      },
+      "outputs": [],
+      "source": [
+        "fut"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Ehn5u5scOYFp"
+      },
+      "source": [
+        "Possibles alternatives que vous pourriez explorer :\n",
+        "```python\n",
+        "from dask.distributed import wait, progress\n",
+        "progress(fut)\n",
+        "```\n",
+        "afficherait une barre de progression dans ce notebook, plutôt que d’avoir à aller sur le tableau de bord. Cette barre de progression est également asynchrone et ne bloque pas l’exécution des autres morceaux de code entre‑temps.\n",
+        "\n",
+        "```python\n",
+        "wait(fut)\n",
+        "```\n",
+        "bloquerait et forcerait le notebook à attendre jusqu’à ce que le calcul référencé par fut soit terminé. Cependant, notez que le résultat de `inc()` réside déjà dans le cluster ; exécuter le calcul maintenant ne prendrait **aucun temps**, car Dask remarque que l’on demande le résultat d’un calcul dont il a déjà connaissance. On y reviendra plus tard."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "XxljuiePOYFp"
+      },
+      "outputs": [],
+      "source": [
+        "# grab the information back - this blocks if fut is not ready\n",
+        "c.gather(fut)\n",
+        "# equivalent action when only considering a single future\n",
+        "# fut.result()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "fvukf-hYOYFp"
+      },
+      "source": [
+        "Ici, nous voyons une autre façon d'exécuter des tâches sur le cluster : lorsque vous soumettez ou mappez avec les entrées sous forme de *futures*, le *calcul se déplace vers les données* plutôt que l'inverse, et le client, dans la session Python locale, n'a jamais besoin de voir les valeurs intermédiaires. Cela est similaire à la construction du graphe avec `delayed`, et en effet, `delayed` peut être utilisé en combinaison avec les *futures*. Ici, nous utilisons l'objet `delayed` `total` d'avant.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "54uovLMLOYFp"
+      },
+      "outputs": [],
+      "source": [
+        "# Some trivial work that takes time\n",
+        "# repeated from the Distributed chapter.\n",
+        "\n",
+        "from dask import delayed\n",
+        "import time\n",
+        "\n",
+        "def inc(x):\n",
+        "    time.sleep(5)\n",
+        "    return x + 1\n",
+        "\n",
+        "def dec(x):\n",
+        "    time.sleep(3)\n",
+        "    return x - 1\n",
+        "\n",
+        "def add(x, y):\n",
+        "    time.sleep(7)\n",
+        "    return x + y\n",
+        "\n",
+        "x = delayed(inc)(1)\n",
+        "y = delayed(dec)(2)\n",
+        "total = delayed(add)(x, y)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "aWEqZbNAOYFp"
+      },
+      "outputs": [],
+      "source": [
+        "# notice the difference from total.compute()\n",
+        "# notice that this cell completes immediately\n",
+        "fut = c.compute(total)\n",
+        "fut"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "4H3eL56KOYFp"
+      },
+      "outputs": [],
+      "source": [
+        "c.gather(fut) # waits until result is ready"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "kKwGZ3FyOYFp"
+      },
+      "source": [
+        "### `Client.submit`\n",
+        "\n",
+        "`submit` prend une fonction et des arguments, les pousse vers le cluster, et retourne un *Future* représentant le résultat à calculer. La fonction est passée à un processus worker pour évaluation. Notez que cette cellule retourne immédiatement, alors que le calcul peut encore être en cours sur le cluster.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "mP1mz0HyOYFp"
+      },
+      "outputs": [],
+      "source": [
+        "fut = c.submit(inc, 1)\n",
+        "fut"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "eaBjq4YNOYFq"
+      },
+      "source": [
+        "Cela ressemble beaucoup à faire un `compute()`, ci-dessus, sauf que maintenant nous passons directement la fonction et les arguments au cluster. Pour quiconque habitué à `concurrent.futures`, cela paraîtra familier. Ce nouveau `fut` se comporte de la même manière que celui d'au-dessus. Notez que nous avons maintenant écrasé la définition précédente de `fut`, qui sera collectée par le ramasse-miettes, et, par conséquent, ce résultat précédent est libéré par le cluster.\n",
+        "\n",
+        "### Exercice : Reconstruire le calcul *delayed* ci-dessus en utilisant `Client.submit` à la place\n",
+        "\n",
+        "Les arguments passés à `submit` peuvent être des *futures* d'autres opérations `submit` ou des objets *delayed*. Le premier cas en particulier démontre le concept de *déplacer le calcul vers les données*, qui est l'un des éléments les plus puissants de la programmation avec Dask.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "v_9xeMHkOYFq"
+      },
+      "outputs": [],
+      "source": [
+        "# Your code here"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "jupyter": {
+          "source_hidden": true
+        },
+        "id": "s1ywOquOOYFq"
+      },
+      "outputs": [],
+      "source": [
+        "x = c.submit(inc, 1)\n",
+        "y = c.submit(dec, 2)\n",
+        "total = c.submit(add, x, y)\n",
+        "\n",
+        "print(total)     # This is still a future\n",
+        "c.gather(total)   # This blocks until the computation has finished\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "YVooei-lOYFq"
+      },
+      "source": [
+        "Chaque *future* représente un résultat détenu, ou en cours d'évaluation par le cluster. Ainsi, nous pouvons contrôler le cache des valeurs intermédiaires - lorsqu'un *future* n'est plus référencé, sa valeur est oubliée. Dans la solution, ci-dessus, des *futures* sont conservés pour chacun des appels de fonction. Ces résultats n'auraient pas besoin d'être réévalués si nous choisissions de soumettre plus de travail les nécessitant.\n",
+        "\n",
+        "Nous pouvons explicitement passer des données de notre session locale vers le cluster en utilisant `scatter()`, mais il est généralement préférable de construire des fonctions qui chargent les données directement dans les *workers* eux-mêmes, afin qu'il n'y ait pas besoin de sérialiser et communiquer les données. La plupart des fonctions de chargement dans Dask, telles que `dd.read_csv`, fonctionnent de cette manière. De même, nous ne voulons normalement pas `gather()` des résultats trop volumineux en mémoire.\n",
+        "\n",
+        "L'[API complète](http://distributed.readthedocs.io/en/latest/api.html) du planificateur distribué donne des détails sur l'interaction avec le cluster, qui, rappelez-vous, peut être sur votre machine locale ou éventuellement sur une ressource de calcul massive.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "yAGx8jigOYFq"
+      },
+      "source": [
+        "L'API *futures* offre un style de soumission de travail qui peut facilement émuler le paradigme *map/reduce* (voir `c.map()`) qui peut être familier à beaucoup de personnes. Les résultats intermédiaires, représentés par des *futures*, peuvent être passés à de nouvelles tâches sans avoir à les récupérer localement depuis le cluster, et du nouveau travail peut être assigné pour travailler sur la sortie de jobs précédents qui n'ont même pas encore commencé.\n",
+        "\n",
+        "Généralement, toute opération Dask qui est exécutée avec `.compute()` peut être soumise pour une exécution asynchrone en utilisant `c.compute()` à la place, et cela s'applique à toutes les collections. Voici un exemple avec le calcul vu précédemment dans le chapitre Bag. Nous avons remplacé la méthode `.compute()` par la version client distribué, donc, encore une fois, nous pourrions continuer à soumettre plus de travail (peut-être basé sur le résultat du calcul), ou, dans la cellule suivante, suivre la progression du calcul. Une barre de progression similaire apparaît dans la page d'interface de surveillance.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "MNn4iC0nOYFr"
+      },
+      "source": [
+        "## Calcul asynchrone\n",
+        "<img style=\"float: right;\" src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/3/32/Rosenbrock_function.svg/450px-Rosenbrock_function.svg.png\" height=200 width=200>\n",
+        "\n",
+        "Un avantage de l'utilisation de l'API *futures* est que vous pouvez avoir des calculs dynamiques qui s'ajustent au fur et à mesure que les choses progressent. Ici, nous implémentons une recherche naïve simple en parcourant les résultats au fur et à mesure qu'ils arrivent, et soumettons de nouveaux points à calculer tandis que d'autres sont encore en cours d'exécution.\n",
+        "\n",
+        "En observant le [tableau de bord de diagnostic](../../9002/status) pendant l'exécution, vous pouvez voir que des calculs sont exécutés concurremment tandis que d'autres sont soumis. Cette flexibilité peut être utile pour des algorithmes parallèles qui nécessitent un certain niveau de synchronisation.\n",
+        "\n",
+        "Réalisons une minimisation très simple en utilisant la programmation dynamique. La fonction d'intérêt est connue sous le nom de Rosenbrock :\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "KrCKpymCOYFr"
+      },
+      "outputs": [],
+      "source": [
+        "# a simple function with interesting minima\n",
+        "import time\n",
+        "\n",
+        "def rosenbrock(point):\n",
+        "    \"\"\"Compute the rosenbrock function and return the point and result\"\"\"\n",
+        "    time.sleep(0.1)\n",
+        "    score = (1 - point[0])**2 + 2 * (point[1] - point[0]**2)**2\n",
+        "    return point, score"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "G9qTnLMBOYFs"
+      },
+      "source": [
+        "Configuration initiale, incluant la création d'une figure graphique. Nous utilisons Bokeh pour cela, qui permet la mise à jour dynamique de la figure au fur et à mesure que les résultats arrivent.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "X7auUhH9OYFs"
+      },
+      "outputs": [],
+      "source": [
+        "from bokeh.io import output_notebook, push_notebook\n",
+        "from bokeh.models.sources import ColumnDataSource\n",
+        "from bokeh.plotting import figure, show\n",
+        "import numpy as np\n",
+        "output_notebook()\n",
+        "\n",
+        "# set up plot background\n",
+        "N = 500\n",
+        "x = np.linspace(-5, 5, N)\n",
+        "y = np.linspace(-5, 5, N)\n",
+        "xx, yy = np.meshgrid(x, y)\n",
+        "d = (1 - xx)**2 + 2 * (yy - xx**2)**2\n",
+        "d = np.log(d)\n",
+        "\n",
+        "p = figure(x_range=(-5, 5), y_range=(-5, 5))\n",
+        "p.image(image=[d], x=-5, y=-5, dw=10, dh=10, palette=\"Spectral11\");"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "rUzV0-ubOYFs"
+      },
+      "source": [
+        "Nous commençons avec un point en (0, 0), et dispersons aléatoirement des points de test autour de lui. Chaque évaluation prend ~100 ms, et au fur et à mesure que les résultats arrivent, nous vérifions si nous avons un nouveau meilleur point, et choisissons des points aléatoires autour de ce nouveau meilleur point, tandis que la boîte de recherche se rétrécit.\n",
+        "\n",
+        "Nous affichons la valeur de la fonction et l'emplacement du meilleur actuel à chaque fois que nous avons une nouvelle meilleure valeur.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "MXPEuPaSOYFs"
+      },
+      "outputs": [],
+      "source": [
+        "from dask.distributed import as_completed\n",
+        "from random import uniform\n",
+        "\n",
+        "scale = 5                  # Intial random perturbation scale\n",
+        "best_point = (0, 0)        # Initial guess\n",
+        "best_score = float('inf')  # Best score so far\n",
+        "startx = [uniform(-scale, scale) for _ in range(10)]\n",
+        "starty = [uniform(-scale, scale) for _ in range(10)]\n",
+        "\n",
+        "# set up plot\n",
+        "source = ColumnDataSource({'x': startx, 'y': starty, 'c': ['grey'] * 10})\n",
+        "p.circle(source=source, x='x', y='y', color='c', radius=0.1)\n",
+        "t = show(p, notebook_handle=True)\n",
+        "\n",
+        "# initial 10 random points\n",
+        "futures = [c.submit(rosenbrock, (x, y)) for x, y in zip(startx, starty)]\n",
+        "iterator = as_completed(futures)\n",
+        "\n",
+        "for res in iterator:\n",
+        "    # take a completed point, is it an improvement?\n",
+        "    point, score = res.result()\n",
+        "    if score < best_score:\n",
+        "        best_score, best_point = score, point\n",
+        "        print(score, point)\n",
+        "\n",
+        "    x, y = best_point\n",
+        "    newx, newy = (x + uniform(-scale, scale), y + uniform(-scale, scale))\n",
+        "\n",
+        "    # update plot\n",
+        "    source.stream({'x': [newx], 'y': [newy], 'c': ['grey']}, rollover=20)\n",
+        "    push_notebook(document=t)\n",
+        "\n",
+        "    # add new point, dynamically, to work on the cluster\n",
+        "    new_point = c.submit(rosenbrock, (newx, newy))\n",
+        "    iterator.add(new_point)  # Start tracking new task as well\n",
+        "\n",
+        "    # Narrow search and consider stopping\n",
+        "    scale *= 0.99\n",
+        "    if scale < 0.001:\n",
+        "        break\n",
+        "point"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "nk8tZbL0iQvo"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ],
+  "metadata": {
+    "anaconda-cloud": {},
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.8.3"
+    },
+    "colab": {
+      "provenance": [],
+      "include_colab_link": true
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/dask/07_dataframe_storage.ipynb b/dask/07_dataframe_storage.ipynb
new file mode 100644
index 0000000..36c8a5c
--- /dev/null
+++ b/dask/07_dataframe_storage.ipynb
@@ -0,0 +1,389 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\"images/dask_horizontal.svg\" align=\"right\" width=\"30%\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Data Storage"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\"images/hdd.jpg\" width=\"20%\" align=\"right\">\n",
+    "Efficient storage can dramatically improve performance, particularly when operating repeatedly from disk.\n",
+    "\n",
+    "Decompressing text and parsing CSV files is expensive.  One of the most effective strategies with medium data is to use a binary storage format like HDF5.  Often the performance gains from doing this is sufficient so that you can switch back to using Pandas again instead of using `dask.dataframe`.\n",
+    "\n",
+    "In this section we'll learn how to efficiently arrange and store your datasets in on-disk binary formats.  We'll use the following:\n",
+    "\n",
+    "1.  [Pandas `HDFStore`](http://pandas.pydata.org/pandas-docs/stable/io.html#io-hdf5) format on top of `HDF5`\n",
+    "2.  Categoricals for storing text data numerically\n",
+    "\n",
+    "**Main Take-aways**\n",
+    "\n",
+    "1.  Storage formats affect performance by an order of magnitude\n",
+    "2.  Text data will keep even a fast format like HDF5 slow\n",
+    "3.  A combination of binary formats, column storage, and partitioned data turns one second wait times into 80ms wait times."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%run prep.py -d accounts"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Read CSV"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "First we read our csv data as before.\n",
+    "\n",
+    "CSV and other text-based file formats are the most common storage for data from many sources, because they require minimal pre-processing, can be written line-by-line and are human-readable. Since Pandas' `read_csv` is well-optimized, CSVs are a reasonable input, but far from optimized, since reading required extensive text parsing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "filename = os.path.join('data', 'accounts.*.csv')\n",
+    "filename"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import dask.dataframe as dd\n",
+    "df_csv = dd.read_csv(filename)\n",
+    "df_csv.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Write to HDF5"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "HDF5 and netCDF are binary array formats very commonly used in the scientific realm.\n",
+    "\n",
+    "Pandas contains a specialized HDF5 format, `HDFStore`.  The ``dd.DataFrame.to_hdf`` method works exactly like the ``pd.DataFrame.to_hdf`` method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "target = os.path.join('data', 'accounts.h5')\n",
+    "target"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# convert to binary format, takes some time up-front\n",
+    "%time df_csv.to_hdf(target, '/data')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# same data as before\n",
+    "df_hdf = dd.read_hdf(target, '/data')\n",
+    "df_hdf.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Compare CSV to HDF5 speeds"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We do a simple computation that requires reading a column of our dataset and compare performance between CSV files and our newly created HDF5 file.  Which do you expect to be faster?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%time df_csv.amount.sum().compute()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%time df_hdf.amount.sum().compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Sadly they are about the same, or perhaps even slower. \n",
+    "\n",
+    "The culprit here is `names` column, which is of `object` dtype and thus hard to store efficiently.  There are two problems here:\n",
+    "\n",
+    "1.  How do we store text data like `names` efficiently on disk?\n",
+    "2.  Why did we have to read the `names` column when all we wanted was `amount`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1.  Store text efficiently with categoricals"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can use Pandas categoricals to replace our object dtypes with a numerical representation.  This takes a bit more time up front, but results in better performance.\n",
+    "\n",
+    "More on categoricals at the [pandas docs](http://pandas.pydata.org/pandas-docs/stable/categorical.html) and [this blogpost](http://matthewrocklin.com/blog/work/2015/06/18/Categoricals)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Categorize data, then store in HDFStore\n",
+    "%time df_hdf.categorize(columns=['names']).to_hdf(target, '/data2')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# It looks the same\n",
+    "df_hdf = dd.read_hdf(target, '/data2')\n",
+    "df_hdf.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# But loads more quickly\n",
+    "%time df_hdf.amount.sum().compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This is now definitely faster than before.  This tells us that it's not only the file type that we use but also how we represent our variables that influences storage performance. \n",
+    "\n",
+    "How does the performance of reading depend on the scheduler we use? You can try this with threaded, processes and distributed.\n",
+    "\n",
+    "However this can still be better.  We had to read all of the columns (`names` and `amount`) in order to compute the sum of one (`amount`).  We'll improve further on this with `parquet`, an on-disk column-store.  First though we learn about how to set an index in a dask.dataframe."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Exercise"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`fastparquet` is a library for interacting with parquet-format files, which are a very common format in the Big Data ecosystem, and used by tools such as Hadoop, Spark and Impala."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "target = os.path.join('data', 'accounts.parquet')\n",
+    "df_csv.categorize(columns=['names']).to_parquet(target, storage_options={\"has_nulls\": True}, engine=\"fastparquet\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Investigate the file structure in the resultant new directory - what do you suppose those files are for?\n",
+    "\n",
+    "`to_parquet` comes with many options, such as compression, whether to explicitly write NULLs information (not necessary in this case), and how to encode strings. You can experiment with these, to see what effect they have on the file size and the processing times, below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ls -l data/accounts.parquet/"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_p = dd.read_parquet(target)\n",
+    "# note that column names shows the type of the values - we could\n",
+    "# choose to load as a categorical column or not.\n",
+    "df_p.dtypes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Rerun the sum computation above for this version of the data, and time how long it takes. You may want to try this more than once - it is common for many libraries to do various setup work when called for the first time."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%time df_p.amount.sum().compute()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "When archiving data, it is common to sort and partition by a column with unique identifiers, to facilitate fast look-ups later. For this data, that column is `id`. Time how long it takes to retrieve the rows corresponding to `id==100` from the raw CSV, from HDF5 and parquet versions, and finally from a new parquet version written after applying `set_index('id')`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# df_p.set_index('id').to_parquet(...)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Remote files"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Dask can access various cloud- and cluster-oriented data storage services such as Amazon S3 or HDFS\n",
+    "\n",
+    "Advantages:\n",
+    "* scalable, secure storage\n",
+    "\n",
+    "Disadvantages:\n",
+    "* network speed becomes bottleneck\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The way to set up dataframes (and other collections) remains very similar to before. Note that the data here is available anonymously, but in general an extra parameter `storage_options=` can be passed with further details about how to interact with the remote storage.\n",
+    "\n",
+    "```python\n",
+    "taxi = dd.read_csv('s3://nyc-tlc/trip data/yellow_tripdata_2015-*.csv',\n",
+    "                   storage_options={'anon': True})\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Warning**: operations over the Internet can take a long time to run. Such operations work really well in a cloud clustered set-up, e.g., amazon EC2 machines reading from S3 or Google compute machines reading from GCS."
+   ]
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/dask/08_machine_learning.ipynb b/dask/08_machine_learning.ipynb
new file mode 100644
index 0000000..7a1873e
--- /dev/null
+++ b/dask/08_machine_learning.ipynb
@@ -0,0 +1,473 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Parallel and Distributed Machine Learning\n",
+    "\n",
+    "[Dask-ML](https://dask-ml.readthedocs.io) has resources for parallel and distributed machine learning."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Types of Scaling\n",
+    "\n",
+    "There are a couple of distinct scaling problems you might face.\n",
+    "The scaling strategy depends on which problem you're facing.\n",
+    "\n",
+    "1. CPU-Bound: Data fits in RAM, but training takes too long. Many hyperparameter combinations, a large ensemble of many models, etc.\n",
+    "2. Memory-bound: Data is larger than RAM, and sampling isn't an option.\n",
+    "\n",
+    "![](images/ml-dimensions.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* For in-memory problems, just use scikit-learn (or your favorite ML library).\n",
+    "* For large models, use `dask_ml.joblib` and your favorite scikit-learn estimator\n",
+    "* For large datasets, use `dask_ml` estimators"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Scikit-Learn in 5 Minutes\n",
+    "\n",
+    "Scikit-Learn has a nice, consistent API.\n",
+    "\n",
+    "1. You instantiate an `Estimator` (e.g. `LinearRegression`, `RandomForestClassifier`, etc.). All of the models *hyperparameters* (user-specified parameters, not the ones learned by the estimator) are passed to the estimator when it's created.\n",
+    "2. You call `estimator.fit(X, y)` to train the estimator.\n",
+    "3. Use `estimator` to inspect attributes, make predictions, etc. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's generate some random data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import make_classification\n",
+    "\n",
+    "X, y = make_classification(n_samples=10000, n_features=4, random_state=0)\n",
+    "X[:8]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y[:8]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We'll fit a Support Vector Classifier."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.svm import SVC"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Create the estimator and fit it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "estimator = SVC(random_state=0)\n",
+    "estimator.fit(X, y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Inspect the learned attributes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "estimator.support_vectors_[:4]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Check the accuracy."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "estimator.score(X, y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Hyperparameters"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Most models have *hyperparameters*. They affect the fit, but are specified up front instead of learned during training."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "estimator = SVC(C=0.00001, shrinking=False, random_state=0)\n",
+    "estimator.fit(X, y)\n",
+    "estimator.support_vectors_[:4]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "estimator.score(X, y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Hyperparameter Optimization\n",
+    "\n",
+    "There are a few ways to learn the best *hyper*parameters while training. One is `GridSearchCV`.\n",
+    "As the name implies, this does a brute-force search over a grid of hyperparameter combinations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import GridSearchCV"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "estimator = SVC(gamma='auto', random_state=0, probability=True)\n",
+    "param_grid = {\n",
+    "    'C': [0.001, 10.0],\n",
+    "    'kernel': ['rbf', 'poly'],\n",
+    "}\n",
+    "\n",
+    "grid_search = GridSearchCV(estimator, param_grid, verbose=2, cv=2)\n",
+    "grid_search.fit(X, y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Single-machine parallelism with scikit-learn\n",
+    "\n",
+    "![](images/unmerged_grid_search_graph.svg)\n",
+    "\n",
+    "Scikit-Learn has nice *single-machine* parallelism, via Joblib.\n",
+    "Any scikit-learn estimator that can operate in parallel exposes an `n_jobs` keyword.\n",
+    "This controls the number of CPU cores that will be used."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "grid_search = GridSearchCV(estimator, param_grid, verbose=2, cv=2, n_jobs=-1)\n",
+    "grid_search.fit(X, y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Multi-machine parallelism with Dask\n",
+    "\n",
+    "![](images/merged_grid_search_graph.svg)\n",
+    "\n",
+    "Dask can talk to scikit-learn (via joblib) so that your *cluster* is used to train a model. \n",
+    "\n",
+    "If you run this on a laptop, it will take quite some time, but the CPU usage will be satisfyingly near 100% for the duration. To run faster, you would need a disrtibuted cluster. That would mean putting something in the call to `Client` something like\n",
+    "\n",
+    "```\n",
+    "c = Client('tcp://my.scheduler.address:8786')\n",
+    "```\n",
+    "\n",
+    "Details on the many ways to create a cluster can be found [here](https://docs.dask.org/en/latest/setup/single-distributed.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's try it on a larger problem (more hyperparameters)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import joblib\n",
+    "import dask.distributed\n",
+    "\n",
+    "c = dask.distributed.Client()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "param_grid = {\n",
+    "    'C': [0.001, 0.1, 1.0, 2.5, 5, 10.0],\n",
+    "    # Uncomment this for larger Grid searches on a cluster\n",
+    "    # 'kernel': ['rbf', 'poly', 'linear'],\n",
+    "    # 'shrinking': [True, False],\n",
+    "}\n",
+    "\n",
+    "grid_search = GridSearchCV(estimator, param_grid, verbose=2, cv=5, n_jobs=-1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "with joblib.parallel_backend(\"dask\", scatter=[X, y]):\n",
+    "    grid_search.fit(X, y)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "grid_search.best_params_, grid_search.best_score_"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Training on Large Datasets\n",
+    "\n",
+    "Sometimes you'll want to train on a larger than memory dataset. `dask-ml` has implemented estimators that work well on dask arrays and dataframes that may be larger than your machine's RAM."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import dask.array as da\n",
+    "import dask.delayed\n",
+    "from sklearn.datasets import make_blobs\n",
+    "import numpy as np"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We'll make a small (random) dataset locally using scikit-learn."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n_centers = 12\n",
+    "n_features = 20\n",
+    "\n",
+    "X_small, y_small = make_blobs(n_samples=1000, centers=n_centers, n_features=n_features, random_state=0)\n",
+    "\n",
+    "centers = np.zeros((n_centers, n_features))\n",
+    "\n",
+    "for i in range(n_centers):\n",
+    "    centers[i] = X_small[y_small == i].mean(0)\n",
+    "    \n",
+    "centers[:4]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The small dataset will be the template for our large random dataset.\n",
+    "We'll use `dask.delayed` to adapt `sklearn.datasets.make_blobs`, so that the actual dataset is being generated on our workers. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n_samples_per_block = 200000\n",
+    "n_blocks = 500\n",
+    "\n",
+    "delayeds = [dask.delayed(make_blobs)(n_samples=n_samples_per_block,\n",
+    "                                     centers=centers,\n",
+    "                                     n_features=n_features,\n",
+    "                                     random_state=i)[0]\n",
+    "            for i in range(n_blocks)]\n",
+    "arrays = [da.from_delayed(obj, shape=(n_samples_per_block, n_features), dtype=X.dtype)\n",
+    "          for obj in delayeds]\n",
+    "X = da.concatenate(arrays)\n",
+    "X"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X = X.persist()  # Only run this on the cluster."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The algorithms implemented in Dask-ML are scalable. They handle larger-than-memory datasets just fine.\n",
+    "\n",
+    "They follow the scikit-learn API, so if you're familiar with scikit-learn, you'll feel at home with Dask-ML."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dask_ml.cluster import KMeans"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clf = KMeans(init_max_iter=3, oversampling_factor=10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%time clf.fit(X)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clf.labels_"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clf.labels_[:10].compute()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.6"
+  },
+  "nbsphinx": {
+   "execute": "never"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/dask/accounts.py b/dask/accounts.py
new file mode 100644
index 0000000..b364805
--- /dev/null
+++ b/dask/accounts.py
@@ -0,0 +1,55 @@
+import numpy as np
+import pandas as pd
+
+names = ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith', 'Frank', 'George',
+'Hannah', 'Ingrid', 'Jerry', 'Kevin', 'Laura', 'Michael', 'Norbert', 'Oliver',
+'Patricia', 'Quinn', 'Ray', 'Sarah', 'Tim', 'Ursula', 'Victor', 'Wendy',
+'Xavier', 'Yvonne', 'Zelda']
+
+k = 100
+
+
+def account_params(k):
+    ids = np.arange(k, dtype=int)
+    names2 = np.random.choice(names, size=k, replace=True)
+    wealth_mag = np.random.exponential(100, size=k)
+    wealth_trend = np.random.normal(10, 10, size=k)
+    freq = np.random.exponential(size=k)
+    freq /= freq.sum()
+
+    return ids, names2, wealth_mag, wealth_trend, freq
+
+def account_entries(n, ids, names, wealth_mag, wealth_trend, freq):
+    indices = np.random.choice(ids, size=n, replace=True, p=freq)
+    amounts = ((np.random.normal(size=n) + wealth_trend[indices])
+                                         * wealth_mag[indices])
+
+    return pd.DataFrame({'id': indices,
+                         'names': names[indices],
+                         'amount': amounts.astype('i4')},
+                         columns=['id', 'names', 'amount'])
+
+
+def accounts(n, k):
+    ids, names, wealth_mag, wealth_trend, freq = account_params(k)
+    df = account_entries(n, ids, names, wealth_mag, wealth_trend, freq)
+    return df
+
+
+def json_entries(n, *args):
+    df = account_entries(n, *args)
+    g = df.groupby(df.id).groups
+
+    data = []
+    for k in g:
+        sub = df.iloc[g[k]]
+        d = dict(id=int(k), name=sub['names'].iloc[0],
+                transactions=[{'transaction-id': int(i), 'amount': int(a)}
+                              for i, a in list(zip(sub.index, sub.amount))])
+        data.append(d)
+
+    return data
+
+def accounts_json(n, k):
+    args = account_params(k)
+    return json_entries(n, *args)
diff --git a/dask/conf.py b/dask/conf.py
new file mode 100644
index 0000000..e2551c7
--- /dev/null
+++ b/dask/conf.py
@@ -0,0 +1,172 @@
+# -*- coding: utf-8 -*-
+#
+# Configuration file for the Sphinx documentation builder.
+#
+# This file does only contain a selection of the most common options. For a
+# full list see the documentation:
+# http://www.sphinx-doc.org/en/master/config
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+
+
+# -- Project information -----------------------------------------------------
+
+project = 'Dask Tutorial'
+copyright = '2018, Dask Developers'
+author = 'Dask Developers'
+
+# The short X.Y version
+version = ''
+# The full version, including alpha/beta/rc tags
+release = ''
+
+
+# -- General configuration ---------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#
+# needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    "sphinx.ext.mathjax",
+    'nbsphinx',
+]
+
+nbsphinx_timeout = 600
+# nbsphinx_execute = "always"
+
+
+nbsphinx_prolog = """
+{% set docname = env.doc2path(env.docname, base=None) %}
+
+You can run this notebook in a `live session <https://mybinder.org/v2/gh/dask/dask-tutorial/master?urlpath=lab/tree/{{
+docname }}>`_ |Binder| or view it `on Github <https://github.com/dask/dask-tutorial/blob/master/{{ docname }}>`_.
+
+.. |Binder| image:: https://mybinder.org/badge.svg
+   :target: https://mybinder.org/v2/gh/dask/dask-tutorial/master?urlpath=lab/tree/{{ docname }}
+"""
+
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+#
+# source_suffix = ['.rst', '.md']
+source_suffix = '.rst'
+
+# The master toctree document.
+master_doc = 'index'
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = None
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path .
+exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store', '**.ipynb_checkpoints']
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'dask_sphinx_theme'
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#
+# html_theme_options = {}
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+# Custom sidebar templates, must be a dictionary that maps document names
+# to template names.
+#
+# The default sidebars (for documents that don't match any pattern) are
+# defined by theme itself.  Builtin themes are using these templates by
+# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
+# 'searchbox.html']``.
+#
+# html_sidebars = {}
+
+
+# -- Options for HTMLHelp output ---------------------------------------------
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'DaskTutorialdoc'
+
+
+# -- Options for LaTeX output ------------------------------------------------
+
+latex_elements = {
+    # The paper size ('letterpaper' or 'a4paper').
+    #
+    # 'papersize': 'letterpaper',
+
+    # The font size ('10pt', '11pt' or '12pt').
+    #
+    # 'pointsize': '10pt',
+
+    # Additional stuff for the LaTeX preamble.
+    #
+    # 'preamble': '',
+
+    # Latex figure (float) alignment
+    #
+    # 'figure_align': 'htbp',
+}
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title,
+#  author, documentclass [howto, manual, or own class]).
+latex_documents = [
+    (master_doc, 'DaskTutorial.tex', 'Dask Tutorial Documentation',
+     'Dask Developers', 'manual'),
+]
+
+
+# -- Options for manual page output ------------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [
+    (master_doc, 'dasktutorial', 'Dask Tutorial Documentation',
+     [author], 1)
+]
+
+
+# -- Options for Texinfo output ----------------------------------------------
+
+# Grouping the document tree into Texinfo files. List of tuples
+# (source start file, target name, title, author,
+#  dir menu entry, description, category)
+texinfo_documents = [
+    (master_doc, 'DaskTutorial', 'Dask Tutorial Documentation',
+     author, 'DaskTutorial', 'One line description of project.',
+     'Miscellaneous'),
+]
diff --git a/dask/data/weather-small/2014-01-01.hdf5 b/dask/data/weather-small/2014-01-01.hdf5
new file mode 100644
index 0000000..27a37b2
Binary files /dev/null and b/dask/data/weather-small/2014-01-01.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-02.hdf5 b/dask/data/weather-small/2014-01-02.hdf5
new file mode 100644
index 0000000..557047d
Binary files /dev/null and b/dask/data/weather-small/2014-01-02.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-03.hdf5 b/dask/data/weather-small/2014-01-03.hdf5
new file mode 100644
index 0000000..8daca98
Binary files /dev/null and b/dask/data/weather-small/2014-01-03.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-04.hdf5 b/dask/data/weather-small/2014-01-04.hdf5
new file mode 100644
index 0000000..2b6730b
Binary files /dev/null and b/dask/data/weather-small/2014-01-04.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-05.hdf5 b/dask/data/weather-small/2014-01-05.hdf5
new file mode 100644
index 0000000..62c6f15
Binary files /dev/null and b/dask/data/weather-small/2014-01-05.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-06.hdf5 b/dask/data/weather-small/2014-01-06.hdf5
new file mode 100644
index 0000000..f3fd869
Binary files /dev/null and b/dask/data/weather-small/2014-01-06.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-07.hdf5 b/dask/data/weather-small/2014-01-07.hdf5
new file mode 100644
index 0000000..aeb6b14
Binary files /dev/null and b/dask/data/weather-small/2014-01-07.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-08.hdf5 b/dask/data/weather-small/2014-01-08.hdf5
new file mode 100644
index 0000000..16b605a
Binary files /dev/null and b/dask/data/weather-small/2014-01-08.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-09.hdf5 b/dask/data/weather-small/2014-01-09.hdf5
new file mode 100644
index 0000000..304e1e9
Binary files /dev/null and b/dask/data/weather-small/2014-01-09.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-10.hdf5 b/dask/data/weather-small/2014-01-10.hdf5
new file mode 100644
index 0000000..95424dc
Binary files /dev/null and b/dask/data/weather-small/2014-01-10.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-11.hdf5 b/dask/data/weather-small/2014-01-11.hdf5
new file mode 100644
index 0000000..7ea2e21
Binary files /dev/null and b/dask/data/weather-small/2014-01-11.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-12.hdf5 b/dask/data/weather-small/2014-01-12.hdf5
new file mode 100644
index 0000000..7266923
Binary files /dev/null and b/dask/data/weather-small/2014-01-12.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-13.hdf5 b/dask/data/weather-small/2014-01-13.hdf5
new file mode 100644
index 0000000..248d49e
Binary files /dev/null and b/dask/data/weather-small/2014-01-13.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-14.hdf5 b/dask/data/weather-small/2014-01-14.hdf5
new file mode 100644
index 0000000..343dee7
Binary files /dev/null and b/dask/data/weather-small/2014-01-14.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-15.hdf5 b/dask/data/weather-small/2014-01-15.hdf5
new file mode 100644
index 0000000..dfbee54
Binary files /dev/null and b/dask/data/weather-small/2014-01-15.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-16.hdf5 b/dask/data/weather-small/2014-01-16.hdf5
new file mode 100644
index 0000000..ff037da
Binary files /dev/null and b/dask/data/weather-small/2014-01-16.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-17.hdf5 b/dask/data/weather-small/2014-01-17.hdf5
new file mode 100644
index 0000000..535832a
Binary files /dev/null and b/dask/data/weather-small/2014-01-17.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-18.hdf5 b/dask/data/weather-small/2014-01-18.hdf5
new file mode 100644
index 0000000..b92d83a
Binary files /dev/null and b/dask/data/weather-small/2014-01-18.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-19.hdf5 b/dask/data/weather-small/2014-01-19.hdf5
new file mode 100644
index 0000000..fc6947b
Binary files /dev/null and b/dask/data/weather-small/2014-01-19.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-20.hdf5 b/dask/data/weather-small/2014-01-20.hdf5
new file mode 100644
index 0000000..9be88c2
Binary files /dev/null and b/dask/data/weather-small/2014-01-20.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-21.hdf5 b/dask/data/weather-small/2014-01-21.hdf5
new file mode 100644
index 0000000..83eef57
Binary files /dev/null and b/dask/data/weather-small/2014-01-21.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-22.hdf5 b/dask/data/weather-small/2014-01-22.hdf5
new file mode 100644
index 0000000..4cd5836
Binary files /dev/null and b/dask/data/weather-small/2014-01-22.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-23.hdf5 b/dask/data/weather-small/2014-01-23.hdf5
new file mode 100644
index 0000000..012834d
Binary files /dev/null and b/dask/data/weather-small/2014-01-23.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-24.hdf5 b/dask/data/weather-small/2014-01-24.hdf5
new file mode 100644
index 0000000..4a07078
Binary files /dev/null and b/dask/data/weather-small/2014-01-24.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-25.hdf5 b/dask/data/weather-small/2014-01-25.hdf5
new file mode 100644
index 0000000..4f76ec6
Binary files /dev/null and b/dask/data/weather-small/2014-01-25.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-26.hdf5 b/dask/data/weather-small/2014-01-26.hdf5
new file mode 100644
index 0000000..10ebb51
Binary files /dev/null and b/dask/data/weather-small/2014-01-26.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-27.hdf5 b/dask/data/weather-small/2014-01-27.hdf5
new file mode 100644
index 0000000..653feb4
Binary files /dev/null and b/dask/data/weather-small/2014-01-27.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-28.hdf5 b/dask/data/weather-small/2014-01-28.hdf5
new file mode 100644
index 0000000..c7034ef
Binary files /dev/null and b/dask/data/weather-small/2014-01-28.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-29.hdf5 b/dask/data/weather-small/2014-01-29.hdf5
new file mode 100644
index 0000000..06fe045
Binary files /dev/null and b/dask/data/weather-small/2014-01-29.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-30.hdf5 b/dask/data/weather-small/2014-01-30.hdf5
new file mode 100644
index 0000000..73e0f4e
Binary files /dev/null and b/dask/data/weather-small/2014-01-30.hdf5 differ
diff --git a/dask/data/weather-small/2014-01-31.hdf5 b/dask/data/weather-small/2014-01-31.hdf5
new file mode 100644
index 0000000..c775274
Binary files /dev/null and b/dask/data/weather-small/2014-01-31.hdf5 differ
diff --git a/dask/images/array.png b/dask/images/array.png
new file mode 100644
index 0000000..7ddb6e4
Binary files /dev/null and b/dask/images/array.png differ
diff --git a/dask/images/dask-dataframe.svg b/dask/images/dask-dataframe.svg
new file mode 100644
index 0000000..7d37123
--- /dev/null
+++ b/dask/images/dask-dataframe.svg
@@ -0,0 +1,225 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+   xmlns:dc="http://purl.org/dc/elements/1.1/"
+   xmlns:cc="http://creativecommons.org/ns#"
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+   xmlns:svg="http://www.w3.org/2000/svg"
+   xmlns="http://www.w3.org/2000/svg"
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+   width="812.37378"
+   height="1011.8721"
+   id="svg2"
+   version="1.1"
+   inkscape:version="0.48.4 r9939"
+   sodipodi:docname="New document 1">
+  <defs
+     id="defs4" />
+  <sodipodi:namedview
+     id="base"
+     pagecolor="#ffffff"
+     bordercolor="#666666"
+     borderopacity="1.0"
+     inkscape:pageopacity="0.0"
+     inkscape:pageshadow="2"
+     inkscape:zoom="0.5"
+     inkscape:cx="68.623957"
+     inkscape:cy="300.16925"
+     inkscape:document-units="px"
+     inkscape:current-layer="layer1"
+     showgrid="true"
+     fit-margin-top="0"
+     fit-margin-left="0"
+     fit-margin-right="0"
+     fit-margin-bottom="0"
+     inkscape:window-width="1600"
+     inkscape:window-height="876"
+     inkscape:window-x="0"
+     inkscape:window-y="24"
+     inkscape:window-maximized="1">
+    <inkscape:grid
+       type="xygrid"
+       id="grid2985"
+       empspacing="5"
+       visible="true"
+       enabled="true"
+       snapvisiblegridlinesonly="true"
+       spacingx="2px"
+       spacingy="2px"
+       originx="45.029487px"
+       originy="-244.02136px" />
+  </sodipodi:namedview>
+  <metadata
+     id="metadata7">
+    <rdf:RDF>
+      <cc:Work
+         rdf:about="">
+        <dc:format>image/svg+xml</dc:format>
+        <dc:type
+           rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
+        <dc:title></dc:title>
+      </cc:Work>
+    </rdf:RDF>
+  </metadata>
+  <g
+     inkscape:label="Layer 1"
+     inkscape:groupmode="layer"
+     id="layer1"
+     transform="translate(45.029487,203.53131)">
+    <rect
+       style="opacity:0.6;fill:#0000b0;fill-opacity:0.50196078;stroke:#000000;stroke-width:5.58885813;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-opacity:0.44292238;stroke-dasharray:none;stroke-dashoffset:0"
+       id="rect2987"
+       width="251.49863"
+       height="195.61005"
+       x="246.14598"
+       y="-200.72348" />
+    <rect
+       style="opacity:0.6;fill:#0000b0;fill-opacity:0.50196078;stroke:#000000;stroke-width:5.58885813;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-opacity:0.44292238;stroke-dasharray:none;stroke-dashoffset:0"
+       id="rect3796"
+       width="251.49863"
+       height="139.72145"
+       x="246.14598"
+       y="22.830854" />
+    <rect
+       style="opacity:0.6;fill:#0000b0;fill-opacity:0.50196078;stroke:#000000;stroke-width:5.58885813;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-opacity:0.44292238;stroke-dasharray:none;stroke-dashoffset:0"
+       id="rect3798"
+       width="251.49863"
+       height="307.38721"
+       x="246.14598"
+       y="190.4966" />
+    <rect
+       style="opacity:0.6;fill:#0000b0;fill-opacity:0.50196078;stroke:#000000;stroke-width:5.58885813;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-opacity:0.44292238;stroke-dasharray:none;stroke-dashoffset:0"
+       id="rect3800"
+       width="251.49863"
+       height="167.66576"
+       x="246.14598"
+       y="525.82812" />
+    <rect
+       style="opacity:0.6;fill:#0000b0;fill-opacity:0.50196078;stroke:#000000;stroke-width:5.58885813;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-opacity:0.44292238;stroke-dasharray:none;stroke-dashoffset:0"
+       id="rect3802"
+       width="251.49863"
+       height="83.832878"
+       x="246.14598"
+       y="721.43817" />
+    <text
+       xml:space="preserve"
+       style="font-size:33.53314972px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Ubuntu;-inkscape-font-specification:Ubuntu"
+       x="78.480225"
+       y="-116.8906"
+       id="text3804"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan3806"
+         x="78.480225"
+         y="-116.8906"
+         style="font-size:39.12200928px">January, 2016</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-size:33.53314972px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Ubuntu;-inkscape-font-specification:Ubuntu"
+       x="78.480225"
+       y="106.66373"
+       id="text3804-1"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan3806-5"
+         x="78.480225"
+         y="106.66373"
+         style="font-size:39.12200928px">Febrary, 2016</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-size:33.53314972px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Ubuntu;-inkscape-font-specification:Ubuntu"
+       x="78.480225"
+       y="358.16235"
+       id="text3804-3"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan3806-9"
+         x="78.480225"
+         y="358.16235"
+         style="font-size:39.12200928px">March, 2016</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-size:33.53314972px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Ubuntu;-inkscape-font-specification:Ubuntu"
+       x="78.480225"
+       y="609.66095"
+       id="text3804-5"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan3806-52"
+         x="78.480225"
+         y="609.66095"
+         style="font-size:39.12200928px">April, 2016</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-size:33.53314972px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Ubuntu;-inkscape-font-specification:Ubuntu"
+       x="78.480225"
+       y="777.32672"
+       id="text3804-51"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan3806-7"
+         x="78.480225"
+         y="777.32672"
+         style="font-size:39.12200928px">May, 2016</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-size:33.53314972px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Ubuntu;-inkscape-font-specification:Ubuntu"
+       x="671.9798"
+       y="78.719437"
+       id="text3886"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan3888"
+         x="676.48798"
+         y="78.719437"
+         style="font-size:39.12200928px">Pandas </tspan><tspan
+         sodipodi:role="line"
+         x="671.9798"
+         y="127.62195"
+         style="font-size:39.12200928px"
+         id="tspan3894">DataFrame</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-size:122.2452774px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Ubuntu;-inkscape-font-specification:Ubuntu"
+       x="645.08667"
+       y="114.33765"
+       id="text3890"
+       sodipodi:linespacing="125%"
+       transform="scale(0.849412,1.177285)"><tspan
+         sodipodi:role="line"
+         id="tspan3892"
+         x="645.08667"
+         y="114.33765">}</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-size:33.53314972px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Ubuntu;-inkscape-font-specification:Ubuntu"
+       x="661.44617"
+       y="296.68491"
+       id="text3896"
+       sodipodi:linespacing="125%"><tspan
+         sodipodi:role="line"
+         id="tspan3898"
+         x="665.95435"
+         y="296.68491"
+         style="font-size:39.12200928px">Dask </tspan><tspan
+         sodipodi:role="line"
+         x="661.44617"
+         y="345.5874"
+         id="tspan3904"
+         style="font-size:39.12200928px">DataFrame</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-size:355.3336792px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Ubuntu;-inkscape-font-specification:Ubuntu"
+       x="1527.4156"
+       y="209.45833"
+       id="text3900"
+       sodipodi:linespacing="125%"
+       transform="scale(0.34044434,2.9373377)"><tspan
+         sodipodi:role="line"
+         id="tspan3902"
+         x="1527.4156"
+         y="209.45833">}</tspan></text>
+  </g>
+</svg>
diff --git a/dask/images/dask_horizontal.svg b/dask/images/dask_horizontal.svg
new file mode 100644
index 0000000..e0b8060
--- /dev/null
+++ b/dask/images/dask_horizontal.svg
@@ -0,0 +1 @@
+<svg id="Layer_1" data-name="Layer 1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 550 247.95"><defs><linearGradient id="linear-gradient" x1="154.55" y1="173.33" x2="242.36" y2="173.33" gradientTransform="translate(-26.62 -73.73) rotate(7.91)" gradientUnits="userSpaceOnUse"><stop offset="0.01" stop-color="#c7422f"/><stop offset="0.37" stop-color="#d46e43"/><stop offset="1" stop-color="#eeb575"/></linearGradient><linearGradient id="linear-gradient-2" x1="181.83" y1="171.07" x2="221.39" y2="171.07" gradientTransform="translate(-26.62 -73.73) rotate(7.91)" gradientUnits="userSpaceOnUse"><stop offset="0.21" stop-color="#cf603b"/><stop offset="1" stop-color="#eeb575"/></linearGradient><linearGradient id="linear-gradient-3" x1="107.2" y1="175.53" x2="204.37" y2="175.53" xlink:href="#linear-gradient-2"/></defs><title>dask</title><path d="M214.33,85.8h36.1c24.73,0.28,30.29,21.36,30.29,40,0,39.54-20.35,39.14-30.29,39.14h-36.1V85.8Zm34.74,64.52c12.94,0,15.6-11.66,15.6-24.55,0-18.13-6.31-25.38-15.6-25.38h-18.7v49.93h18.7Z" style="fill:#101011"/><path d="M311.7,86.08h18.46l27.18,78.84h-17L333.87,147H306.31l-6.92,17.88h-17Zm18.18,47.34-9.09-26.87-10,26.87h19Z" style="fill:#101011"/><path d="M364.71,106.58c0-11.68,4.7-20.91,20.54-20.91,7.19,0,27.91,1.13,38.24,3v11.76s-21.49-.8-35.82-0.8c-5.45,0-6.91,3.45-6.91,7.19v5.89c0,6.07,2.67,6.19,6.91,6.19h20.61c14,0,18.95,8.92,18.95,20v7.75c0,15.95-9.55,19.64-18.95,19.64-6.58,0-35.16-.8-40.53-3.24V151.58s23.47,0.61,37.14.61c4.9,0,6.29-5.82,6.29-5.82v-7c0-4-1.56-5.91-6.29-5.91H384.76c-14.5,0-20.05-6.54-20.05-19.81v-7.1Z" style="fill:#101011"/><path d="M438.85,86.08h15.56v33.19h8.64l22.75-33.19h18.67l-27.35,40.11,27.35,38.87H485.81l-23-31.67h-8.37v31.67H438.85v-79Z" style="fill:#101011"/><path d="M192.41,110.26q0.17-1.83.29-3.66a119.55,119.55,0,0,0-12.24-60.92L173.64,32l-3.16,15a109,109,0,0,1-64.2,77.79l-4.69,2,1.78,4.78a107.9,107.9,0,0,1,6.31,48.3A109.44,109.44,0,0,1,104,205.75L100.36,216l10.28-3.38A119.71,119.71,0,0,0,192.41,110.26ZM122.68,196l-5.48,2.79,1.2-6a120.35,120.35,0,0,0,1.75-12,118.87,118.87,0,0,0-4.52-45.89l-0.73-2.41,2.25-1.12A120.24,120.24,0,0,0,173.4,72.22l3.51-8L179,72.7a107.63,107.63,0,0,1,2.69,36.54A108.48,108.48,0,0,1,122.68,196Z" style="fill:url(#linear-gradient)"/><path d="M166.91,116.14c4.13-9.12,8.42-31.77,8.15-33.46A126,126,0,0,1,160,105.07c-0.85,2.24-1.74,4.47-2.74,6.67h0a108.87,108.87,0,0,1-31.62,40.47q0.85,6.14,1.07,12.36A119.4,119.4,0,0,0,166.91,116.14Z" style="fill:url(#linear-gradient-2)"/><path d="M104.08,165.48a109,109,0,0,1-30.87,9.17l-6.08.86,3.08-5.31a120.74,120.74,0,0,0,5.54-10.74,118.79,118.79,0,0,0,10.62-44.87l0.09-2.51,2.49-.33a120.18,120.18,0,0,0,54-21.47A102.79,102.79,0,0,0,161.57,56.9a109,109,0,0,1-80.84,45l-5.08.35,0.13,5.1a107.92,107.92,0,0,1-9.72,47.73,109.43,109.43,0,0,1-13.83,22.67l-6.71,8.49,10.82,0.14a119.25,119.25,0,0,0,47.52-9.26A104.21,104.21,0,0,0,104.08,165.48Z" style="fill:url(#linear-gradient-3)"/></svg>
\ No newline at end of file
diff --git a/dask/images/distributed_session.png b/dask/images/distributed_session.png
new file mode 100644
index 0000000..711be1a
Binary files /dev/null and b/dask/images/distributed_session.png differ
diff --git a/dask/images/embarrassing.gif b/dask/images/embarrassing.gif
new file mode 100644
index 0000000..90a098e
Binary files /dev/null and b/dask/images/embarrassing.gif differ
diff --git a/dask/images/fail-case.gif b/dask/images/fail-case.gif
new file mode 100644
index 0000000..e662219
Binary files /dev/null and b/dask/images/fail-case.gif differ
diff --git a/dask/images/generic-dask.png b/dask/images/generic-dask.png
new file mode 100644
index 0000000..beb1a2a
Binary files /dev/null and b/dask/images/generic-dask.png differ
diff --git a/dask/images/grid_search_schedule.gif b/dask/images/grid_search_schedule.gif
new file mode 100644
index 0000000..ef9e64c
Binary files /dev/null and b/dask/images/grid_search_schedule.gif differ
diff --git a/dask/images/hdd.jpg b/dask/images/hdd.jpg
new file mode 100644
index 0000000..9b9b9ac
Binary files /dev/null and b/dask/images/hdd.jpg differ
diff --git a/dask/images/merged_grid_search_graph.svg b/dask/images/merged_grid_search_graph.svg
new file mode 100644
index 0000000..d2266d5
--- /dev/null
+++ b/dask/images/merged_grid_search_graph.svg
@@ -0,0 +1,152 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+ "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Generated by graphviz version 2.38.0 (20140413.2041)
+ -->
+<!-- Title: merged Pages: 1 -->
+<svg width="1004pt" height="389pt"
+ viewBox="0.00 0.00 1004.32 388.85" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 384.853)">
+<title>merged</title>
+<polygon fill="white" stroke="none" points="-4,4 -4,-384.853 1000.32,-384.853 1000.32,4 -4,4"/>
+<!-- data -->
+<g id="node1" class="node"><title>data</title>
+<ellipse fill="none" stroke="black" cx="539.66" cy="-18" rx="70.4255" ry="18"/>
+<text text-anchor="middle" x="539.66" y="-14.514" font-family="Inconsolata" font-size="14.00">Training Data</text>
+</g>
+<!-- vect1 -->
+<g id="node2" class="node"><title>vect1</title>
+<ellipse fill="none" stroke="black" cx="539.66" cy="-99.4754" rx="110.118" ry="27.4509"/>
+<text text-anchor="middle" x="539.66" y="-103.704" font-family="Inconsolata" font-size="14.00">CountVectorizer</text>
+<text text-anchor="middle" x="539.66" y="-88.2754" font-family="Inconsolata" font-size="14.00">&#45; ngram_range=(1, 1)</text>
+</g>
+<!-- data&#45;&gt;vect1 -->
+<g id="edge1" class="edge"><title>data&#45;&gt;vect1</title>
+<path fill="none" stroke="black" d="M539.66,-36.245C539.66,-43.7264 539.66,-52.7938 539.66,-61.6655"/>
+<polygon fill="black" stroke="black" points="536.161,-61.8835 539.66,-71.8835 543.161,-61.8835 536.161,-61.8835"/>
+</g>
+<!-- tfidf_1_1 -->
+<g id="node3" class="node"><title>tfidf_1_1</title>
+<ellipse fill="none" stroke="black" cx="328.66" cy="-190.426" rx="90.5193" ry="27.4509"/>
+<text text-anchor="middle" x="328.66" y="-194.654" font-family="Inconsolata" font-size="14.00">TfidfTransformer</text>
+<text text-anchor="middle" x="328.66" y="-179.226" font-family="Inconsolata" font-size="14.00">&#45; norm=&#39;l1&#39;</text>
+</g>
+<!-- vect1&#45;&gt;tfidf_1_1 -->
+<g id="edge2" class="edge"><title>vect1&#45;&gt;tfidf_1_1</title>
+<path fill="none" stroke="black" d="M485.048,-123.498C455.85,-135.807 419.814,-150.999 389.695,-163.696"/>
+<polygon fill="black" stroke="black" points="387.995,-160.614 380.14,-167.724 390.715,-167.065 387.995,-160.614"/>
+</g>
+<!-- tfidf_1_2 -->
+<g id="node4" class="node"><title>tfidf_1_2</title>
+<ellipse fill="none" stroke="black" cx="666.66" cy="-190.426" rx="90.5193" ry="27.4509"/>
+<text text-anchor="middle" x="666.66" y="-194.654" font-family="Inconsolata" font-size="14.00">TfidfTransformer</text>
+<text text-anchor="middle" x="666.66" y="-179.226" font-family="Inconsolata" font-size="14.00">&#45; norm=&#39;l2&#39;</text>
+</g>
+<!-- vect1&#45;&gt;tfidf_1_2 -->
+<g id="edge9" class="edge"><title>vect1&#45;&gt;tfidf_1_2</title>
+<path fill="none" stroke="black" d="M575.375,-125.49C590.232,-135.896 607.616,-148.071 623.204,-158.99"/>
+<polygon fill="black" stroke="black" points="621.454,-162.037 631.652,-164.907 625.47,-156.303 621.454,-162.037"/>
+</g>
+<!-- sgd_1_1_1 -->
+<g id="node5" class="node"><title>sgd_1_1_1</title>
+<ellipse fill="none" stroke="black" cx="75.6604" cy="-281.377" rx="75.8212" ry="27.4509"/>
+<text text-anchor="middle" x="75.6604" y="-285.605" font-family="Inconsolata" font-size="14.00">SGDClassifier</text>
+<text text-anchor="middle" x="75.6604" y="-270.177" font-family="Inconsolata" font-size="14.00">&#45; alpha=1e&#45;3</text>
+</g>
+<!-- tfidf_1_1&#45;&gt;sgd_1_1_1 -->
+<g id="edge3" class="edge"><title>tfidf_1_1&#45;&gt;sgd_1_1_1</title>
+<path fill="none" stroke="black" d="M270.934,-211.722C231.374,-225.631 178.922,-244.072 138.342,-258.339"/>
+<polygon fill="black" stroke="black" points="137.096,-255.067 128.823,-261.686 139.418,-261.671 137.096,-255.067"/>
+</g>
+<!-- sgd_1_1_2 -->
+<g id="node6" class="node"><title>sgd_1_1_2</title>
+<ellipse fill="none" stroke="black" cx="244.66" cy="-281.377" rx="75.8212" ry="27.4509"/>
+<text text-anchor="middle" x="244.66" y="-285.605" font-family="Inconsolata" font-size="14.00">SGDClassifier</text>
+<text text-anchor="middle" x="244.66" y="-270.177" font-family="Inconsolata" font-size="14.00">&#45; alpha=1e&#45;4</text>
+</g>
+<!-- tfidf_1_1&#45;&gt;sgd_1_1_2 -->
+<g id="edge5" class="edge"><title>tfidf_1_1&#45;&gt;sgd_1_1_2</title>
+<path fill="none" stroke="black" d="M304.364,-217.155C295.406,-226.64 285.133,-237.519 275.663,-247.548"/>
+<polygon fill="black" stroke="black" points="272.933,-245.34 268.612,-255.014 278.023,-250.146 272.933,-245.34"/>
+</g>
+<!-- sgd_1_1_3 -->
+<g id="node7" class="node"><title>sgd_1_1_3</title>
+<ellipse fill="none" stroke="black" cx="413.66" cy="-281.377" rx="75.8212" ry="27.4509"/>
+<text text-anchor="middle" x="413.66" y="-285.605" font-family="Inconsolata" font-size="14.00">SGDClassifier</text>
+<text text-anchor="middle" x="413.66" y="-270.177" font-family="Inconsolata" font-size="14.00">&#45; alpha=1e&#45;5</text>
+</g>
+<!-- tfidf_1_1&#45;&gt;sgd_1_1_3 -->
+<g id="edge7" class="edge"><title>tfidf_1_1&#45;&gt;sgd_1_1_3</title>
+<path fill="none" stroke="black" d="M353.246,-217.155C362.311,-226.64 372.706,-237.519 382.289,-247.548"/>
+<polygon fill="black" stroke="black" points="379.985,-250.202 389.424,-255.014 385.045,-245.366 379.985,-250.202"/>
+</g>
+<!-- sgd_1_2_1 -->
+<g id="node8" class="node"><title>sgd_1_2_1</title>
+<ellipse fill="none" stroke="black" cx="582.66" cy="-281.377" rx="75.8212" ry="27.4509"/>
+<text text-anchor="middle" x="582.66" y="-285.605" font-family="Inconsolata" font-size="14.00">SGDClassifier</text>
+<text text-anchor="middle" x="582.66" y="-270.177" font-family="Inconsolata" font-size="14.00">&#45; alpha=1e&#45;3</text>
+</g>
+<!-- tfidf_1_2&#45;&gt;sgd_1_2_1 -->
+<g id="edge10" class="edge"><title>tfidf_1_2&#45;&gt;sgd_1_2_1</title>
+<path fill="none" stroke="black" d="M642.364,-217.155C633.406,-226.64 623.133,-237.519 613.663,-247.548"/>
+<polygon fill="black" stroke="black" points="610.933,-245.34 606.612,-255.014 616.023,-250.146 610.933,-245.34"/>
+</g>
+<!-- sgd_1_2_2 -->
+<g id="node9" class="node"><title>sgd_1_2_2</title>
+<ellipse fill="none" stroke="black" cx="751.66" cy="-281.377" rx="75.8212" ry="27.4509"/>
+<text text-anchor="middle" x="751.66" y="-285.605" font-family="Inconsolata" font-size="14.00">SGDClassifier</text>
+<text text-anchor="middle" x="751.66" y="-270.177" font-family="Inconsolata" font-size="14.00">&#45; alpha=1e&#45;4</text>
+</g>
+<!-- tfidf_1_2&#45;&gt;sgd_1_2_2 -->
+<g id="edge12" class="edge"><title>tfidf_1_2&#45;&gt;sgd_1_2_2</title>
+<path fill="none" stroke="black" d="M691.246,-217.155C700.311,-226.64 710.706,-237.519 720.289,-247.548"/>
+<polygon fill="black" stroke="black" points="717.985,-250.202 727.424,-255.014 723.045,-245.366 717.985,-250.202"/>
+</g>
+<!-- sgd_1_2_3 -->
+<g id="node10" class="node"><title>sgd_1_2_3</title>
+<ellipse fill="none" stroke="black" cx="920.66" cy="-281.377" rx="75.8212" ry="27.4509"/>
+<text text-anchor="middle" x="920.66" y="-285.605" font-family="Inconsolata" font-size="14.00">SGDClassifier</text>
+<text text-anchor="middle" x="920.66" y="-270.177" font-family="Inconsolata" font-size="14.00">&#45; alpha=1e&#45;5</text>
+</g>
+<!-- tfidf_1_2&#45;&gt;sgd_1_2_3 -->
+<g id="edge14" class="edge"><title>tfidf_1_2&#45;&gt;sgd_1_2_3</title>
+<path fill="none" stroke="black" d="M724.615,-211.722C764.331,-225.631 816.991,-244.072 857.732,-258.339"/>
+<polygon fill="black" stroke="black" points="856.692,-261.684 867.287,-261.686 859.006,-255.077 856.692,-261.684"/>
+</g>
+<!-- best -->
+<g id="node11" class="node"><title>best</title>
+<ellipse fill="none" stroke="black" cx="497.66" cy="-362.853" rx="111.9" ry="18"/>
+<text text-anchor="middle" x="497.66" y="-359.367" font-family="Inconsolata" font-size="14.00">Choose Best Parameters</text>
+</g>
+<!-- sgd_1_1_1&#45;&gt;best -->
+<g id="edge4" class="edge"><title>sgd_1_1_1&#45;&gt;best</title>
+<path fill="none" stroke="black" d="M129.553,-300.672C139.493,-303.671 149.835,-306.548 159.66,-308.853 239.416,-327.563 331.34,-341.497 399.172,-350.369"/>
+<polygon fill="black" stroke="black" points="399.056,-353.883 409.423,-351.696 399.955,-346.941 399.056,-353.883"/>
+</g>
+<!-- sgd_1_1_2&#45;&gt;best -->
+<g id="edge6" class="edge"><title>sgd_1_1_2&#45;&gt;best</title>
+<path fill="none" stroke="black" d="M300.685,-300.048C310.027,-302.99 319.617,-306.009 328.66,-308.853 365.346,-320.386 406.521,-333.3 439.044,-343.492"/>
+<polygon fill="black" stroke="black" points="438.255,-346.913 448.844,-346.564 440.349,-340.234 438.255,-346.913"/>
+</g>
+<!-- sgd_1_1_3&#45;&gt;best -->
+<g id="edge8" class="edge"><title>sgd_1_1_3&#45;&gt;best</title>
+<path fill="none" stroke="black" d="M440.239,-307.524C450.503,-317.236 462.196,-328.298 472.338,-337.895"/>
+<polygon fill="black" stroke="black" points="470.074,-340.57 479.743,-344.901 474.885,-335.485 470.074,-340.57"/>
+</g>
+<!-- sgd_1_2_1&#45;&gt;best -->
+<g id="edge11" class="edge"><title>sgd_1_2_1&#45;&gt;best</title>
+<path fill="none" stroke="black" d="M555.766,-307.524C545.379,-317.236 533.547,-328.298 523.284,-337.895"/>
+<polygon fill="black" stroke="black" points="520.705,-335.514 515.791,-344.901 525.486,-340.627 520.705,-335.514"/>
+</g>
+<!-- sgd_1_2_2&#45;&gt;best -->
+<g id="edge13" class="edge"><title>sgd_1_2_2&#45;&gt;best</title>
+<path fill="none" stroke="black" d="M695.64,-300.061C686.298,-303.002 676.707,-306.017 667.66,-308.853 630.768,-320.418 589.349,-333.332 556.632,-343.517"/>
+<polygon fill="black" stroke="black" points="555.28,-340.272 546.772,-346.586 557.36,-346.956 555.28,-340.272"/>
+</g>
+<!-- sgd_1_2_3&#45;&gt;best -->
+<g id="edge15" class="edge"><title>sgd_1_2_3&#45;&gt;best</title>
+<path fill="none" stroke="black" d="M866.768,-300.675C856.828,-303.675 846.486,-306.55 836.66,-308.853 756.67,-327.599 664.474,-341.53 596.441,-350.392"/>
+<polygon fill="black" stroke="black" points="595.631,-346.968 586.16,-351.717 596.525,-353.91 595.631,-346.968"/>
+</g>
+</g>
+</svg>
diff --git a/dask/images/ml-dimensions.png b/dask/images/ml-dimensions.png
new file mode 100644
index 0000000..dc42654
Binary files /dev/null and b/dask/images/ml-dimensions.png differ
diff --git a/dask/images/pandas_logo.png b/dask/images/pandas_logo.png
new file mode 100644
index 0000000..66dd4b1
Binary files /dev/null and b/dask/images/pandas_logo.png differ
diff --git a/dask/images/tasks.png b/dask/images/tasks.png
new file mode 100644
index 0000000..d208ecb
Binary files /dev/null and b/dask/images/tasks.png differ
diff --git a/dask/images/ui.png b/dask/images/ui.png
new file mode 100644
index 0000000..4610900
Binary files /dev/null and b/dask/images/ui.png differ
diff --git a/dask/images/unmerged_grid_search_graph.svg b/dask/images/unmerged_grid_search_graph.svg
new file mode 100644
index 0000000..6fc9419
--- /dev/null
+++ b/dask/images/unmerged_grid_search_graph.svg
@@ -0,0 +1,251 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+ "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Generated by graphviz version 2.38.0 (20140413.2041)
+ -->
+<!-- Title: unmerged Pages: 1 -->
+<svg width="1424pt" height="389pt"
+ viewBox="0.00 0.00 1423.62 388.85" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 384.853)">
+<title>unmerged</title>
+<polygon fill="white" stroke="none" points="-4,4 -4,-384.853 1419.62,-384.853 1419.62,4 -4,4"/>
+<!-- data -->
+<g id="node1" class="node"><title>data</title>
+<ellipse fill="none" stroke="black" cx="707.309" cy="-18" rx="70.4255" ry="18"/>
+<text text-anchor="middle" x="707.309" y="-14.514" font-family="Inconsolata" font-size="14.00">Training Data</text>
+</g>
+<!-- vect1 -->
+<g id="node2" class="node"><title>vect1</title>
+<ellipse fill="none" stroke="black" cx="110.309" cy="-99.4754" rx="110.118" ry="27.4509"/>
+<text text-anchor="middle" x="110.309" y="-103.704" font-family="Inconsolata" font-size="14.00">CountVectorizer</text>
+<text text-anchor="middle" x="110.309" y="-88.2754" font-family="Inconsolata" font-size="14.00">&#45; ngram_range=(1, 1)</text>
+</g>
+<!-- data&#45;&gt;vect1 -->
+<g id="edge1" class="edge"><title>data&#45;&gt;vect1</title>
+<path fill="none" stroke="black" d="M641.03,-24.1191C548.625,-31.7659 376.018,-47.9016 230.309,-72 219.914,-73.7191 209.065,-75.7601 198.323,-77.9349"/>
+<polygon fill="black" stroke="black" points="197.588,-74.5127 188.504,-79.9644 199.005,-81.3678 197.588,-74.5127"/>
+</g>
+<!-- vect2 -->
+<g id="node3" class="node"><title>vect2</title>
+<ellipse fill="none" stroke="black" cx="349.309" cy="-99.4754" rx="110.118" ry="27.4509"/>
+<text text-anchor="middle" x="349.309" y="-103.704" font-family="Inconsolata" font-size="14.00">CountVectorizer</text>
+<text text-anchor="middle" x="349.309" y="-88.2754" font-family="Inconsolata" font-size="14.00">&#45; ngram_range=(1, 1)</text>
+</g>
+<!-- data&#45;&gt;vect2 -->
+<g id="edge5" class="edge"><title>data&#45;&gt;vect2</title>
+<path fill="none" stroke="black" d="M656.016,-30.3869C599.719,-42.8847 508.257,-63.1892 439.835,-78.3787"/>
+<polygon fill="black" stroke="black" points="438.918,-74.9971 429.914,-80.5812 440.435,-81.8307 438.918,-74.9971"/>
+</g>
+<!-- vect3 -->
+<g id="node4" class="node"><title>vect3</title>
+<ellipse fill="none" stroke="black" cx="588.309" cy="-99.4754" rx="110.118" ry="27.4509"/>
+<text text-anchor="middle" x="588.309" y="-103.704" font-family="Inconsolata" font-size="14.00">CountVectorizer</text>
+<text text-anchor="middle" x="588.309" y="-88.2754" font-family="Inconsolata" font-size="14.00">&#45; ngram_range=(1, 1)</text>
+</g>
+<!-- data&#45;&gt;vect3 -->
+<g id="edge9" class="edge"><title>data&#45;&gt;vect3</title>
+<path fill="none" stroke="black" d="M683.227,-35.083C669.232,-44.4302 651.125,-56.5227 634.485,-67.6364"/>
+<polygon fill="black" stroke="black" points="632.318,-64.8747 625.946,-73.3391 636.205,-70.6958 632.318,-64.8747"/>
+</g>
+<!-- vect4 -->
+<g id="node5" class="node"><title>vect4</title>
+<ellipse fill="none" stroke="black" cx="827.309" cy="-99.4754" rx="110.118" ry="27.4509"/>
+<text text-anchor="middle" x="827.309" y="-103.704" font-family="Inconsolata" font-size="14.00">CountVectorizer</text>
+<text text-anchor="middle" x="827.309" y="-88.2754" font-family="Inconsolata" font-size="14.00">&#45; ngram_range=(1, 1)</text>
+</g>
+<!-- data&#45;&gt;vect4 -->
+<g id="edge13" class="edge"><title>data&#45;&gt;vect4</title>
+<path fill="none" stroke="black" d="M731.592,-35.083C745.706,-44.4302 763.964,-56.5227 780.745,-67.6364"/>
+<polygon fill="black" stroke="black" points="779.085,-70.7354 789.355,-73.3391 782.951,-64.8993 779.085,-70.7354"/>
+</g>
+<!-- vect5 -->
+<g id="node6" class="node"><title>vect5</title>
+<ellipse fill="none" stroke="black" cx="1066.31" cy="-99.4754" rx="110.118" ry="27.4509"/>
+<text text-anchor="middle" x="1066.31" y="-103.704" font-family="Inconsolata" font-size="14.00">CountVectorizer</text>
+<text text-anchor="middle" x="1066.31" y="-88.2754" font-family="Inconsolata" font-size="14.00">&#45; ngram_range=(1, 1)</text>
+</g>
+<!-- data&#45;&gt;vect5 -->
+<g id="edge17" class="edge"><title>data&#45;&gt;vect5</title>
+<path fill="none" stroke="black" d="M758.745,-30.3869C815.291,-42.9051 907.217,-63.2557 975.866,-78.4533"/>
+<polygon fill="black" stroke="black" points="975.299,-81.9123 985.819,-80.6566 976.812,-75.0778 975.299,-81.9123"/>
+</g>
+<!-- vect6 -->
+<g id="node7" class="node"><title>vect6</title>
+<ellipse fill="none" stroke="black" cx="1305.31" cy="-99.4754" rx="110.118" ry="27.4509"/>
+<text text-anchor="middle" x="1305.31" y="-103.704" font-family="Inconsolata" font-size="14.00">CountVectorizer</text>
+<text text-anchor="middle" x="1305.31" y="-88.2754" font-family="Inconsolata" font-size="14.00">&#45; ngram_range=(1, 1)</text>
+</g>
+<!-- data&#45;&gt;vect6 -->
+<g id="edge21" class="edge"><title>data&#45;&gt;vect6</title>
+<path fill="none" stroke="black" d="M773.726,-24.1097C866.326,-31.7456 1039.3,-47.8684 1185.31,-72 1195.7,-73.7179 1206.55,-75.7582 1217.3,-77.9326"/>
+<polygon fill="black" stroke="black" points="1216.61,-81.3655 1227.11,-79.9618 1218.03,-74.5103 1216.61,-81.3655"/>
+</g>
+<!-- tfidf1 -->
+<g id="node8" class="node"><title>tfidf1</title>
+<ellipse fill="none" stroke="black" cx="150.309" cy="-190.426" rx="90.5193" ry="27.4509"/>
+<text text-anchor="middle" x="150.309" y="-194.654" font-family="Inconsolata" font-size="14.00">TfidfTransformer</text>
+<text text-anchor="middle" x="150.309" y="-179.226" font-family="Inconsolata" font-size="14.00">&#45; norm=&#39;l1&#39;</text>
+</g>
+<!-- vect1&#45;&gt;tfidf1 -->
+<g id="edge2" class="edge"><title>vect1&#45;&gt;tfidf1</title>
+<path fill="none" stroke="black" d="M122.31,-127.164C126.062,-135.507 130.263,-144.849 134.247,-153.709"/>
+<polygon fill="black" stroke="black" points="131.16,-155.379 138.454,-163.063 137.544,-152.508 131.16,-155.379"/>
+</g>
+<!-- tfidf2 -->
+<g id="node9" class="node"><title>tfidf2</title>
+<ellipse fill="none" stroke="black" cx="369.309" cy="-190.426" rx="90.5193" ry="27.4509"/>
+<text text-anchor="middle" x="369.309" y="-194.654" font-family="Inconsolata" font-size="14.00">TfidfTransformer</text>
+<text text-anchor="middle" x="369.309" y="-179.226" font-family="Inconsolata" font-size="14.00">&#45; norm=&#39;l1&#39;</text>
+</g>
+<!-- vect2&#45;&gt;tfidf2 -->
+<g id="edge6" class="edge"><title>vect2&#45;&gt;tfidf2</title>
+<path fill="none" stroke="black" d="M355.309,-127.164C357.129,-135.256 359.16,-144.288 361.098,-152.908"/>
+<polygon fill="black" stroke="black" points="357.704,-153.768 363.312,-162.757 364.533,-152.233 357.704,-153.768"/>
+</g>
+<!-- tfidf3 -->
+<g id="node10" class="node"><title>tfidf3</title>
+<ellipse fill="none" stroke="black" cx="595.309" cy="-190.426" rx="90.5193" ry="27.4509"/>
+<text text-anchor="middle" x="595.309" y="-194.654" font-family="Inconsolata" font-size="14.00">TfidfTransformer</text>
+<text text-anchor="middle" x="595.309" y="-179.226" font-family="Inconsolata" font-size="14.00">&#45; norm=&#39;l1&#39;</text>
+</g>
+<!-- vect3&#45;&gt;tfidf3 -->
+<g id="edge10" class="edge"><title>vect3&#45;&gt;tfidf3</title>
+<path fill="none" stroke="black" d="M590.409,-127.164C591.039,-135.168 591.741,-144.092 592.413,-152.627"/>
+<polygon fill="black" stroke="black" points="588.936,-153.062 593.21,-162.757 595.915,-152.513 588.936,-153.062"/>
+</g>
+<!-- tfidf4 -->
+<g id="node11" class="node"><title>tfidf4</title>
+<ellipse fill="none" stroke="black" cx="809.309" cy="-190.426" rx="90.5193" ry="27.4509"/>
+<text text-anchor="middle" x="809.309" y="-194.654" font-family="Inconsolata" font-size="14.00">TfidfTransformer</text>
+<text text-anchor="middle" x="809.309" y="-179.226" font-family="Inconsolata" font-size="14.00">&#45; norm=&#39;l2&#39;</text>
+</g>
+<!-- vect4&#45;&gt;tfidf4 -->
+<g id="edge14" class="edge"><title>vect4&#45;&gt;tfidf4</title>
+<path fill="none" stroke="black" d="M821.908,-127.164C820.271,-135.256 818.443,-144.288 816.698,-152.908"/>
+<polygon fill="black" stroke="black" points="813.258,-152.261 814.705,-162.757 820.119,-153.65 813.258,-152.261"/>
+</g>
+<!-- tfidf5 -->
+<g id="node12" class="node"><title>tfidf5</title>
+<ellipse fill="none" stroke="black" cx="1038.31" cy="-190.426" rx="90.5193" ry="27.4509"/>
+<text text-anchor="middle" x="1038.31" y="-194.654" font-family="Inconsolata" font-size="14.00">TfidfTransformer</text>
+<text text-anchor="middle" x="1038.31" y="-179.226" font-family="Inconsolata" font-size="14.00">&#45; norm=&#39;l2&#39;</text>
+</g>
+<!-- vect5&#45;&gt;tfidf5 -->
+<g id="edge18" class="edge"><title>vect5&#45;&gt;tfidf5</title>
+<path fill="none" stroke="black" d="M1057.91,-127.164C1055.31,-135.418 1052.4,-144.651 1049.64,-153.426"/>
+<polygon fill="black" stroke="black" points="1046.27,-152.474 1046.61,-163.063 1052.95,-154.576 1046.27,-152.474"/>
+</g>
+<!-- tfidf6 -->
+<g id="node13" class="node"><title>tfidf6</title>
+<ellipse fill="none" stroke="black" cx="1297.31" cy="-190.426" rx="90.5193" ry="27.4509"/>
+<text text-anchor="middle" x="1297.31" y="-194.654" font-family="Inconsolata" font-size="14.00">TfidfTransformer</text>
+<text text-anchor="middle" x="1297.31" y="-179.226" font-family="Inconsolata" font-size="14.00">&#45; norm=&#39;l2&#39;</text>
+</g>
+<!-- vect6&#45;&gt;tfidf6 -->
+<g id="edge22" class="edge"><title>vect6&#45;&gt;tfidf6</title>
+<path fill="none" stroke="black" d="M1302.91,-127.164C1302.19,-135.168 1301.39,-144.092 1300.62,-152.627"/>
+<polygon fill="black" stroke="black" points="1297.12,-152.484 1299.71,-162.757 1304.09,-153.111 1297.12,-152.484"/>
+</g>
+<!-- sgd1 -->
+<g id="node14" class="node"><title>sgd1</title>
+<ellipse fill="none" stroke="black" cx="178.309" cy="-281.377" rx="75.8212" ry="27.4509"/>
+<text text-anchor="middle" x="178.309" y="-285.605" font-family="Inconsolata" font-size="14.00">SGDClassifier</text>
+<text text-anchor="middle" x="178.309" y="-270.177" font-family="Inconsolata" font-size="14.00">&#45; alpha=1e&#45;3</text>
+</g>
+<!-- tfidf1&#45;&gt;sgd1 -->
+<g id="edge3" class="edge"><title>tfidf1&#45;&gt;sgd1</title>
+<path fill="none" stroke="black" d="M158.71,-218.115C161.308,-226.369 164.214,-235.602 166.977,-244.377"/>
+<polygon fill="black" stroke="black" points="163.669,-245.527 170.01,-254.014 170.346,-243.425 163.669,-245.527"/>
+</g>
+<!-- sgd2 -->
+<g id="node15" class="node"><title>sgd2</title>
+<ellipse fill="none" stroke="black" cx="407.309" cy="-281.377" rx="75.8212" ry="27.4509"/>
+<text text-anchor="middle" x="407.309" y="-285.605" font-family="Inconsolata" font-size="14.00">SGDClassifier</text>
+<text text-anchor="middle" x="407.309" y="-270.177" font-family="Inconsolata" font-size="14.00">&#45; alpha=1e&#45;4</text>
+</g>
+<!-- tfidf2&#45;&gt;sgd2 -->
+<g id="edge7" class="edge"><title>tfidf2&#45;&gt;sgd2</title>
+<path fill="none" stroke="black" d="M380.71,-218.115C384.274,-226.457 388.265,-235.8 392.05,-244.66"/>
+<polygon fill="black" stroke="black" points="388.899,-246.193 396.046,-254.014 395.336,-243.443 388.899,-246.193"/>
+</g>
+<!-- sgd3 -->
+<g id="node16" class="node"><title>sgd3</title>
+<ellipse fill="none" stroke="black" cx="606.309" cy="-281.377" rx="75.8212" ry="27.4509"/>
+<text text-anchor="middle" x="606.309" y="-285.605" font-family="Inconsolata" font-size="14.00">SGDClassifier</text>
+<text text-anchor="middle" x="606.309" y="-270.177" font-family="Inconsolata" font-size="14.00">&#45; alpha=1e&#45;5</text>
+</g>
+<!-- tfidf3&#45;&gt;sgd3 -->
+<g id="edge11" class="edge"><title>tfidf3&#45;&gt;sgd3</title>
+<path fill="none" stroke="black" d="M598.609,-218.115C599.599,-226.118 600.702,-235.043 601.758,-243.578"/>
+<polygon fill="black" stroke="black" points="598.31,-244.213 603.011,-253.708 605.257,-243.354 598.31,-244.213"/>
+</g>
+<!-- sgd4 -->
+<g id="node17" class="node"><title>sgd4</title>
+<ellipse fill="none" stroke="black" cx="797.309" cy="-281.377" rx="75.8212" ry="27.4509"/>
+<text text-anchor="middle" x="797.309" y="-285.605" font-family="Inconsolata" font-size="14.00">SGDClassifier</text>
+<text text-anchor="middle" x="797.309" y="-270.177" font-family="Inconsolata" font-size="14.00">&#45; alpha=1e&#45;3</text>
+</g>
+<!-- tfidf4&#45;&gt;sgd4 -->
+<g id="edge15" class="edge"><title>tfidf4&#45;&gt;sgd4</title>
+<path fill="none" stroke="black" d="M805.708,-218.115C804.628,-226.118 803.425,-235.043 802.273,-243.578"/>
+<polygon fill="black" stroke="black" points="798.775,-243.33 800.907,-253.708 805.712,-244.266 798.775,-243.33"/>
+</g>
+<!-- sgd5 -->
+<g id="node18" class="node"><title>sgd5</title>
+<ellipse fill="none" stroke="black" cx="1007.31" cy="-281.377" rx="75.8212" ry="27.4509"/>
+<text text-anchor="middle" x="1007.31" y="-285.605" font-family="Inconsolata" font-size="14.00">SGDClassifier</text>
+<text text-anchor="middle" x="1007.31" y="-270.177" font-family="Inconsolata" font-size="14.00">&#45; alpha=1e&#45;4</text>
+</g>
+<!-- tfidf5&#45;&gt;sgd5 -->
+<g id="edge19" class="edge"><title>tfidf5&#45;&gt;sgd5</title>
+<path fill="none" stroke="black" d="M1029.01,-218.115C1026.13,-226.369 1022.91,-235.602 1019.85,-244.377"/>
+<polygon fill="black" stroke="black" points="1016.48,-243.42 1016.5,-254.014 1023.09,-245.723 1016.48,-243.42"/>
+</g>
+<!-- sgd6 -->
+<g id="node19" class="node"><title>sgd6</title>
+<ellipse fill="none" stroke="black" cx="1242.31" cy="-281.377" rx="75.8212" ry="27.4509"/>
+<text text-anchor="middle" x="1242.31" y="-285.605" font-family="Inconsolata" font-size="14.00">SGDClassifier</text>
+<text text-anchor="middle" x="1242.31" y="-270.177" font-family="Inconsolata" font-size="14.00">&#45; alpha=1e&#45;5</text>
+</g>
+<!-- tfidf6&#45;&gt;sgd6 -->
+<g id="edge23" class="edge"><title>tfidf6&#45;&gt;sgd6</title>
+<path fill="none" stroke="black" d="M1281.1,-217.633C1275.68,-226.41 1269.54,-236.336 1263.78,-245.66"/>
+<polygon fill="black" stroke="black" points="1260.68,-244.011 1258.4,-254.357 1266.63,-247.693 1260.68,-244.011"/>
+</g>
+<!-- best -->
+<g id="node20" class="node"><title>best</title>
+<ellipse fill="none" stroke="black" cx="701.309" cy="-362.853" rx="111.9" ry="18"/>
+<text text-anchor="middle" x="701.309" y="-359.367" font-family="Inconsolata" font-size="14.00">Choose Best Parameters</text>
+</g>
+<!-- sgd1&#45;&gt;best -->
+<g id="edge4" class="edge"><title>sgd1&#45;&gt;best</title>
+<path fill="none" stroke="black" d="M244.158,-295.098C268.655,-299.668 296.714,-304.712 322.309,-308.853 418.815,-324.465 529.592,-339.639 606.352,-349.722"/>
+<polygon fill="black" stroke="black" points="606.111,-353.22 616.481,-351.049 607.021,-346.279 606.111,-353.22"/>
+</g>
+<!-- sgd2&#45;&gt;best -->
+<g id="edge8" class="edge"><title>sgd2&#45;&gt;best</title>
+<path fill="none" stroke="black" d="M466.804,-298.46C516.039,-311.77 585.866,-330.646 636.509,-344.336"/>
+<polygon fill="black" stroke="black" points="635.804,-347.771 646.371,-347.002 637.63,-341.013 635.804,-347.771"/>
+</g>
+<!-- sgd3&#45;&gt;best -->
+<g id="edge12" class="edge"><title>sgd3&#45;&gt;best</title>
+<path fill="none" stroke="black" d="M635.587,-306.871C647.635,-316.951 661.528,-328.572 673.427,-338.527"/>
+<polygon fill="black" stroke="black" points="671.384,-341.381 681.299,-345.113 675.875,-336.012 671.384,-341.381"/>
+</g>
+<!-- sgd4&#45;&gt;best -->
+<g id="edge16" class="edge"><title>sgd4&#45;&gt;best</title>
+<path fill="none" stroke="black" d="M767.722,-306.871C755.547,-316.951 741.509,-328.572 729.484,-338.527"/>
+<polygon fill="black" stroke="black" points="726.999,-336.04 721.529,-345.113 731.463,-341.432 726.999,-336.04"/>
+</g>
+<!-- sgd5&#45;&gt;best -->
+<g id="edge20" class="edge"><title>sgd5&#45;&gt;best</title>
+<path fill="none" stroke="black" d="M946.853,-298.079C895.038,-311.536 820.542,-330.885 767.213,-344.736"/>
+<polygon fill="black" stroke="black" points="766.311,-341.354 757.512,-347.255 768.07,-348.129 766.311,-341.354"/>
+</g>
+<!-- sgd6&#45;&gt;best -->
+<g id="edge24" class="edge"><title>sgd6&#45;&gt;best</title>
+<path fill="none" stroke="black" d="M1175.95,-294.679C1149.91,-299.351 1119.74,-304.574 1092.31,-308.853 991.863,-324.518 876.478,-339.832 797.229,-349.937"/>
+<polygon fill="black" stroke="black" points="796.741,-346.471 787.263,-351.205 797.625,-353.415 796.741,-346.471"/>
+</g>
+</g>
+</svg>
diff --git a/dask/introduction_dask.ipynb b/dask/introduction_dask.ipynb
new file mode 100644
index 0000000..8b7a1aa
--- /dev/null
+++ b/dask/introduction_dask.ipynb
@@ -0,0 +1,2359 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/lsteffenel/hpc-python/blob/master/dask/introduction_dask.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "K_zOTwS3-SSx"
+      },
+      "source": [
+        "![fig_dask](https://miro.medium.com/max/1000/1*D6mSsdWECFLn6wJne4VTjg.png)\n",
+        "\n",
+        "# <font color=\"red\">Que représente Dask ?</font>\n",
+        "\n",
+        "- Une bibliothèque flexible pour le calcul parallèle en Python qui facilite la construction de workflows intuitifs pour l'ingestion et l'analyse de grands ensembles de données distribués.\n",
+        "- Un outil d'analyse parallèle natif conçu pour s'intégrer parfaitement avec NumPy, Pandas et Scikit-Learn.\n",
+        "- Une bibliothèque de parallélisation *out-of-core* (les données sont lues en mémoire depuis le disque au fur et à mesure des besoins) qui s'intègre parfaitement aux structures de données NumPy et Pandas existantes pour répondre aux besoins suivants :\n",
+        "     * **L'ensemble de données disponible ne tient pas en mémoire d'une seule machine.**\n",
+        "     * **La tâche de traitement des données est chronophage et doit être mise à l'échelle et accélérée.**\n",
+        "- Orchestre des threads ou processus parallèles pour nous et accélère les temps de traitement.\n",
+        "   - Fonctionne en distribuant les gros calculs et en les décomposant en plus petits calculs via un ordonnanceur de tâches et des travailleurs de tâches.\n",
+        "\n",
+        "Dask est composé de plusieurs composants et API différents, qui peuvent être catégorisés en trois couches : l'ordonnanceur, les API de bas niveau et les API de haut niveau.\n",
+        "\n",
+        "- Dask fournit quelques constructions de haut niveau appelées Dask Bags, Dask DataFrames et Dask Arrays. Elles offrent une interface facile à utiliser pour paralléliser de nombreuses transformations de données typiques dans les workflows d'apprentissage automatique (ML).\n",
+        "- Dask permet la création de graphes d'exécution de tâches hautement personnalisés grâce à leur API Python étendue (par ex., `dask.delayed`) et à l'intégration avec les structures de données existantes.\n",
+        "\n",
+        "![fig_layers](http://bicortex.com/bicortex/wp-content/post_content//2019/06/Dask_APIs_Architecture.png)  \n",
+        "**Source de l'image** : bicortex.com\n",
+        "\n",
+        "Le diagramme ci-dessous décrit les étapes suivies par Dask pour manipuler les données.\n",
+        "\n",
+        "- L'opération est décomposée en une séquence d'opérations sur des partitions plus petites de nos données (sans avoir à charger tout l'ensemble de données en mémoire).\n",
+        "- Dask lit chaque partition au fur et à mesure des besoins et calcule les résultats intermédiaires.\n",
+        "- Les résultats intermédiaires sont agrégés dans le résultat final.\n",
+        "- Dask gère toute cette séquence en interne pour nous.\n",
+        "- Sur une seule machine, Dask peut utiliser des threads ou des processeurs pour paralléliser ces opérations.\n",
+        "\n",
+        "![fig_proc](https://www.nvidia.com/content/dam/en-zz/Solutions/glossary/data-science/dask/dask-pic1.png)  \n",
+        "\n",
+        "\n",
+        "**Avantages de l'utilisation de Dask**\n",
+        "\n",
+        "- Entièrement implémenté en Python et met nativement à l'échelle NumPy, Pandas et scikit-learn.\n",
+        "- Peut être utilisé efficacement pour travailler avec des ensembles de données moyens sur une seule machine et de grands ensembles de données sur un cluster.\n",
+        "- Peut être utilisé comme un framework général pour paralléliser la plupart des objets Python.\n",
+        "- Présente une configuration et un coût de maintenance très faibles.\n",
+        "\n",
+        ">Dask fournit des collections de haut niveau Array, Bag et DataFrame qui imitent NumPy, les listes et Pandas mais peuvent opérer en parallèle sur des ensembles de données qui ne rentrent pas en mémoire principale. Les collections de haut niveau de Dask sont des alternatives à NumPy et Pandas pour les grands ensembles de données.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "k8pHFIBI-SSx"
+      },
+      "source": [
+        "**Rappel sur les processus et les threads**\n",
+        "\n",
+        "- Un **processus** est une exécution de programme.\n",
+        "- Un **thread** est une séquence d'exécution unique au sein du processus.\n",
+        "- Un processus peut contenir plusieurs threads.\n",
+        "- Les threads sont utilisés pour des tâches légères, tandis que les processus sont utilisés pour des tâches plus « lourdes ».\n",
+        "\n",
+        "\n",
+        "**Python classique ne peut exécuter qu'un seul thread à la fois.**\n",
+        "\n",
+        ">Dask offre une méthode facile et cohérente pour paralléliser les calculs qui s'étend d'un simple ordinateur portable à des clusters de milliers de cœurs. Il repose sur un ordonnanceur de tâches qui distribue les appels de fonctions Python sur plusieurs threads, processus ou nœuds de cluster.\n",
+        "\n",
+        "\n",
+        "![threads](https://pediaa.com/wp-content/uploads/2018/07/Difference-Between-Process-and-Thread-Comparison-Summary-684x1024.jpg)  \n",
+        "**Source de l'image** : pediaa.com\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "nGFobdE1-SSx"
+      },
+      "source": [
+        "### Installation / importation de modules\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "2gTw00xx-SSx"
+      },
+      "outputs": [],
+      "source": [
+        "!python -m pip install dask[complete]"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "y7Bj3V6F-SSy"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install memory_profiler"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "RlWFzeaGvPmc"
+      },
+      "outputs": [],
+      "source": [
+        "import warnings\n",
+        "warnings.filterwarnings('ignore')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "uktcOPYqt8Cn"
+      },
+      "outputs": [],
+      "source": [
+        "%matplotlib inline\n",
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt\n",
+        "import pandas as pd\n",
+        "import os\n",
+        "import dask\n",
+        "import dask.array as da\n",
+        "import dask.dataframe as dd\n",
+        "from dask.diagnostics import ProgressBar"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "5Q77FBnF-SSy"
+      },
+      "outputs": [],
+      "source": [
+        "print(f\"Numpy version:  {np.__version__}\")\n",
+        "print(f\"Pandas version: {pd.__version__}\")\n",
+        "print(f\"Dask   version: {dask.__version__}\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Clp5fTHu-SSy"
+      },
+      "outputs": [],
+      "source": [
+        "from memory_profiler import memory_usage\n",
+        "import memory_profiler\n",
+        "%load_ext memory_profiler"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Ng_rnKFc-SSz"
+      },
+      "source": [
+        "## Détermination des informations du système"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "SzXQNijE-SSz"
+      },
+      "outputs": [],
+      "source": [
+        "import math\n",
+        "def convert_size(size):\n",
+        "    \"\"\"\n",
+        "      Convert from KB to another unit.\n",
+        "    \"\"\"\n",
+        "    if (size == 0):\n",
+        "       return '0B'\n",
+        "    size_name = (\"B\", \"KB\", \"MB\", \"GB\", \"TB\", \"PB\", \"EB\", \"ZB\", \"YB\")\n",
+        "    i = int(math.floor(math.log(size,1024)))\n",
+        "    p = math.pow(1024,i)\n",
+        "    s = round(size/p,2)\n",
+        "    return \" \".join([str(s),size_name[i]])"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "AiuYwciT-SSz"
+      },
+      "outputs": [],
+      "source": [
+        "import platform\n",
+        "import psutil\n",
+        "\n",
+        "print(\"=\"*20, \"System Information\", \"=\"*20)\n",
+        "uname = platform.uname()\n",
+        "print(f\"           System: {uname.system}\")\n",
+        "print(f\"        Node Name: {uname.node}\")\n",
+        "print(f\"          Release: {uname.release}\")\n",
+        "print(f\"          Version: {uname.version}\")\n",
+        "print(f\"          Machine: {uname.machine}\")\n",
+        "print(f\"        Processor: {uname.processor}\")\n",
+        "print(\"=\"*20, \"CPU Information\", \"=\"*20)\n",
+        "cpufreq = psutil.cpu_freq()\n",
+        "print(\"# logical cores = # physical cores times # threads \")\n",
+        "print(\"                    that can run on each physical core.\")\n",
+        "print(f\"   Physical cores: {psutil.cpu_count(logical=False)}\")\n",
+        "print(f\"    Logical cores: {psutil.cpu_count(logical=True)}\")\n",
+        "print(f\"Current frequency: {psutil.cpu_freq().current}\")\n",
+        "print(f\"    Min frequency: {psutil.cpu_freq().min}\")\n",
+        "print(f\"    Max frequency: {psutil.cpu_freq().max}\")\n",
+        "print(\"=\"*20, \"Memory Information\", \"=\"*20)\n",
+        "svmem = psutil.virtual_memory()\n",
+        "print(f\"     Total memory: {convert_size(svmem.total)}\")\n",
+        "print(f\" Available memory: {convert_size(svmem.available)}\")\n",
+        "svmem = psutil.virtual_memory()\n",
+        "print(\"=\"*60)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Sur Colab on a généralement 1 coeur physique (2 coeurs logiques). Ça permettra de faire quelques tests de parallélisme, mais limités. Si vous lancez le code sur votre propre machine, vous pouvez obtenir des résultats bien plus intéressants."
+      ],
+      "metadata": {
+        "id": "rIRr9124UGcW"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "mCdhmJP0-SSz"
+      },
+      "source": [
+        "### Configuration de la barre de progression\n",
+        "\n",
+        "- Vous pouvez utiliser la barre de progression intégrée de Dask pour suivre l'avancement de n'importe quel appel `get()` ou `compute()`.\n",
+        "- Ici, nous utiliserons l'enregistrement global où la barre de progression s'affichera pour tous les calculs.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "UecEnOsa-SSz"
+      },
+      "outputs": [],
+      "source": [
+        "from dask.diagnostics import ProgressBar\n",
+        "pbar = ProgressBar()\n",
+        "pbar.register()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "D-cU2qC8-SSz"
+      },
+      "source": [
+        "# <font color=\"red\">Paralléliser le code avec `dask.delayed`</font>\n",
+        "\n",
+        "- Une méthode simple pour paralléliser le code.\n",
+        "- Permet aux utilisateurs de retarder les appels de fonctions dans un graphe de tâches avec dépendances.\n",
+        "- Des systèmes comme `dask.dataframe` sont construits avec `dask.delayed`.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "B9DdwpYM-SSz"
+      },
+      "source": [
+        "**Exemple Simple**\n",
+        "\n",
+        "Considerez les fonctions suivantes :"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "lMUztQOB-SSz"
+      },
+      "outputs": [],
+      "source": [
+        "import time\n",
+        "\n",
+        "def increment(x):\n",
+        "    time.sleep(1.0)\n",
+        "    return x + 1\n",
+        "\n",
+        "def double(x):\n",
+        "    time.sleep(1.0)\n",
+        "    return 2 * x\n",
+        "\n",
+        "def add(x, y):\n",
+        "    time.sleep(1.0)\n",
+        "    return x + y"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "vwEr2vQc-SSz"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "\n",
+        "x = increment(1)\n",
+        "y = increment(2)\n",
+        "z = add(x, y)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Regardez surtout le temps d'exécution (**Wall time**). Nous allons utiliser ça comme paramètre pour les comparaisons futures."
+      ],
+      "metadata": {
+        "id": "JpxNUGmvUqGH"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "IrO5LJpQ-SSz"
+      },
+      "source": [
+        "Pour paralléliser ce code en Dask, nous allons utiliser le décorateur `dask.delayed` sur les fonctions `increment` et `add`.\n",
+        "- En décorant les fonctions, nous enregistrons ce que nous voulons calculer sous forme de tâches dans des graphes qui seront exécutés plus tard sur du matériel parallèle.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "37GiL0E8-SS0"
+      },
+      "outputs": [],
+      "source": [
+        "xd = dask.delayed(increment)(1)\n",
+        "yd = dask.delayed(increment)(2)\n",
+        "zd = dask.delayed(add)(xd, yd)\n",
+        "zd"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "56H1_1K5-SS0"
+      },
+      "source": [
+        "- Quand nous appelons la version retardée en passant les arguments, exactement comme avant, la fonction originale n'est pas encore exécutée.\n",
+        "- Un objet *delayed* est créé, qui garde la trace de la fonction à appeler et des arguments à lui passer.\n",
+        "- Nous utilisons la méthode `visualize` (qui repose sur le package `graphviz`) qui fournit une représentation visuelle des opérations effectuées.\n",
+        "\n",
+        "**ATTENTION** : la création des *delayed* ne lance pas automatiquement leur exécution."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "scrolled": true,
+        "id": "xNwvnM5X-SS0"
+      },
+      "outputs": [],
+      "source": [
+        "zd.visualize(rankdir='LR')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "XdA7mg1c-SS0"
+      },
+      "source": [
+        "- Notez que nous n'avons pas encore calculé **total** physiquement.\n",
+        "- Nous devons appliquer la méthode `compute` pour obtenir le résultat.\n",
+        "- <font color=\"red\">C'est seulement à ce moment que les données sont chargées en mémoire pour les calculs</font>.\n",
+        "- Les calculs sont effectués en utilisant un pool de threads local."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "rJRShsgU-SS0"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "dask.compute(zd)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "zizg8rem-SS0"
+      },
+      "source": [
+        "**Utilisation de `delayed` dans des boucles**\n",
+        "\n",
+        "Considérez le code séquentiel avec deux boucles *for* :\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "IX6Rlr0V-SS0"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "\n",
+        "n = 10\n",
+        "data = [i+1 for i in range(n)]\n",
+        "\n",
+        "out = list()\n",
+        "for x in data:\n",
+        "    y = increment(x)\n",
+        "    z = double(y)\n",
+        "    out.append(z)\n",
+        "\n",
+        "total = 0\n",
+        "for z in out:\n",
+        "    total = add(total, z)\n",
+        "\n",
+        "total"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "DnvgwGNc-SS0"
+      },
+      "source": [
+        "Nous pouvons paralléliser le code ci-dessus en utilisant le décorateur `delayed` :\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "DKKXTvw_-SS0"
+      },
+      "outputs": [],
+      "source": [
+        "n = 10\n",
+        "data = [i+1 for i in range(n)]\n",
+        "\n",
+        "out = list()\n",
+        "for x in data:\n",
+        "    y = dask.delayed(increment)(x)\n",
+        "    z = dask.delayed(double)(y)\n",
+        "    out.append(z)\n",
+        "\n",
+        "totald = 0\n",
+        "for z in out:\n",
+        "    totald = dask.delayed(add)(totald, z)\n",
+        "\n",
+        "totald"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Zm-AlHg0-SS0"
+      },
+      "source": [
+        "Nous pouvons aussi obtenir la représentation visuelle via un graphe de tâches.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "g9tJrGip-SS1"
+      },
+      "outputs": [],
+      "source": [
+        "totald.visualize()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "horU8AHB-SS1"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "dask.compute(totald)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BTI1TEOE-SS1"
+      },
+      "source": [
+        "### Exercice 1\n",
+        "\n",
+        "Utilisez le décorateur `delayed` pour paralléliser le code ci-dessous :\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "35h2fEpd-SS1"
+      },
+      "outputs": [],
+      "source": [
+        "def is_odd(x):\n",
+        "    return x%2"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "XyskTUeu-SS1"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "\n",
+        "n = 10\n",
+        "data = [i+1 for i in range(n)]\n",
+        "\n",
+        "results = list()\n",
+        "\n",
+        "for x in data:\n",
+        "    if is_odd(x):\n",
+        "        y = double(x)\n",
+        "    else:\n",
+        "        y = increment(x)\n",
+        "    results.append(y)\n",
+        "\n",
+        "total = sum(results)\n",
+        "print(total)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#VOTRE CODE ICI"
+      ],
+      "metadata": {
+        "id": "O5NnsQWJHjCH"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "dmK_FIy1-SS1"
+      },
+      "source": [
+        "### Exemple : Mots palindromiques\n",
+        "\n",
+        "- Un mot palindromique est un mot dont les caractères se lisent de la même façon à l'envers qu'à l'endroit.\n",
+        "- Quelques exemples de palindromes sont `redivider`, `deified`, `civic`, `radar`, `level`, `rotor`, `kayak`, `reviver`, `racecar`, `madam`, et `refer`.\n",
+        "\n",
+        "Nous voulons trouver le nombre de palindromes dans une liste de mots.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "LZKBGw34-SS2"
+      },
+      "outputs": [],
+      "source": [
+        "def is_palindrome(s):\n",
+        "    return s.upper() == s.upper()[::-1]"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "uTxSzRJx-SS2"
+      },
+      "outputs": [],
+      "source": [
+        "list_words = [\n",
+        "    'complete', 'abstraction', 'from', 'compass', 'sights', 'sounds',\n",
+        "    'Human', 'shapes', 'interferences', 'troubles', 'joys', 'were',\n",
+        "    'they', 'were', 'there', \"man\", 'seemed', 'shaded', 'hemisphere',\n",
+        "    'globe', 'sentient', 'being', 'save', 'himself', \"rather\",\n",
+        "    \"Abba\", \"Aibohphobia\", \"Bib\", \"Bob\", \"Civic\", \"Deified\",\n",
+        "    \"Detartrated\", \"Dewed\", \"Eve\", \"Hannah\", \"Kayak\", \"Level\",\n",
+        "    \"Madam\", \"Malayalam\", \"Minim\", \"Mom\", \"Murdrum\", \"Noon\", \"Nun\",\n",
+        "    \"Otto\", \"Peep\", \"Pop\", \"Racecar\", \"Radar\", \"Redder\", \"Refer\",\n",
+        "    \"Repaper\", \"Rotator\", \"Rotavator\", \"Rotor\", \"Sagas\",\n",
+        "    \"Sis\", \"Solo\", \"Stats\", \"Tattarrattat\", \"Tenet\",\n",
+        "    'redivider', 'deified', 'civic', 'radar', 'level',\n",
+        "    'Being', 'not', 'without', 'frequent', 'consciousness',\n",
+        "    'that', 'there', 'was', 'some', 'charm', 'this', 'life', 'stood',\n",
+        "    'still', 'after', 'looking', 'sky', 'useful', 'instrument',\n",
+        "    'regarded', 'appreciative', 'spirit', 'work', 'art',\n",
+        "    'superlatively', 'beautiful', 'moment', 'seemed',\n",
+        "    'impressed', 'with', 'speaking', 'loneliness', 'scene',\n",
+        "    \"brother\", \"system\", \"SISteR\", \"TEXT\", \"paREnts\", \"python\",\n",
+        "    \"Numpy\", \"Dask\", \"PanDaS\"\n",
+        "]\n",
+        "\n",
+        "len(list_words)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "1ePoddKr-SS2"
+      },
+      "source": [
+        "**Code simple en Python**"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "0I87aAFy-SS2"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "palindromes_py = [is_palindrome(s) for s in list_words]\n",
+        "total_py = sum(palindromes_py)\n",
+        "total_py"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "nKgvlXWo-SS2"
+      },
+      "source": [
+        "**Avec Dask delayed**"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "FEDdtmvU-SS2"
+      },
+      "outputs": [],
+      "source": [
+        "palindromes_da = [dask.delayed(is_palindrome)(s) for s in list_words]\n",
+        "total_da = dask.delayed(sum)(palindromes_da)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "T0-twL0z-SS2"
+      },
+      "outputs": [],
+      "source": [
+        "total_da.visualize()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "vRMo5bXK-SS2"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "result = total_da.compute()\n",
+        "result"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Quelle horreur, Dask semble bien plus lent !!!\n",
+        "La raison est que le traitement avec `delayed` n'a pas la bonne granularité.\n",
+        "\n",
+        "On peut essayer d'utiliser une structure de données propre à Dask (les `bag`), our voir si ça peut s'améliorer."
+      ],
+      "metadata": {
+        "id": "jsWzNqAwW6aW"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "g_xy1yBV-SS2"
+      },
+      "source": [
+        "Si nous utilisons Dask Bag, nous effectuons les mêmes calculs plus rapidement :\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "0_Ouvm-E-SS2"
+      },
+      "outputs": [],
+      "source": [
+        "import dask.bag as db\n",
+        "bag = db.from_sequence(list_words)\n",
+        "bag.map(is_palindrome).visualize()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "5BburINY-SS2"
+      },
+      "outputs": [],
+      "source": [
+        "%time\n",
+        "result= sum(bag.map(is_palindrome).compute())\n",
+        "result"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "GLFGK1mY-SS3"
+      },
+      "source": [
+        "**<font color=\"red\">Leçons à retenir</font>**\n",
+        "\n",
+        "- Le décorateur `delayed` ajoute une surcharge.\n",
+        "- Il est préférable de ne pas l'utiliser quand une tâche nécessite peu de temps.\n",
+        "- Appelez `delayed` sur la fonction et non sur le résultat.\n",
+        "- Décomposez les calculs en de nombreuses pièces. Vous obtenez le parallélisme en ayant de nombreux appels *delayed*, et non en utilisant un seul : Dask ne regardera pas à l'intérieur d'une fonction décorée avec `delayed` pour paralléliser ce code en interne.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ht9Cw_Le-SS3"
+      },
+      "source": [
+        "### Exercice 2\n",
+        "\n",
+        "Utilisez Dask pour paralléliser le code ci-dessous (calculs de `pi`) :\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "fJ0DgNeL-SS3"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "\n",
+        "import random\n",
+        "\n",
+        "def approximate_pi(num_samples):\n",
+        "    num_points_circ = 0\n",
+        "\n",
+        "    for i in range(num_samples):\n",
+        "        # Select an arbitrary point in [-1,1]x[-1,1]\n",
+        "        x = random.uniform(-1, 1)\n",
+        "        y = random.uniform(-1, 1)\n",
+        "\n",
+        "        # Check if the point is inside the circle\n",
+        "        if x**2 + y**2 < 1.0:\n",
+        "            num_points_circ += 1\n",
+        "\n",
+        "    return 4 * num_points_circ / num_samples\n",
+        "\n",
+        "def mean(*args):\n",
+        "    return sum(args) / len(args)\n",
+        "\n",
+        "number_samples = 10**6\n",
+        "number_experiments = 10\n",
+        "\n",
+        "pi_approx = mean(*[approximate_pi(number_samples) for i in range(number_experiments)])\n",
+        "\n",
+        "print(\"Approximation of Pi: {}\".format(pi_approx))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#VOTRE CODE ICI"
+      ],
+      "metadata": {
+        "id": "c29MHJtvHpbf"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BAsCVnWL-SS3"
+      },
+      "source": [
+        "# <font color=\"red\">Dask Array</font>\n",
+        "\n",
+        "- Les tableaux Dask coordonnent de nombreux tableaux NumPy, organisés en **chunks** dans une grille.\n",
+        "    - **Parallèle** : Utilise tous les cœurs de votre ordinateur\n",
+        "    - **Plus grand que la mémoire** : Permet de travailler sur des ensembles de données plus grands que la mémoire disponible en décomposant votre tableau en de nombreuses petites pièces, en opérant sur ces pièces dans un ordre qui minimise l'empreinte mémoire de votre calcul, et en fluxant efficacement les données depuis le disque.\n",
+        "    - **Algorithmes par blocs** : Effectue de gros calculs en réalisant de nombreux plus petits calculs\n",
+        "- Ils supportent un large sous-ensemble de l'API NumPy.\n",
+        "\n",
+        "![fig_array](https://miro.medium.com/max/1388/1*JfQnXJ5_R104bPyE8_XhwQ.png)\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "NliD4RH8-SS3"
+      },
+      "source": [
+        "**Créer un tableau Dask**\n",
+        "\n",
+        "- Créez un tableau 20000x20000 de nombres aléatoires, représenté par de nombreux tableaux NumPy de taille 1000x1000 (ou plus petits si le tableau ne peut pas être divisé uniformément).\n",
+        "- Il y a 400 (20x20) tableaux NumPy de taille 1000x1000.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Kz6wxzUk-SS3"
+      },
+      "outputs": [],
+      "source": [
+        "x = da.random.random((10000, 40000), chunks=(1000, 1000))\n",
+        "x"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "RoDGzCcx-SS3"
+      },
+      "source": [
+        "Le tableau :\n",
+        "- Fait 2,98 Gb\n",
+        "- Est organisé en 400 **chunks** de tableaux NumPy `1000x1000`.\n",
+        "- Chaque chunk fait 7,64 Mb\n",
+        "\n",
+        "Des informations similaires peuvent être obtenues via :\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "z6dkDC7x-SS3"
+      },
+      "outputs": [],
+      "source": [
+        "print(f\"     Type: {type(x)}\")\n",
+        "print(f\"    Shape: {x.shape}\")\n",
+        "print(f\"     Size: {x.size}\")\n",
+        "print(f\"Num bytes: {x.nbytes} B or {convert_size(x.nbytes)}\")\n",
+        "print(f\"   Chunks: {x.chunks}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "k3pZG4yz-SS3"
+      },
+      "source": [
+        "Nous pouvons utiliser la syntaxe NumPy :\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "cS-gxofR-SS4"
+      },
+      "outputs": [],
+      "source": [
+        "y = 2.0 + x.T\n",
+        "y.shape"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "leFY-HLl-SS4"
+      },
+      "outputs": [],
+      "source": [
+        "mu = x.mean(axis=0)\n",
+        "mu"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "ZdaFmYjb-SS4"
+      },
+      "outputs": [],
+      "source": [
+        "z = y[::2, 5000:].mean(axis=1)\n",
+        "z"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Jv1JwWUJ-SS4"
+      },
+      "outputs": [],
+      "source": [
+        "z.visualize(rankdir=\"LR\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "3VtTcAWl-SS4"
+      },
+      "source": [
+        "Utilisez la fonction **`compute()`** si vous voulez votre résultat sous forme de tableau NumPy.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "KpzYIcJn-SS4"
+      },
+      "outputs": [],
+      "source": [
+        "mu[0].compute()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "er6x7-bF-SS4"
+      },
+      "outputs": [],
+      "source": [
+        "w = z.compute()\n",
+        "print(type(w), w.shape )"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BBX10jFi-SS4"
+      },
+      "source": [
+        "**Faire persister les données en mémoire**\n",
+        "\n",
+        "- Si vous disposez de la RAM disponible pour votre jeu de données, vous pouvez persister les données en mémoire.\n",
+        "- Cela permet aux calculs futurs d'être beaucoup plus rapides.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "7wmPu4mL-SS4"
+      },
+      "outputs": [],
+      "source": [
+        "%time y.sum().compute()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "gcJfyh5s-SS4"
+      },
+      "outputs": [],
+      "source": [
+        "y = y.persist()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "grct90im-SS5"
+      },
+      "outputs": [],
+      "source": [
+        "%time y[0, 0].compute()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "15kcHexe-SS5"
+      },
+      "outputs": [],
+      "source": [
+        "%time y.sum().compute()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "TrfJpRqG-SS5"
+      },
+      "source": [
+        "**Usage mémoire Numpy versus Dask**"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "YgW7kdAJ-SS5"
+      },
+      "outputs": [],
+      "source": [
+        "def f_numpy():\n",
+        "    x = np.random.normal(10, 0.1, size=(20000, 20000))\n",
+        "    y = x.mean(axis=0)[::100]"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "cLVUKqNK-SS5"
+      },
+      "source": [
+        "`%%memit`\n",
+        "\n",
+        "- Mesure l'utilisation mémoire d'une seule instruction.\n",
+        "- Fournit la mémoire maximale et la croissance incrémentielle de la mémoire\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "hy095aqt-SS5"
+      },
+      "outputs": [],
+      "source": [
+        "%%memit\n",
+        "f_numpy()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "hx9EKEiF-SS5"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "f_numpy()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "tfsWIUSk-SS5"
+      },
+      "outputs": [],
+      "source": [
+        "def f_dask():\n",
+        "    x = da.random.normal(10, 0.1, size=(20000, 20000),\n",
+        "                         chunks=(1000, 1000))\n",
+        "    y = x.mean(axis=0)[::100].compute()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "GAACmmDb-SS5"
+      },
+      "outputs": [],
+      "source": [
+        "%%memit\n",
+        "f_dask()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "8DhOF6bq-SS5"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "f_dask()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "V1JlePBO-SS5"
+      },
+      "source": [
+        "On voit que Dask utilise moins de mémoire, mais son temps d'exécution n'est pas si impressionnant.\n",
+        "\n",
+        "Redimensionner la taille des chunks peut améliorer les performances :\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "MC9L4wJX-SS6"
+      },
+      "outputs": [],
+      "source": [
+        "def f_dask2():\n",
+        "    x = da.random.normal(10, 0.1, size=(20000, 20000),\n",
+        "                         chunks=(2000, 500))\n",
+        "    y = x.mean(axis=0)[::100].compute()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "ID_0Goyy-SS6"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "f_dask2()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "FLQLs9HA-SS6"
+      },
+      "source": [
+        "**Dask a terminé plus rapidement, mais a utilisé plus de temps CPU total car Dask a pu paralléliser de manière transparente le calcul grâce à la taille des chunks.**\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "GBYCzOhl-SS6"
+      },
+      "source": [
+        "**<font color=\"red\">Points à considérer</font>**\n",
+        "\n",
+        "- Si vos données tiennent en RAM et que vous n'êtes pas limité par les performances, alors utiliser NumPy peut être le bon choix. Dask ajoute une couche de complexité supplémentaire qui peut gêner.\n",
+        "- **Si vous cherchez seulement des accélérations plutôt que de la scalabilité, envisagez d'utiliser Numba pour manipuler les tableaux NumPy.**\n",
+        "- Comment choisir la taille des chunks ?\n",
+        "     - Trop petits : surcharge importante.\n",
+        "     - Mal alignés avec les données : lecture inefficace.\n",
+        "     - Il est recommandé d'avoir une taille de chunk d'au moins 100 Mb.\n",
+        "     - Choisissez une taille de chunk suffisamment grande pour réduire le nombre de chunks que Dask doit gérer (ce qui affecte la surcharge) mais aussi assez petite pour que plusieurs puissent tenir en mémoire simultanément. Dask aura souvent autant de chunks en mémoire que deux fois le nombre de threads actifs.\n",
+        "\n",
+        "**Éviter la sur-souscription des threads**\n",
+        "     \n",
+        "- Par défaut, Dask exécute autant de tâches concurrentes que vous avez de cœurs logiques.\n",
+        "- Il suppose que chaque tâche consomme environ un cœur.\n",
+        "- Beaucoup de bibliothèques de calcul (utilisées par Dask) sont elles-mêmes multi-threadées, ce qui peut causer des conflits et de mauvaises performances.\n",
+        "- Pour de meilleures performances, nous devons spécifier explicitement l'utilisation d'un seul thread :\n",
+        "\n",
+        "```bash\n",
+        "   export OMP_NUM_THREADS=1\n",
+        "   export MKL_NUM_THREADS=1\n",
+        "   export OPENBLAS_NUM_THREADS=1\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "5kCV308E-SS6"
+      },
+      "source": [
+        "## <font color=\"red\">Profilage mémoire</font>\n",
+        "\n",
+        "- Nous utilisons le package `memory_profiler` pour suivre l'utilisation mémoire.\n",
+        "- Il est entièrement écrit en Python et surveille le processus qui exécute du code Python ainsi que l'utilisation mémoire ligne par ligne.\n",
+        "- Nous utilisons la fonction `memory_usage()` et passons le paramètre `interval` pour la fréquence de mesure de l'utilisation mémoire.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "UnG-HsbX-SS6"
+      },
+      "outputs": [],
+      "source": [
+        "def sum_with_numpy():\n",
+        "    # Serial implementation\n",
+        "    np.arange(10**8).sum()\n",
+        "\n",
+        "def sum_with_dask():\n",
+        "    # Parallel implementation\n",
+        "    work = da.arange(10**8).sum()\n",
+        "    work.compute()\n",
+        "\n",
+        "memory_numpy = memory_usage(sum_with_numpy, interval=0.01)\n",
+        "memory_dask = memory_usage(sum_with_dask, interval=0.01)\n",
+        "\n",
+        "# Plot results\n",
+        "plt.plot(memory_numpy, label='numpy')\n",
+        "plt.plot(memory_dask, label='dask')\n",
+        "plt.xlabel('Time step')\n",
+        "plt.ylabel('Memory / MB')\n",
+        "plt.legend(loc='best')\n",
+        "plt.show()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "dbIr_1nn-SS6"
+      },
+      "source": [
+        "Vous pouvez aussi utiliser les options de profilage de Dask :\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "2tTtA_pv-SS6"
+      },
+      "outputs": [],
+      "source": [
+        "from dask.diagnostics import Profiler, ResourceProfiler\n",
+        "work = da.arange(10**8).sum()\n",
+        "with Profiler() as prof, ResourceProfiler(dt=0.001) as rprof:\n",
+        "    result2 = work.compute()\n",
+        "\n",
+        "from bokeh.plotting import output_notebook\n",
+        "from dask.diagnostics import visualize\n",
+        "visualize([prof,rprof], output_notebook())"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "WVYNIif2-SS6"
+      },
+      "outputs": [],
+      "source": [
+        "with ResourceProfiler(dt=0.001) as rprof2:\n",
+        "    result = np.arange(10**8).sum()\n",
+        "visualize([rprof2], output_notebook())"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Pj5mYuke-SS6"
+      },
+      "source": [
+        "# <font color=\"red\">Dask DataFrames</font>\n",
+        "\n",
+        "- Pandas est excellent pour les ensembles de données tabulaires qui tiennent en mémoire.\n",
+        "- Dask devient utile quand l'ensemble de données à analyser est plus grand que la RAM de votre machine.\n",
+        "- Dask DataFrames :\n",
+        "     - Coordonnent de nombreux DataFrames Pandas, partitionnés le long d'un index.\n",
+        "     - Supportent un large sous-ensemble de l'API Pandas.\n",
+        "- Une opération sur un Dask DataFrame déclenche de nombreuses opérations Pandas sur les DataFrames Pandas constitutifs, de manière à prendre en compte le parallélisme potentiel et les contraintes mémoire.\n",
+        "- Parmi les opérations très rapides avec les Dask DataFrames :\n",
+        "     - Opérations arithmétiques (multiplication ou addition à une Series)\n",
+        "     - Agrégations courantes (`mean`, `min`, `max`, `sum`, etc.)\n",
+        "     - Appel de `apply`\n",
+        "     - Appel de `value_counts()`, `drop_duplicates()` ou `corr()`\n",
+        "     - Filtrage avec `loc`, `isin`, et sélection ligne par ligne\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "HGuPjwLt-SS7"
+      },
+      "source": [
+        "### <font color=\"green\">Exemple : Dataset des vols NYC</font>\n",
+        "\n",
+        "Ce dataset contient des données concernant des vols (années 1990) au départ des trois aéroports de la région de New York.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "XL3oxLdO-SS7"
+      },
+      "source": [
+        "télécharger les données :"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "nkl-t7Hl-SS7"
+      },
+      "outputs": [],
+      "source": [
+        "import urllib.request\n",
+        "\n",
+        "print(\"\\t Downloading NYC dataset...\", end=\"\\n\", flush=True)\n",
+        "\n",
+        "url = \"https://storage.googleapis.com/dask-tutorial-data/nycflights.tar.gz\"\n",
+        "filename, header = urllib.request.urlretrieve(url, \"nycflights.tar.gz\")\n",
+        "\n",
+        "print(\"\\t Done!\", flush=True)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "rmmSGXCW-SS7"
+      },
+      "outputs": [],
+      "source": [
+        "!ls -lrt"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "pRxVoJBF-SS7"
+      },
+      "source": [
+        "Extraire les fichiers `.csv` du fichier tar.gz :"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "y1ZEb0Xa-SS7"
+      },
+      "outputs": [],
+      "source": [
+        "import tarfile\n",
+        "\n",
+        "with tarfile.open(filename, mode=\"r:gz\") as flights:\n",
+        "     flights.extractall(\"data/\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "k5nXF2-3-SS7"
+      },
+      "outputs": [],
+      "source": [
+        "!ls -lrt data/nycflights"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BA1Ii9DY-SS7"
+      },
+      "source": [
+        "Charger tous les fichiers dans le dataset, d'un seul coup:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "ebGTQG9N-SS7"
+      },
+      "outputs": [],
+      "source": [
+        "import os\n",
+        "\n",
+        "df = dd.read_csv(os.path.join(\"data\", \"nycflights\", \"*.csv\"),\n",
+        "                parse_dates={\"Date\": [0, 1, 2]})\n",
+        "df"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "JitcR2Po-SS7"
+      },
+      "source": [
+        "- La représentation de l'objet DataFrame (aussi appelé *schéma*) ne contient pas de données.\n",
+        "- `pandas.read_csv` lit l'intégralité du fichier avant d'inférer les types de données.\n",
+        "- `dask.dataframe.read_csv` ne lit qu'un échantillon du début du fichier (ou du premier fichier). Ces types de données inférés sont ensuite appliqués lors de la lecture de toutes les partitions.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "1sdfBtAT-SS7"
+      },
+      "source": [
+        "Nous pouvons essayer d'afficher les premières lignes :"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "vuJF83uP-SS8"
+      },
+      "outputs": [],
+      "source": [
+        "df.head()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "7p90Bm7J-SS8"
+      },
+      "source": [
+        "Par contre, si on essaye d'afficher les dernières lignes, on peut tomber sur une erreur :"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "uzniub_0-SS8"
+      },
+      "outputs": [],
+      "source": [
+        "df.tail()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "4Onqg-gd-SS8"
+      },
+      "source": [
+        "En effet, il y a un problème avec les types de données de quelques colonnes.\n",
+        "- Les types de données inférés à partir de l'échantillon sont incorrects.\n",
+        "- Nous pouvons le corriger en relisant les fichiers et en spécifiant les types de données appropriés.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "GH-Hnx-B-SS8"
+      },
+      "outputs": [],
+      "source": [
+        "df = dd.read_csv(os.path.join(\"data\", \"nycflights\", \"*.csv\"),\n",
+        "                parse_dates={\"Date\": [0, 1, 2]},\n",
+        "                dtype={'TailNum': str,\n",
+        "                       'CRSElapsedTime': float,\n",
+        "                       'Cancelled': bool})"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "2OBJeqO0-SS8"
+      },
+      "outputs": [],
+      "source": [
+        "df.tail()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "0TMHT3EB-SS8"
+      },
+      "source": [
+        "### <font color=\"blue\">Effectuer des opérations comme avec les `Pandas DataFrames`</font>\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "4jBAMbqi-SS8"
+      },
+      "source": [
+        "**Valeur maximale d'une colonne** :\n",
+        "\n",
+        "- Nous voulons maintenant calculer le maximum de la colonne `DepDelay`.\n",
+        "- Avec `Pandas`, nous ferions une boucle sur chaque fichier pour trouver les maximums individuels, puis trouver le maximum final sur tous les maximums individuels.\n",
+        "- `dask.dataframe` nous permet d'écrire du code semblable à Pandas qui opère sur des ensembles de données plus grands que la mémoire, en parallèle.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "I3cYckyf-SS8"
+      },
+      "outputs": [],
+      "source": [
+        "df.DepDelay.max().visualize()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "-Vc3wOi1-SS8"
+      },
+      "outputs": [],
+      "source": [
+        "%time df.DepDelay.max().compute()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "6XxQ7VGn-SS8"
+      },
+      "source": [
+        "Si nous faisons la même chose en `Pandas`, nous obtiendrons :\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "S665yjXc-SS8"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "\n",
+        "import glob\n",
+        "\n",
+        "list_files = glob.glob(\"data/nycflights/*csv\")\n",
+        "\n",
+        "maxes = list()\n",
+        "for file_name in list_files:\n",
+        "    pddf = pd.read_csv(file_name)\n",
+        "    maxes.append(pddf.DepDelay.max())\n",
+        "\n",
+        "final_max = max(maxes)\n",
+        "\n",
+        "print(\"Final Maximum: \", max(maxes))"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "OZarv61u-SS9"
+      },
+      "source": [
+        "**Plotting**"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "2wftQjPG-SS9"
+      },
+      "outputs": [],
+      "source": [
+        "df[df.Dest == 'PIT'].compute().plot(kind='scatter',\n",
+        "                                    x=\"DayOfWeek\",\n",
+        "                                    y=\"DepDelay\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BGWW_gIF-SS9"
+      },
+      "source": [
+        "**Autres Operations et statistiques sur les vols**"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "1jUr16g5-SS9"
+      },
+      "source": [
+        "Nombre de vols non annulés:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "xMZ2jcVX-SS9"
+      },
+      "outputs": [],
+      "source": [
+        "len(df[~df.Cancelled])"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Oydytw61-SS9"
+      },
+      "source": [
+        "Nombre de vols non-annulés au départ de chaque aéroport :"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "DbB9y4Pb-SS9"
+      },
+      "outputs": [],
+      "source": [
+        "df[~df.Cancelled].groupby('Origin').Origin.count().compute()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "iuoHCGbW-SS9"
+      },
+      "source": [
+        "Retard moyen au départ pour chaque jour de la semaine :"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "-kMZpZGa-SS9"
+      },
+      "outputs": [],
+      "source": [
+        "df.groupby(\"DayOfWeek\").DepDelay.mean().compute()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "k173t9iF-SS9"
+      },
+      "source": [
+        "Regroupement par destinations et comptage :\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "bJjVlu-2-SS9"
+      },
+      "outputs": [],
+      "source": [
+        "df.groupby(\"Dest\").count().compute()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Moyenne des retards par destination."
+      ],
+      "metadata": {
+        "id": "7FvAYgKHbj7v"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "6r7qtlzn-SS9"
+      },
+      "outputs": [],
+      "source": [
+        "df.groupby(\"Dest\")[\"ArrDelay\"].mean().compute()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Total des vols en retard supérieur à 30 minutes, par destination."
+      ],
+      "metadata": {
+        "id": "qA4b2vkGbnf8"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "coXrEqIy-SS9"
+      },
+      "outputs": [],
+      "source": [
+        "df[df.ArrDelay+df.DepDelay>30.0].groupby(\"Dest\").Dest.count().compute()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "xOrJolM6-SS-"
+      },
+      "source": [
+        "**Partage des résultats intermédiaires**\n",
+        "\n",
+        "- Nous effectuons parfois la même opération plusieurs fois.\n",
+        "- Pour la plupart des opérations, `dask.dataframe` hache les arguments, permettant aux calculs dupliqués d'être partagés et ne calculés qu'une seule fois.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "G9I89G4N-SS-"
+      },
+      "outputs": [],
+      "source": [
+        "non_cancelled = df[~df.Cancelled]\n",
+        "mean_delay = non_cancelled.DepDelay.mean()\n",
+        "std_delay = non_cancelled.DepDelay.std()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Ici, on appelle chaque transformation séparemment."
+      ],
+      "metadata": {
+        "id": "7lQsIMlDb4la"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "PW2atmqO-SS-"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "mean_delay_res = mean_delay.compute()\n",
+        "std_delay_res = std_delay.compute()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Q_PXPpYh-SS-"
+      },
+      "source": [
+        "Maintenant, nous passons les deux transformations à un seul appel `compute` :"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "GjATSKRb-SS-"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "\n",
+        "mean_delay_res, std_delay_res = da.compute(mean_delay, std_delay)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "UzA-lBP3-SS-"
+      },
+      "source": [
+        "Les graphes de tâches des deux résultats sont fusionnés lors de l'appel à `dask.compute`, permettant aux opérations partagées d'être effectuées une seule fois au lieu de deux.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "SB-HBHH9-SS-"
+      },
+      "source": [
+        "### Exercice 3\n",
+        "\n",
+        "- Considérez le code ci-dessous qui calcule le retard moyen au départ par aéroport.\n",
+        "- Parallélisez le code en utilisant Dask.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "h8O6F7OB-SS-"
+      },
+      "outputs": [],
+      "source": [
+        "%%time\n",
+        "\n",
+        "sum_delays = list()\n",
+        "count_delays = list()\n",
+        "\n",
+        "for file_name in list_files:\n",
+        "    pddf = pd.read_csv(file_name)\n",
+        "    by_origin = pddf.groupby('Origin')\n",
+        "    loc_total = by_origin.DepDelay.sum()\n",
+        "    loc_count = by_origin.DepDelay.count()\n",
+        "    sum_delays.append(loc_total)\n",
+        "    count_delays.append(loc_count)\n",
+        "\n",
+        "total_delays = sum(sum_delays)\n",
+        "n_flights = sum(count_delays)\n",
+        "mean_delays = total_delays / n_flights\n",
+        "print(\"Mean delays: {}\".format(mean_delays))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#VOTRE CODE ICI"
+      ],
+      "metadata": {
+        "id": "wpMi9rgIH2vn"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "kX2LIZdV-STA"
+      },
+      "source": [
+        "# <font color=\"red\">Ordonnanceurs de tâches</font>\n",
+        "\n",
+        "- Après que Dask a généré les graphes de tâches, il doit les exécuter sur du matériel parallèle.\n",
+        "- C'est le rôle d'un ordonnanceur de tâches.\n",
+        "- Il existe différents ordonnanceurs de tâches. Chacun consomme un graphe de tâches et calcule le même résultat, mais avec différentes caractéristiques de performance.\n",
+        "\n",
+        "![schedulers](https://docs.dask.org/en/latest/_images/dask-overview.svg)\n",
+        "\n",
+        "**Source de l'image** : [https://docs.dask.org/en/latest/](https://docs.dask.org/en/latest/)\n",
+        "\n",
+        "Les réseaux Dask sont composés de trois éléments :\n",
+        "- **Ordonnanceur centralisé** : Gère les travailleurs et assigne les tâches qu'ils doivent accomplir.\n",
+        "- **Travailleurs** : Sont des threads, processus, ou machines séparées dans un cluster. Ils exécutent les calculs du graphe de calcul : effectuent les calculs, conservent les résultats, et communiquent les résultats entre eux.\n",
+        "- **Un ou plusieurs clients** : notebooks Jupyter ou scripts qui interagissent avec les utilisateurs et soumettent le travail à l'ordonnanceur pour exécution sur les travailleurs.\n",
+        "\n",
+        "![networks](https://miro.medium.com/max/700/0*9JHQAjTVoKbm2f4X.png)  \n",
+        "**Source de l'image** : [Steven Gon](https://gongster.medium.com/dask-an-introduction-and-tutorial-b42f901bcff5)\n",
+        "\n",
+        "Pour exécuter les graphes de tâches, il existe deux types d'ordonnanceurs :\n",
+        "* **Machine unique** : Fournit des fonctionnalités de base sur un processus local ou un pool de threads. Il est simple et économique à utiliser, mais ne peut être utilisé que sur une seule machine et ne scale pas.\n",
+        "* **Distribué** : Offre plus de fonctionnalités, mais nécessite un peu plus d'effort pour la configuration. Il peut fonctionner localement ou distribué sur un cluster.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "T0AatZHa-STA"
+      },
+      "source": [
+        "## <font color=\"blue\">Ordonnanceurs pour une machine unique</font>\n",
+        "\n",
+        "Considérez l'exemple suivant :\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "RcWK238E-STA"
+      },
+      "outputs": [],
+      "source": [
+        "n = 10\n",
+        "data = [i+1 for i in range(n)]\n",
+        "\n",
+        "out = list()\n",
+        "for x in data:\n",
+        "    y = dask.delayed(increment)(x)\n",
+        "    z = dask.delayed(double)(y)\n",
+        "    out.append(z)\n",
+        "\n",
+        "totald = 0\n",
+        "for z in out:\n",
+        "    totald = dask.delayed(add)(totald, z)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ckbGXSr--STA"
+      },
+      "source": [
+        "**Synchonous**\n",
+        "\n",
+        "- L'ordonnanceur `synchrous` est une exécution mono-thread de tous les calculs dans le thread local, sans aucun parallélisme.\n",
+        "- Il est utile pour le débogage ou le profilage.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "WPMiexZv-STA"
+      },
+      "outputs": [],
+      "source": [
+        "%time totald.compute(scheduler='synchronous')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "4FFn51TU-STA"
+      },
+      "source": [
+        "**Threads locaux**\n",
+        "\n",
+        "Utilise `multiprocessing.pool.ThreadPool`\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "HcTWDoCB-STA"
+      },
+      "source": [
+        "Utiliser tous les coeurs."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "B48zHQbr-STA"
+      },
+      "outputs": [],
+      "source": [
+        "%time totald.compute(scheduler='threads')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "rSfvag_2-STA"
+      },
+      "source": [
+        "Utiliser une partie des coeurs (NB : sur Colab vous n'avez que 2 coeurs, donc ça ne changera pas grand chose)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "vuVllFsr-STA"
+      },
+      "outputs": [],
+      "source": [
+        "%time totald.compute(scheduler='threads', num_workers=2)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "bl3rk0YT-STB"
+      },
+      "source": [
+        "Il y a aussi le mot clé `single-threaded`, équivalent à `num_workers = 1`."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "wkEx_8Sd-STB"
+      },
+      "outputs": [],
+      "source": [
+        "%time totald.compute(scheduler='single-threaded')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "I_q0Eh_F-STB"
+      },
+      "source": [
+        "**Processus locaux**\n",
+        "\n",
+        "- L'ordonnanceur multiprocessing exécute les calculs avec un `multiprocessing.Pool` local.\n",
+        "- Chaque tâche et toutes ses dépendances sont envoyées à un processus local, exécutées, puis leur résultat est renvoyé au processus principal.\n",
+        "- Le transfert de données vers les processus distants et en retour peut introduire des pénalités de performance, particulièrement lorsque les données transférées entre processus sont volumineuses.\n",
+        "- L'ordonnanceur multiprocessing est un excellent choix lorsque les workflows sont relativement linéaires, sans transferts significatifs de données inter-tâches, et lorsque les entrées et sorties sont petites, comme des noms de fichiers et des comptes.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "vgf3ouwm-STB"
+      },
+      "outputs": [],
+      "source": [
+        "import multiprocessing\n",
+        "print (multiprocessing.cpu_count())"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "XrZqSW-l-STB"
+      },
+      "source": [
+        "Utiliser tous les coeurs:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "vchqE__5-STB"
+      },
+      "outputs": [],
+      "source": [
+        "%time result = totald.compute(scheduler='processes')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "mpUCVr-p-STB"
+      },
+      "source": [
+        "Utiliser un nombre spécifique de coeurs :"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "bpjdnrTj-STB"
+      },
+      "outputs": [],
+      "source": [
+        "%time result = totald.compute(scheduler='processes', num_workers=2)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "AagbVUAz-STB"
+      },
+      "source": [
+        "### Threads ou Processus ?\n",
+        "\n",
+        "- **Utilisez l'ordonnanceur threadé** si votre calcul est dominé par du code non-Python, comme principalement le cas lors de l'utilisation de données numériques dans des tableaux NumPy, DataFrames Pandas, ou tout autre projet basé sur C/C++/Cython de l'écosystème.\n",
+        "   - Il est léger.\n",
+        "   - Peu de surcharge.\n",
+        "   - Le transfert de données entre tâches n'est pas coûteux car tout se passe dans le même processus.\n",
+        "- **Utilisez l'ordonnanceur multiprocessing** si votre calcul est dominé par le traitement d'objets Python purs comme des chaînes, dictionnaires ou listes.\n",
+        "   - Il est léger.\n",
+        "   - Chaque tâche et toutes ses dépendances sont envoyées à un processus local, exécutées, puis leur résultat est renvoyé au processus principal.\n",
+        "   - Le transfert de données vers les processus distants et en retour peut introduire des pénalités de performance, particulièrement lorsque les données transférées entre processus sont volumineuses.\n",
+        "   - Excellent choix quand les workflows sont relativement linéaires, sans transferts significatifs de données inter-tâches, et quand les entrées et sorties sont petites, comme des noms de fichiers et des comptes.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "93cTk5mo-STB"
+      },
+      "source": [
+        "## <font color=\"blue\">Ordonnanceur distribué</font>\n",
+        "\n",
+        "- L'ordonnanceur distribué Dask peut être configuré sur un cluster ou exécuté localement sur une machine personnelle.\n",
+        "- C'est un ordonnanceur de tâches dynamique, distribué et géré de manière centralisée.\n",
+        "     - Le processus central `dask-scheduler` coordonne les actions de plusieurs processus `dask-worker` répartis sur plusieurs machines et les requêtes concurrentes de plusieurs clients.\n",
+        "     - L'ordonnanceur est asynchrone et piloté par événements, répondant simultanément aux requêtes de calcul de multiples clients et suivant l'avancement de multiples travailleurs.\n",
+        "     - La nature asynchrone et pilotée par événements le rend flexible pour gérer concurrentement une variété de charges de travail provenant de multiples utilisateurs tout en gérant une population de travailleurs fluide avec des pannes et ajouts.\n",
+        "     - Les travailleurs communiquent entre eux pour le transfert massif de données via TCP.\n",
+        "- Pour configurer `dask.distributed`, nous devons créer une instance client en appelant la classe `Client` depuis `dask.distributed`.\n",
+        "- Cela créera en interne un ordonnanceur Dask et des travailleurs Dask.\n",
+        "- Nous obtiendrons le **lien du tableau de bord** où nous pouvons analyser les tâches s'exécutant en parallèle.\n",
+        "- Nous pouvons passer un nombre de travailleurs (via l'argument `n_workers`) et le nombre de threads par processus travailleur (via `threads_per_worker`).\n",
+        "- Dès la création d'un client, Dask commence automatiquement à l'utiliser.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "scrolled": true,
+        "id": "t5OjTK-6-STC"
+      },
+      "outputs": [],
+      "source": [
+        "from dask.distributed import Client\n",
+        "client = Client()\n",
+        "client = Client(n_workers=3, threads_per_worker=4)\n",
+        "client.cluster"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "avN4B6mw-STC"
+      },
+      "source": [
+        "Si vous êtes dans Google Colab, nous devons créer un tunnel pour rediriger le tableau de bord. Nous utiliserons le service [localXpose](https://localxpose.io/signup)\n",
+        "\n",
+        "C'est un service gratuit mais qui nécessite une inscription. Créez un compte et allez dans le menu \"Access\" pour copier votre propre token. Collez-le dans le code ci-dessous.\n",
+        "\n",
+        "**Si vous ne voulez pas créer un compte, vous pouvez sauter cette partie (ou alors executez Dask dans votre propre machine).**\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip install loclx-colab"
+      ],
+      "metadata": {
+        "id": "hni3cS6NEB_3"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import loclx_colab.loclx as lx\n",
+        "port = 8787 # The service port that you want to expose\n",
+        "access_token = 'YOUR TOKEN' # Your LocalXpose token here\n",
+        "url = lx.http_tunnel_start(port, access_token)\n",
+        "print(f\"Your service is exposed to this URL: https://{url}\")"
+      ],
+      "metadata": {
+        "id": "GQ1VdrKGEOJ_"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Parfois, l'URL ne s'affiche pas du premier coup. Vous pouvez imprimer la liste des adresses avec la commande ci-dessous. Copiez l'URL et collez-la dans un nouvel onglet de navigateur web.\n"
+      ],
+      "metadata": {
+        "id": "iLm3zZ9CJRPn"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# To list all live created tunels\n",
+        "print(lx.http_tunnel_status())"
+      ],
+      "metadata": {
+        "id": "aYVMLgoRGM3f"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "iBbIXt8wJQWG"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Avec le dashboard, vous pouvez suivre l'éxécution des appels suivants :"
+      ],
+      "metadata": {
+        "id": "MQcMe9SQJkCi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "0bSp1v2D-STC"
+      },
+      "outputs": [],
+      "source": [
+        "import random\n",
+        "\n",
+        "def random_slow_add(x, y):\n",
+        "    time.sleep(random.randrange(3,10))\n",
+        "    return x + y"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "hsQyB-fP-STC"
+      },
+      "outputs": [],
+      "source": [
+        "results = list()\n",
+        "\n",
+        "for x in data:\n",
+        "    y = dask.delayed(random_slow_add)(x, 1)\n",
+        "    results.append(y)\n",
+        "\n",
+        "total = dask.delayed(sum)(results)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "vdrm0Zg1-STC"
+      },
+      "outputs": [],
+      "source": [
+        "%time result = total.compute()\n",
+        "result"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "GhhAtkcU-STC"
+      },
+      "source": [
+        "Eteindre le cluster:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "GbX5qAxQ-STC"
+      },
+      "outputs": [],
+      "source": [
+        "client.close()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "dJ54gJxw-STC"
+      },
+      "source": [
+        "**<font color=\"red\">Points à considérer</font>**\n",
+        "\n",
+        "- Chaque tâche Dask a une surcharge (environ 1 ms). Si vous avez beaucoup de tâches, cette surcharge peut s'accumuler. Il est judicieux de donner à chaque tâche plus de quelques secondes de travail.\n",
+        "- Pour mieux comprendre les performances de votre programme, consultez la documentation [Dask Performance Diagnostics](https://distributed.dask.org/en/latest/diagnosing-performance.html). Vous pouvez aussi visionner la [vidéo](https://docs.dask.org/en/stable/diagnostics-distributed.html) pour apprendre à grouper votre travail en moins de tâches plus substantielles. Cela peut signifier appeler les opérations paresseuses d'un coup au lieu individuellement. Cela peut aussi impliquer une repartition de vos DataFrame(s).\n",
+        "- Une bonne règle empirique pour choisir le nombre de threads par travailleur Dask est la racine carrée du nombre de cœurs par nœud.\n",
+        "     - En général, plus de threads par travailleur sont bons pour un programme qui passe la plupart de son temps dans NumPy, SciPy, Numba, etc., et moins de threads par travailleur sont meilleurs pour des programmes plus simples qui passent la plupart de leur temps dans l'interpréteur Python.\n",
+        "- L'ordonnanceur Dask s'exécute sur un seul thread, donc lui assigner son propre nœud est un gaspillage.\n",
+        "- Il n'y a pas de limite stricte à l'évolutivité de Dask. La surcharge des tâches finira toutefois par submerger votre calcul selon la durée de chaque tâche.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [],
+      "metadata": {
+        "id": "_sgdmqkIfRAl"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "noMtaFSH-SSx"
+      },
+      "source": [
+        "## <font color=\"red\">Documents de Référence</font>\n",
+        "\n",
+        "- <a href=\"https://docs.dask.org/en/latest/why.html\">Pourquoi Dask ?</a>\n",
+        "- <a href=\"https://github.com/dask/dask-tutorial\">Tutoriel</a>\n",
+        "- <a href=\"https://www.manning.com/books/data-science-with-python-and-dask\">Data Science with Python and Dask</a>\n",
+        "- <a href=\"https://www.manifold.ai/dask-and-machine-learning-preprocessing-tutorial\">Dask and Machine Learning: Preprocessing Tutorial</a>\n",
+        "- <a href=\"https://carpentries-incubator.github.io/lesson-parallel-python/aio/index.html\">Parallel Programming in Python</a>\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "Os0b7j0OfHQb"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "name": "adv_viz.ipynb",
+      "provenance": [],
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "display_name": "Python 3 (ipykernel)",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.8.2"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/dask/prep.py b/dask/prep.py
new file mode 100644
index 0000000..bda7b48
--- /dev/null
+++ b/dask/prep.py
@@ -0,0 +1,133 @@
+import time
+import sys
+import argparse
+import os
+from glob import glob
+
+import tarfile
+import urllib.request
+
+import pandas as pd
+import dask.array as da
+
+DATASETS = ["random", "flights", "all"]
+here = os.path.dirname(__file__)
+data_dir = os.path.abspath(os.path.join(here, "data"))
+
+
+def parse_args(args=None):
+    parser = argparse.ArgumentParser(
+        description="Downloads, generates and prepares data for the Dask tutorial."
+    )
+    parser.add_argument(
+        "--small",
+        action="store_true",
+        default=None,
+        help="Whether to use smaller example datasets. Checks DASK_TUTORIAL_SMALL environment variable if not specified.",
+    )
+    parser.add_argument(
+        "-d", "--dataset", choices=DATASETS, help="Datasets to generate.", default="all"
+    )
+
+    return parser.parse_args(args)
+
+
+if not os.path.exists(data_dir):
+    raise OSError(
+        "data/ directory not found, aborting data preparation. "
+        'Restore it with "git checkout data" from the base '
+        "directory."
+    )
+
+
+def flights(small=None):
+    start = time.time()
+    flights_raw = os.path.join(data_dir, "nycflights.tar.gz")
+    flightdir = os.path.join(data_dir, "nycflights")
+    jsondir = os.path.join(data_dir, "flightjson")
+    if small is None:
+        small = bool(os.environ.get("DASK_TUTORIAL_SMALL", False))
+
+    if small:
+        N = 500
+    else:
+        N = 10_000
+
+    if not os.path.exists(flights_raw):
+        print("- Downloading NYC Flights dataset... ", end="", flush=True)
+        url = "https://storage.googleapis.com/dask-tutorial-data/nycflights.tar.gz"
+        urllib.request.urlretrieve(url, flights_raw)
+        print("done", flush=True)
+
+    if not os.path.exists(flightdir):
+        print("- Extracting flight data... ", end="", flush=True)
+        tar_path = os.path.join(data_dir, "nycflights.tar.gz")
+        with tarfile.open(tar_path, mode="r:gz") as flights:
+            flights.extractall("data/")
+
+        if small:
+            for path in glob(os.path.join(data_dir, "nycflights", "*.csv")):
+                with open(path, "r") as f:
+                    lines = f.readlines()[:1000]
+
+                with open(path, "w") as f:
+                    f.writelines(lines)
+
+        print("done", flush=True)
+
+    if not os.path.exists(jsondir):
+        print("- Creating json data... ", end="", flush=True)
+        os.mkdir(jsondir)
+        for path in glob(os.path.join(data_dir, "nycflights", "*.csv")):
+            prefix = os.path.splitext(os.path.basename(path))[0]
+            df = pd.read_csv(path, nrows=N)
+            df.to_json(
+                os.path.join(data_dir, "flightjson", prefix + ".json"),
+                orient="records",
+                lines=True,
+            )
+        print("done", flush=True)
+    else:
+        return
+
+    end = time.time()
+    print("** Created flights dataset! in {:0.2f}s**".format(end - start))
+
+
+def random_array(small=None):
+    if small is None:
+        small = bool(os.environ.get("DASK_TUTORIAL_SMALL", False))
+
+    t0 = time.time()
+    print("- Generating random array data... ", end="", flush=True)
+    if os.path.exists(os.path.join(data_dir, "random.zarr")) and os.path.exists(
+        os.path.join(data_dir, "random_sc.zarr")
+    ):
+        return
+
+    if small:
+        size = 20_000_000
+        random_arr = da.random.random(size=(size,), chunks=(625000,))
+        random_arr_small_chunks = da.random.random(size=(size,), chunks=(1000,))
+    else:
+        size = 200_000_000
+        random_arr = da.random.random(size=(size,), chunks=(6250000,))
+        random_arr_small_chunks = da.random.random(size=(size,), chunks=(10000,))
+
+    random_arr.to_zarr(os.path.join(data_dir, "random.zarr"))
+    random_arr_small_chunks.to_zarr(os.path.join(data_dir, "random_sc.zarr"))
+
+    t1 = time.time()
+    print("** Created random data for array exercise in {:0.2f}s".format(t1 - t0))
+
+
+def main(args=None):
+    args = parse_args(args)
+    if args.dataset == "random" or args.dataset == "all":
+        random_array(args.small)
+    if args.dataset == "flights" or args.dataset == "all":
+        flights(args.small)
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/dask/solutions/01_for_loop_delayed.py b/dask/solutions/01_for_loop_delayed.py
new file mode 100644
index 0000000..20abec3
--- /dev/null
+++ b/dask/solutions/01_for_loop_delayed.py
@@ -0,0 +1,9 @@
+results = []
+for x in data:
+    if is_even(x):  # even
+        y = delayed(double)(x)
+    else:          # odd
+        y = delayed(inc)(x)
+    results.append(y)
+    
+total = delayed(sum)(results)
\ No newline at end of file
diff --git a/dask/solutions/01_pandas_delayed.py b/dask/solutions/01_pandas_delayed.py
new file mode 100644
index 0000000..03242b5
--- /dev/null
+++ b/dask/solutions/01_pandas_delayed.py
@@ -0,0 +1,31 @@
+%%time
+
+# This is just one possible solution, there are
+# several ways to do this using `delayed`
+
+sums = []
+counts = []
+for fn in filenames:
+    # Read in file
+    df = delayed(pd.read_csv)(fn)
+
+    # Groupby origin airport
+    by_origin = df.groupby('Origin')
+
+    # Sum of all departure delays by origin
+    total = by_origin.DepDelay.sum()
+
+    # Number of flights by origin
+    count = by_origin.DepDelay.count()
+
+    # Save the intermediates
+    sums.append(total)
+    counts.append(count)
+
+# Compute the intermediates
+sums, counts = compute(sums, counts)
+
+# Combine intermediates to get total mean-delay-per-origin
+total_delays = sum(sums)
+n_flights = sum(counts)
+mean = total_delays / n_flights
\ No newline at end of file
diff --git a/dask/solutions/01_parallel_for.py b/dask/solutions/01_parallel_for.py
new file mode 100644
index 0000000..6fea8a8
--- /dev/null
+++ b/dask/solutions/01_parallel_for.py
@@ -0,0 +1,10 @@
+results = []
+
+for x in data:
+    y = delayed(inc)(x)
+    results.append(y)
+    
+total = delayed(sum)(results)
+print("Before computing:", total)  # Let's see what type of thing total is
+result = total.compute()
+print("After computing :", result)  # After it's computed
\ No newline at end of file
diff --git a/dask/solutions/01x_verbose_concise.py b/dask/solutions/01x_verbose_concise.py
new file mode 100644
index 0000000..53036be
--- /dev/null
+++ b/dask/solutions/01x_verbose_concise.py
@@ -0,0 +1,22 @@
+## verbose version
+delayed_read_csv = delayed(pd.read_csv)
+a = delayed_read_csv(filenames[0])
+b = delayed_read_csv(filenames[1])
+c = delayed_read_csv(filenames[2])
+
+delayed_len = delayed(len)
+na = delayed_len(a)
+nb = delayed_len(b)
+nc = delayed_len(c)
+
+delayed_sum = delayed(sum)
+
+total = delayed_sum([na, nb, nc])
+%time print(total.compute())
+
+
+## concise version
+csvs = [delayed(pd.read_csv)(fn) for fn in filenames]
+lens = [delayed(len)(csv) for csv in csvs]
+total = delayed(sum)(lens)
+%time print(total.compute())
\ No newline at end of file
diff --git a/dask/solutions/03_mean_by_block.py b/dask/solutions/03_mean_by_block.py
new file mode 100644
index 0000000..b59ca9c
--- /dev/null
+++ b/dask/solutions/03_mean_by_block.py
@@ -0,0 +1,10 @@
+sums = []
+lengths = []
+for i in range(0, 1_000_000_000, 1_000_000):
+    chunk = dset[i: i + 1_000_000]  # pull out numpy array
+    sums.append(chunk.sum())
+    lengths.append(len(chunk))
+
+total = sum(sums)
+length = sum(lengths)
+print(total / length)
\ No newline at end of file
diff --git a/dask/solutions/04_exo1.py b/dask/solutions/04_exo1.py
new file mode 100644
index 0000000..68e759a
--- /dev/null
+++ b/dask/solutions/04_exo1.py
@@ -0,0 +1 @@
+len(df)
\ No newline at end of file
diff --git a/dask/solutions/04_exo2.py b/dask/solutions/04_exo2.py
new file mode 100644
index 0000000..8e17b2d
--- /dev/null
+++ b/dask/solutions/04_exo2.py
@@ -0,0 +1 @@
+len(df[~df.Cancelled])
\ No newline at end of file
diff --git a/dask/solutions/04_exo3.py b/dask/solutions/04_exo3.py
new file mode 100644
index 0000000..8d84a09
--- /dev/null
+++ b/dask/solutions/04_exo3.py
@@ -0,0 +1 @@
+df[~df.Cancelled].groupby('Origin').Origin.count().compute()
\ No newline at end of file
diff --git a/dask/solutions/04_exo4.py b/dask/solutions/04_exo4.py
new file mode 100644
index 0000000..e8ccd8e
--- /dev/null
+++ b/dask/solutions/04_exo4.py
@@ -0,0 +1 @@
+df.groupby("Origin").DepDelay.mean().compute()
\ No newline at end of file
diff --git a/dask/solutions/04_exo5.py b/dask/solutions/04_exo5.py
new file mode 100644
index 0000000..023bdc2
--- /dev/null
+++ b/dask/solutions/04_exo5.py
@@ -0,0 +1 @@
+df.groupby("DayOfWeek").DepDelay.mean().compute()
\ No newline at end of file
diff --git a/dask/solutions/04_map_partitions.py b/dask/solutions/04_map_partitions.py
new file mode 100644
index 0000000..d62dd67
--- /dev/null
+++ b/dask/solutions/04_map_partitions.py
@@ -0,0 +1,11 @@
+def compute_departure_timestamp(df):
+    hours = df.CRSDepTime // 100
+    hours_timedelta = pd.to_timedelta(hours, unit='h')
+
+    minutes = df.CRSDepTime % 100
+    minutes_timedelta = pd.to_timedelta(minutes, unit='m')
+
+    return df.Date + hours_timedelta + minutes_timedelta
+
+departure_timestamp = df.map_partitions(compute_departure_timestamp)
+departure_timestamp.head()
\ No newline at end of file
diff --git a/dask/sources.py b/dask/sources.py
new file mode 100644
index 0000000..48dd6a7
--- /dev/null
+++ b/dask/sources.py
@@ -0,0 +1,3 @@
+flights_url = "https://storage.googleapis.com/dask-tutorial-data/nycflights.tar.gz"
+lazy_url = "http://www.google.com"
+bag_url = "s3://dask-data/nyc-taxi/2015/yellow_tripdata_2015-01.csv"
diff --git a/exercise-instructions.md b/exercise-instructions.md
deleted file mode 100644
index c055ede..0000000
--- a/exercise-instructions.md
+++ /dev/null
@@ -1,66 +0,0 @@
-## Course setup
-
-Download (clone) the exercise material with
-
-```bash
-$ git clone https://github.com/csc-training/hpc-python.git
-```
-The external Python packages can be installed with **pip** by using the provided 
-[requirements.txt](requirements.txt) file. In CSC classroom this can be done 
-(aftter cloning the material) as
-
-```bash
-$ cd hpc-python
-$ pip3.4 install -r requirements.txt --user
-```
-
-You can test that all packages are available by
-```bash
-$ cd test
-$ python3.4 test.py
-```
-
-## General exercise instructions
-
-Simple exercises can be carried out directly in the interactive interpreter.
-For more complex ones it is recommended to write the program into a .py file.
-Still, it is useful to keep an interactive interpreter open for testing!
-Some exercises contain references to functions/modules which are not addressed
-in actual lectures. In these cases Python's interactive help (and google) are
-useful, e.g.
-
-```
-In [4]: help(numpy)
-```
-
-It is not necessary to complete all the exercises, instead you may leave some
-for further study at home. Also, some Bonus exercises are provided in the end.
-
-### Visualisation
-
-In some exercises it might be convenient to do visualisations with matplotlib
-Python package. Interactive plotting is most convenient with the IPython
-enhanced interpreter. For enabling interactive plotting, start IPython with
---matplotlib argument:
-```
-% ipython --matplotlib
-```
-Simple x-y plots can then be done as:
-
-```
-In [1]: import matplotlib.pyplot as plt
-…
-In [6]: plt.plot(x,y)  # line
-In [7]: plt.plot(x,y, ’ro’)  # individual points in red
-```
-Look matplotlib documentation for additional information for visualisation.
-
-### Parallel calculations
-
-In class room workstations MPI parallel Python programs can be launched with mpiexec, e.g. to
-run with 4 MPI tasks one issues
-
-```
-% mpiexec –n 4 python3.4 example.py
-```
-
diff --git a/mpi/MPI_on_Python.ipynb b/mpi/MPI_on_Python.ipynb
new file mode 100644
index 0000000..8077371
--- /dev/null
+++ b/mpi/MPI_on_Python.ipynb
@@ -0,0 +1,830 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "7236f31d",
+   "metadata": {},
+   "source": [
+    "\n",
+    "# Message passing interface\n",
+    "\n",
+    "- MPI is an application programming interface (API) for communication\n",
+    "  between separate processes\n",
+    "- MPI programs are portable and scalable\n",
+    "    - the same program can run on different types of computers, from PC's\n",
+    "      to supercomputers\n",
+    "    - the most widely used approach for distributed parallel computing\n",
+    "- MPI is flexible and comprehensive\n",
+    "    - large (over 300 procedures)\n",
+    "    - concise (often only 6 procedures are needed)\n",
+    "- MPI standard defines C and Fortran interfaces\n",
+    "    - MPI for Python (mpi4py) provides an unofficial Python interface\n",
+    "\n",
+    "\n",
+    "# Processes and threads\n",
+    "\n",
+    "![](../docs/img/processes-threads-highlight-proc.svg){.center width=80%}\n",
+    "\n",
+    "\n",
+    "<div class=\"column\">\n",
+    "\n",
+    "## Process\n",
+    "\n",
+    "- Independent execution units\n",
+    "- Have their own state information and *own memory* address space\n",
+    "\n",
+    "</div>\n",
+    "<div class=\"column\">\n",
+    "\n",
+    "## Thread\n",
+    "\n",
+    "- A single process may contain multiple threads\n",
+    "- Have their own state information, but *share* the *same memory*\n",
+    "  address space\n",
+    "\n",
+    "</div>\n",
+    "\n",
+    "\n",
+    "# Execution model\n",
+    "\n",
+    "- MPI program is launched as a set of *independent*, *identical processes*\n",
+    "    - execute the same program code and instructions\n",
+    "    - can reside in different nodes (or even in different computers)\n",
+    "- The way to launch a MPI program depends on the system\n",
+    "    - mpiexec, mpirun, srun, aprun, ...\n",
+    "    - mpiexec/mpirun in training class\n",
+    "    - srun on puhti.csc.fi\n",
+    "\n",
+    "\n",
+    "# MPI rank\n",
+    "\n",
+    "- Rank: ID number given to a process\n",
+    "    - it is possible to query for rank\n",
+    "    - processes can perform different tasks based on their rank\n",
+    "\n",
+    "```python\n",
+    "if (rank == 0):\n",
+    "    # do something\n",
+    "elif (rank == 1):\n",
+    "    # do something else\n",
+    "else:\n",
+    "    # all other processes do something different\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Data model\n",
+    "\n",
+    "- Each MPI process has its own *separate* memory space, i.e. all\n",
+    "  variables and data structures are *local* to the process\n",
+    "- Processes can exchange data by sending and receiving messages\n",
+    "\n",
+    "![](../docs/img/data-model.svg){.center width=90%}\n",
+    "\n",
+    "\n",
+    "# MPI communicator\n",
+    "\n",
+    "- Communicator: a group containing all the processes that will participate\n",
+    "  in communication\n",
+    "    - in mpi4py most MPI calls are implemented as methods of a\n",
+    "    communicator object\n",
+    "    - `MPI_COMM_WORLD` contains all processes (`MPI.COMM_WORLD` in\n",
+    "    mpi4py)\n",
+    "    - user can define custom communicators\n",
+    "\n",
+    "\n",
+    "# Routines in MPI for Python\n",
+    "\n",
+    "- Communication between processes\n",
+    "    - sending and receiving messages between two processes\n",
+    "    - sending and receiving messages between several processes\n",
+    "- Synchronization between processes\n",
+    "- Communicator creation and manipulation\n",
+    "- Advanced features (e.g. user defined datatypes, one-sided communication\n",
+    "  and parallel I/O)\n",
+    "\n",
+    "\n",
+    "# Getting started\n",
+    "\n",
+    "- Basic methods of communicator object\n",
+    "    - `Get_size()` Number of processes in communicator\n",
+    "    - `Get_rank()` rank of this process\n",
+    "\n",
+    "```python\n",
+    "from mpi4py import MPI\n",
+    "\n",
+    "comm = MPI.COMM_WORLD # communicator object containing all processes\n",
+    "\n",
+    "size = comm.Get_size()\n",
+    "rank = comm.Get_rank()\n",
+    "\n",
+    "print(\"I am rank %d in group of %d processes\" % (rank, size))\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Running an example program\n",
+    "\n",
+    "```bash\n",
+    "$ mpiexec -n 4 python3 hello.py\n",
+    "\n",
+    "I am rank 2 in group of 4 processes\n",
+    "I am rank 0 in group of 4 processes\n",
+    "I am rank 3 in group of 4 processes\n",
+    "I am rank 1 in group of 4 processes\n",
+    "```\n",
+    "\n",
+    "```python\n",
+    "from mpi4py import MPI\n",
+    "\n",
+    "comm = MPI.COMM_WORLD # communicator object containing all processes\n",
+    "\n",
+    "size = comm.Get_size()\n",
+    "rank = comm.Get_rank()\n",
+    "\n",
+    "print(\"I am rank %d in group of %d processes\" % (rank, size))\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Point-to-Point Communication\n",
+    "\n",
+    "\n",
+    "<div class=\"column\">\n",
+    "\n",
+    "- Data is local to the MPI processes\n",
+    "    - They need to *communicate* to coordinate work\n",
+    "- Point-to-point communication\n",
+    "    - Messages are sent between two processes\n",
+    "- Collective communication\n",
+    "    - Involving a number of processes at the same time\n",
+    "\n",
+    "</div>\n",
+    "\n",
+    "<div class=\"column\">\n",
+    "\n",
+    "![](../docs/img/communication-schematic.svg){.center width=50%}\n",
+    "\n",
+    "</div>\n",
+    "\n",
+    "\n",
+    "# MPI point-to-point operations\n",
+    "\n",
+    "- One process *sends* a message to another process that *receives* it\n",
+    "- Sends and receives in a program should match - one receive per send\n",
+    "- Each message contains\n",
+    "    - The actual *data* that is to be sent\n",
+    "    - The *datatype* of each element of data\n",
+    "    - The *number of elements* the data consists of\n",
+    "    - An identification number for the message (*tag*)\n",
+    "    - The ranks of the *source* and *destination* process\n",
+    "- With **mpi4py** it is often enough to specify only *data* and\n",
+    "  *source* and *destination*\n",
+    "\n",
+    "# Sending and receiving data\n",
+    "\n",
+    "- Sending and receiving a dictionary\n",
+    "\n",
+    "```python\n",
+    "from mpi4py import MPI\n",
+    "\n",
+    "comm = MPI.COMM_WORLD # communicator object containing all processes\n",
+    "rank = comm.Get_rank()\n",
+    "\n",
+    "if rank == 0:\n",
+    "    data = {'a': 7, 'b': 3.14}\n",
+    "    comm.send(data, dest=1)\n",
+    "elif rank == 1:\n",
+    "    data = comm.recv(source=0)\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Sending and receiving data\n",
+    "\n",
+    "- Arbitrary Python objects can be communicated with the send and\n",
+    "  receive methods of a communicator\n",
+    "\n",
+    "<div class=\"column\">\n",
+    "\n",
+    "`.send(data, dest)`\n",
+    "  : `data`{.input}\n",
+    "    : Python object to send\n",
+    "\n",
+    "    `dest`{.input}\n",
+    "    : destination rank\n",
+    "\n",
+    "</div>\n",
+    "<div class=\"column\">\n",
+    "\n",
+    "`.recv(source)`\n",
+    "  : `source`{.input}\n",
+    "    : source rank\n",
+    "    : note: data is provided as return value\n",
+    "\n",
+    "</div>\n",
+    "\n",
+    "- Destination and source ranks have to match!\n",
+    "\n",
+    "\n",
+    "# Blocking routines & deadlocks\n",
+    "\n",
+    "- `send()` and `recv()` are *blocking* routines\n",
+    "    - the functions exit only once it is safe to use the data (memory)\n",
+    "    involved in the communication\n",
+    "- Completion depends on other processes => risk for *deadlocks*\n",
+    "    - for example, if all processes call `recv()` there is no-one left to\n",
+    "    call a corresponding `send()` and the program is *stuck forever*\n",
+    "\n",
+    "\n",
+    "# Typical point-to-point communication patterns\n",
+    "\n",
+    "![](../docs/img/comm_patt.svg){.center width=100%}\n",
+    "\n",
+    "<br>\n",
+    "\n",
+    "- Incorrect ordering of sends and receives may result in a deadlock\n",
+    "\n",
+    "\n",
+    "# Case study: parallel sum\n",
+    "\n",
+    "<div class=column style=\"width:30%\">\n",
+    "![](img/parallel-sum-0.svg){.center width=70%}\n",
+    "</div>\n",
+    "\n",
+    "<div class=column style=\"width:68%\">\n",
+    "## Initial state\n",
+    "\n",
+    "An array A containing floating point numbers read from a a file by the first\n",
+    "MPI task (rank 0).\n",
+    "\n",
+    "## Goal\n",
+    "\n",
+    "Calculate the total sum of all elements in array A in parallel.\n",
+    "</div>\n",
+    "\n",
+    "\n",
+    "# Case study: parallel sum\n",
+    "\n",
+    "<div class=column style=\"width:30%\">\n",
+    "![](../docs/img/parallel-sum-0.svg){.center width=70%}\n",
+    "</div>\n",
+    "\n",
+    "<div class=column style=\"width:68%\">\n",
+    "## Parallel algorithm\n",
+    "\n",
+    "<pre style=\"border:none; margin-top:1em; font-size:1em\">\n",
+    "1. Scatter the data\n",
+    "   1.1. receive operation for scatter\n",
+    "   1.2. send operation for scatter\n",
+    "2. Compute partial sums in parallel\n",
+    "3. Gather the partial sums\n",
+    "   3.1. receive operation for gather\n",
+    "   3.2. send operation for gather\n",
+    "4. Compute the total sum\n",
+    "</pre>\n",
+    "\n",
+    "</div>\n",
+    "\n",
+    "\n",
+    "# Step 1.1: Receive operation for scatter\n",
+    "\n",
+    "![](../docs/img/parallel-sum-1.1.png){.center width=55%}\n",
+    "\n",
+    "\n",
+    "# Step 1.2: Send operation for scatter\n",
+    "\n",
+    "![](../docs/img/parallel-sum-1.2.png){.center width=55%}\n",
+    "\n",
+    "\n",
+    "# Step 2: Compute partial sums in parallel\n",
+    "\n",
+    "![](../docs/img/parallel-sum-2.png){.center width=55%}\n",
+    "\n",
+    "\n",
+    "# Step 3.1: Receive operation for gather\n",
+    "\n",
+    "![](../docs/img/parallel-sum-3.1.png){.center width=55%}\n",
+    "\n",
+    "\n",
+    "# Step 3.2: Send operation for gather\n",
+    "\n",
+    "![](../docs/img/parallel-sum-3.2.png){.center width=55%}\n",
+    "\n",
+    "\n",
+    "# Step 4: Compute the total sum\n",
+    "\n",
+    "![](../docs/img/parallel-sum-4.png){.center width=55%}\n",
+    "\n",
+    "\n",
+    "# Communicating NumPy arrays\n",
+    "\n",
+    "- Arbitrary Python objects are converted to byte streams (pickled) when\n",
+    "  sending and back to Python objects (unpickled) when receiving\n",
+    "    - these conversions may be a serious overhead to communication\n",
+    "- Contiguous memory buffers (such as NumPy arrays) can be communicated\n",
+    "  with very little overhead using upper case methods:\n",
+    "    - `Send(data, dest)`\n",
+    "    - `Recv(data, source)`\n",
+    "    - note the difference in receiving: the data array has to exist at the\n",
+    "      time of call\n",
+    "\n",
+    "\n",
+    "# Send/receive a NumPy array\n",
+    "\n",
+    "- Note the difference between upper/lower case!\n",
+    "    - send/recv: general Python objects, slow\n",
+    "    - Send/Recv: continuous arrays, fast\n",
+    "\n",
+    "```python\n",
+    "from mpi4py import MPI\n",
+    "import numpy\n",
+    "\n",
+    "comm = MPI.COMM_WORLD\n",
+    "rank = comm.Get_rank()\n",
+    "\n",
+    "data = numpy.empty(100, dtype=float)\n",
+    "if rank == 0:\n",
+    "    data[:] = numpy.arange(100, dtype=float)\n",
+    "    comm.Send(data, dest=1)\n",
+    "elif rank == 1:\n",
+    "    comm.Recv(data, source=0)\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Combined send and receive\n",
+    "\n",
+    "- Send one message and receive another with a single command\n",
+    "    - reduces risk for deadlocks\n",
+    "- Destination and source ranks can be same or different\n",
+    "    - `MPI.PROC_NULL` can be used for *no destination/source*\n",
+    "\n",
+    "```python\n",
+    "data = numpy.arange(10, dtype=float) * (rank + 1)\n",
+    "buffer = numpy.empty(data.shape, dtype=data.dtype)\n",
+    "\n",
+    "if rank == 0:\n",
+    "    dest, source = 1, 1\n",
+    "elif rank == 1:\n",
+    "    dest, source = 0, 0\n",
+    "\n",
+    "comm.Sendrecv(data, dest=dest, recvbuf=buffer, source=source)\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# MPI datatypes\n",
+    "\n",
+    "- MPI has a number of predefined datatypes to represent data\n",
+    "    - e.g. `MPI.INT` for integer and `MPI.DOUBLE` for float\n",
+    "- No need to specify the datatype for Python objects or Numpy arrays\n",
+    "    - objects are serialised as byte streams\n",
+    "    - automatic detection for NumPy arrays\n",
+    "- If needed, one can also define custom datatypes\n",
+    "    - for example to use non-contiguous data buffers\n",
+    "\n",
+    "# Summary\n",
+    "\n",
+    "- Point-to-point communication = messages are sent between two MPI\n",
+    "  processes\n",
+    "- Point-to-point operations enable any parallel communication pattern (in\n",
+    "  principle)\n",
+    "- Arbitrary Python objects (that can be pickled!)\n",
+    "    - `send` / `recv`\n",
+    "    - `sendrecv`\n",
+    "- Memory buffers such as Numpy arrays\n",
+    "    - `Send` / `Recv`\n",
+    "    - `Sendrecv`\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Non-blocking communication\n",
+    "\n",
+    "- Non-blocking sends and receives\n",
+    "    - `isend` & `irecv`\n",
+    "    - returns immediately and sends/receives in background\n",
+    "    - return value is a Request object\n",
+    "- Enables some computing concurrently with communication\n",
+    "- Avoids many common dead-lock situations\n",
+    "\n",
+    "\n",
+    "# Non-blocking communication\n",
+    "\n",
+    "- Have to finalize send/receive operations\n",
+    "    - `wait()`\n",
+    "        - Waits for the communication started with `isend` or `irecv` to\n",
+    "          finish (blocking)\n",
+    "    - `test()`\n",
+    "        - Tests if the communication has finished (non-blocking)\n",
+    "- You can mix non-blocking and blocking p2p routines\n",
+    "    - e.g., receive `isend` with `recv`\n",
+    "\n",
+    "\n",
+    "# Example: non-blocking send/receive\n",
+    "\n",
+    "```python\n",
+    "rank = comm.Get_rank()\n",
+    "size = comm.Get_size()\n",
+    "\n",
+    "if rank == 0:\n",
+    "    data = arange(size, dtype=float) * (rank + 1)\n",
+    "    req = comm.Isend(data, dest=1)    # start a send\n",
+    "    calculate_something(rank)         # .. do something else ..\n",
+    "    req.wait()                        # wait for send to finish\n",
+    "    # safe to read/write data again\n",
+    "\n",
+    "elif rank == 1:\n",
+    "    data = empty(size, float)\n",
+    "    req = comm.Irecv(data, source=0)  # post a receive\n",
+    "    calculate_something(rank)         # .. do something else ..\n",
+    "    req.wait()                        # wait for receive to finish\n",
+    "    # data is now ready for use\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Multiple non-blocking operations\n",
+    "\n",
+    "- Methods `waitall()` and `waitany()` may come handy when dealing with\n",
+    "  multiple non-blocking operations (available in the `MPI.Request` class)\n",
+    "    - `Request.waitall(requests)`\n",
+    "        - wait for all initiated requests to complete\n",
+    "    - `Request.waitany(requests)`\n",
+    "        - wait for any initiated request to complete\n",
+    "- For example, assuming `requests` is a list of request objects, one can wait\n",
+    "  for all of them to be finished with:\n",
+    "\n",
+    "~~~python\n",
+    "MPI.Request.waitall(requests)\n",
+    "~~~\n",
+    "\n",
+    "\n",
+    "# Example: non-blocking message chain\n",
+    "\n",
+    "<small>\n",
+    "\n",
+    "~~~python\n",
+    "from mpi4py import MPI\n",
+    "import numpy\n",
+    "\n",
+    "comm = MPI.COMM_WORLD\n",
+    "rank = comm.Get_rank()\n",
+    "size = comm.Get_size()\n",
+    "\n",
+    "data = numpy.arange(10, dtype=float) * (rank + 1)  # send buffer\n",
+    "buffer = numpy.zeros(10, dtype=float)              # receive buffer\n",
+    "\n",
+    "tgt = rank + 1\n",
+    "src = rank - 1\n",
+    "if rank == 0:\n",
+    "    src = MPI.PROC_NULL\n",
+    "if rank == size - 1:\n",
+    "    tgt = MPI.PROC_NULL\n",
+    "\n",
+    "req = []\n",
+    "req.append(comm.Isend(data, dest=tgt))\n",
+    "req.append(comm.Irecv(buffer, source=src))\n",
+    "\n",
+    "MPI.Request.waitall(req)\n",
+    "~~~\n",
+    "\n",
+    "</small>\n",
+    "\n",
+    "\n",
+    "# Overlapping computation and communication\n",
+    "\n",
+    "<div class=\"column\">\n",
+    "~~~python\n",
+    "request_in = comm.Irecv(ghost_data)\n",
+    "request_out = comm.Isend(border_data)\n",
+    "\n",
+    "compute(ghost_independent_data)\n",
+    "request_in.wait()\n",
+    "\n",
+    "compute(border_data)\n",
+    "request_out.wait()\n",
+    "~~~\n",
+    "</div>\n",
+    "\n",
+    "<div class=\"column\">\n",
+    "![](../docs/img/non-blocking-pattern.png)\n",
+    "</div>\n",
+    "\n",
+    "\n",
+    "# Summary\n",
+    "\n",
+    "- Non-blocking communication is usually the smart way to do point-to-point\n",
+    "  communication in MPI\n",
+    "- Non-blocking communication realization\n",
+    "    - `isend` / `Isend`\n",
+    "    - `irecv` / `Irecv`\n",
+    "    - `request.wait()`\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Communicators\n",
+    "\n",
+    "- The communicator determines the \"communication universe\"\n",
+    "    - The source and destination of a message is identified by process rank\n",
+    "      *within* the communicator\n",
+    "- So far: `MPI.COMM_WORLD`\n",
+    "- Processes can be divided into subcommunicators\n",
+    "    - Task level parallelism with process groups performing separate tasks\n",
+    "    - Collective communication within a group of processes\n",
+    "    - Parallel I/O\n",
+    "\n",
+    "\n",
+    "# Communicators\n",
+    "\n",
+    "<div class=\"column\">\n",
+    "- Communicators are dynamic\n",
+    "- A task can belong simultaneously to several communicators\n",
+    "    - Unique rank in each communicator\n",
+    "</div>\n",
+    "<div class=\"column\">\n",
+    "![](../docs/img/communicator.svg){.center width=80%}\n",
+    "</div>\n",
+    "\n",
+    "\n",
+    "\n",
+    "# User-defined communicators\n",
+    "\n",
+    "- By default a single, universal communicator exists to which all\n",
+    "  processes belong (`MPI.COMM_WORLD`)\n",
+    "- One can create new communicators, e.g. by splitting this into\n",
+    "  sub-groups\n",
+    "\n",
+    "```python\n",
+    "comm = MPI.COMM_WORLD\n",
+    "rank = comm.Get_rank()\n",
+    "\n",
+    "color = rank % 4\n",
+    "\n",
+    "local_comm = comm.Split(color)\n",
+    "local_rank = local_comm.Get_rank()\n",
+    "\n",
+    "print(\"Global rank: %d Local rank: %d\" % (rank, local_rank))\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Collective communication\n",
+    "\n",
+    "- Collective communication transmits data among all processes in a process\n",
+    "  group (communicator)\n",
+    "    - these routines must be called by all the processes in the group\n",
+    "    - amount of sent and received data must match\n",
+    "- Collective communication includes\n",
+    "    - data movement\n",
+    "    - collective computation\n",
+    "    - synchronization\n",
+    "- Example\n",
+    "    - `comm.barrier()` makes every task hold until all tasks in the\n",
+    "      communicator `comm` have called it\n",
+    "\n",
+    "\n",
+    "# Collective communication\n",
+    "\n",
+    "- Collective communication typically outperforms point-to-point\n",
+    "  communication\n",
+    "- Code becomes more compact (and efficient!) and easier to maintain:\n",
+    "    - For example, communicating a Numpy array of 1M elements from task 0 to all\n",
+    "      other tasks:\n",
+    "\n",
+    "<div class=\"column\">\n",
+    "\n",
+    "```python\n",
+    "if rank == 0:\n",
+    "    for i in range(1, size):\n",
+    "    comm.Send(data, i)\n",
+    "else:\n",
+    "    comm.Recv(data, 0)\n",
+    "```\n",
+    "\n",
+    "</div>\n",
+    "<div class=\"column\">\n",
+    "\n",
+    "```python\n",
+    "comm.Bcast(data, 0)\n",
+    "```\n",
+    "\n",
+    "</div>\n",
+    "\n",
+    "\n",
+    "# Broadcast\n",
+    "\n",
+    "- Send the same data from one process to all the other\n",
+    "\n",
+    "![](../docs/img/mpi-bcast.svg){.center width=80%}\n",
+    "\n",
+    "\n",
+    "# Broadcast\n",
+    "\n",
+    "- Broadcast sends same data to all processes\n",
+    "\n",
+    "```python\n",
+    "from mpi4py import MPI\n",
+    "import numpy\n",
+    "\n",
+    "comm = MPI.COMM_WORLD\n",
+    "rank = comm.Get_rank()\n",
+    "\n",
+    "if rank == 0:\n",
+    "    py_data = {'key1' : 0.0, 'key2' : 11}  # Python object\n",
+    "    data = np.arange(8) / 10.              # NumPy array\n",
+    "else:\n",
+    "    py_data = None\n",
+    "    data = np.zeros(8)\n",
+    "\n",
+    "new_data = comm.bcast(py_data, root=0)\n",
+    "\n",
+    "comm.Bcast(data, root=0)\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Scatter\n",
+    "\n",
+    "- Send equal amount of data from one process to others\n",
+    "- Segments A, B, ... may contain multiple elements\n",
+    "\n",
+    "![](../docs/img/mpi-scatter.svg){.center width=80%}\n",
+    "\n",
+    "\n",
+    "# Scatter\n",
+    "\n",
+    "- Scatter distributes data to processes\n",
+    "\n",
+    "```python\n",
+    "from mpi4py import MPI\n",
+    "from numpy import arange, empty\n",
+    "\n",
+    "comm = MPI.COMM_WORLD\n",
+    "rank = comm.Get_rank()\n",
+    "size = comm.Get_size()\n",
+    "if rank == 0:\n",
+    "    py_data = range(size)\n",
+    "    data = arange(size**2, dtype=float)\n",
+    "else:\n",
+    "    py_data = None\n",
+    "    data = None\n",
+    "\n",
+    "new_data = comm.scatter(py_data, root=0)  # returns the value\n",
+    "\n",
+    "buffer = empty(size, float)         # prepare a receive buffer\n",
+    "comm.Scatter(data, buffer, root=0)  # in-place modification\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Gather\n",
+    "\n",
+    "- Collect data from all the process to one process\n",
+    "- Segments A, B, ... may contain multiple elements\n",
+    "\n",
+    "![](../docs/img/mpi-gather.svg){.center width=80%}\n",
+    "\n",
+    "\n",
+    "# Gather\n",
+    "\n",
+    "- Gather pulls data from all processes\n",
+    "\n",
+    "```python\n",
+    "from mpi4py import MPI\n",
+    "from numpy import arange, zeros\n",
+    "\n",
+    "comm = MPI.COMM_WORLD\n",
+    "rank = comm.Get_rank()\n",
+    "size = comm.Get_size()\n",
+    "\n",
+    "data = arange(10, dtype=float) * (rank + 1)\n",
+    "buffer = zeros(size * 10, float)\n",
+    "\n",
+    "n = comm.gather(rank, root=0)     # returns the value\n",
+    "comm.Gather(data, buffer, root=0) # in-place modification\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Reduce\n",
+    "\n",
+    "- Applies an operation over set of processes and places result in\n",
+    "  single process\n",
+    "\n",
+    "![](../docs/img/mpi-reduce.svg){.center width=80%}\n",
+    "\n",
+    "# Reduce\n",
+    "\n",
+    "- Reduce gathers data and applies an operation on it\n",
+    "\n",
+    "```python\n",
+    "from mpi4py import MPI\n",
+    "from numpy import arange, empty\n",
+    "\n",
+    "comm = MPI.COMM_WORLD\n",
+    "rank = comm.Get_rank()\n",
+    "size = comm.Get_size()\n",
+    "\n",
+    "data = arange(10 * size, dtype=float) * (rank + 1)\n",
+    "buffer = zeros(size * 10, float)\n",
+    "\n",
+    "n = comm.reduce(rank, op=MPI.SUM, root=0)     # returns the value\n",
+    "comm.Reduce(data, buffer, op=MPI.SUM, root=0) # in-place modification\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Other common collective operations\n",
+    "\n",
+    "Scatterv\n",
+    "  : each process receives different amount of data\n",
+    "\n",
+    "Gatherv\n",
+    "  : each process sends different amount of data\n",
+    "\n",
+    "Allreduce\n",
+    "  : all processes receive the results of reduction\n",
+    "\n",
+    "Alltoall\n",
+    "  : each process sends and receives to/from each other\n",
+    "\n",
+    "Alltoallv\n",
+    "  : each process sends and receives different amount of data\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Non-blocking collectives\n",
+    "\n",
+    "- New in MPI 3: no support in mpi4py\n",
+    "- Non-blocking collectives enable the overlapping of communication and\n",
+    "  computation together with the benefits of collective communication\n",
+    "- Restrictions\n",
+    "    - have to be called in same order by all ranks in a communicator\n",
+    "    - mixing of blocking and non-blocking collectives is not allowed\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Common mistakes with collectives\n",
+    "\n",
+    "1. Using a collective operation within one branch of an if-else test based on\n",
+    "   the rank of the process\n",
+    "    - for example: `if rank == 0: comm.bcast(...)`\n",
+    "    - all processes in a communicator must call a collective routine!\n",
+    "2. Assuming that all processes making a collective call would complete at\n",
+    "   the same time.\n",
+    "3. Using the input buffer also as an output buffer:\n",
+    "    - for example: `comm.Scatter(a, a, MPI.SUM)`\n",
+    "    - always use different memory locations (arrays) for input and output!\n",
+    "\n",
+    "\n",
+    "# Summary\n",
+    "\n",
+    "- Collective communications involve all the processes within a\n",
+    "  communicator\n",
+    "    - all processes must call them\n",
+    "- Collective operations make code more transparent and compact\n",
+    "- Collective routines allow optimizations by MPI library\n",
+    "- MPI-3 contains also non-blocking collectives, but these are currently\n",
+    "  not supported by MPI for Python\n",
+    "\n",
+    "\n",
+    "# On-line resources\n",
+    "\n",
+    "- Documentation for mpi4py is quite limited\n",
+    "    - short on-line manual available at\n",
+    "    [https://mpi4py.readthedocs.io/](https://mpi4py.readthedocs.io/)\n",
+    "- Some good references:\n",
+    "    - \"A Python Introduction to Parallel Programming with MPI\" *by Jeremy\n",
+    "      Bejarano* [http://materials.jeremybejarano.com/MPIwithPython/](http://materials.jeremybejarano.com/MPIwithPython/)\n",
+    "    - \"mpi4py examples\" *by Jörg Bornschein* [https://github.com/jbornschein/mpi4py-examples](https://github.com/jbornschein/mpi4py-examples)\n",
+    "\n",
+    "\n",
+    "# Summary\n",
+    "\n",
+    "- mpi4py provides Python interface to MPI\n",
+    "- MPI calls via communicator object\n",
+    "- Possible to communicate arbitrary Python objects\n",
+    "- NumPy arrays can be communicated with nearly same speed as in C/Fortran\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1b15b4d8",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/multiprocessing/Multiprocessing.ipynb b/multiprocessing/Multiprocessing.ipynb
new file mode 100644
index 0000000..87e0fa8
--- /dev/null
+++ b/multiprocessing/Multiprocessing.ipynb
@@ -0,0 +1,302 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a764204e",
+   "metadata": {},
+   "source": [
+    "\n",
+    "\n",
+    "# Processes and threads\n",
+    "\n",
+    "\n",
+    "\n",
+    "## Process\n",
+    "\n",
+    "- Independent execution units\n",
+    "- Have their own state information and *own memory* address space\n",
+    "\n",
+    "\n",
+    "## Thread\n",
+    "\n",
+    "- A single process may contain multiple threads\n",
+    "- Have their own state information, but *share* the *same memory*\n",
+    "  address space\n",
+    "\n",
+    "\n",
+    "\n",
+    "# Processes and threads\n",
+    "\n",
+    "![](../docs/img/processes-threads.png)\n",
+    "\n",
+    "<div class=\"column\">\n",
+    "\n",
+    "## Process\n",
+    "\n",
+    "- Long-lived: created when parallel program started, killed when\n",
+    "  program is finished\n",
+    "- Explicit communication between processes\n",
+    "\n",
+    "</div>\n",
+    "<div class=\"column\">\n",
+    "\n",
+    "## Thread\n",
+    "\n",
+    "- Short-lived: created when entering a parallel region, destroyed\n",
+    "  (joined) when region ends\n",
+    "- Communication through shared memory\n",
+    "\n",
+    "</div>\n",
+    "\n",
+    "# Processes and threads\n",
+    "\n",
+    "![](../docs/img/processes-threads.png)\n",
+    "\n",
+    "<div class=\"column\">\n",
+    "\n",
+    "## Process\n",
+    "\n",
+    "- MPI\n",
+    "    - good performance\n",
+    "    - scales from a laptop to a supercomputer\n",
+    "\n",
+    "</div>\n",
+    "<div class=\"column\">\n",
+    "\n",
+    "## Thread\n",
+    "\n",
+    "- OpenMP\n",
+    "    - C / Fortran, not Python\n",
+    "- threading module\n",
+    "    - only for I/O bound tasks (maybe)\n",
+    "    - Global Interpreter Lock (GIL) limits usability\n",
+    "\n",
+    "</div>\n",
+    "\n",
+    "\n",
+    "# Processes and threads\n",
+    "\n",
+    "![](../docs/img/processes-threads.png)\n",
+    "\n",
+    "<div class=\"column\">\n",
+    "\n",
+    "## Process\n",
+    "\n",
+    "- MPI\n",
+    "    - good performance\n",
+    "    - scales from a laptop to a supercomputer\n",
+    "\n",
+    "</div>\n",
+    "<div class=\"column\">\n",
+    "\n",
+    "## ~~Thread~~ Process\n",
+    "\n",
+    "- multiprocessing module\n",
+    "    - relies on OS for forking worker processes that mimic threads\n",
+    "    - limited communication between the parallel processes\n",
+    "\n",
+    "</div>\n",
+    "\n",
+    "\n",
+    "# Multiprocessing\n",
+    "\n",
+    "- Underlying OS used to spawn new independent subprocesses\n",
+    "- processes are independent and execute code in an asynchronous manner\n",
+    "    - no guarantee on the order of execution\n",
+    "- Communication possible only through dedicated, shared communication\n",
+    "  channels\n",
+    "    - Queues, Pipes\n",
+    "    - must be created before a new process is forked\n",
+    "\n",
+    "\n",
+    "# Spawn a process\n",
+    "\n",
+    "```python\n",
+    "from multiprocessing import Process\n",
+    "import os\n",
+    "\n",
+    "def hello(name):\n",
+    "    print 'Hello', name\n",
+    "    print 'My PID is', os.getpid()\n",
+    "    print \"My parent's PID is\", os.getppid()\n",
+    "\n",
+    "# Create a new process\n",
+    "p = Process(target=hello, args=('Alice', ))\n",
+    "\n",
+    "# Start the process\n",
+    "p.start()\n",
+    "print 'Spawned a new process from PID', os.getpid()\n",
+    "\n",
+    "# End the process\n",
+    "p.join()\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Communication\n",
+    "\n",
+    "- Sharing data\n",
+    "    - shared memory, data manager\n",
+    "- Pipes\n",
+    "    - direct communication between two processes\n",
+    "- Queues\n",
+    "    - work sharing among a group of processes\n",
+    "- Pool of workers\n",
+    "    - offloading tasks to a group of worker processes\n",
+    "\n",
+    "\n",
+    "# Queues\n",
+    "\n",
+    "- FIFO (*first-in-first-out*) task queues that can be used to distribute\n",
+    "  work among processes\n",
+    "- Shared among all processes\n",
+    "    - all processes can add and retrieve data from the queue\n",
+    "- Automatically takes care of locking, so can be used safely with minimal\n",
+    "  hassle\n",
+    "\n",
+    "\n",
+    "# Queues\n",
+    "\n",
+    "```python\n",
+    "from multiprocessing import Process, Queue\n",
+    "\n",
+    "def f(q):\n",
+    "    while True:\n",
+    "        x = q.get()\n",
+    "        if x is None:\n",
+    "            break\n",
+    "        print(x**2)\n",
+    "\n",
+    "q = Queue()\n",
+    "for i in range(100):\n",
+    "    q.put(i)\n",
+    "# task queue: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ..., 99]\n",
+    "\n",
+    "for i in range(3):\n",
+    "    q.put(None)\n",
+    "    p = Process(target=f, args=(q, ))\n",
+    "    p.start()\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Queues\n",
+    "\n",
+    "```python\n",
+    "from multiprocessing import Process, Queue\n",
+    "\n",
+    "def f(q):\n",
+    "    while True:\n",
+    "        x = q.get()\n",
+    "        if x is None: # if sentinel, stop execution\n",
+    "            break\n",
+    "        print(x**2)\n",
+    "\n",
+    "q = Queue()\n",
+    "for i in range(100):\n",
+    "    q.put(i)\n",
+    "# task queue: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ..., 99]\n",
+    "\n",
+    "for i in range(3):\n",
+    "    q.put(None) # add sentinels to the queue to signal STOP\n",
+    "    p = Process(target=f, args=(q, ))\n",
+    "    p.start()\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Pool of workers\n",
+    "\n",
+    "- Group of processes that carry out tasks assigned to them\n",
+    "    1. Master process submits tasks to the pool\n",
+    "    2. Pool of worker processes perform the tasks\n",
+    "    3. Master process retrieves the results from the pool\n",
+    "- Blocking and non-blocking (= asynchronous) calls available\n",
+    "\n",
+    "\n",
+    "# Pool of workers\n",
+    "\n",
+    "```python\n",
+    "from multiprocessing import Pool\n",
+    "import time\n",
+    "\n",
+    "def f(x):\n",
+    "    return x**2\n",
+    "\n",
+    "pool = Pool(8)\n",
+    "\n",
+    "# Blocking execution (with a single process)\n",
+    "result = pool.apply(f, (4,))\n",
+    "print(result)\n",
+    "\n",
+    "# Non-blocking execution \"in the background\"\n",
+    "result = pool.apply_async(f, (12,))\n",
+    "while not result.ready():\n",
+    "    time.sleep(1)\n",
+    "print(result.get())\n",
+    "# an alternative to \"sleeping\" is to use e.g. result.get(timeout=1)\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Pool of workers\n",
+    "\n",
+    "```python\n",
+    "from multiprocessing import Pool\n",
+    "import time\n",
+    "\n",
+    "def f(x):\n",
+    "    return x**2\n",
+    "\n",
+    "pool = Pool(8)\n",
+    "\n",
+    "# calculate x**2 in parallel for x in 0..9\n",
+    "result = pool.map(f, range(10))\n",
+    "print(result)\n",
+    "\n",
+    "# non-blocking alternative\n",
+    "result = pool.map_async(f, range(10))\n",
+    "while not result.ready():\n",
+    "    time.sleep(1)\n",
+    "print(result.get())\n",
+    "```\n",
+    "\n",
+    "\n",
+    "# Summary\n",
+    "\n",
+    "- Parallelism achieved by launching new OS processes\n",
+    "- Only limited communication possible\n",
+    "    - work sharing: queues / pool of workers\n",
+    "- Non-blocking execution available\n",
+    "    - do something else while waiting for results\n",
+    "- Further information:\n",
+    "  https://docs.python.org/2/library/multiprocessing.html\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ceedfa5c",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/numpy/01-numpy.md b/numpy/01-numpy.md
new file mode 100644
index 0000000..89c0c3b
--- /dev/null
+++ b/numpy/01-numpy.md
@@ -0,0 +1,576 @@
+---
+title:  Numpy basics
+lang:   en
+---
+
+# Numpy – fast array interface
+
+- Standard Python is not well suitable for numerical computations
+    - lists are very flexible but also slow to process in numerical
+      computations
+
+- Numpy adds a new **array** data type
+    - static, multidimensional
+    - fast processing of arrays
+    - tools for linear algebra, random numbers, *etc.*
+
+
+# Numpy arrays
+
+- All elements of an array have the same type
+- Array can have multiple dimensions
+- The number of elements in the array is fixed, shape can be changed
+
+
+# Python list vs. NumPy array
+
+![](img/list-vs-array.svg)
+
+
+# Creating numpy arrays
+
+From a list:
+```python
+>>> import numpy
+>>> a = numpy.array((1, 2, 3, 4), float)
+>>> a
+array([ 1., 2., 3., 4.])
+
+>>> list1 = [[1, 2, 3], [4,5,6]]
+>>> mat = numpy.array(list1, complex)
+>>> mat
+array([[ 1.+0.j, 2.+0.j, 3.+0.j],
+       [ 4.+0.j, 5.+0.j, 6.+0.j]])
+
+>>> mat.shape
+(2, 3)
+
+>>> mat.size
+6
+```
+
+
+# Helper functions for creating arrays: 1
+
+- `arange` and `linspace` can generate ranges of numbers:
+
+```python
+>>> a = numpy.arange(10)
+>>> a
+array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
+
+>>> b = numpy.arange(0.1, 0.2, 0.02)
+>>> b
+array([0.1 , 0.12, 0.14, 0.16, 0.18])
+
+>>> c = numpy.linspace(-4.5, 4.5, 5)
+>>> c
+array([-4.5 , -2.25, 0. , 2.25, 4.5 ])
+```
+
+# Helper functions for creating arrays: 2
+
+- array with given shape initialized to `zeros`, `ones` or arbitrary
+  value (`full`):
+
+```python
+>>> a = numpy.zeros((4, 6), float)
+>>> a.shape
+(4, 6)
+
+>>> b = numpy.ones((2, 4))
+>>> b
+array([[ 1., 1., 1., 1.],
+       [ 1., 1., 1., 1.]])
+	   
+>>> c = numpy.full((2, 3), 4.2)
+>>> c
+array([[4.2, 4.2, 4.2],
+       [4.2, 4.2, 4.2]])
+```
+
+- Empty array (no values assigned) with `empty`
+
+# Helper functions for creating arrays: 3
+
+- Similar arrays as an existing one with `zeros_like`, `ones_like`, 
+  `full_like` and `empty_like`:
+
+```python
+>>> a = numpy.zeros((4, 6), float)
+>>> b = numpy.empty_like(a)
+>>> c = numpy.ones_like(a)
+>>> d = numpy.full_like(a, 9.1)
+```
+
+# Non-numeric data
+
+- NumPy supports also storing non-numerical data e.g. strings (largest
+  element determines the item size)
+
+```python
+>>> a = numpy.array(['foo', 'foo-bar'])
+>>> a
+array(['foo', 'foo-bar'], dtype='|U7')
+```
+
+- Character arrays can, however, be sometimes useful
+
+```python
+>>> dna = 'AAAGTCTGAC'
+>>> a = numpy.array(dna, dtype='c')
+>>> a
+array([b'A', b'A', b'A', b'G', b'T', b'C', b'T', b'G', b'A', b'C'],
+      dtype='|S1')
+```
+
+
+# Accessing arrays
+
+<div class="column">
+- Simple indexing:
+
+```python
+>>> mat = numpy.array([[1, 2, 3], [4, 5, 6]])
+>>> mat[0,2]
+3
+
+>>> mat[1,-2]
+5
+```
+</div>
+
+<div class="column">
+- Slicing:
+
+```python
+>>> a = numpy.arange(10)
+>>> a[2:]
+array([2, 3, 4, 5, 6, 7, 8, 9])
+
+>>> a[:-1]
+array([0, 1, 2, 3, 4, 5, 6, 7, 8])
+
+>>> a[1:3] = -1
+>>> a
+array([0, -1, -1, 3, 4, 5, 6, 7, 8, 9])
+
+>>> a[1:7:2]
+array([1, 3, 5])
+```
+</div>
+
+# Slicing of arrays in multiple dimensions
+
+- Multidimensional arrays can be sliced along multiple dimensions
+- Values can be assigned to only part of the array
+```python
+>>> a = numpy.zeros((4, 4))
+>>> a[1:3, 1:3] = 2.0
+>>> a
+array([[ 0., 0., 0., 0.],
+       [ 0., 2., 2., 0.],
+       [ 0., 2., 2., 0.],
+       [ 0., 0., 0., 0.]])
+```
+
+
+# Views and copies of arrays
+
+- Simple assignment creates references to arrays
+- Slicing creates "views" to the arrays
+- Use `copy()` for real copying of arrays
+
+```python
+a = numpy.arange(10)
+b = a              # reference, changing values in b changes a
+b = a.copy()       # true copy
+
+c = a[1:4]         # view, changing c changes elements [1:4] of a
+c = a[1:4].copy()  # true copy of subarray
+```
+
+
+# Array manipulation
+
+- `reshape` : change the shape of array
+
+```python
+>>> mat = numpy.array([[1, 2, 3], [4, 5, 6]])
+>>> mat
+array([[1, 2, 3],
+       [4, 5, 6]])
+
+>>> mat.reshape(3,2)
+array([[1, 2],
+       [3, 4],
+       [5, 6]])
+```
+
+- `ravel` : flatten array to 1-d
+
+```python
+>>> mat.ravel()
+array([1, 2, 3, 4, 5, 6])
+```
+
+
+# Array manipulation
+
+- `concatenate` : join arrays together
+
+```python
+>>> mat1 = numpy.array([[1, 2, 3], [4, 5, 6]])
+>>> mat2 = numpy.array([[7, 8, 9], [10, 11, 12]])
+>>> numpy.concatenate((mat1, mat2))
+array([[ 1, 2, 3],
+       [ 4, 5, 6],
+       [ 7, 8, 9],
+       [10, 11, 12]])
+
+>>> numpy.concatenate((mat1, mat2), axis=1)
+array([[ 1, 2, 3,  7,  8,  9],
+       [ 4, 5, 6, 10, 11, 12]])
+```
+
+- `split` : split array to N pieces
+
+```python
+>>> numpy.split(mat1, 3, axis=1)
+[array([[1], [4]]), array([[2], [5]]), array([[3], [6]])]
+```
+
+
+# Array operations
+
+- Most operations for numpy arrays are done element-wise
+  (`+`, `-`,  `*`,  `/`,  `**`)
+
+```python
+>>> a = numpy.array([1.0, 2.0, 3.0])
+>>> b = 2.0
+>>> a * b
+array([ 2., 4., 6.])
+
+>>> a + b
+array([ 3., 4., 5.])
+
+>>> a * a
+array([ 1., 4., 9.])
+```
+
+# Array operations
+
+- Numpy has special functions which can work with array arguments
+  (sin, cos, exp, sqrt, log, ...)
+
+```python
+>>> import numpy, math
+>>> a = numpy.linspace(-math.pi, math.pi, 8)
+>>> a
+array([-3.14159265, -2.24399475, -1.34639685, -0.44879895,
+        0.44879895,  1.34639685,  2.24399475,  3.14159265])
+
+>>> numpy.sin(a)
+array([ -1.22464680e-16, -7.81831482e-01, -9.74927912e-01,
+        -4.33883739e-01,  4.33883739e-01,  9.74927912e-01,
+         7.81831482e-01,  1.22464680e-16])
+
+>>> math.sin(a)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in ?
+TypeError: only length-1 arrays can be converted to Python scalars
+```
+
+
+# Numpy tools { .section }
+
+
+# I/O with Numpy
+
+- Numpy provides functions for reading data from file and for writing data
+  into the files
+- Simple text files
+    - `numpy.loadtxt`
+    - `numpy.savetxt`
+    - Data in regular column layout
+    - Can deal with comments and different column delimiters
+
+
+# Random numbers
+
+- The module `numpy.random` provides several functions for constructing
+  random arrays
+    - `random`: uniform random numbers
+    - `normal`: normal distribution
+    - `choice`: random sample from given array
+    - ...
+
+```python
+>>> import numpy.random as rnd
+>>> rnd.random((2,2))
+array([[ 0.02909142, 0.90848 ],
+       [ 0.9471314 , 0.31424393]])
+
+>>> rnd.choice(numpy.arange(4), 10)
+array([0, 1, 1, 2, 1, 1, 2, 0, 2, 3])
+```
+
+
+# Polynomials
+
+- Polynomial is defined by an array of coefficients p
+  $p(x, N) = p[0] x^{N-1} + p[1] x^{N-2} + ... + p[N-1]$
+- For example:
+    - Least square fitting: `numpy.polyfit`
+    - Evaluating polynomials: `numpy.polyval`
+    - Roots of polynomial: `numpy.roots`
+
+```python
+>>> x = numpy.linspace(-4, 4, 7)
+>>> y = x**2 + rnd.random(x.shape)
+>>>
+>>> p = numpy.polyfit(x, y, 2)
+>>> p
+array([ 0.96869003, -0.01157275, 0.69352514])
+```
+
+
+# Linear algebra
+
+- Numpy can calculate matrix and vector products efficiently: `dot`,
+  `vdot`, ...
+- Eigenproblems: `linalg.eig`, `linalg.eigvals`, ...
+- Linear systems and matrix inversion: `linalg.solve`, `linalg.inv`
+
+```python
+>>> A = numpy.array(((2, 1), (1, 3)))
+>>> B = numpy.array(((-2, 4.2), (4.2, 6)))
+>>> C = numpy.dot(A, B)
+>>>
+>>> b = numpy.array((1, 2))
+>>> numpy.linalg.solve(C, b) # solve C x = b
+array([ 0.04453441, 0.06882591])
+```
+
+
+# Linear algebra
+
+- Normally, NumPy utilises high performance libraries in linear algebra
+  operations
+- Example: matrix multiplication C = A * B matrix dimension 1000
+    - pure python:           522.30 s
+    - naive C:                 1.50 s
+    - numpy.dot:               0.04 s
+    - library call from C:     0.04 s
+
+
+# Numpy advanced topics { .section }
+
+
+# Anatomy of NumPy array
+
+- **ndarray** type is made of
+    - one dimensional contiguous block of memory (raw data)
+    - indexing scheme: how to locate an element
+    - data type descriptor: how to interpret an element
+
+![](img/ndarray-in-memory.svg){.center width=50%}
+
+
+# NumPy indexing
+
+- There are many possible ways of arranging items of N-dimensional
+  array in a 1-dimensional block
+- NumPy uses **striding** where N-dimensional index ($n_0, n_1, ..., n_{N-1}$)
+  corresponds to offset from the beginning of 1-dimensional block
+  
+$$
+offset = \sum_{k=0}^{N-1} s_k n_k, s_k \text{ is stride in dimension k}
+$$
+
+
+![](img/ndarray-in-memory-offset.svg){.center width=50%}
+
+# ndarray attributes
+
+`a = numpy.array(...)`
+  : `a.flags`
+    : various information about memory layout
+
+    `a.strides`
+    : bytes to step in each dimension when traversing
+
+    `a.itemsize`
+    : size of one array element in bytes
+
+    `a.data`
+    : Python buffer object pointing to start of arrays data
+
+    `a.__array_interface__`
+    : Python internal interface
+
+
+# Advanced indexing
+
+- Numpy arrays can be indexed also with other arrays (integer or
+  boolean)
+
+```python
+>>> x = numpy.arange(10,1,-1)
+>>> x
+array([10, 9, 8, 7, 6, 5, 4, 3, 2])
+
+>>> x[numpy.array([3, 3, 1, 8])]
+array([7, 7, 9, 2])
+```
+
+- Boolean "mask" arrays
+
+```python
+>>> m = x > 7
+>>> m
+array([ True, True, True, False, False, ...
+
+>>> x[m]
+array([10, 9, 8])
+```
+
+- Advanced indexing creates copies of arrays
+
+
+# Vectorized operations
+
+- `for` loops in Python are slow
+- Use "vectorized" operations when possible
+- Example: difference
+    - for loop is ~80 times slower!
+
+<div class="column">
+```python
+# brute force using a for loop
+arr = numpy.arange(1000)
+dif = numpy.zeros(999, int)
+for i in range(1, len(arr)):
+    dif[i-1] = arr[i] - arr[i-1]
+
+# vectorized operation
+arr = numpy.arange(1000)
+dif = arr[1:] - arr[:-1]
+```
+</div>
+
+<div class="column">
+![](img/vectorised-difference.svg){.center width=90%}
+</div>
+
+# Broadcasting
+
+- If array shapes are different, the smaller array may be broadcasted
+  into a larger shape
+
+```python
+>>> from numpy import array
+>>> a = array([[1,2],[3,4],[5,6]], float)
+>>> a
+array([[ 1., 2.],
+       [ 3., 4.],
+       [ 5., 6.]])
+
+>>> b = array([[7,11]], float)
+>>> b
+array([[ 7., 11.]])
+
+>>> a * b
+array([[ 7., 22.],
+       [ 21., 44.],
+       [ 35., 66.]])
+```
+
+
+# Broadcasting
+
+- Example: calculate distances from a given point
+
+```python
+# array containing 3d coordinates for 100 points
+points = numpy.random.random((100, 3))
+origin = numpy.array((1.0, 2.2, -2.2))
+dists = (points - origin)**2
+dists = numpy.sqrt(numpy.sum(dists, axis=1))
+
+# find the most distant point
+i = numpy.argmax(dists)
+print(points[i])
+```
+
+
+# Temporary arrays
+
+- In complex expressions, NumPy stores intermediate values in
+  temporary arrays
+- Memory consumption can be higher than expected
+
+```{.python emphasize=5:5-5:11,5:15-5:21}
+a = numpy.random.random((1024, 1024, 50))
+b = numpy.random.random((1024, 1024, 50))
+
+# two temporary arrays will be created
+c = 2.0 * a - 4.5 * b
+
+# three temporary arrays will be created due to unnecessary parenthesis
+c = (2.0 * a - 4.5 * b) + 1.1 * (numpy.sin(a) + numpy.cos(b))
+```
+
+
+# Temporary arrays
+
+- Broadcasting approaches can lead also to hidden temporary arrays
+- Example: pairwise distance of **M** points in 3 dimensions
+    - Input data is M x 3 array
+    - Output is M x M array containing the distance between points i
+      and j
+	- There is a temporary 1000 x 1000 x 3 array
+
+```{.python emphasize=2:17-2:44}
+X = numpy.random.random((1000, 3))
+D = numpy.sqrt(((X[:, numpy.newaxis, :] - X) ** 2).sum(axis=-1))
+```
+
+
+# Numexpr
+
+- Evaluation of complex expressions with one operation at a time can lead
+  also into suboptimal performance
+    - Effectively, one carries out multiple *for* loops in the NumPy
+      C-code
+
+- Numexpr package provides fast evaluation of array expressions
+
+```python
+import numexpr as ne
+x = numpy.random.random((1000000, 1))
+y = numpy.random.random((1000000, 1))
+poly = ne.evaluate("((.25*x + .75)*x - 1.5)*x - 2")
+```
+
+
+# Numexpr
+
+- By default, numexpr tries to use multiple threads
+- Number of threads can be queried and set with
+  `ne.set_num_threads(nthreads)`
+- Supported operators and functions:
+  +,-,\*,/,\*\*, sin, cos, tan, exp, log, sqrt
+- Speedups in comparison to NumPy are typically between 0.95 and 4
+- Works best on arrays that do not fit in CPU cache
+
+
+# Summary
+
+- Numpy provides a static array data structure
+- Multidimensional arrays
+- Fast mathematical operations for arrays
+- Tools for linear algebra and random numbers
+- Arrays can be broadcasted into same shapes
+- Expression evaluation can lead into temporary arrays
diff --git a/performance/cprofile.ipynb b/performance/cprofile.ipynb
new file mode 100644
index 0000000..363fcab
--- /dev/null
+++ b/performance/cprofile.ipynb
@@ -0,0 +1,6 @@
+{
+ "cells": [],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/requirements.txt b/requirements.txt
deleted file mode 100644
index 3013b0a..0000000
--- a/requirements.txt
+++ /dev/null
@@ -1,7 +0,0 @@
-numpy
-mpi4py 
-matplotlib==2.2.4
-cython
-cffi
-numexpr
-ipython==6.1