Understanding parallelism in Keras

Keras uses parallelism extensively in its codebase. Most tasks are embarrassingly parallel: data loading, data augmentation, and other preprocessing techniques are all easily parallelized.

Feeding process

To feed a Model, keras takes a batch of inputs from an infinite generator and feed it to the Model.

# Simple example
for batch in infinite_gen:
    model.train_on_batch(batch)
    ....

The generator can be anything, but right now we support two types: a Python generator and a keras.utils.Sequence generator.

Python generator

A generator is an object allowing you to iterate over the return value of a function. In Keras, we require that those generators are infinite. Example:

while True:
    for batch in my_batches:
       yield batch

To manipulate those generators, we use a keras.utils.GeneratorEnqueuer.

A GeneratorEnqueuer creates a pool of processes and sends the generator object to each process. These objects are NOT shareable so all processes get a copy of the original, this causes the results to be duplicated.

Then, the enqueuer asks each process for a new input. The result is store inside of a Queue. Keras can then pop the queue to get a new batch.

Sequences

A keras.utils.Sequence is a safer way of doing it. To manipulate those, we use a keras.utils.OrderedEnqueuer. This enqueuer is almost the same as the GeneratorEnqueuer, but it asks for a specific input to each process. This is feasible because keras.utils.Sequence is indexable. This way, we avoid duplicates.

Resources

keras-team

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understanding parallelism in Keras

Feeding process

Python generator

Sequences

Resources

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Search code, repositories, users, issues, pull requests...

Understanding parallelism in Keras

Feeding process

Python generator

Sequences

Resources

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally