Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Understanding parallelism in Keras

Frédéric Branchaud-Charron edited this page Jun 29, 2018 · 1 revision

Keras uses parallelism extensively in its codebase. Most tasks are embarrassingly parallel: data loading, data augmentation, and other preprocessing techniques are all easily parallelized.

Feeding process

To feed a Model, keras takes a batch of inputs from an infinite generator and feed it to the Model.

# Simple example
for batch in infinite_gen:
    model.train_on_batch(batch)
    ....

The generator can be anything, but right now we support two types: a Python generator and a keras.utils.Sequence generator.

Python generator

A generator is an object allowing you to iterate over the return value of a function. In Keras, we require that those generators are infinite. Example:

while True:
    for batch in my_batches:
       yield batch

To manipulate those generators, we use a keras.utils.GeneratorEnqueuer.

A GeneratorEnqueuer creates a pool of processes and sends the generator object to each process. These objects are NOT shareable so all processes get a copy of the original, this causes the results to be duplicated.

Then, the enqueuer asks each process for a new input. The result is store inside of a Queue. Keras can then pop the queue to get a new batch.

Sequences

A keras.utils.Sequence is a safer way of doing it. To manipulate those, we use a keras.utils.OrderedEnqueuer. This enqueuer is almost the same as the GeneratorEnqueuer, but it asks for a specific input to each process. This is feasible because keras.utils.Sequence is indexable. This way, we avoid duplicates.

Resources

Clone this wiki locally
Morty Proxy This is a proxified and sanitized view of the page, visit original site.