"on fit_generator() / fit() and thread-safety" Code Answer

4

during my research on this i came across some information answering my questions.

note: as updated in the question in newer tensorflow/keras-versions (tf > 2) fit_generator() is deprecated. instead, it is recommended to use fit() with the generator. however, the answer still applies to fit() using a generator as well.


1. does keras emit this warning only because the generator is not inheriting sequences, or does keras also check if a generator is threadsafe in general?

taken from keras' gitrepo (training_generators.py) i found in lines 46-52 the following:

use_sequence_api = is_sequence(generator)
if not use_sequence_api and use_multiprocessing and workers > 1:
    warnings.warn(
        userwarning('using a generator with `use_multiprocessing=true`'
                    ' and multiple workers may duplicate your data.'
                    ' please consider using the `keras.utils.sequence'
                    ' class.'))

the definition of is_sequence() taken from training_utils.py in lines 624-635 is:

def is_sequence(seq):
    """determine if an object follows the sequence api.
    # arguments
        seq: a possible sequence object
    # returns
        boolean, whether the object follows the sequence api.
    """
    # todo dref360: decide which pattern to follow. first needs a new tf version.
    return (getattr(seq, 'use_sequence_api', false)
            or set(dir(sequence())).issubset(set(dir(seq) + ['use_sequence_api'])))

regarding this piece of code keras only checks if a passed generator is a keras-sequence (or rather uses keras' sequence api) and does not check if a generator is threadsafe in general.


2. is using the approach i choosed as threadsafe as using the generatorclass(sequence)-version from the keras-docs?

as omer zohar has shown on github his decorator is threadsafe - i don't see any reason why it shouldn't be as threadsafe for keras (even though keras will warn as shown in 1.). the implementation of thread.lock() can be concidered as threadsafe according to the docs:

a factory function that returns a new primitive lock object. once a thread has acquired it, subsequent attempts to acquire it block, until it is released; any thread may release it.

the generator is also picklable, which can be tested like (see this so-q&a here for further information):

#dump yielded data in order to check if picklable
with open("test.pickle", "wb") as outfile:
    for yielded_data in generator(data):
        pickle.dump(yielded_data, outfile, protocol=pickle.highest_protocol)

resuming this, i would even suggest to implement thread.lock() when you extend keras' sequence() like:

import threading

class generatorclass(sequence):

    def __init__(self, x_set, y_set, batch_size):
        self.x, self.y = x_set, y_set
        self.batch_size = batch_size
        self.lock = threading.lock()   #set self.lock

    def __len__(self):
        return int(np.ceil(len(self.x) / float(self.batch_size)))

    def __getitem__(self, idx):
        with self.lock:                #use self.lock
            batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
            batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]

            return ...

edit 24/04/2020:

by using self.lock = threading.lock() you might run into the following error:

typeerror: can't pickle _thread.lock objects

in case this happens try to replace with self.lock: inside __getitem__ with with threading.lock(): and comment out / delete the self.lock = threading.lock() inside the __init__.

it seems there are some problems when storing the lock-object inside a class (see for example this q&a).


3. are there any other approaches leading to a thread-safe-generator keras can deal with which are different from these two examples?

during my research i did not encounter any other method. of course i cannot say this with 100% certainty.

By lander16 on October 14 2022

Answers related to “on fit_generator() / fit() and thread-safety”

Only authorized users can answer the Search term. Please sign in first, or register a free account.