Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

file_descriptor sharing strategy may be leaking FDs, resulting in DataLoader causing RuntimeError: received 0 items of ancdata #973

Copy link
Copy link
@jfsantos

Description

@jfsantos
Issue body actions

Editorial note: If you are having this problem, try running torch.multiprocessing.set_sharing_strategy('file_system') right after your import of torch


I am using a DataLoader in my code with a custom Dataset class, and it worked fine during training for several epochs. However, when testing my model, after a bit less than 1k iterations, I'm getting the following error:

RuntimeError                              Traceback (most recent call last)
/home/jfsantos/src/pytorch_models/test_model.py in <module>()
     82
     83 print('Generating samples...')
---> 84 for k, batch in tqdm(enumerate(test_loader)):
     85     f = G_test.audio_paths[k]
     86     spec, phase = spectrogram_from_file(f, window=window, step=step)

/home/jfsantos/anaconda3/envs/pytorch/lib/python3.5/site-packages/tqdm/_tqdm.py in __iter__(self)
    831 """, fp_write=getattr(self.fp, 'write', sys.stderr.write))
    832
--> 833             for obj in iterable:
    834                 yield obj
    835                 # Update and print the progressbar.

/home/jfsantos/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/utils/data/dataloader.py in __next__(self)
    166         while True:
    167             assert (not self.shutdown and self.batches_outstanding > 0)
--> 168             idx, batch = self.data_queue.get()
    169             self.batches_outstanding -= 1
    170             if idx != self.rcvd_idx:

/home/jfsantos/anaconda3/envs/pytorch/lib/python3.5/multiprocessing/queues.py in get(self)
    343             res = self._reader.recv_bytes()
    344         # unserialize the data after having released the lock
--> 345         return ForkingPickler.loads(res)
    346
    347     def put(self, obj):

/home/jfsantos/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/multiprocessing/reductions.py in rebuild_storage_fd(cls, df, size)
     68         fd = multiprocessing.reduction.rebuild_handle(df)
     69     else:
---> 70         fd = df.detach()
     71     try:
     72         storage = storage_from_cache(cls, fd_id(fd))

/home/jfsantos/anaconda3/envs/pytorch/lib/python3.5/multiprocessing/resource_sharer.py in detach(self)
     56             '''Get the fd.  This should only be called once.'''
     57             with _resource_sharer.get_connection(self._id) as conn:
---> 58                 return reduction.recv_handle(conn)
     59
     60

/home/jfsantos/anaconda3/envs/pytorch/lib/python3.5/multiprocessing/reduction.py in recv_handle(conn)
    179         '''Receive a handle over a local connection.'''
    180         with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s:
--> 181             return recvfds(s, 1)[0]
    182
    183     def DupFd(fd):

/home/jfsantos/anaconda3/envs/pytorch/lib/python3.5/multiprocessing/reduction.py in recvfds(sock, size)
    158             if len(ancdata) != 1:
    159                 raise RuntimeError('received %d items of ancdata' %
--> 160                                    len(ancdata))
    161             cmsg_level, cmsg_type, cmsg_data = ancdata[0]
    162             if (cmsg_level == socket.SOL_SOCKET and

RuntimeError: received 0 items of ancdata

However, if I just do idxs = [k for k, batch in tqdm(enumerate(test_loader))] I do not have this issue.

I do not have any idea on how to test it as my knowledge of how PyTorch does this is currently very limited, but I could help debug this given some instructions. Does anyone have any idea on where I could start?

xiumingzhang, lopuhin, juanmed, isht7, zardadi and 21 more

Metadata

Metadata

Assignees

Labels

has workaroundhigh prioritymodule: crashProblem manifests as a hard crash, as opposed to a RuntimeErrorProblem manifests as a hard crash, as opposed to a RuntimeErrormodule: dataloaderRelated to torch.utils.data.DataLoader and SamplerRelated to torch.utils.data.DataLoader and SamplertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.