Rate this Page

torch.cuda.comm.reduce_add_coalesced#

torch.cuda.comm.reduce_add_coalesced(inputs, destination=None, buffer_size=10485760)[source]#

Sum tensors from multiple GPUs.

Small tensors are first coalesced into a buffer to reduce the number of synchronizations.

Parameters
  • inputs (Iterable[Iterable[Tensor]]) – iterable of iterables that contain tensors from a single device.

  • destination (int, optional) – a device on which the output will be placed (default: current device).

  • buffer_size (int) – maximum size of the buffer used for coalescing

Returns

A tuple of tensors containing an elementwise sum of each group of inputs, placed on the destination device.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources
Morty Proxy This is a proxified and sanitized view of the page, visit original site.