Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Releases: NVIDIA-Merlin/distributed-embeddings

v23.06.00 (v1.0)

13 Jul 06:14
Compare
Choose a tag to compare
Loading

What’s Changed

New Features

  • Added support for row-slicing as a parallel strategy
  • Added support for data-parallel as a parallel strategy
  • Allow mix-matching data-parallel, table-parallel, row-slicing, and column-slicing. Refer to User Guide for more details.
  • Added IntegerLookup layer that supports on-the-fly vocabulary building, on both CPU and GPU

Breaking Changes

  • Added NVIDIA cuCollections as submodule for GPU hash map support
  • Now support TensorFlow 2.12. Note that this change breaks the build with TF 2.09 and earlier.

Improvements

  • Improved package import

Bug Fixes

  • fixes input offset overflow due to automatic table concatenating
  • fixes potential graph mismatching problems in broadcast

Full Changelog: v23.03.00...v23.06.00

v23.03.00

19 Apr 13:13
Compare
Choose a tag to compare
Loading

What’s Changed

New Features

  • NVIDIA Hopper™ architecture families support (compute capability 9.0).
  • Added support for Keras Model fit api.
  • Added support for Horovod callbacks in case of hybrid data/model parallel.

Breaking Changes

  • Now support TensorFlow 2.12. Note that this change breaks build with TF 2.09 and earlier.
  • Now require horovod version 0.27 or later.

Improvements

  • Improved unit tests

Bug Fixes

  • Use tf.shape for graph mode support by @edknv in #6

New Contributors

  • @edknv made their first contribution in #6

Full Changelog: v0.3...v23.03.00

v0.3

13 Feb 06:13
Compare
Choose a tag to compare
Loading

What’s Changed

New Features

  • CUDA 12 support
  • Automatic concatenation of multiple embedding tables for greatly improved speed
  • Support model parallel with user-defined custom keras layer through DistributedEmbedding wrapper

Improvements

  • Support cases where number of workers is greater than number of tables.
  • For corner cases where diffrerent slices of a table are placed onto same worker, they will be merged into single slice now.

Breaking Changes

  • move submodule from CUB to NVIDIA Thrust for better compatibilities

Bug Fixes

  • Better error handling in set_weight() when weights are not initialized
  • Better error handling when global batchsize is not divisible by number of workers

Full Changelog: v0.2...v0.3

v0.2

09 Feb 08:06
Compare
Choose a tag to compare
Loading

What’s Changed

Breaking Changes

New Features

  • SparseTensor is supported as embedding input, in addition to Dense and Ragged Tensor.
  • Add support and example for keras model.fit() api through custom train_step() function

Improvements

  • Improved embedding lookup speed when input is multi-hot with combiner.
  • Improved embedding lookup speed when input is one-hot, regardless of its combiner and format(Tensor, SparseTensor or RaggedTensor)
  • Add support for data parallel input, cpu embedding and TF native embedding api as options in benchmark

Bug Fixes

  • fix build with tensorflow 2.10+
  • fix a bug where batch dimension could be None at early stage in graph mode

Full Changelog: v0.1...v0.2

v0.1

09 Feb 07:40
Compare
Choose a tag to compare
Loading
v0.1 Pre-release
Pre-release

Initial release

Morty Proxy This is a proxified and sanitized view of the page, visit original site.