Changelog History
Page 1
-
v2.4.0-rc4 Changes
December 04, 2020π Release 2.4.0
Major Features and Improvements
π
tf.distributeintroduces experimental support for asynchronous training of Keras models via thetf.distribute.experimental.ParameterServerStrategyAPI. Please see below for additional details.π
MultiWorkerMirroredStrategyis now a stable API and is no longer considered experimental. Some of the major improvements involve handling peer failure and many bug fixes. Please check out the detailed tutorial on Multi-worker training with Keras.π Introduces experimental support for a new module named
tf.experimental.numpywhich is a NumPy-compatible API for writing TF programs. See the detailed guide to learn more. Additional details below.β Adds Support for
π TensorFloat-32 on Ampere based GPUs. TensorFloat-32, or TF32 for short, is a math mode for NVIDIA Ampere based GPUs and is enabled by default.π A major refactoring of the internals of the Keras Functional API has been completed, that should improve the reliability, stability, and performance of constructing Functional models.
Keras mixed precision API
tf.keras.mixed_precisionis no longer experimental and allows the use of 16-bit floating point formats during training, improving performance by up to 3x on GPUs and 60% on TPUs. Please see below for additional details.π· TensorFlow Profiler now supports profiling
MultiWorkerMirroredStrategyand tracing multiple workers using the sampling mode API.TFLite Profiler for Android is available. See the detailed guide to learn more.
π¦ TensorFlow pip packages are now built with CUDA11 and cuDNN 8.0.2.
π₯ Breaking Changes
TF Core:
- Certain float32 ops run in lower precision on Ampere based GPUs, including matmuls and convolutions, due to the use of TensorFloat-32. Specifically, inputs to such ops are rounded from 23 bits of precision to 10
bits of precision. This is unlikely to cause issues in practice for deep learning models. In some cases, TensorFloat-32 is also used for complex64 ops.
TensorFloat-32 can be disabled by runningtf.config.experimental.enable_tensor_float_32_execution(False). - The byte layout for string tensors across the C-API has been updated to match TF Core/C++; i.e., a contiguous array of
tensorflow::tstring/TF_TStrings. - C-API functions
TF_StringDecode,TF_StringEncode, andTF_StringEncodedSizeare no longer relevant and have been removed; seecore/platform/ctstring.hfor string access/modification in C. tensorflow.python,tensorflow.coreandtensorflow.compilermodules are now hidden. These modules are not part of TensorFlow public API.tf.raw_ops.Maxandtf.raw_ops.Minno longer accept inputs of typetf.complex64ortf.complex128, because the behavior of these ops is not well defined for complex types.
- XLA:CPU and XLA:GPU devices are no longer registered by default. Use
TF_XLA_FLAGS=--tf_xla_enable_xla_devicesif you really need them, but this flag will eventually be removed in subsequent releases.tf.keras:- The
steps_per_executionargument incompile()is no longer experimental; if you were passingexperimental_steps_per_execution, rename it tosteps_per_executionin your code. This argument controls the number of batches to run during eachtf.functioncall when callingfit(). Running multiple batches inside a singletf.functioncall can greatly improve performance on TPUs or small models with a large Python overhead. - A major refactoring of the internals of the Keras Functional API may affect code that
is relying on certain internal details:- Code that uses
isinstance(x, tf.Tensor)instead oftf.is_tensorwhen checking Keras symbolic inputs/outputs should switch to usingtf.is_tensor. - Code that is overly dependent on the exact names attached to symbolic tensors (e.g. assumes there will be ":0" at the end of the inputs, treats names as unique identifiers instead of using
tensor.ref(), etc.) - Code that uses
get_concrete_functionto trace Keras symbolic inputs directly should switch to building matchingtf.TensorSpecs directly and tracing theTensorSpecobjects. - Code that relies on the exact number and names of the op layers that TensorFlow operations were converted into may have changed.
- Code that uses
tf.map_fn/tf.cond/tf.while_loop/control flow as op layers and happens to work before TF 2.4. These will explicitly be unsupported now. Converting these ops to Functional API op layers was unreliable before TF 2.4, and prone to erroring incomprehensibly or being silently buggy. - Code that directly asserts on a Keras symbolic value in cases where ops like
tf.rankused to return a static or symbolic value depending on if the input had a fully static shape or not. Now these ops always return symbolic values. - Code already susceptible to leaking tensors outside of graphs becomes slightly more likely to do so now.
- Code that tries directly getting gradients with respect to symbolic Keras inputs/outputs. Use GradientTape on the actual Tensors passed to the already- constructed model instead.
- Code that requires very tricky shape manipulation via converted op layers in order to work, where the Keras symbolic shape inference proves insufficient.
- Code that tries manually walking a
tf.keras.Modellayer by layer and assumes layers only ever have one positional argument. This assumption doesn't hold true before TF 2.4 either, but is more likely to cause issues now. - Code that manually enters
keras.backend.get_graph()before building a functional model is no longer needed. - Start enforcing input shape assumptions when calling Functional API Keras models. This may potentially break some users, in case there is a mismatch between the shape used when creating
Inputobjects in a Functional model, and the shape of the data passed to that model. You can fix this mismatch by either calling the model with correctly-shaped data, or by relaxingInputshape assumptions (note that you can pass shapes withNoneentries for axes
that are meant to be dynamic). You can also disable the input checking entirely by settingmodel.input_spec = None.
- Code that uses
- Serveral changes have been made to
tf.keras.mixed_precision.experimental. Note that it is now recommended to use the non-experimentaltf.keras.mixed_precisionAPI.AutoCastVariable.dtypenow refers to the actual variable dtype, not the dtype it will be casted to.- When mixed precision is enabled,
tf.keras.layers.Embeddingnow outputs a float16 or bfloat16 tensor instead of a float32 tensor. - The property
tf.keras.mixed_precision.experimental.LossScaleOptimizer.loss_scaleis
β‘οΈ now a tensor, not aLossScaleobject. This means to get a loss scale of aLossScaleOptimizeras a tensor, you must now call
opt.loss_scaleinstead ofopt.loss_scale(). - The property
should_cast_variableshas been removed fromtf.keras.mixed_precision.experimental.Policy - When passing a
tf.mixed_precision.experimental.DynamicLossScaleto
β‘οΈtf.keras.mixed_precision.experimental.LossScaleOptimizer, theDynamicLossScale's multiplier must be 2. - When passing a
tf.mixed_precision.experimental.DynamicLossScaletotf.keras.mixed_precision.experimental.LossScaleOptimizer,
β‘οΈ the weights of theDynanmicLossScaleare copied into theLossScaleOptimizerinstead of being reused. This means modifying the
β‘οΈ weights of theDynamicLossScalewill no longer affect the weights of the LossScaleOptimizer, and vice versa. - The global policy can no longer be set to a non-floating point policy in
tf.keras.mixed_precision.experimental.set_policy - In
Layer.call,AutoCastVariables will no longer be casted withinMirroredStrategy.runorReplicaContext.merge_call. This is
because a thread local variable is used to determine whetherAutoCastVariables are casted, and those two functions run with a
different thread. Note this only applies if one of these two functions is called withinLayer.call; if one of those two functions callsLayer.call,AutoCastVariables will still be casted.
tf.data:tf.data.experimental.service.DispatchServernow takes a config tuple instead of individual arguments. Usages should be updated totf.data.experimental.service.DispatchServer(dispatcher_config).
-
tf.data.experimental.service.WorkerServernow takes a config tuple instead of individual arguments. Usages should be updated totf.data.experimental.service.WorkerServer(worker_config).tf.distribute:- Removes
tf.distribute.Strategy.experimental_make_numpy_dataset. Please usetf.data.Dataset.from_tensor_slicesinstead. - Renames
experimental_hintsintf.distribute.StrategyExtended.reduce_to,tf.distribute.StrategyExtended.batch_reduce_to,tf.distribute.ReplicaContext.all_reducetooptions: - Renames
tf.distribute.experimental.CollectiveHintstotf.distribute.experimental.CommunicationOptions. - Renames
tf.distribute.experimental.CollectiveCommunicationtotf.distribute.experimental.CommunicationImplementation. - Renames
tf.distribute.Strategy.experimental_distribute_datasets_from_functiontodistribute_datasets_from_functionas it is no longer experimental.
- Removes
tf.distribute.Strategy.experimental_run_v2method, which was deprecated in TF 2.2.tf.lite:tf.quantization.quantize_and_dequantize_v2has been introduced, which updates the gradient definition for quantization which is outside the range
to be 0. To simulate the V1 the behavior oftf.quantization.quantize_and_dequantize(...)usetf.grad_pass_through(tf.quantization.quantize_and_dequantize_v2)(...).
π Bug Fixes and Other Changes
TF Core:
- π Introduces experimental support for a new module named
tf.experimental.numpy, which
is a NumPy-compatible API for writing TF programs. This module provides classndarray, which mimics thendarrayclass in NumPy, and wraps an immutabletf.Tensorunder the hood. A subset of NumPy functions (e.g.numpy.add) are provided. Their inter-operation with TF facilities is seamless in most cases.
See tensorflow/python/ops/numpy_ops/README.md
π for details of what operations are supported and what are the differences from NumPy. tf.types.experimental.TensorLikeis a newUniontype that can be used as type annotation for variables representing a Tensor or a value
that can be converted to Tensor bytf.convert_to_tensor.- Calling ops with a python constants or numpy values is now consistent with tf.convert_to_tensor behavior. This avoids operations like
tf.reshape truncating inputs such as from int64 to int32. - β Adds
tf.sparse.map_valuesto apply a function to the.values ofSparseTensorarguments. - The Python bitwise operators for
Tensor(__and__,__or__,__xor__and__invert__now support non-boolarguments and apply
π the corresponding bitwise ops.boolarguments continue to be supported and dispatch to logical ops. This brings them more in line with
Python and NumPy behavior. - β Adds
tf.SparseTensor.with_values. This returns a new SparseTensor with the same sparsity pattern, but with new provided values. It is
similar to thewith_valuesfunction ofRaggedTensor. - β Adds
StatelessCaseop, and uses it if none of case branches has stateful ops. - Adds
tf.config.experimental.get_memory_usageto return total memory usage of the device. - β Adds gradients for
RaggedTensorToVariantandRaggedTensorFromVariant. - π Improve shape inference of nested function calls by supporting constant folding across Arg nodes which makes more static values available to shape inference functions.
tf.debugging:tf.debugging.assert_shapes()now works onSparseTensors (Fixes #36268).
- GPU
- Adds Support for TensorFloat-32 on Ampere based GPUs.
TensorFloat-32, or TF32 for short, is a math mode for NVIDIA Ampere based GPUs which causes certain float32 ops, such as matrix
multiplications and convolutions, to run much faster on Ampere GPUs but with reduced precision. This reduced precision has not been found
to effect convergence quality of deep learning models in practice. TensorFloat-32 is enabled by default, but can be disabled withtf.config.experimental.enable_tensor_float_32_execution.
- Adds Support for TensorFloat-32 on Ampere based GPUs.
tf.math:- Adds
tf.math.erfcinv, the inverse totf.math.erfc.
- Adds
tf.nn:tf.nn.max_pool2dnow supports explicit padding.
tf.image:- Adds deterministic
tf.image.stateless_random_*functions for eachtf.image.random_*function. Added a new opstateless_sample_distorted_bounding_boxwhich is a deterministic version ofsample_distorted_bounding_boxop. Given the same seed, these stateless functions/ops produce the same results independent of how many times the function is called, and independent of global seed settings. - Adds deterministic
tf.image.resizebackprop CUDA kernels formethod=ResizeMethod.BILINEAR(the default method). Enable by setting the environment variableTF_DETERMINISTIC_OPSto"true"or"1".
- Adds deterministic
- π¨
tf.print:- Bug fix in
tf.print()withOrderedDictwhere if anOrderedDictdidn't have the keys sorted, the keys and values were not being printed
in accordance with their correct mapping.
- Bug fix in
tf.train.Checkpoint:- Now accepts a
rootargument in the initialization, which generates a checkpoint with a root object. This allows users to create aCheckpointobject that is compatible with Kerasmodel.save_weights()andmodel.load_weights. The checkpoint is also compatible with the checkpoint saved in thevariables/folder in the SavedModel. - When restoring,
save_pathcan be a path to a SavedModel. The function will automatically find the checkpoint in the SavedModel.
- Now accepts a
tf.data:- Adds new
tf.data.experimental.service.register_datasetandtf.data.experimental.service.from_dataset_idAPIs to enable one
π¨ process to register a dataset with the tf.data service, and another process to consume data from the dataset. - β Adds support for dispatcher fault tolerance. To enable fault tolerance, configure a
work_dirwhen running your dispatcher server and set
dispatcher_fault_tolerance=True. The dispatcher will store its state towork_dir, so that on restart it can continue from its previous
state after restart. - β Adds support for sharing dataset graphs via shared filesystem instead of over RPC. This reduces load on the dispatcher, improving performance
π· of distributing datasets. For this to work, the dispatcher'swork_dirmust be accessible from workers. If the worker fails to read from the
work_dir, it falls back to using RPC for dataset graph transfer. - β Adds support for a new "distributed_epoch" processing mode. This processing mode distributes a dataset across all tf.data workers,
π instead of having each worker process the full dataset. See the tf.data service docs to learn more. - Adds optional
exclude_colsparameter to CsvDataset. This parameter is the complement ofselect_cols; at most one of these should be specified. - We have implemented an optimization which reorders data-discarding transformations such as
takeandshardto happen earlier in the dataset when it is safe to do so. The optimization can be disabled via theexperimental_optimization.reorder_data_discarding_opsdataset option. tf.data.Optionswere previously immutable and can now be overridden.- π
tf.data.Dataset.from_generatornow supports Ragged and Sparse tensors with a newoutput_signatureargument, which allowsfrom_generatorto
produce any type describable by atf.TypeSpec. tf.data.experimental.AUTOTUNEis now available in the core API astf.data.AUTOTUNE.
tf.distribute:- π Introduces experimental support for asynchronous training of Keras models via
tf.distribute.experimental.ParameterServerStrategy:- Replaces the existing
tf.distribute.experimental.ParameterServerStrategysymbol with a new class that is for parameter server training in TF2. Usage of
the old symbol, usually with Estimator API, should be replaced with [tf.compat.v1.distribute.experimental.ParameterServerStrategy]. - Added
tf.distribute.experimental.coordinator.*namespace, including the main APIClusterCoordinatorfor coordinating the training cluster, the related data structureRemoteValueandPerWorkerValue.
- Replaces the existing
- β Adds
tf.distribute.Strategy.gatherandtf.distribute.ReplicaContext.all_gatherAPIs to support gathering dense distributed values. - π Fixes various issues with saving a distributed model.
tf.keras:- π Improvements from the Functional API refactoring:
- Functional model construction does not need to maintain a global workspace graph, removing memory leaks especially when building many
models or very large models. - Functional model construction should be ~8-10% faster on average.
- Functional models can now contain non-symbolic values in their call inputs inside of the first positional argument.
- Several classes of TF ops that were not reliably converted to Keras layers during functional API construction should now work, e.g.
tf.image.ssim_multiscale - Error messages when Functional API construction goes wrong (and when ops cannot be converted to Keras layers automatically) should be
clearer and easier to understand.
- Functional model construction does not need to maintain a global workspace graph, removing memory leaks especially when building many
- β‘οΈ
Optimizer.minimizecan now accept a lossTensorand aGradientTapeas an alternative to accepting acallableloss. - β Adds
betahyperparameter to FTRL optimizer classes (Keras and others) to match FTRL paper. Optimizer. __init__now accepts agradient_aggregatorto allow for customization of how gradients are aggregated across devices, as well as
gradients_transformersto allow for custom gradient transformations (such as gradient clipping).- π Improvements to Keras preprocessing layers:
- TextVectorization can now accept a vocabulary list or file as an init arg.
- Normalization can now accept mean and variance values as init args.
- In
AttentionandAdditiveAttentionlayers, thecall()method now accepts areturn_attention_scoresargument. When set to
True, the layer returns the attention scores as an additional output argument. - β Adds
tf.metrics.log_coshandtf.metrics.logcoshAPI entrypoints with the same implementation as theirtf.lossesequivalent. - For Keras model, the individual call of
Model.evaluateuses no cached data for evaluation, whileModel.fituses cached data when
πvalidation_dataarg is provided for better performance. - Adds a
save_tracesargument tomodel.save/tf.keras.models.save_modelwhich determines whether the SavedModel format stores the Keras model/layer call functions. The traced functions allow Keras to revive custom models and layers without the original class definition, but if this isn't required the tracing can be disabled with the added option. - The
tf.keras.mixed_precisionAPI is non non-experimental. The
non-experimental API differs from the experimental API in several ways.tf.keras.mixed_precision.Policyno longer takes in a
tf.mixed_precision.experimental.LossScalein the constructor, and no
longer has aLossScaleassociated with it. Instead,Model.compile
β‘οΈ will automatically wrap the optimizer with aLossScaleOptimizerusing
dynamic loss scaling ifPolicy.nameis "mixed_float16".tf.keras.mixed_precision.LossScaleOptimizer's constructor takes in
different arguments. In particular, it no longer takes in aLossScale,
and there is no longer aLossScaleassociated with the
β‘οΈLossScaleOptimizer. Instead,LossScaleOptimizerdirectly implements
π fixed or dynamic loss scaling. See the documentation of
β‘οΈtf.keras.mixed_precision.experimental.LossScaleOptimizer
for details on the differences between the experimental
β‘οΈLossScaleOptimizerand the new non-experimentalLossScaleOptimizer.tf.mixed_precision.experimental.LossScaleand its subclasses are
π deprecated, as all of its functionality now exists within
β‘οΈtf.keras.mixed_precision.LossScaleOptimizer
tf.lite:TFLiteConverter:- Support optional flags
inference_input_typeandinference_output_typefor full integer quantized models. This allows users to modify the model input and output type to integer types (tf.int8,tf.uint8) instead of defaulting to float type (tf.float32).
- Support optional flags
- NNAPI
- Adds NNAPI Delegation support for requantization use cases by converting the operation into a dequantize-quantize pair.
- Removes deprecated
Interpreter.setUseNNAPI(boolean)Java API. UseInterpreter.Options.setUseNNAPIinstead. - Deprecates
Interpreter::UseNNAPI(bool)C++ API. UseNnApiDelegate()and related delegate configuration methods directly. - Deprecates
Interpreter::SetAllowFp16PrecisionForFp32(bool)C++ API. Prefer controlling this via delegate options, e.g.tflite::StatefulNnApiDelegate::Options::allow_fp16' orTfLiteGpuDelegateOptionsV2::is_precision_loss_allowed`.
- GPU
- GPU acceleration now supports quantized models by default
DynamicBuffer::AddJoinedString()will now add a separator if the first string to be joined is empty.- β Adds support for cumulative sum (cumsum), both as builtin op and MLIR conversion.
TensorRT- Issues a warning when the
session_configparameter for the TF1 converter is used or therewrite_config_templatefield in the TF2
converter parameter object is used.
TPU Enhancements:
- β Adds support for the
betaparameter of the FTRL optimizer for TPU embeddings. Users of other TensorFlow platforms can implement equivalent
behavior by adjusting thel2parameter.
π XLA Support:
- π xla.experimental.compile is deprecated, use
tf.function(experimental_compile=True)instead. - Adds
tf.function.experimental_get_compiler_irwhich returns compiler IR (currently 'hlo' and 'optimized_hlo') for given input for given function.
π Security:
- π Fixes an undefined behavior causing a segfault in
tf.raw_ops.Switch, (CVE-2020-15190) - π Fixes three vulnerabilities in conversion to DLPack format
- π Fixes two vulnerabilities in
SparseFillEmptyRowsGrad - π Fixes several vulnerabilities in
RaggedCountSparseOutputandSparseCountSparseOutputoperations - π Fixes an integer truncation vulnerability in code using the work sharder API, (CVE-2020-15202)
- π Fixes a format string vulnerability in
tf.strings.as_string, (CVE-2020-15203) - π Fixes segfault raised by calling session-only ops in eager mode, (CVE-2020-15204)
- π Fixes data leak and potential ASLR violation from
tf.raw_ops.StringNGrams, (CVE-2020-15205) - π Fixes segfaults caused by incomplete
SavedModelvalidation, (CVE-2020-15206) - π Fixes a data corruption due to a bug in negative indexing support in TFLite, (CVE-2020-15207)
- π Fixes a data corruption due to dimension mismatch in TFLite, (CVE-2020-15208)
- π Fixes several vulnerabilities in TFLite saved model format
- π Fixes several vulnerabilities in TFLite implementation of segment sum
- Fixes a segfault in
tf.quantization.quantize_and_dequantize, (CVE-2020-15265) - π Fixes an undefined behavior float cast causing a crash, (CVE-2020-15266)
Other:
- π We have replaced uses of "whitelist" and "blacklist" with "allowlist" and "denylist" where possible. Please see this list for more context.
- Adds
tf.config.experimental.mlir_bridge_rolloutwhich will help us rollout the new MLIR TPU bridge. - Adds
tf.experimental.register_filesystem_pluginto load modular filesystem plugins from Python
Thanks to our Contributors
π This release contains contributions from many people at Google and external contributors.
8bitmp3, aaa.jq, Abhineet Choudhary, Abolfazl Shahbazi, acxz, Adam Hillier, Adrian Garcia Badaracco, Ag Ramesh, ahmedsabie, Alan Anderson, Alexander Grund, Alexandre Lissy, Alexey Ivanov, Amedeo Cavallo, anencore94, Aniket Kumar Singh, Anthony Platanios, Ashwin Phadke, Balint Cristian, Basit Ayantunde, bbbboom, Ben Barsdell, Benjamin Chetioui, Benjamin Peterson, bhack, Bhanu Prakash Bandaru Venkata, Biagio Montaruli, Brent M. Spell, bubblebooy, bzhao, cfRod, Cheng Chen, Cheng(Kit) Chen, Chris Tessum, Christian, chuanqiw, codeadmin_peritiae, COTASPAR, CuiYifeng, danielknobe, danielyou0230, dannyfriar, daria, DarrenZhang01, Denisa Roberts, dependabot[bot], Deven Desai, Dmitry Volodin, Dmitry Zakharov, drebain, Duncan Riach, Eduard Feicho, Ehsan Toosi, Elena Zhelezina, emlaprise2358, Eugene Kuznetsov, Evaderan-Lab, Evgeniy Polyakov, Fausto Morales, Felix Johnny, fo40225, Frederic Bastien, Fredrik Knutsson, fsx950223, Gaurav Singh, Gauri1 Deshpande, George Grzegorz Pawelczak, gerbauz, Gianluca Baratti, Giorgio Arena, Gmc2, Guozhong Zhuang, Hannes Achleitner, Harirai, HarisWang, Harsh188, hedgehog91, Hemal Mamtora, Hideto Ueno, Hugh Ku, Ian Beauregard, Ilya Persky, jacco, Jakub BerΓ‘nek, Jan Jongboom, Javier Montalt Tordera, Jens Elofsson, Jerry Shih, jerryyin, jgehw, Jinjing Zhou, jma, jmsmdy, Johan NordstrΓΆm, John Poole, Jonah Kohn, Jonathan Dekhtiar, jpodivin, Jung Daun, Kai Katsumata, Kaixi Hou, Kamil Rakoczy, Kaustubh Maske Patil, Kazuaki Ishizaki, Kedar Sovani, Koan-Sin Tan, Koki Ibukuro, Krzysztof Laskowski, Kushagra Sharma, Kushan Ahmadian, Lakshay Tokas, Leicong Li, levinxo, Lukas Geiger, Maderator, Mahmoud Abuzaina, Mao Yunfei, Marius Brehler, markf, Martin Hwasser, Martin KubovΔΓk, Matt Conley, Matthias, mazharul, mdfaijul, Michael137, MichelBr, Mikhail Startsev, Milan Straka, Ml-0, Myung-Hyun Kim, MΓ₯ns Nilsson, Nathan Luehr, ngc92, nikochiko, Niranjan Hasabnis, nyagato_00, Oceania2018, Oleg Guba, Ongun Kanat, OscarVanL, Patrik Laurell, Paul Tanger, Peter Sobot, Phil Pearl, PlusPlusUltra, Poedator, Prasad Nikam, Rahul-Kamat, Rajeshwar Reddy T, redwrasse, Rickard, Robert Szczepanski, Rohan Lekhwani, Sam Holt, Sami Kama, Samuel Holt, Sandeep Giri, sboshin, Sean Settle, settle, Sharada Shiddibhavi, Shawn Presser, ShengYang1, Shi,Guangyong, Shuxiang Gao, Sicong Li, Sidong-Wei, Srihari Humbarwadi, Srinivasan Narayanamoorthy, Steenu Johnson, Steven Clarkson, stjohnso98, Tamas Bela Feher, Tamas Nyiri, Tarandeep Singh, Teng Lu, Thibaut Goetghebuer-Planchon, Tim Bradley, Tomasz Strejczek, Tongzhou Wang, Torsten Rudolf, Trent Lo, Ty Mick, Tzu-Wei Sung, Varghese, Jojimon, Vignesh Kothapalli, Vishakha Agrawal, Vividha, Vladimir Menshakov, Vladimir Silyaev, VoVAllen, VΓ΅ VΔn NghΔ©a, wondertx, xiaohong1031, Xiaoming (Jason) Cui, Xinan Jiang, Yair Ehrenwald, Yasir Modak, Yasuhiro Matsumoto, Yimei Sun, Yiwen Li, Yixing, Yoav Ramon, Yong Tang, Yong Wu, yuanbopeng, Yunmo Koo, Zhangqiang, Zhou Peng, ZhuBaohe, zilinzhu, zmx
- Certain float32 ops run in lower precision on Ampere based GPUs, including matmuls and convolutions, due to the use of TensorFloat-32. Specifically, inputs to such ops are rounded from 23 bits of precision to 10
-
v2.4.0-rc3 Changes
November 24, 2020π Release 2.4.0
Major Features and Improvements
π
tf.distributeintroduces experimental support for asynchronous training of Keras models via thetf.distribute.experimental.ParameterServerStrategyAPI. Please see below for additional details.π
MultiWorkerMirroredStrategyis now a stable API and is no longer considered experimental. Some of the major improvements involve handling peer failure and many bug fixes. Please check out the detailed tutorial on Multi-worker training with Keras.π Introduces experimental support for a new module named
tf.experimental.numpywhich is a NumPy-compatible API for writing TF programs. See the detailed guide to learn more. Additional details below.β Adds Support for
π TensorFloat-32 on Ampere based GPUs. TensorFloat-32, or TF32 for short, is a math mode for NVIDIA Ampere based GPUs and is enabled by default.π A major refactoring of the internals of the Keras Functional API has been completed, that should improve the reliability, stability, and performance of constructing Functional models.
Keras mixed precision API
tf.keras.mixed_precisionis no longer experimental and allows the use of 16-bit floating point formats during training, improving performance by up to 3x on GPUs and 60% on TPUs. Please see below for additional details.π· TensorFlow Profiler now supports profiling
MultiWorkerMirroredStrategyand tracing multiple workers using the sampling mode API.TFLite Profiler for Android is available. See the detailed guide to learn more.
π¦ TensorFlow pip packages are now built with CUDA11 and cuDNN 8.0.2.
π₯ Breaking Changes
TF Core:
- Certain float32 ops run in lower precsion on Ampere based GPUs, including matmuls and convolutions, due to the use of TensorFloat-32. Specifically, inputs to such ops are rounded from 23 bits of precision to 10
bits of precision. This is unlikely to cause issues in practice for deep learning models. In some cases, TensorFloat-32 is also used for complex64 ops.
TensorFloat-32 can be disabled by runningtf.config.experimental.enable_tensor_float_32_execution(False). - The byte layout for string tensors across the C-API has been updated to match TF Core/C++; i.e., a contiguous array of
tensorflow::tstring/TF_TStrings. - C-API functions
TF_StringDecode,TF_StringEncode, andTF_StringEncodedSizeare no longer relevant and have been removed; seecore/platform/ctstring.hfor string access/modification in C. tensorflow.python,tensorflow.coreandtensorflow.compilermodules are now hidden. These modules are not part of TensorFlow public API.tf.raw_ops.Maxandtf.raw_ops.Minno longer accept inputs of typetf.complex64ortf.complex128, because the behavior of these ops is not well defined for complex types.
- XLA:CPU and XLA:GPU devices are no longer registered by default. Use
TF_XLA_FLAGS=--tf_xla_enable_xla_devicesif you really need them, but this flag will eventually be removed in subsequent releases.tf.keras:- The
steps_per_executionargument incompile()is no longer experimental; if you were passingexperimental_steps_per_execution, rename it tosteps_per_executionin your code. This argument controls the number of batches to run during eachtf.functioncall when callingfit(). Running multiple batches inside a singletf.functioncall can greatly improve performance on TPUs or small models with a large Python overhead. - A major refactoring of the internals of the Keras Functional API may affect code that
is relying on certain internal details:- Code that uses
isinstance(x, tf.Tensor)instead oftf.is_tensorwhen checking Keras symbolic inputs/outputs should switch to usingtf.is_tensor. - Code that is overly dependent on the exact names attached to symbolic tensors (e.g. assumes there will be ":0" at the end of the inputs, treats names as unique identifiers instead of using
tensor.ref(), etc.) - Code that uses
get_concrete_functionto trace Keras symbolic inputs directly should switch to building matchingtf.TensorSpecs directly and tracing theTensorSpecobjects. - Code that relies on the exact number and names of the op layers that TensorFlow operations were converted into may have changed.
- Code that uses
tf.map_fn/tf.cond/tf.while_loop/control flow as op layers and happens to work before TF 2.4. These will explicitly be unsupported now. Converting these ops to Functional API op layers was unreliable before TF 2.4, and prone to erroring incomprehensibly or being silently buggy. - Code that directly asserts on a Keras symbolic value in cases where ops like
tf.rankused to return a static or symbolic value depending on if the input had a fully static shape or not. Now these ops always return symbolic values. - Code already susceptible to leaking tensors outside of graphs becomes slightly more likely to do so now.
- Code that tries directly getting gradients with respect to symbolic Keras inputs/outputs. Use GradientTape on the actual Tensors passed to the already- constructed model instead.
- Code that requires very tricky shape manipulation via converted op layers in order to work, where the Keras symbolic shape inference proves insufficient.
- Code that tries manually walking a
tf.keras.Modellayer by layer and assumes layers only ever have one positional argument. This assumption doesn't hold true before TF 2.4 either, but is more likely to cause issues now. - Code that manually enters
keras.backend.get_graph()before building a functional model is no longer needed. - Start enforcing input shape assumptions when calling Functional API Keras models. This may potentially break some users, in case there is a mismatch between the shape used when creating
Inputobjects in a Functional model, and the shape of the data passed to that model. You can fix this mismatch by either calling the model with correctly-shaped data, or by relaxingInputshape assumptions (note that you can pass shapes withNoneentries for axes
that are meant to be dynamic). You can also disable the input checking entirely by settingmodel.input_spec = None.
- Code that uses
- Serveral changes have been made to
tf.keras.mixed_precision.experimental. Note that it is now recommended to use the non-experimentaltf.keras.mixed_precisionAPI.AutoCastVariable.dtypenow refers to the actual variable dtype, not the dtype it will be casted to.- When mixed precision is enabled,
tf.keras.layers.Embeddingnow outputs a float16 or bfloat16 tensor instead of a float32 tensor. - The property
tf.keras.mixed_precision.experimental.LossScaleOptimizer.loss_scaleis
β‘οΈ now a tensor, not aLossScaleobject. This means to get a loss scale of aLossScaleOptimizeras a tensor, you must now call
opt.loss_scaleinstead ofopt.loss_scale(). - The property
should_cast_variableshas been removed fromtf.keras.mixed_precision.experimental.Policy - When passing a
tf.mixed_precision.experimental.DynamicLossScaleto
β‘οΈtf.keras.mixed_precision.experimental.LossScaleOptimizer, theDynamicLossScale's multiplier must be 2. - When passing a
tf.mixed_precision.experimental.DynamicLossScaletotf.keras.mixed_precision.experimental.LossScaleOptimizer,
β‘οΈ the weights of theDynanmicLossScaleare copied into theLossScaleOptimizerinstead of being reused. This means modifying the
β‘οΈ weights of theDynamicLossScalewill no longer affect the weights of the LossScaleOptimizer, and vice versa. - The global policy can no longer be set to a non-floating point policy in
tf.keras.mixed_precision.experimental.set_policy - In
Layer.call,AutoCastVariables will no longer be casted withinMirroredStrategy.runorReplicaContext.merge_call. This is
because a thread local variable is used to determine whetherAutoCastVariables are casted, and those two functions run with a
different thread. Note this only applies if one of these two functions is called withinLayer.call; if one of those two functions callsLayer.call,AutoCastVariables will still be casted.
tf.data:tf.data.experimental.service.DispatchServernow takes a config tuple instead of individual arguments. Usages should be updated totf.data.experimental.service.DispatchServer(dispatcher_config).
-
tf.data.experimental.service.WorkerServernow takes a config tuple instead of individual arguments. Usages should be updated totf.data.experimental.service.WorkerServer(worker_config).tf.distribute:- Removes
tf.distribute.Strategy.experimental_make_numpy_dataset. Please usetf.data.Dataset.from_tensor_slicesinstead. - Renames
experimental_hintsintf.distribute.StrategyExtended.reduce_to,tf.distribute.StrategyExtended.batch_reduce_to,tf.distribute.ReplicaContext.all_reducetooptions: - Renames
tf.distribute.experimental.CollectiveHintstotf.distribute.experimental.CommunicationOptions. - Renames
tf.distribute.experimental.CollectiveCommunicationtotf.distribute.experimental.CommunicationImplementation. - Renames
tf.distribute.Strategy.experimental_distribute_datasets_from_functiontodistribute_datasets_from_functionas it is no longer experimental.
- Removes
tf.distribute.Strategy.experimental_run_v2method, which was deprecated in TF 2.2.tf.lite:tf.quantization.quantize_and_dequantize_v2has been introduced, which updates the gradient definition for quantization which is outside the range
to be 0. To simulate the V1 the behavior oftf.quantization.quantize_and_dequantize(...)usetf.grad_pass_through(tf.quantization.quantize_and_dequantize_v2)(...).
π Bug Fixes and Other Changes
TF Core:
- π Introduces experimental support for a new module named
tf.experimental.numpy, which
is a NumPy-compatible API for writing TF programs. This module provides classndarray, which mimics thendarrayclass in NumPy, and wraps an immutabletf.Tensorunder the hood. A subset of NumPy functions (e.g.numpy.add) are provided. Their inter-operation with TF facilities is seamless in most cases.
See tensorflow/python/ops/numpy_ops/README.md
π for details of what operations are supported and what are the differences from NumPy. tf.types.experimental.TensorLikeis a newUniontype that can be used as type annotation for variables representing a Tensor or a value
that can be converted to Tensor bytf.convert_to_tensor.- Calling ops with a python constants or numpy values is now consistent with tf.convert_to_tensor behavior. This avoids operations like
tf.reshape truncating inputs such as from int64 to int32. - β Adds
tf.sparse.map_valuesto apply a function to the.values ofSparseTensorarguments. - The Python bitwise operators for
Tensor(__and__,__or__,__xor__and__invert__now support non-boolarguments and apply
π the corresponding bitwise ops.boolarguments continue to be supported and dispatch to logical ops. This brings them more in line with
Python and NumPy behavior. - β Adds
tf.SparseTensor.with_values. This returns a new SparseTensor with the same sparsity pattern, but with new provided values. It is
similar to thewith_valuesfunction ofRaggedTensor. - β Adds
StatelessCaseop, and uses it if none of case branches has stateful ops. - Adds
tf.config.experimental.get_memory_usageto return total memory usage of the device. - β Adds gradients for
RaggedTensorToVariantandRaggedTensorFromVariant. - π Improve shape inference of nested function calls by supporting constant folding across Arg nodes which makes more static values available to shape inference functions.
tf.debugging:tf.debugging.assert_shapes()now works onSparseTensors (Fixes #36268).
- GPU
- Adds Support for TensorFloat-32 on Ampere based GPUs.
TensorFloat-32, or TF32 for short, is a math mode for NVIDIA Ampere based GPUs which causes certain float32 ops, such as matrix
multiplications and convolutions, to run much faster on Ampere GPUs but with reduced precision. This reduced precision has not been found
to effect convergence quality of deep learning models in practice. TensorFloat-32 is enabled by default, but can be disabled withtf.config.experimental.enable_tensor_float_32_execution.
- Adds Support for TensorFloat-32 on Ampere based GPUs.
tf.math:- Adds
tf.math.erfcinv, the inverse totf.math.erfc.
- Adds
tf.nn:tf.nn.max_pool2dnow supports explicit padding.
tf.image:- Adds deterministic
tf.image.stateless_random_*functions for eachtf.image.random_*function. Added a new opstateless_sample_distorted_bounding_boxwhich is a deterministic version ofsample_distorted_bounding_boxop. Given the same seed, these stateless functions/ops produce the same results independent of how many times the function is called, and independent of global seed settings. - Adds deterministic
tf.image.resizebackprop CUDA kernels formethod=ResizeMethod.BILINEAR(the default method). Enable by setting the environment variableTF_DETERMINISTIC_OPSto"true"or"1".
- Adds deterministic
- π¨
tf.print:- Bug fix in
tf.print()withOrderedDictwhere if anOrderedDictdidn't have the keys sorted, the keys and values were not being printed
in accordance with their correct mapping.
- Bug fix in
tf.train.Checkpoint:- Now accepts a
rootargument in the initialization, which generates a checkpoint with a root object. This allows users to create aCheckpointobject that is compatible with Kerasmodel.save_weights()andmodel.load_weights. The checkpoint is also compatible with the checkpoint saved in thevariables/folder in the SavedModel. - When restoring,
save_pathcan be a path to a SavedModel. The function will automatically find the checkpoint in the SavedModel.
- Now accepts a
tf.data:- Adds new
tf.data.experimental.service.register_datasetandtf.data.experimental.service.from_dataset_idAPIs to enable one
π¨ process to register a dataset with the tf.data service, and another process to consume data from the dataset. - β Adds support for dispatcher fault tolerance. To enable fault tolerance, configure a
work_dirwhen running your dispatcher server and set
dispatcher_fault_tolerance=True. The dispatcher will store its state towork_dir, so that on restart it can continue from its previous
state after restart. - β Adds support for sharing dataset graphs via shared filesystem instead of over RPC. This reduces load on the dispatcher, improving performance
π· of distributing datasets. For this to work, the dispatcher'swork_dirmust be accessible from workers. If the worker fails to read from the
work_dir, it falls back to using RPC for dataset graph transfer. - β Adds support for a new "distributed_epoch" processing mode. This processing mode distributes a dataset across all tf.data workers,
π instead of having each worker process the full dataset. See the tf.data service docs to learn more. - Adds optional
exclude_colsparameter to CsvDataset. This parameter is the complement ofselect_cols; at most one of these should be specified. - We have implemented an optimization which reorders data-discarding transformations such as
takeandshardto happen earlier in the dataset when it is safe to do so. The optimization can be disabled via theexperimental_optimization.reorder_data_discarding_opsdataset option. tf.data.Optionswere previously immutable and can now be overridden.- π
tf.data.Dataset.from_generatornow supports Ragged and Sparse tensors with a newoutput_signatureargument, which allowsfrom_generatorto
produce any type describable by atf.TypeSpec. tf.data.experimental.AUTOTUNEis now available in the core API astf.data.AUTOTUNE.
tf.distribute:- π Introduces experimental support for asynchronous training of Keras models via
tf.distribute.experimental.ParameterServerStrategy:- Replaces the existing
tf.distribute.experimental.ParameterServerStrategysymbol with a new class that is for parameter server training in TF2. Usage of
the old symbol, usually with Estimator API, should be replaced with [tf.compat.v1.distribute.experimental.ParameterServerStrategy]. - Added
tf.distribute.experimental.coordinator.*namespace, including the main APIClusterCoordinatorfor coordinating the training cluster, the related data structureRemoteValueandPerWorkerValue.
- Replaces the existing
- β Adds
tf.distribute.Strategy.gatherandtf.distribute.ReplicaContext.all_gatherAPIs to support gathering dense distributed values. - π Fixes various issues with saving a distributed model.
tf.keras:- π Improvements from the Functional API refactoring:
- Functional model construction does not need to maintain a global workspace graph, removing memory leaks especially when building many
models or very large models. - Functional model construction should be ~8-10% faster on average.
- Functional models can now contain non-symbolic values in their call inputs inside of the first positional argument.
- Several classes of TF ops that were not reliably converted to Keras layers during functional API construction should now work, e.g.
tf.image.ssim_multiscale - Error messages when Functional API construction goes wrong (and when ops cannot be converted to Keras layers automatically) should be
clearer and easier to understand.
- Functional model construction does not need to maintain a global workspace graph, removing memory leaks especially when building many
- β‘οΈ
Optimizer.minimizecan now accept a lossTensorand aGradientTapeas an alternative to accepting acallableloss. - β Adds
betahyperparameter to FTRL optimizer classes (Keras and others) to match FTRL paper. Optimizer. __init__now accepts agradient_aggregatorto allow for customization of how gradients are aggregated across devices, as well as
gradients_transformersto allow for custom gradient transformations (such as gradient clipping).- π Improvements to Keras preprocessing layers:
- TextVectorization can now accept a vocabulary list or file as an init arg.
- Normalization can now accept mean and variance values as init args.
- In
AttentionandAdditiveAttentionlayers, thecall()method now accepts areturn_attention_scoresargument. When set to
True, the layer returns the attention scores as an additional output argument. - β Adds
tf.metrics.log_coshandtf.metrics.logcoshAPI entrypoints with the same implementation as theirtf.lossesequivalent. - For Keras model, the individual call of
Model.evaluateuses no cached data for evaluation, whileModel.fituses cached data when
πvalidation_dataarg is provided for better performance. - Adds a
save_tracesargument tomodel.save/tf.keras.models.save_modelwhich determines whether the SavedModel format stores the Keras model/layer call functions. The traced functions allow Keras to revive custom models and layers without the original class definition, but if this isn't required the tracing can be disabled with the added option. - The
tf.keras.mixed_precisionAPI is non non-experimental. The
non-experimental API differs from the experimental API in several ways.tf.keras.mixed_precision.Policyno longer takes in a
tf.mixed_precision.experimental.LossScalein the constructor, and no
longer has aLossScaleassociated with it. Instead,Model.compile
β‘οΈ will automatically wrap the optimizer with aLossScaleOptimizerusing
dynamic loss scaling ifPolicy.nameis "mixed_float16".tf.keras.mixed_precision.LossScaleOptimizer's constructor takes in
different arguments. In particular, it no longer takes in aLossScale,
and there is no longer aLossScaleassociated with the
β‘οΈLossScaleOptimizer. Instead,LossScaleOptimizerdirectly implements
π fixed or dynamic loss scaling. See the documentation of
β‘οΈtf.keras.mixed_precision.experimental.LossScaleOptimizer
for details on the differences between the experimental
β‘οΈLossScaleOptimizerand the new non-experimentalLossScaleOptimizer.tf.mixed_precision.experimental.LossScaleand its subclasses are
π deprecated, as all of its functionality now exists within
β‘οΈtf.keras.mixed_precision.LossScaleOptimizer
tf.lite:TFLiteConverter:- Support optional flags
inference_input_typeandinference_output_typefor full integer quantized models. This allows users to modify the model input and output type to integer types (tf.int8,tf.uint8) instead of defaulting to float type (tf.float32).
- Support optional flags
- NNAPI
- Adds NNAPI Delegation support for requantization use cases by converting the operation into a dequantize-quantize pair.
- Removes deprecated
Interpreter.setUseNNAPI(boolean)Java API. UseInterpreter.Options.setUseNNAPIinstead. - Deprecates
Interpreter::UseNNAPI(bool)C++ API. UseNnApiDelegate()and related delegate configuration methods directly. - Deprecates
Interpreter::SetAllowFp16PrecisionForFp32(bool)C++ API. Prefer controlling this via delegate options, e.g.tflite::StatefulNnApiDelegate::Options::allow_fp16' orTfLiteGpuDelegateOptionsV2::is_precision_loss_allowed`.
- GPU
- GPU acceleration now supports quantized models by default
DynamicBuffer::AddJoinedString()will now add a separator if the first string to be joined is empty.- β Adds support for cumulative sum (cumsum), both as builtin op and MLIR conversion.
TensorRT- Issues a warning when the
session_configparameter for the TF1 converter is used or therewrite_config_templatefield in the TF2
converter parameter object is used.
TPU Enhancements:
- β Adds support for the
betaparameter of the FTRL optimizer for TPU embeddings. Users of other TensorFlow platforms can implement equivalent
behavior by adjusting thel2parameter.
π XLA Support:
- π xla.experimental.compile is deprecated, use
tf.function(experimental_compile=True)instead. - Adds
tf.function.experimental_get_compiler_irwhich returns compiler IR (currently 'hlo' and 'optimized_hlo') for given input for given function.
π Security:
- π Fixes an undefined behavior causing a segfault in
tf.raw_ops.Switch, (CVE-2020-15190) - π Fixes three vulnerabilities in conversion to DLPack format
- π Fixes two vulnerabilities in
SparseFillEmptyRowsGrad - π Fixes several vulnerabilities in
RaggedCountSparseOutputandSparseCountSparseOutputoperations - π Fixes an integer truncation vulnerability in code using the work sharder API, (CVE-2020-15202)
- π Fixes a format string vulnerability in
tf.strings.as_string, (CVE-2020-15203) - π Fixes segfault raised by calling session-only ops in eager mode, (CVE-2020-15204)
- π Fixes data leak and potential ASLR violation from
tf.raw_ops.StringNGrams, (CVE-2020-15205) - π Fixes segfaults caused by incomplete
SavedModelvalidation, (CVE-2020-15206) - π Fixes a data corruption due to a bug in negative indexing support in TFLite, (CVE-2020-15207)
- π Fixes a data corruption due to dimension mismatch in TFLite, (CVE-2020-15208)
- π Fixes several vulnerabilities in TFLite saved model format
- π Fixes several vulnerabilities in TFLite implementation of segment sum
- Fixes a segfault in
tf.quantization.quantize_and_dequantize, (CVE-2020-15265) - π Fixes an undefined behavior float cast causing a crash, (CVE-2020-15266)
Other:
- π We have replaced uses of "whitelist" and "blacklist" with "allowlist" and "denylist" where possible. Please see this list for more context.
- Adds
tf.config.experimental.mlir_bridge_rolloutwhich will help us rollout the new MLIR TPU bridge. - Adds
tf.experimental.register_filesystem_pluginto load modular filesystem plugins from Python
Thanks to our Contributors
π This release contains contributions from many people at Google and external contributors.
8bitmp3, aaa.jq, Abhineet Choudhary, Abolfazl Shahbazi, acxz, Adam Hillier, Adrian Garcia Badaracco, Ag Ramesh, ahmedsabie, Alan Anderson, Alexander Grund, Alexandre Lissy, Alexey Ivanov, Amedeo Cavallo, anencore94, Aniket Kumar Singh, Anthony Platanios, Ashwin Phadke, Balint Cristian, Basit Ayantunde, bbbboom, Ben Barsdell, Benjamin Chetioui, Benjamin Peterson, bhack, Bhanu Prakash Bandaru Venkata, Biagio Montaruli, Brent M. Spell, bubblebooy, bzhao, cfRod, Cheng Chen, Cheng(Kit) Chen, Chris Tessum, Christian, chuanqiw, codeadmin_peritiae, COTASPAR, CuiYifeng, danielknobe, danielyou0230, dannyfriar, daria, DarrenZhang01, Denisa Roberts, dependabot[bot], Deven Desai, Dmitry Volodin, Dmitry Zakharov, drebain, Duncan Riach, Eduard Feicho, Ehsan Toosi, Elena Zhelezina, emlaprise2358, Eugene Kuznetsov, Evaderan-Lab, Evgeniy Polyakov, Fausto Morales, Felix Johnny, fo40225, Frederic Bastien, Fredrik Knutsson, fsx950223, Gaurav Singh, Gauri1 Deshpande, George Grzegorz Pawelczak, gerbauz, Gianluca Baratti, Giorgio Arena, Gmc2, Guozhong Zhuang, Hannes Achleitner, Harirai, HarisWang, Harsh188, hedgehog91, Hemal Mamtora, Hideto Ueno, Hugh Ku, Ian Beauregard, Ilya Persky, jacco, Jakub BerΓ‘nek, Jan Jongboom, Javier Montalt Tordera, Jens Elofsson, Jerry Shih, jerryyin, jgehw, Jinjing Zhou, jma, jmsmdy, Johan NordstrΓΆm, John Poole, Jonah Kohn, Jonathan Dekhtiar, jpodivin, Jung Daun, Kai Katsumata, Kaixi Hou, Kamil Rakoczy, Kaustubh Maske Patil, Kazuaki Ishizaki, Kedar Sovani, Koan-Sin Tan, Koki Ibukuro, Krzysztof Laskowski, Kushagra Sharma, Kushan Ahmadian, Lakshay Tokas, Leicong Li, levinxo, Lukas Geiger, Maderator, Mahmoud Abuzaina, Mao Yunfei, Marius Brehler, markf, Martin Hwasser, Martin KubovΔΓk, Matt Conley, Matthias, mazharul, mdfaijul, Michael137, MichelBr, Mikhail Startsev, Milan Straka, Ml-0, Myung-Hyun Kim, MΓ₯ns Nilsson, Nathan Luehr, ngc92, nikochiko, Niranjan Hasabnis, nyagato_00, Oceania2018, Oleg Guba, Ongun Kanat, OscarVanL, Patrik Laurell, Paul Tanger, Peter Sobot, Phil Pearl, PlusPlusUltra, Poedator, Prasad Nikam, Rahul-Kamat, Rajeshwar Reddy T, redwrasse, Rickard, Robert Szczepanski, Rohan Lekhwani, Sam Holt, Sami Kama, Samuel Holt, Sandeep Giri, sboshin, Sean Settle, settle, Sharada Shiddibhavi, Shawn Presser, ShengYang1, Shi,Guangyong, Shuxiang Gao, Sicong Li, Sidong-Wei, Srihari Humbarwadi, Srinivasan Narayanamoorthy, Steenu Johnson, Steven Clarkson, stjohnso98, Tamas Bela Feher, Tamas Nyiri, Tarandeep Singh, Teng Lu, Thibaut Goetghebuer-Planchon, Tim Bradley, Tomasz Strejczek, Tongzhou Wang, Torsten Rudolf, Trent Lo, Ty Mick, Tzu-Wei Sung, Varghese, Jojimon, Vignesh Kothapalli, Vishakha Agrawal, Vividha, Vladimir Menshakov, Vladimir Silyaev, VoVAllen, VΓ΅ VΔn NghΔ©a, wondertx, xiaohong1031, Xiaoming (Jason) Cui, Xinan Jiang, Yair Ehrenwald, Yasir Modak, Yasuhiro Matsumoto, Yimei Sun, Yiwen Li, Yixing, Yoav Ramon, Yong Tang, Yong Wu, yuanbopeng, Yunmo Koo, Zhangqiang, Zhou Peng, ZhuBaohe, zilinzhu, zmx
- Certain float32 ops run in lower precsion on Ampere based GPUs, including matmuls and convolutions, due to the use of TensorFloat-32. Specifically, inputs to such ops are rounded from 23 bits of precision to 10
-
v2.4.0-rc2 Changes
November 18, 2020π Release 2.4.0
Major Features and Improvements
π
tf.distributeintroduces experimental support for asynchronous training of Keras models via thetf.distribute.experimental.ParameterServerStrategyAPI. Please see below for additional details.π
MultiWorkerMirroredStrategyis now a stable API and is no longer considered experimental. Some of the major improvements involve handling peer failure and many bug fixes. Please check out the detailed tutorial on Multi-worker training with Keras.π Introduces experimental support for a new module named
tf.experimental.numpywhich is a NumPy-compatible API for writing TF programs. See the detailed guide to learn more. Additional details below.β Adds Support for
π TensorFloat-32 on Ampere based GPUs. TensorFloat-32, or TF32 for short, is a math mode for NVIDIA Ampere based GPUs and is enabled by default.π A major refactoring of the internals of the Keras Functional API has been completed, that should improve the reliability, stability, and performance of constructing Functional models.
Keras mixed precision API
tf.keras.mixed_precisionis no longer experimental and allows the use of 16-bit floating point formats during training, improving performance by up to 3x on GPUs and 60% on TPUs. Please see below for additional details.π· TensorFlow Profiler now supports profiling
MultiWorkerMirroredStrategyand tracing multiple workers using the sampling mode API.TFLite Profiler for Android is available. See the detailed guide to learn more.
π¦ TensorFlow pip packages are now built with CUDA11 and cuDNN 8.0.2.
π₯ Breaking Changes
TF Core:
- Certain float32 ops run in lower precsion on Ampere based GPUs, including matmuls and convolutions, due to the use of TensorFloat-32. Specifically, inputs to such ops are rounded from 23 bits of precision to 10
bits of precision. This is unlikely to cause issues in practice for deep learning models. In some cases, TensorFloat-32 is also used for complex64 ops.
TensorFloat-32 can be disabled by runningtf.config.experimental.enable_tensor_float_32_execution(False). - The byte layout for string tensors across the C-API has been updated to match TF Core/C++; i.e., a contiguous array of
tensorflow::tstring/TF_TStrings. - C-API functions
TF_StringDecode,TF_StringEncode, andTF_StringEncodedSizeare no longer relevant and have been removed; seecore/platform/ctstring.hfor string access/modification in C. tensorflow.python,tensorflow.coreandtensorflow.compilermodules are now hidden. These modules are not part of TensorFlow public API.tf.raw_ops.Maxandtf.raw_ops.Minno longer accept inputs of typetf.complex64ortf.complex128, because the behavior of these ops is not well defined for complex types.
- XLA:CPU and XLA:GPU devices are no longer registered by default. Use
TF_XLA_FLAGS=--tf_xla_enable_xla_devicesif you really need them, but this flag will eventually be removed in subsequent releases.tf.keras:- The
steps_per_executionargument incompile()is no longer experimental; if you were passingexperimental_steps_per_execution, rename it tosteps_per_executionin your code. This argument controls the number of batches to run during eachtf.functioncall when callingfit(). Running multiple batches inside a singletf.functioncall can greatly improve performance on TPUs or small models with a large Python overhead. - A major refactoring of the internals of the Keras Functional API may affect code that
is relying on certain internal details:- Code that uses
isinstance(x, tf.Tensor)instead oftf.is_tensorwhen checking Keras symbolic inputs/outputs should switch to usingtf.is_tensor. - Code that is overly dependent on the exact names attached to symbolic tensors (e.g. assumes there will be ":0" at the end of the inputs, treats names as unique identifiers instead of using
tensor.ref(), etc.) - Code that uses
get_concrete_functionto trace Keras symbolic inputs directly should switch to building matchingtf.TensorSpecs directly and tracing theTensorSpecobjects. - Code that relies on the exact number and names of the op layers that TensorFlow operations were converted into may have changed.
- Code that uses
tf.map_fn/tf.cond/tf.while_loop/control flow as op layers and happens to work before TF 2.4. These will explicitly be unsupported now. Converting these ops to Functional API op layers was unreliable before TF 2.4, and prone to erroring incomprehensibly or being silently buggy. - Code that directly asserts on a Keras symbolic value in cases where ops like
tf.rankused to return a static or symbolic value depending on if the input had a fully static shape or not. Now these ops always return symbolic values. - Code already susceptible to leaking tensors outside of graphs becomes slightly more likely to do so now.
- Code that tries directly getting gradients with respect to symbolic Keras inputs/outputs. Use GradientTape on the actual Tensors passed to the already- constructed model instead.
- Code that requires very tricky shape manipulation via converted op layers in order to work, where the Keras symbolic shape inference proves insufficient.
- Code that tries manually walking a
tf.keras.Modellayer by layer and assumes layers only ever have one positional argument. This assumption doesn't hold true before TF 2.4 either, but is more likely to cause issues now. - Code that manually enters
keras.backend.get_graph()before building a functional model is no longer needed. - Start enforcing input shape assumptions when calling Functional API Keras models. This may potentially break some users, in case there is a mismatch between the shape used when creating
Inputobjects in a Functional model, and the shape of the data passed to that model. You can fix this mismatch by either calling the model with correctly-shaped data, or by relaxingInputshape assumptions (note that you can pass shapes withNoneentries for axes
that are meant to be dynamic). You can also disable the input checking entirely by settingmodel.input_spec = None.
- Code that uses
- Serveral changes have been made to
tf.keras.mixed_precision.experimental. Note that it is now recommended to use the non-experimentaltf.keras.mixed_precisionAPI.AutoCastVariable.dtypenow refers to the actual variable dtype, not the dtype it will be casted to.- When mixed precision is enabled,
tf.keras.layers.Embeddingnow outputs a float16 or bfloat16 tensor instead of a float32 tensor. - The property
tf.keras.mixed_precision.experimental.LossScaleOptimizer.loss_scaleis
β‘οΈ now a tensor, not aLossScaleobject. This means to get a loss scale of aLossScaleOptimizeras a tensor, you must now call
opt.loss_scaleinstead ofopt.loss_scale(). - The property
should_cast_variableshas been removed fromtf.keras.mixed_precision.experimental.Policy - When passing a
tf.mixed_precision.experimental.DynamicLossScaleto
β‘οΈtf.keras.mixed_precision.experimental.LossScaleOptimizer, theDynamicLossScale's multiplier must be 2. - When passing a
tf.mixed_precision.experimental.DynamicLossScaletotf.keras.mixed_precision.experimental.LossScaleOptimizer,
β‘οΈ the weights of theDynanmicLossScaleare copied into theLossScaleOptimizerinstead of being reused. This means modifying the
β‘οΈ weights of theDynamicLossScalewill no longer affect the weights of the LossScaleOptimizer, and vice versa. - The global policy can no longer be set to a non-floating point policy in
tf.keras.mixed_precision.experimental.set_policy - In
Layer.call,AutoCastVariables will no longer be casted withinMirroredStrategy.runorReplicaContext.merge_call. This is
because a thread local variable is used to determine whetherAutoCastVariables are casted, and those two functions run with a
different thread. Note this only applies if one of these two functions is called withinLayer.call; if one of those two functions callsLayer.call,AutoCastVariables will still be casted.
tf.data:tf.data.experimental.service.DispatchServernow takes a config tuple instead of individual arguments. Usages should be updated totf.data.experimental.service.DispatchServer(dispatcher_config).
-
tf.data.experimental.service.WorkerServernow takes a config tuple instead of individual arguments. Usages should be updated totf.data.experimental.service.WorkerServer(worker_config).tf.distribute:- Removes
tf.distribute.Strategy.experimental_make_numpy_dataset. Please usetf.data.Dataset.from_tensor_slicesinstead. - Renames
experimental_hintsintf.distribute.StrategyExtended.reduce_to,tf.distribute.StrategyExtended.batch_reduce_to,tf.distribute.ReplicaContext.all_reducetooptions: - Renames
tf.distribute.experimental.CollectiveHintstotf.distribute.experimental.CommunicationOptions. - Renames
tf.distribute.experimental.CollectiveCommunicationtotf.distribute.experimental.CommunicationImplementation. - Renames
tf.distribute.Strategy.experimental_distribute_datasets_from_functiontodistribute_datasets_from_functionas it is no longer experimental.
- Removes
tf.distribute.Strategy.experimental_run_v2method, which was deprecated in TF 2.2.tf.lite:tf.quantization.quantize_and_dequantize_v2has been introduced, which updates the gradient definition for quantization which is outside the range
to be 0. To simulate the V1 the behavior oftf.quantization.quantize_and_dequantize(...)usetf.grad_pass_through(tf.quantization.quantize_and_dequantize_v2)(...).
π Bug Fixes and Other Changes
TF Core:
- π Introduces experimental support for a new module named
tf.experimental.numpy, which
is a NumPy-compatible API for writing TF programs. This module provides classndarray, which mimics thendarrayclass in NumPy, and wraps an immutabletf.Tensorunder the hood. A subset of NumPy functions (e.g.numpy.add) are provided. Their inter-operation with TF facilities is seamless in most cases.
See tensorflow/python/ops/numpy_ops/README.md
π for details of what operations are supported and what are the differences from NumPy. tf.types.experimental.TensorLikeis a newUniontype that can be used as type annotation for variables representing a Tensor or a value
that can be converted to Tensor bytf.convert_to_tensor.- Calling ops with a python constants or numpy values is now consistent with tf.convert_to_tensor behavior. This avoids operations like
tf.reshape truncating inputs such as from int64 to int32. - β Adds
tf.sparse.map_valuesto apply a function to the.values ofSparseTensorarguments. - The Python bitwise operators for
Tensor(__and__,__or__,__xor__and__invert__now support non-boolarguments and apply
π the corresponding bitwise ops.boolarguments continue to be supported and dispatch to logical ops. This brings them more in line with
Python and NumPy behavior. - β Adds
tf.SparseTensor.with_values. This returns a new SparseTensor with the same sparsity pattern, but with new provided values. It is
similar to thewith_valuesfunction ofRaggedTensor. - β Adds
StatelessCaseop, and uses it if none of case branches has stateful ops. - Adds
tf.config.experimental.get_memory_usageto return total memory usage of the device. - β Adds gradients for
RaggedTensorToVariantandRaggedTensorFromVariant. - π Improve shape inference of nested function calls by supporting constant folding across Arg nodes which makes more static values available to shape inference functions.
tf.debugging:tf.debugging.assert_shapes()now works onSparseTensors (Fixes #36268).
- GPU
- Adds Support for TensorFloat-32 on Ampere based GPUs.
TensorFloat-32, or TF32 for short, is a math mode for NVIDIA Ampere based GPUs which causes certain float32 ops, such as matrix
multiplications and convolutions, to run much faster on Ampere GPUs but with reduced precision. This reduced precision has not been found
to effect convergence quality of deep learning models in practice. TensorFloat-32 is enabled by default, but can be disabled withtf.config.experimental.enable_tensor_float_32_execution.
- Adds Support for TensorFloat-32 on Ampere based GPUs.
tf.math:- Adds
tf.math.erfcinv, the inverse totf.math.erfc.
- Adds
tf.nn:tf.nn.max_pool2dnow supports explicit padding.
tf.image:- Adds deterministic
tf.image.stateless_random_*functions for eachtf.image.random_*function. Added a new opstateless_sample_distorted_bounding_boxwhich is a deterministic version ofsample_distorted_bounding_boxop. Given the same seed, these stateless functions/ops produce the same results independent of how many times the function is called, and independent of global seed settings. - Adds deterministic
tf.image.resizebackprop CUDA kernels formethod=ResizeMethod.BILINEAR(the default method). Enable by setting the environment variableTF_DETERMINISTIC_OPSto"true"or"1".
- Adds deterministic
- π¨
tf.print:- Bug fix in
tf.print()withOrderedDictwhere if anOrderedDictdidn't have the keys sorted, the keys and values were not being printed
in accordance with their correct mapping.
- Bug fix in
tf.train.Checkpoint:- Now accepts a
rootargument in the initialization, which generates a checkpoint with a root object. This allows users to create aCheckpointobject that is compatible with Kerasmodel.save_weights()andmodel.load_weights. The checkpoint is also compatible with the checkpoint saved in thevariables/folder in the SavedModel. - When restoring,
save_pathcan be a path to a SavedModel. The function will automatically find the checkpoint in the SavedModel.
- Now accepts a
tf.data:- Adds new
tf.data.experimental.service.register_datasetandtf.data.experimental.service.from_dataset_idAPIs to enable one
π¨ process to register a dataset with the tf.data service, and another process to consume data from the dataset. - β Adds support for dispatcher fault tolerance. To enable fault tolerance, configure a
work_dirwhen running your dispatcher server and set
dispatcher_fault_tolerance=True. The dispatcher will store its state towork_dir, so that on restart it can continue from its previous
state after restart. - β Adds support for sharing dataset graphs via shared filesystem instead of over RPC. This reduces load on the dispatcher, improving performance
π· of distributing datasets. For this to work, the dispatcher'swork_dirmust be accessible from workers. If the worker fails to read from the
work_dir, it falls back to using RPC for dataset graph transfer. - β Adds support for a new "distributed_epoch" processing mode. This processing mode distributes a dataset across all tf.data workers,
π instead of having each worker process the full dataset. See the tf.data service docs to learn more. - Adds optional
exclude_colsparameter to CsvDataset. This parameter is the complement ofselect_cols; at most one of these should be specified. - We have implemented an optimization which reorders data-discarding transformations such as
takeandshardto happen earlier in the dataset when it is safe to do so. The optimization can be disabled via theexperimental_optimization.reorder_data_discarding_opsdataset option. tf.data.Optionswere previously immutable and can now be overridden.- π
tf.data.Dataset.from_generatornow supports Ragged and Sparse tensors with a newoutput_signatureargument, which allowsfrom_generatorto
produce any type describable by atf.TypeSpec. tf.data.experimental.AUTOTUNEis now available in the core API astf.data.AUTOTUNE.
tf.distribute:- π Introduces experimental support for asynchronous training of Keras models via
tf.distribute.experimental.ParameterServerStrategy:- Replaces the existing
tf.distribute.experimental.ParameterServerStrategysymbol with a new class that is for parameter server training in TF2. Usage of
the old symbol, usually with Estimator API, should be replaced with [tf.compat.v1.distribute.experimental.ParameterServerStrategy]. - Added
tf.distribute.experimental.coordinator.*namespace, including the main APIClusterCoordinatorfor coordinating the training cluster, the related data structureRemoteValueandPerWorkerValue.
- Replaces the existing
- β Adds
tf.distribute.Strategy.gatherandtf.distribute.ReplicaContext.all_gatherAPIs to support gathering dense distributed values. - π Fixes various issues with saving a distributed model.
tf.keras:- π Improvements from the Functional API refactoring:
- Functional model construction does not need to maintain a global workspace graph, removing memory leaks especially when building many
models or very large models. - Functional model construction should be ~8-10% faster on average.
- Functional models can now contain non-symbolic values in their call inputs inside of the first positional argument.
- Several classes of TF ops that were not reliably converted to Keras layers during functional API construction should now work, e.g.
tf.image.ssim_multiscale - Error messages when Functional API construction goes wrong (and when ops cannot be converted to Keras layers automatically) should be
clearer and easier to understand.
- Functional model construction does not need to maintain a global workspace graph, removing memory leaks especially when building many
- β‘οΈ
Optimizer.minimizecan now accept a lossTensorand aGradientTapeas an alternative to accepting acallableloss. - β Adds
betahyperparameter to FTRL optimizer classes (Keras and others) to match FTRL paper. Optimizer. __init__now accepts agradient_aggregatorto allow for customization of how gradients are aggregated across devices, as well as
gradients_transformersto allow for custom gradient transformations (such as gradient clipping).- π Improvements to Keras preprocessing layers:
- TextVectorization can now accept a vocabulary list or file as an init arg.
- Normalization can now accept mean and variance values as init args.
- In
AttentionandAdditiveAttentionlayers, thecall()method now accepts areturn_attention_scoresargument. When set to
True, the layer returns the attention scores as an additional output argument. - β Adds
tf.metrics.log_coshandtf.metrics.logcoshAPI entrypoints with the same implementation as theirtf.lossesequivalent. - For Keras model, the individual call of
Model.evaluateuses no cached data for evaluation, whileModel.fituses cached data when
πvalidation_dataarg is provided for better performance. - Adds a
save_tracesargument tomodel.save/tf.keras.models.save_modelwhich determines whether the SavedModel format stores the Keras model/layer call functions. The traced functions allow Keras to revive custom models and layers without the original class definition, but if this isn't required the tracing can be disabled with the added option. - The
tf.keras.mixed_precisionAPI is non non-experimental. The
non-experimental API differs from the experimental API in several ways.tf.keras.mixed_precision.Policyno longer takes in a
tf.mixed_precision.experimental.LossScalein the constructor, and no
longer has aLossScaleassociated with it. Instead,Model.compile
β‘οΈ will automatically wrap the optimizer with aLossScaleOptimizerusing
dynamic loss scaling ifPolicy.nameis "mixed_float16".tf.keras.mixed_precision.LossScaleOptimizer's constructor takes in
different arguments. In particular, it no longer takes in aLossScale,
and there is no longer aLossScaleassociated with the
β‘οΈLossScaleOptimizer. Instead,LossScaleOptimizerdirectly implements
π fixed or dynamic loss scaling. See the documentation of
β‘οΈtf.keras.mixed_precision.experimental.LossScaleOptimizer
for details on the differences between the experimental
β‘οΈLossScaleOptimizerand the new non-experimentalLossScaleOptimizer.tf.mixed_precision.experimental.LossScaleand its subclasses are
π deprecated, as all of its functionality now exists within
β‘οΈtf.keras.mixed_precision.LossScaleOptimizer
tf.lite:TFLiteConverter:- Support optional flags
inference_input_typeandinference_output_typefor full integer quantized models. This allows users to modify the model input and output type to integer types (tf.int8,tf.uint8) instead of defaulting to float type (tf.float32).
- Support optional flags
- NNAPI
- Adds NNAPI Delegation support for requantization use cases by converting the operation into a dequantize-quantize pair.
- Removes deprecated
Interpreter.setUseNNAPI(boolean)Java API. UseInterpreter.Options.setUseNNAPIinstead. - Deprecates
Interpreter::UseNNAPI(bool)C++ API. UseNnApiDelegate()and related delegate configuration methods directly. - Deprecates
Interpreter::SetAllowFp16PrecisionForFp32(bool)C++ API. Prefer controlling this via delegate options, e.g.tflite::StatefulNnApiDelegate::Options::allow_fp16' orTfLiteGpuDelegateOptionsV2::is_precision_loss_allowed`.
- GPU
- GPU acceleration now supports quantized models by default
DynamicBuffer::AddJoinedString()will now add a separator if the first string to be joined is empty.- β Adds support for cumulative sum (cumsum), both as builtin op and MLIR conversion.
TensorRT- Issues a warning when the
session_configparameter for the TF1 converter is used or therewrite_config_templatefield in the TF2
converter parameter object is used.
TPU Enhancements:
- β Adds support for the
betaparameter of the FTRL optimizer for TPU embeddings. Users of other TensorFlow platforms can implement equivalent
behavior by adjusting thel2parameter.
π XLA Support:
- π xla.experimental.compile is deprecated, use
tf.function(experimental_compile=True)instead. - Adds
tf.function.experimental_get_compiler_irwhich returns compiler IR (currently 'hlo' and 'optimized_hlo') for given input for given function.
π Security:
- π Fixes an undefined behavior causing a segfault in
tf.raw_ops.Switch, (CVE-2020-15190) - π Fixes three vulnerabilities in conversion to DLPack format
- π Fixes two vulnerabilities in
SparseFillEmptyRowsGrad - π Fixes several vulnerabilities in
RaggedCountSparseOutputandSparseCountSparseOutputoperations - π Fixes an integer truncation vulnerability in code using the work sharder API, (CVE-2020-15202)
- π Fixes a format string vulnerability in
tf.strings.as_string, (CVE-2020-15203) - π Fixes segfault raised by calling session-only ops in eager mode, (CVE-2020-15204)
- π Fixes data leak and potential ASLR violation from
tf.raw_ops.StringNGrams, (CVE-2020-15205) - π Fixes segfaults caused by incomplete
SavedModelvalidation, (CVE-2020-15206) - π Fixes a data corruption due to a bug in negative indexing support in TFLite, (CVE-2020-15207)
- π Fixes a data corruption due to dimension mismatch in TFLite, (CVE-2020-15208)
- π Fixes several vulnerabilities in TFLite saved model format
- π Fixes several vulnerabilities in TFLite implementation of segment sum
- Fixes a segfault in
tf.quantization.quantize_and_dequantize, (CVE-2020-15265) - π Fixes an undefined behavior float cast causing a crash, (CVE-2020-15266)
Other:
- π We have replaced uses of "whitelist" and "blacklist" with "allowlist" and "denylist" where possible. Please see this list for more context.
- Adds
tf.config.experimental.mlir_bridge_rolloutwhich will help us rollout the new MLIR TPU bridge. - Adds
tf.experimental.register_filesystem_pluginto load modular filesystem plugins from Python
Thanks to our Contributors
π This release contains contributions from many people at Google and external contributors.
8bitmp3, aaa.jq, Abhineet Choudhary, Abolfazl Shahbazi, acxz, Adam Hillier, Adrian Garcia Badaracco, Ag Ramesh, ahmedsabie, Alan Anderson, Alexander Grund, Alexandre Lissy, Alexey Ivanov, Amedeo Cavallo, anencore94, Aniket Kumar Singh, Anthony Platanios, Ashwin Phadke, Balint Cristian, Basit Ayantunde, bbbboom, Ben Barsdell, Benjamin Chetioui, Benjamin Peterson, bhack, Bhanu Prakash Bandaru Venkata, Biagio Montaruli, Brent M. Spell, bubblebooy, bzhao, cfRod, Cheng Chen, Cheng(Kit) Chen, Chris Tessum, Christian, chuanqiw, codeadmin_peritiae, COTASPAR, CuiYifeng, danielknobe, danielyou0230, dannyfriar, daria, DarrenZhang01, Denisa Roberts, dependabot[bot], Deven Desai, Dmitry Volodin, Dmitry Zakharov, drebain, Duncan Riach, Eduard Feicho, Ehsan Toosi, Elena Zhelezina, emlaprise2358, Eugene Kuznetsov, Evaderan-Lab, Evgeniy Polyakov, Fausto Morales, Felix Johnny, fo40225, Frederic Bastien, Fredrik Knutsson, fsx950223, Gaurav Singh, Gauri1 Deshpande, George Grzegorz Pawelczak, gerbauz, Gianluca Baratti, Giorgio Arena, Gmc2, Guozhong Zhuang, Hannes Achleitner, Harirai, HarisWang, Harsh188, hedgehog91, Hemal Mamtora, Hideto Ueno, Hugh Ku, Ian Beauregard, Ilya Persky, jacco, Jakub BerΓ‘nek, Jan Jongboom, Javier Montalt Tordera, Jens Elofsson, Jerry Shih, jerryyin, jgehw, Jinjing Zhou, jma, jmsmdy, Johan NordstrΓΆm, John Poole, Jonah Kohn, Jonathan Dekhtiar, jpodivin, Jung Daun, Kai Katsumata, Kaixi Hou, Kamil Rakoczy, Kaustubh Maske Patil, Kazuaki Ishizaki, Kedar Sovani, Koan-Sin Tan, Koki Ibukuro, Krzysztof Laskowski, Kushagra Sharma, Kushan Ahmadian, Lakshay Tokas, Leicong Li, levinxo, Lukas Geiger, Maderator, Mahmoud Abuzaina, Mao Yunfei, Marius Brehler, markf, Martin Hwasser, Martin KubovΔΓk, Matt Conley, Matthias, mazharul, mdfaijul, Michael137, MichelBr, Mikhail Startsev, Milan Straka, Ml-0, Myung-Hyun Kim, MΓ₯ns Nilsson, Nathan Luehr, ngc92, nikochiko, Niranjan Hasabnis, nyagato_00, Oceania2018, Oleg Guba, Ongun Kanat, OscarVanL, Patrik Laurell, Paul Tanger, Peter Sobot, Phil Pearl, PlusPlusUltra, Poedator, Prasad Nikam, Rahul-Kamat, Rajeshwar Reddy T, redwrasse, Rickard, Robert Szczepanski, Rohan Lekhwani, Sam Holt, Sami Kama, Samuel Holt, Sandeep Giri, sboshin, Sean Settle, settle, Sharada Shiddibhavi, Shawn Presser, ShengYang1, Shi,Guangyong, Shuxiang Gao, Sicong Li, Sidong-Wei, Srihari Humbarwadi, Srinivasan Narayanamoorthy, Steenu Johnson, Steven Clarkson, stjohnso98, Tamas Bela Feher, Tamas Nyiri, Tarandeep Singh, Teng Lu, Thibaut Goetghebuer-Planchon, Tim Bradley, Tomasz Strejczek, Tongzhou Wang, Torsten Rudolf, Trent Lo, Ty Mick, Tzu-Wei Sung, Varghese, Jojimon, Vignesh Kothapalli, Vishakha Agrawal, Vividha, Vladimir Menshakov, Vladimir Silyaev, VoVAllen, VΓ΅ VΔn NghΔ©a, wondertx, xiaohong1031, Xiaoming (Jason) Cui, Xinan Jiang, Yair Ehrenwald, Yasir Modak, Yasuhiro Matsumoto, Yimei Sun, Yiwen Li, Yixing, Yoav Ramon, Yong Tang, Yong Wu, yuanbopeng, Yunmo Koo, Zhangqiang, Zhou Peng, ZhuBaohe, zilinzhu, zmx
- Certain float32 ops run in lower precsion on Ampere based GPUs, including matmuls and convolutions, due to the use of TensorFloat-32. Specifically, inputs to such ops are rounded from 23 bits of precision to 10
-
v2.4.0-rc1 Changes
November 09, 2020π Release 2.4.0
Major Features and Improvements
π
tf.distributeintroduces experimental support for asynchronous training of Keras models via thetf.distribute.experimental.ParameterServerStrategyAPI. Please see below for additional details.π
MultiWorkerMirroredStrategyis now a stable API and is no longer considered experimental. Some of the major improvements involve handling peer failure and many bug fixes. Please check out the detailed tutorial on Multi-worker training with Keras.π Introduces experimental support for a new module named
tf.experimental.numpywhich is a NumPy-compatible API for writing TF programs. See the detailed guide to learn more. Additional details below.β Adds Support for
π TensorFloat-32 on Ampere based GPUs. TensorFloat-32, or TF32 for short, is a math mode for NVIDIA Ampere based GPUs and is enabled by default.π A major refactoring of the internals of the Keras Functional API has been completed, that should improve the reliability, stability, and performance of constructing Functional models.
Keras mixed precision API
tf.keras.mixed_precisionis no longer experimental and allows the use of 16-bit floating point formats during training, improving performance by up to 3x on GPUs and 60% on TPUs. Please see below for additional details.π· TF Profiler now supports profiling multiple workers using the sampling mode API.
TFLite Profiler for Android is available. See the detailed guide to learn more.
π¦ TensorFlow pip packages are now built with CUDA11 and cuDNN 8.0.2.
π₯ Breaking Changes
TF Core:
- Certain float32 ops run in lower precsion on Ampere based GPUs, including matmuls and convolutions, due to the use of TensorFloat-32. Specifically, inputs to such ops are rounded from 23 bits of precision to 10
bits of precision. This is unlikely to cause issues in practice for deep learning models. In some cases, TensorFloat-32 is also used for complex64 ops.
TensorFloat-32 can be disabled by runningtf.config.experimental.enable_tensor_float_32_execution(False). - The byte layout for string tensors across the C-API has been updated to match TF Core/C++; i.e., a contiguous array of
tensorflow::tstring/TF_TStrings. - C-API functions
TF_StringDecode,TF_StringEncode, andTF_StringEncodedSizeare no longer relevant and have been removed; seecore/platform/ctstring.hfor string access/modification in C. tensorflow.python,tensorflow.coreandtensorflow.compilermodules are now hidden. These modules are not part of TensorFlow public API.tf.raw_ops.Maxandtf.raw_ops.Minno longer accept inputs of typetf.complex64ortf.complex128, because the behavior of these ops is not well defined for complex types.
- XLA:CPU and XLA:GPU devices are no longer registered by default. Use
TF_XLA_FLAGS=--tf_xla_enable_xla_devicesif you really need them, but this flag will eventually be removed in subsequent releases.tf.keras:- The
steps_per_executionargument incompile()is no longer experimental; if you were passingexperimental_steps_per_execution, rename it tosteps_per_executionin your code. This argument controls the number of batches to run during eachtf.functioncall when callingfit(). Running multiple batches inside a singletf.functioncall can greatly improve performance on TPUs or small models with a large Python overhead. - A major refactoring of the internals of the Keras Functional API may affect code that
is relying on certain internal details:- Code that uses
isinstance(x, tf.Tensor)instead oftf.is_tensorwhen checking Keras symbolic inputs/outputs should switch to usingtf.is_tensor. - Code that is overly dependent on the exact names attached to symbolic tensors (e.g. assumes there will be ":0" at the end of the inputs, treats names as unique identifiers instead of using
tensor.ref(), etc.) - Code that uses
get_concrete_functionto trace Keras symbolic inputs directly should switch to building matchingtf.TensorSpecs directly and tracing theTensorSpecobjects. - Code that relies on the exact number and names of the op layers that TensorFlow operations were converted into may have changed.
- Code that uses
tf.map_fn/tf.cond/tf.while_loop/control flow as op layers and happens to work before TF 2.4. These will explicitly be unsupported now. Converting these ops to Functional API op layers was unreliable before TF 2.4, and prone to erroring incomprehensibly or being silently buggy. - Code that directly asserts on a Keras symbolic value in cases where ops like
tf.rankused to return a static or symbolic value depending on if the input had a fully static shape or not. Now these ops always return symbolic values. - Code already susceptible to leaking tensors outside of graphs becomes slightly more likely to do so now.
- Code that tries directly getting gradients with respect to symbolic Keras inputs/outputs. Use GradientTape on the actual Tensors passed to the already- constructed model instead.
- Code that requires very tricky shape manipulation via converted op layers in order to work, where the Keras symbolic shape inference proves insufficient.
- Code that tries manually walking a
tf.keras.Modellayer by layer and assumes layers only ever have one positional argument. This assumption doesn't hold true before TF 2.4 either, but is more likely to cause issues now. - Code that manually enters
keras.backend.get_graph()before building a functional model is no longer needed. - Start enforcing input shape assumptions when calling Functional API Keras models. This may potentially break some users, in case there is a mismatch between the shape used when creating
Inputobjects in a Functional model, and the shape of the data passed to that model. You can fix this mismatch by either calling the model with correctly-shaped data, or by relaxingInputshape assumptions (note that you can pass shapes withNoneentries for axes
that are meant to be dynamic). You can also disable the input checking entirely by settingmodel.input_spec = None.
- Code that uses
- Serveral changes have been made to
tf.keras.mixed_precision.experimental. Note that it is now recommended to use the non-experimentaltf.keras.mixed_precisionAPI.AutoCastVariable.dtypenow refers to the actual variable dtype, not the
dtype it will be casted to.- When mixed precision is enabled,
tf.keras.layers.Embeddingnow outputs a
float16 or bfloat16 tensor instead of a float32 tensor. - The property
β‘οΈtf.keras.mixed_precision.experimental.LossScaleOptimizer.loss_scaleis
now a tensor, not aLossScaleobject. This means to get a loss scale of
β‘οΈ aLossScaleOptimizeras a tensor, you must now callopt.loss_scale
instead ofopt.loss_scale(). - The property
should_cast_variableshas been removed from
tf.keras.mixed_precision.experimental.Policy - When passing a
tf.mixed_precision.experimental.DynamicLossScaleto
β‘οΈtf.keras.mixed_precision.experimental.LossScaleOptimizer, the
DynamicLossScale's multiplier must be 2. - When passing a
tf.mixed_precision.experimental.DynamicLossScaleto
β‘οΈtf.keras.mixed_precision.experimental.LossScaleOptimizer, the weights of
β‘οΈ theDynanmicLossScaleare copied into theLossScaleOptimizerinstead
of being reused. This means modifying the weights of the
DynamicLossScalewill no longer affect the weights of the
β‘οΈ LossScaleOptimizer, and vice versa. - The global policy can no longer be set to a non-floating point policy in
tf.keras.mixed_precision.experimental.set_policy - In
Layer.call,AutoCastVariables will no longer be casted within
πMirroredStrategy.runorReplicaContext.merge_call. This is because a
thread local variable is used to determine whetherAutoCastVariables are
casted, and those two functions run with a different thread. Note this
only applies if one of these two functions is called withinLayer.call;
if one of those two functions callsLayer.call,AutoCastVariables will
still be casted.
tf.data:tf.data.experimental.service.DispatchServernow takes a config tuple instead of individual arguments. Usages should be updated totf.data.experimental.service.DispatchServer(dispatcher_config).
-
tf.data.experimental.service.WorkerServernow takes a config tuple instead of individual arguments. Usages should be updated totf.data.experimental.service.WorkerServer(worker_config).tf.distribute:- Removes
tf.distribute.Strategy.experimental_make_numpy_dataset. Please usetf.data.Dataset.from_tensor_slicesinstead. - Renames
experimental_hintsintf.distribute.StrategyExtended.reduce_to,tf.distribute.StrategyExtended.batch_reduce_to,tf.distribute.ReplicaContext.all_reducetooptions: - Renames
tf.distribute.experimental.CollectiveHintstotf.distribute.experimental.CommunicationOptions. - Renames
tf.distribute.experimental.CollectiveCommunicationtotf.distribute.experimental.CommunicationImplementation. - Renames
tf.distribute.Strategy.experimental_distribute_datasets_from_functiontodistribute_datasets_from_functionas it is no longer experimental.
- Removes
tf.distribute.Strategy.experimental_run_v2method, which was deprecated in TF 2.2.tf.lite:tf.quantization.quantize_and_dequantize_v2has been introduced, which updates the gradient definition for quantization which is outside the range
to be 0. To simulate the V1 the behavior oftf.quantization.quantize_and_dequantize(...)usetf.grad_pass_through(tf.quantization.quantize_and_dequantize_v2)(...).
π Bug Fixes and Other Changes
TF Core:
- π Introduces experimental support for a new module named
tf.experimental.numpy, which
is a NumPy-compatible API for writing TF programs. This module provides classndarray, which mimics thendarrayclass in NumPy, and wraps an immutabletf.Tensorunder the hood. A subset of NumPy functions (e.g.numpy.add) are provided. Their inter-operation with TF facilities is seamless in most cases.
See tensorflow/python/ops/numpy_ops/README.md
π for details of what operations are supported and what are the differences from NumPy. tf.types.experimental.TensorLikeis a newUniontype that can be used as type annotation for variables representing a Tensor or a value
that can be converted to Tensor bytf.convert_to_tensor.- Calling ops with a python constants or numpy values is now consistent with tf.convert_to_tensor behavior. This avoids operations like
tf.reshape truncating inputs such as from int64 to int32. - β Adds
tf.sparse.map_valuesto apply a function to the.values ofSparseTensorarguments. - The Python bitwise operators for
Tensor(__and__,__or__,__xor__and__invert__now support non-boolarguments and apply
π the corresponding bitwise ops.boolarguments continue to be supported and dispatch to logical ops. This brings them more in line with
Python and NumPy behavior. - β Adds
tf.SparseTensor.with_values. This returns a new SparseTensor with the same sparsity pattern, but with new provided values. It is
similar to thewith_valuesfunction ofRaggedTensor. - β Adds
StatelessCaseop, and uses it if none of case branches has stateful ops. - Adds
tf.config.experimental.get_memory_usageto return total memory usage of the device. - β Adds gradients for
RaggedTensorToVariantandRaggedTensorFromVariant. - π Improve shape inference of nested function calls by supporting constant folding across Arg nodes which makes more static values available to shape inference functions.
tf.debugging:tf.debugging.assert_shapes()now works onSparseTensors (Fixes #36268).
- GPU
- Adds Support for TensorFloat-32 on Ampere based GPUs.
TensorFloat-32, or TF32 for short, is a math mode for NVIDIA Ampere based GPUs which causes certain float32 ops, such as matrix
multiplications and convolutions, to run much faster on Ampere GPUs but with reduced precision. This reduced precision has not been found
to effect convergence quality of deep learning models in practice. TensorFloat-32 is enabled by default, but can be disabled withtf.config.experimental.enable_tensor_float_32_execution.
- Adds Support for TensorFloat-32 on Ampere based GPUs.
tf.math:- Adds
tf.math.erfcinv, the inverse totf.math.erfc.
- Adds
tf.nn:tf.nn.max_pool2dnow supports explicit padding.
tf.image:- Adds deterministic
tf.image.stateless_random_*functions for eachtf.image.random_*function. Added a new opstateless_sample_distorted_bounding_boxwhich is a deterministic version ofsample_distorted_bounding_boxop. Given the same seed, these stateless functions/ops produce the same results independent of how many times the function is called, and independent of global seed settings.
- Adds deterministic
- π¨
tf.print:- Bug fix in
tf.print()withOrderedDictwhere if anOrderedDictdidn't have the keys sorted, the keys and values were not being printed
in accordance with their correct mapping.
- Bug fix in
tf.train.Checkpoint:- Now accepts a
rootargument in the initialization, which generates a checkpoint with a root object. This allows users to create aCheckpointobject that is compatible with Kerasmodel.save_weights()andmodel.load_weights. The checkpoint is also compatible with the checkpoint saved in thevariables/folder in the SavedModel. - When restoring,
save_pathcan be a path to a SavedModel. The function will automatically find the checkpoint in the SavedModel.
- Now accepts a
tf.data:- Adds new
tf.data.experimental.service.register_datasetandtf.data.experimental.service.from_dataset_idAPIs to enable one
π¨ process to register a dataset with the tf.data service, and another process to consume data from the dataset. - β Adds support for dispatcher fault tolerance. To enable fault tolerance, configure a
work_dirwhen running your dispatcher server and set
dispatcher_fault_tolerance=True. The dispatcher will store its state towork_dir, so that on restart it can continue from its previous
state after restart. - β Adds support for sharing dataset graphs via shared filesystem instead of over RPC. This reduces load on the dispatcher, improving performance
π· of distributing datasets. For this to work, the dispatcher'swork_dirmust be accessible from workers. If the worker fails to read from the
work_dir, it falls back to using RPC for dataset graph transfer. - β Adds support for a new "distributed_epoch" processing mode. This processing mode distributes a dataset across all tf.data workers,
π instead of having each worker process the full dataset. See the tf.data service docs to learn more. - Adds optional
exclude_colsparameter to CsvDataset. This parameter is the complement ofselect_cols; at most one of these should be specified. - We have implemented an optimization which reorders data-discarding transformations such as
takeandshardto happen earlier in the dataset when it is safe to do so. The optimization can be disabled via theexperimental_optimization.reorder_data_discarding_opsdataset option. tf.data.Optionswere previously immutable and can now be overridden.- π
tf.data.Dataset.from_generatornow supports Ragged and Sparse tensors with a newoutput_signatureargument, which allowsfrom_generatorto
produce any type describable by atf.TypeSpec. tf.data.experimental.AUTOTUNEis now available in the core API astf.data.AUTOTUNE.
tf.distribute:- π Introduces experimental support for asynchronous training of Keras models via
tf.distribute.experimental.ParameterServerStrategy:- Replaces the existing
tf.distribute.experimental.ParameterServerStrategysymbol with a new class that is for parameter server training in TF2. Usage of
the old symbol, usually with Estimator API, should be replaced with [tf.compat.v1.distribute.experimental.ParameterServerStrategy]. - Added
tf.distribute.experimental.coordinator.*namespace, including the main APIClusterCoordinatorfor coordinating the training cluster, the related data structureRemoteValueandPerWorkerValue.
- Replaces the existing
- β Adds
tf.distribute.Strategy.gatherandtf.distribute.ReplicaContext.all_gatherAPIs to support gathering dense distributed values. - π Fixes various issues with saving a distributed model.
tf.keras:- π Improvements from the Functional API refactoring:
- Functional model construction does not need to maintain a global workspace graph, removing memory leaks especially when building many
models or very large models. - Functional model construction should be ~8-10% faster on average.
- Functional models can now contain non-symbolic values in their call inputs inside of the first positional argument.
- Several classes of TF ops that were not reliably converted to Keras layers during functional API construction should now work, e.g.
tf.image.ssim_multiscale - Error messages when Functional API construction goes wrong (and when ops cannot be converted to Keras layers automatically) should be
clearer and easier to understand.
- Functional model construction does not need to maintain a global workspace graph, removing memory leaks especially when building many
- β‘οΈ
Optimizer.minimizecan now accept a lossTensorand aGradientTapeas an alternative to accepting acallableloss. - β Adds
betahyperparameter to FTRL optimizer classes (Keras and others) to match FTRL paper. Optimizer. __init__now accepts agradient_aggregatorto allow for customization of how gradients are aggregated across devices, as well as
gradients_transformersto allow for custom gradient transformations (such as gradient clipping).- π Improvements to Keras preprocessing layers:
- TextVectorization can now accept a vocabulary list or file as an init arg.
- Normalization can now accept mean and variance values as init args.
- In
AttentionandAdditiveAttentionlayers, thecall()method now accepts areturn_attention_scoresargument. When set to
True, the layer returns the attention scores as an additional output argument. - β Adds
tf.metrics.log_coshandtf.metrics.logcoshAPI entrypoints with the same implementation as theirtf.lossesequivalent. - For Keras model, the individual call of
Model.evaluateuses no cached data for evaluation, whileModel.fituses cached data when
πvalidation_dataarg is provided for better performance. - Adds a
save_tracesargument tomodel.save/tf.keras.models.save_modelwhich determines whether the SavedModel format stores the Keras model/layer call functions. The traced functions allow Keras to revive custom models and layers without the original class definition, but if this isn't required the tracing can be disabled with the added option. - The
tf.keras.mixed_precisionAPI is non non-experimental. The
non-experimental API differs from the experimental API in several ways.tf.keras.mixed_precision.Policyno longer takes in a
tf.mixed_precision.experimental.LossScalein the constructor, and no
longer has aLossScaleassociated with it. Instead,Model.compile
β‘οΈ will automatically wrap the optimizer with aLossScaleOptimizerusing
dynamic loss scaling ifPolicy.nameis "mixed_float16".tf.keras.mixed_precision.LossScaleOptimizer's constructor takes in
different arguments. In particular, it no longer takes in aLossScale,
and there is no longer aLossScaleassociated with the
β‘οΈLossScaleOptimizer. Instead,LossScaleOptimizerdirectly implements
π fixed or dynamic loss scaling. See the documentation of
β‘οΈtf.keras.mixed_precision.experimental.LossScaleOptimizer
for details on the differences between the experimental
β‘οΈLossScaleOptimizerand the new non-experimentalLossScaleOptimizer.tf.mixed_precision.experimental.LossScaleand its subclasses are
π deprecated, as all of its functionality now exists within
β‘οΈtf.keras.mixed_precision.LossScaleOptimizer
tf.lite:TFLiteConverter:- Support optional flags
inference_input_typeandinference_output_typefor full integer quantized models. This allows users to modify the model input and output type to integer types (tf.int8,tf.uint8) instead of defaulting to float type (tf.float32).
- Support optional flags
- NNAPI
- Adds NNAPI Delegation support for requantization use cases by converting the operation into a dequantize-quantize pair.
- Removes deprecated
Interpreter.setUseNNAPI(boolean)Java API. UseInterpreter.Options.setUseNNAPIinstead. - Deprecates
Interpreter::UseNNAPI(bool)C++ API. UseNnApiDelegate()and related delegate configuration methods directly. - Deprecates
Interpreter::SetAllowFp16PrecisionForFp32(bool)C++ API. Prefer controlling this via delegate options, e.g.tflite::StatefulNnApiDelegate::Options::allow_fp16' orTfLiteGpuDelegateOptionsV2::is_precision_loss_allowed`.
- GPU
- GPU acceleration now supports quantized models by default
DynamicBuffer::AddJoinedString()will now add a separator if the first string to be joined is empty.- β Adds support for cumulative sum (cumsum), both as builtin op and MLIR conversion.
TensorRT- Issues a warning when the
session_configparameter for the TF1 converter is used or therewrite_config_templatefield in the TF2
converter parameter object is used.
TPU Enhancements:
- β Adds support for the
betaparameter of the FTRL optimizer for TPU embeddings. Users of other TensorFlow platforms can implement equivalent
behavior by adjusting thel2parameter.
π XLA Support:
- π xla.experimental.compile is deprecated, use
tf.function(experimental_compile=True)instead. - Adds
tf.function.experimental_get_compiler_irwhich returns compiler IR (currently 'hlo' and 'optimized_hlo') for given input for given function.
π Security:
- π Fixes an undefined behavior causing a segfault in
tf.raw_ops.Switch, (CVE-2020-15190) - π Fixes three vulnerabilities in conversion to DLPack format
- π Fixes two vulnerabilities in
SparseFillEmptyRowsGrad - π Fixes several vulnerabilities in
RaggedCountSparseOutputandSparseCountSparseOutputoperations - π Fixes an integer truncation vulnerability in code using the work sharder API, (CVE-2020-15202)
- π Fixes a format string vulnerability in
tf.strings.as_string, (CVE-2020-15203) - π Fixes segfault raised by calling session-only ops in eager mode, (CVE-2020-15204)
- π Fixes data leak and potential ASLR violation from
tf.raw_ops.StringNGrams, (CVE-2020-15205) - π Fixes segfaults caused by incomplete
SavedModelvalidation, (CVE-2020-15206) - π Fixes a data corruption due to a bug in negative indexing support in TFLite, (CVE-2020-15207)
- π Fixes a data corruption due to dimension mismatch in TFLite, (CVE-2020-15208)
- π Fixes several vulnerabilities in TFLite saved model format
- π Fixes several vulnerabilities in TFLite implementation of segment sum
- Fixes a segfault in
tf.quantization.quantize_and_dequantize, (CVE-2020-15265) - π Fixes an undefined behavior float cast causing a crash, (CVE-2020-15266)
Other:
- π We have replaced uses of "whitelist" and "blacklist" with "allowlist" and "denylist" where possible. Please see this list for more context.
- Adds
tf.config.experimental.mlir_bridge_rolloutwhich will help us rollout the new MLIR TPU bridge. - Adds
tf.experimental.register_filesystem_pluginto load modular filesystem plugins from Python
Thanks to our Contributors
π This release contains contributions from many people at Google and external contributors.
8bitmp3, aaa.jq, Abhineet Choudhary, Abolfazl Shahbazi, acxz, Adam Hillier, Adrian Garcia Badaracco, Ag Ramesh, ahmedsabie, Alan Anderson, Alexander Grund, Alexandre Lissy, Alexey Ivanov, Amedeo Cavallo, anencore94, Aniket Kumar Singh, Anthony Platanios, Ashwin Phadke, Balint Cristian, Basit Ayantunde, bbbboom, Ben Barsdell, Benjamin Chetioui, Benjamin Peterson, bhack, Bhanu Prakash Bandaru Venkata, Biagio Montaruli, Brent M. Spell, bubblebooy, bzhao, cfRod, Cheng Chen, Cheng(Kit) Chen, Chris Tessum, Christian, chuanqiw, codeadmin_peritiae, COTASPAR, CuiYifeng, danielknobe, danielyou0230, dannyfriar, daria, DarrenZhang01, Denisa Roberts, dependabot[bot], Deven Desai, Dmitry Volodin, Dmitry Zakharov, drebain, Duncan Riach, Eduard Feicho, Ehsan Toosi, Elena Zhelezina, emlaprise2358, Eugene Kuznetsov, Evaderan-Lab, Evgeniy Polyakov, Fausto Morales, Felix Johnny, fo40225, Frederic Bastien, Fredrik Knutsson, fsx950223, Gaurav Singh, Gauri1 Deshpande, George Grzegorz Pawelczak, gerbauz, Gianluca Baratti, Giorgio Arena, Gmc2, Guozhong Zhuang, Hannes Achleitner, Harirai, HarisWang, Harsh188, hedgehog91, Hemal Mamtora, Hideto Ueno, Hugh Ku, Ian Beauregard, Ilya Persky, jacco, Jakub BerΓ‘nek, Jan Jongboom, Javier Montalt Tordera, Jens Elofsson, Jerry Shih, jerryyin, jgehw, Jinjing Zhou, jma, jmsmdy, Johan NordstrΓΆm, John Poole, Jonah Kohn, Jonathan Dekhtiar, jpodivin, Jung Daun, Kai Katsumata, Kaixi Hou, Kamil Rakoczy, Kaustubh Maske Patil, Kazuaki Ishizaki, Kedar Sovani, Koan-Sin Tan, Koki Ibukuro, Krzysztof Laskowski, Kushagra Sharma, Kushan Ahmadian, Lakshay Tokas, Leicong Li, levinxo, Lukas Geiger, Maderator, Mahmoud Abuzaina, Mao Yunfei, Marius Brehler, markf, Martin Hwasser, Martin KubovΔΓk, Matt Conley, Matthias, mazharul, mdfaijul, Michael137, MichelBr, Mikhail Startsev, Milan Straka, Ml-0, Myung-Hyun Kim, MΓ₯ns Nilsson, Nathan Luehr, ngc92, nikochiko, Niranjan Hasabnis, nyagato_00, Oceania2018, Oleg Guba, Ongun Kanat, OscarVanL, Patrik Laurell, Paul Tanger, Peter Sobot, Phil Pearl, PlusPlusUltra, Poedator, Prasad Nikam, Rahul-Kamat, Rajeshwar Reddy T, redwrasse, Rickard, Robert Szczepanski, Rohan Lekhwani, Sam Holt, Sami Kama, Samuel Holt, Sandeep Giri, sboshin, Sean Settle, settle, Sharada Shiddibhavi, Shawn Presser, ShengYang1, Shi,Guangyong, Shuxiang Gao, Sicong Li, Sidong-Wei, Srihari Humbarwadi, Srinivasan Narayanamoorthy, Steenu Johnson, Steven Clarkson, stjohnso98, Tamas Bela Feher, Tamas Nyiri, Tarandeep Singh, Teng Lu, Thibaut Goetghebuer-Planchon, Tim Bradley, Tomasz Strejczek, Tongzhou Wang, Torsten Rudolf, Trent Lo, Ty Mick, Tzu-Wei Sung, Varghese, Jojimon, Vignesh Kothapalli, Vishakha Agrawal, Vividha, Vladimir Menshakov, Vladimir Silyaev, VoVAllen, VΓ΅ VΔn NghΔ©a, wondertx, xiaohong1031, Xiaoming (Jason) Cui, Xinan Jiang, Yair Ehrenwald, Yasir Modak, Yasuhiro Matsumoto, Yimei Sun, Yiwen Li, Yixing, Yoav Ramon, Yong Tang, Yong Wu, yuanbopeng, Yunmo Koo, Zhangqiang, Zhou Peng, ZhuBaohe, zilinzhu, zmx
- Certain float32 ops run in lower precsion on Ampere based GPUs, including matmuls and convolutions, due to the use of TensorFloat-32. Specifically, inputs to such ops are rounded from 23 bits of precision to 10
-
v2.4.0-rc0 Changes
November 02, 2020π Release 2.4.0
Major Features and Improvements
π
tf.distributeintroduces experimental support for asynchronous training of Keras models via thetf.distribute.experimental.ParameterServerStrategyAPI. Please see below for additional details.π
MultiWorkerMirroredStrategyis now a stable API and is no longer considered experimental. Some of the major improvements involve handling peer failure and many bug fixes. Please check out the detailed tutorial on Multi-worker training with Keras.π Introduces experimental support for a new module named
tf.experimental.numpywhich is a NumPy-compatible API for writing TF programs. See the detailed guide to learn more. Additional details below.β Adds Support for TensorFloat-32 on Ampere based GPUs. TensorFloat-32, or TF32 for short, is a math mode for NVIDIA Ampere based GPUs and is enabled by default.
π A major refactoring of the internals of the Keras Functional API has been completed, that should improve the reliability, stability, and performance of constructing Functional models.
Keras mixed precision API
tf.keras.mixed_precisionis no longer experimental and allows the use of 16-bit floating point formats during training, improving performance by up to 3x on GPUs and 60% on TPUs.π· TF Profiler now supports profiling multiple workers using the sampling mode API.
TFLite Profiler for Android is available. See the detailed guide to learn more.
π¦ TensorFlow pip packages are now built with CUDA11 and cuDNN 8.0.2.
π₯ Breaking Changes
TF Core:
- The byte layout for string tensors across the C-API has been updated to match TF Core/C++; i.e., a contiguous array of
tensorflow::tstring/TF_TStrings. - C-API functions
TF_StringDecode,TF_StringEncode, andTF_StringEncodedSizeare no longer relevant and have been removed; seecore/platform/ctstring.hfor string access/modification in C. tensorflow.python,tensorflow.coreandtensorflow.compilermodules are now hidden. These modules are not part of TensorFlow public API.tf.raw_ops.Maxandtf.raw_ops.Minno longer accept inputs of typetf.complex64ortf.complex128, because the behavior of these ops is not well defined for complex types.- Certain float32 ops run in lower precsion on Ampere based GPUs, including matmuls and convolutions, due to the use of TensorFloat-32. Specifically, inputs to such ops are rounded from 23 bits of precision to 10
bits of precision. This is unlikely to cause issues in practice for deep learning models. In some cases, TensorFloat-32 is also used for complex64 ops.
TensorFloat-32 can be disabled by runningconfig.experimental.enable_tensor_float_32_execution(False).
- XLA:CPU and XLA:GPU devices are no longer registered by default. Use
TF_XLA_FLAGS=--tf_xla_enable_xla_devicesif you really need them, but this flag will eventually be removed in subsequent releases.tf.keras:- The
steps_per_executionargument incompile()is no longer experimental; if you were passingexperimental_steps_per_execution, rename it tosteps_per_executionin your code. This argument controls the number of batches to run during eachtf.functioncall when callingfit(). Running multiple batches inside a singletf.functioncall can greatly improve performance on TPUs or small models with a large Python overhead. - A major refactoring of the internals of the Keras Functional API may affect code that
is relying on certain internal details:- Code that uses
isinstance(x, tf.Tensor)instead oftf.is_tensorwhen checking Keras symbolic inputs/outputs should switch to usingtf.is_tensor. - Code that is overly dependent on the exact names attached to symbolic tensors (e.g. assumes there will be ":0" at the end of the inputs, treats names as unique identifiers instead of using
tensor.ref(), etc.) - Code that uses
get_concrete_functionto trace Keras symbolic inputs directly should switch to building matchingtf.TensorSpecs directly and tracing theTensorSpecobjects. - Code that relies on the exact number and names of the op layers that TensorFlow operations were converted into may have changed.
- Code that uses
tf.map_fn/tf.cond/tf.while_loop/control flow as op layers and happens to work before TF 2.4. These will explicitly be unsupported now. Converting these ops to Functional API op layers was unreliable before TF 2.4, and prone to erroring incomprehensibly or being silently buggy. - Code that directly asserts on a Keras symbolic value in cases where ops like
tf.rankused to return a static or symbolic value depending on if the input had a fully static shape or not. Now these ops always return symbolic values. - Code already susceptible to leaking tensors outside of graphs becomes slightly more likely to do so now.
- Code that tries directly getting gradients with respect to symbolic Keras inputs/outputs. Use GradientTape on the actual Tensors passed to the already- constructed model instead.
- Code that requires very tricky shape manipulation via converted op layers in order to work, where the Keras symbolic shape inference proves insufficient.
- Code that tries manually walking a
tf.keras.Modellayer by layer and assumes layers only ever have one positional argument. This assumption doesn't hold true before TF 2.4 either, but is more likely to cause issues now. - Code that manually enters
keras.backend.get_graph()before building a functional model is no longer needed. - Start enforcing input shape assumptions when calling Functional API Keras models. This may potentially break some users, in case there is a mismatch between the shape used when creating
Inputobjects in a Functional model, and the shape of the data passed to that model. You can fix this mismatch by either calling the model with correctly-shaped data, or by relaxingInputshape assumptions (note that you can pass shapes withNoneentries for axes
that are meant to be dynamic). You can also disable the input checking entirely by settingmodel.input_spec = None.
- Code that uses
tf.data:tf.data.experimental.service.DispatchServernow takes a config tuple instead of individual arguments. Usages should be updated totf.data.experimental.service.DispatchServer(dispatcher_config).
-
tf.data.experimental.service.WorkerServernow takes a config tuple instead of individual arguments. Usages should be updated totf.data.experimental.service.WorkerServer(worker_config).tf.distribute:- Removes
tf.distribute.Strategy.experimental_make_numpy_dataset. Please usetf.data.Dataset.from_tensor_slicesinstead. - Renames
experimental_hintsintf.distribute.StrategyExtended.reduce_to,tf.distribute.StrategyExtended.batch_reduce_to,tf.distribute.ReplicaContext.all_reducetooptions: - Renames
tf.distribute.experimental.CollectiveHintstotf.distribute.experimental.CommunicationOptions. - Renames
tf.distribute.experimental.CollectiveCommunicationtotf.distribute.experimental.CommunicationImplementation. - Renames
tf.distribute.Strategy.experimental_distribute_datasets_from_functiontodistribute_datasets_from_functionas it is no longer experimental.
- Removes
tf.distribute.Strategy.experimental_run_v2method, which was deprecated in TF 2.2.tf.lite:tf.quantization.quantize_and_dequantize_v2has been introduced, which updates the gradient definition for quantization which is outside the range
to be 0. To simulate the V1 the behavior oftf.quantization.quantize_and_dequantize(...)usetf.grad_pass_through(tf.quantization.quantize_and_dequantize_v2)(...).
π Bug Fixes and Other Changes
TF Core:
- π Introduces experimental support for a new module named
tf.experimental.numpy, which
is a NumPy-compatible API for writing TF programs. This module provides classndarray, which mimics thendarrayclass in NumPy, and wraps an immutabletf.Tensorunder the hood. A subset of NumPy functions (e.g.numpy.add) are provided. Their inter-operation with TF facilities is seamless in most cases.
See tensorflow/python/ops/numpy_ops/README.md
π for details of what operations are supported and what are the differences from NumPy. tf.types.experimental.TensorLikeis a newUniontype that can be used as type annotation for variables representing a Tensor or a value
that can be converted to Tensor bytf.convert_to_tensor.- Calling ops with a python constants or numpy values is now consistent with tf.convert_to_tensor behavior. This avoids operations like
tf.reshape truncating inputs such as from int64 to int32. - β Adds
tf.sparse.map_valuesto apply a function to the.values ofSparseTensorarguments. - The Python bitwise operators for
Tensor(__and__,__or__,__xor__and__invert__now support non-boolarguments and apply
π the corresponding bitwise ops.boolarguments continue to be supported and dispatch to logical ops. This brings them more in line with
Python and NumPy behavior. - β Adds
tf.SparseTensor.with_values. This returns a new SparseTensor with the same sparsity pattern, but with new provided values. It is
similar to thewith_valuesfunction ofRaggedTensor. - β Adds
StatelessCaseop, and uses it if none of case branches has stateful ops. - Adds
tf.config.experimental.get_memory_usageto return total memory usage of the device. - β Adds gradients for
RaggedTensorToVariantandRaggedTensorFromVariant. - π Improve shape inference of nested function calls by supporting constant folding across Arg nodes which makes more static values available to shape inference functions.
tf.debugging:tf.debugging.assert_shapes()now works onSparseTensors (Fixes #36268).
- GPU
- Adds Support for TensorFloat-32 on Ampere based GPUs.
TensorFloat-32, or TF32 for short, is a math mode for NVIDIA Ampere based GPUs which causes certain float32 ops, such as matrix
multiplications and convolutions, to run much faster on Ampere GPUs but with reduced precision. This reduced precision has not been found
to effect convergence quality of deep learning models in practice. TensorFloat-32 is enabled by default, but can be disabled withtf.config.experimental.enable_tensor_float_32_execution.
- Adds Support for TensorFloat-32 on Ampere based GPUs.
tf.math:- Adds
tf.math.erfcinv, the inverse totf.math.erfc.
- Adds
tf.nn:tf.nn.max_pool2dnow supports explicit padding.
tf.image:- Adds deterministic
tf.image.stateless_random_*functions for eachtf.image.random_*function. Added a new opstateless_sample_distorted_bounding_boxwhich is a deterministic version ofsample_distorted_bounding_boxop. Given the same seed, these stateless functions/ops produce the same results independent of how many times the function is called, and independent of global seed settings.
- Adds deterministic
- π¨
tf.print:- Bug fix in
tf.print()withOrderedDictwhere if anOrderedDictdidn't have the keys sorted, the keys and values were not being printed
in accordance with their correct mapping.
- Bug fix in
tf.train.Checkpoint:- Now accepts a
rootargument in the initialization, which generates a checkpoint with a root object. This allows users to create aCheckpointobject that is compatible with Kerasmodel.save_weights()andmodel.load_weights. The checkpoint is also compatible with the checkpoint saved in thevariables/folder in the SavedModel. - When restoring,
save_pathcan be a path to a SavedModel. The function will automatically find the checkpoint in the SavedModel.
- Now accepts a
tf.data:- Adds new
tf.data.experimental.service.register_datasetandtf.data.experimental.service.from_dataset_idAPIs to enable one
π¨ process to register a dataset with the tf.data service, and another process to consume data from the dataset. - β Adds support for dispatcher fault tolerance. To enable fault tolerance, configure a
work_dirwhen running your dispatcher server and set
dispatcher_fault_tolerance=True. The dispatcher will store its state towork_dir, so that on restart it can continue from its previous
state after restart. - β Adds support for sharing dataset graphs via shared filesystem instead of over RPC. This reduces load on the dispatcher, improving performance
π· of distributing datasets. For this to work, the dispatcher'swork_dirmust be accessible from workers. If the worker fails to read from the
work_dir, it falls back to using RPC for dataset graph transfer. - β Adds support for a new "distributed_epoch" processing mode. This processing mode distributes a dataset across all tf.data workers,
π instead of having each worker process the full dataset. See the tf.data service docs to learn more. - Adds optional
exclude_colsparameter to CsvDataset. This parameter is the complement ofselect_cols; at most one of these should be specified. - We have implemented an optimization which reorders data-discarding transformations such as
takeandshardto happen earlier in the dataset when it is safe to do so. The optimization can be disabled via theexperimental_optimization.reorder_data_discarding_opsdataset option. tf.data.Optionswere previously immutable and can now be overridden.- π
tf.data.Dataset.from_generatornow supports Ragged and Sparse tensors with a newoutput_signatureargument, which allowsfrom_generatorto
produce any type describable by atf.TypeSpec. tf.data.experimental.AUTOTUNEis now available in the core API astf.data.AUTOTUNE.
tf.distribute:- π Introduces experimental support for asynchronous training of Keras models via
tf.distribute.experimental.ParameterServerStrategy:- Replaces the existing
tf.distribute.experimental.ParameterServerStrategysymbol with a new class that is for parameter server training in TF2. Usage of
the old symbol, usually with Estimator API, should be replaced with [tf.compat.v1.distribute.experimental.ParameterServerStrategy]. - Added
tf.distribute.experimental.coordinator.*namespace, including the main APIClusterCoordinatorfor coordinating the training cluster, the related data structureRemoteValueandPerWorkerValue.
- Replaces the existing
- β Adds
tf.distribute.Strategy.gatherandtf.distribute.ReplicaContext.all_gatherAPIs to support gathering dense distributed values. - π Fixes various issues with saving a distributed model.
tf.keras:- π Improvements from the Functional API refactoring:
- Functional model construction does not need to maintain a global workspace graph, removing memory leaks especially when building many
models or very large models. - Functional model construction should be ~8-10% faster on average.
- Functional models can now contain non-symbolic values in their call inputs inside of the first positional argument.
- Several classes of TF ops that were not reliably converted to Keras layers during functional API construction should now work, e.g.
tf.image.ssim_multiscale - Error messages when Functional API construction goes wrong (and when ops cannot be converted to Keras layers automatically) should be
clearer and easier to understand.
- Functional model construction does not need to maintain a global workspace graph, removing memory leaks especially when building many
- β‘οΈ
Optimizer.minimizecan now accept a lossTensorand aGradientTapeas an alternative to accepting acallableloss. - β Adds
betahyperparameter to FTRL optimizer classes (Keras and others) to match FTRL paper. Optimizer. __init__now accepts agradient_aggregatorto allow for customization of how gradients are aggregated across devices, as well as
gradients_transformersto allow for custom gradient transformations (such as gradient clipping).- π Improvements to Keras preprocessing layers:
- TextVectorization can now accept a vocabulary list or file as an init arg.
- Normalization can now accept mean and variance values as init args.
- In
AttentionandAdditiveAttentionlayers, thecall()method now accepts areturn_attention_scoresargument. When set to
True, the layer returns the attention scores as an additional output argument. - β Adds
tf.metrics.log_coshandtf.metrics.logcoshAPI entrypoints with the same implementation as theirtf.lossesequivalent. - For Keras model, the individual call of
Model.evaluateuses no cached data for evaluation, whileModel.fituses cached data when
πvalidation_dataarg is provided for better performance. - Adds a
save_tracesargument tomodel.save/tf.keras.models.save_modelwhich determines whether the SavedModel format stores the Keras model/layer call functions. The traced functions allow Keras to revive custom models and layers without the original class definition, but if this isn't required the tracing can be disabled with the added option.
tf.lite:TFLiteConverter:- Support optional flags
inference_input_typeandinference_output_typefor full integer quantized models. This allows users to modify the model input and output type to integer types (tf.int8,tf.uint8) instead of defaulting to float type (tf.float32).
- Support optional flags
- NNAPI
- Adds NNAPI Delegation support for requantization use cases by converting the operation into a dequantize-quantize pair.
- Removes deprecated
Interpreter.setUseNNAPI(boolean)Java API. UseInterpreter.Options.setUseNNAPIinstead. - Deprecates
Interpreter::UseNNAPI(bool)C++ API. UseNnApiDelegate()and related delegate configuration methods directly. - Deprecates
Interpreter::SetAllowFp16PrecisionForFp32(bool)C++ API. Prefer controlling this via delegate options, e.g.tflite::StatefulNnApiDelegate::Options::allow_fp16' orTfLiteGpuDelegateOptionsV2::is_precision_loss_allowed`.
- GPU
- GPU acceleration now supports quantized models by default
DynamicBuffer::AddJoinedString()will now add a separator if the first string to be joined is empty.- β Adds support for cumulative sum (cumsum), both as builtin op and MLIR conversion.
TensorRT- Issues a warning when the
session_configparameter for the TF1 converter is used or therewrite_config_templatefield in the TF2
converter parameter object is used.
TPU Enhancements:
- β Adds support for the
betaparameter of the FTRL optimizer for TPU embeddings. Users of other TensorFlow platforms can implement equivalent
behavior by adjusting thel2parameter.
π XLA Support:
- π xla.experimental.compile is deprecated, use
tf.function(experimental_compile=True)instead. - Adds
tf.function.experimental_get_compiler_irwhich returns compiler IR (currently 'hlo' and 'optimized_hlo') for given input for given function.
π Security:
- π Fixes an undefined behavior causing a segfault in
tf.raw_ops.Switch, (CVE-2020-15190) - π Fixes three vulnerabilities in conversion to DLPack format
- π Fixes two vulnerabilities in
SparseFillEmptyRowsGrad - π Fixes several vulnerabilities in
RaggedCountSparseOutputandSparseCountSparseOutputoperations - π Fixes an integer truncation vulnerability in code using the work sharder API, (CVE-2020-15202)
- π Fixes a format string vulnerability in
tf.strings.as_string, (CVE-2020-15203) - π Fixes segfault raised by calling session-only ops in eager mode, (CVE-2020-15204)
- π Fixes data leak and potential ASLR violation from
tf.raw_ops.StringNGrams, (CVE-2020-15205) - π Fixes segfaults caused by incomplete
SavedModelvalidation, (CVE-2020-15206) - π Fixes a data corruption due to a bug in negative indexing support in TFLite, (CVE-2020-15207)
- π Fixes a data corruption due to dimension mismatch in TFLite, (CVE-2020-15208)
- π Fixes several vulnerabilities in TFLite saved model format
- π Fixes several vulnerabilities in TFLite implementation of segment sum
- Fixes a segfault in
tf.quantization.quantize_and_dequantize, (CVE-2020-15265) - π Fixes an undefined behavior float cast causing a crash, (CVE-2020-15266)
Other:
- π We have replaced uses of "whitelist" and "blacklist" with "allowlist" and "denylist" where possible. Please see this list for more context.
- Adds
tf.config.experimental.mlir_bridge_rolloutwhich will help us rollout the new MLIR TPU bridge. - Adds
tf.experimental.register_filesystem_pluginto load modular filesystem plugins from Python
Thanks to our Contributors
π This release contains contributions from many people at Google and external contributors.
8bitmp3, aaa.jq, Abhineet Choudhary, Abolfazl Shahbazi, acxz, Adam Hillier, Adrian Garcia Badaracco, Ag Ramesh, ahmedsabie, Alan Anderson, Alexander Grund, Alexandre Lissy, Alexey Ivanov, Amedeo Cavallo, anencore94, Aniket Kumar Singh, Anthony Platanios, Ashwin Phadke, Balint Cristian, Basit Ayantunde, bbbboom, Ben Barsdell, Benjamin Chetioui, Benjamin Peterson, bhack, Bhanu Prakash Bandaru Venkata, Biagio Montaruli, Brent M. Spell, bubblebooy, bzhao, cfRod, Cheng Chen, Cheng(Kit) Chen, Chris Tessum, Christian, chuanqiw, codeadmin_peritiae, COTASPAR, CuiYifeng, danielknobe, danielyou0230, dannyfriar, daria, DarrenZhang01, Denisa Roberts, dependabot[bot], Deven Desai, Dmitry Volodin, Dmitry Zakharov, drebain, Duncan Riach, Eduard Feicho, Ehsan Toosi, Elena Zhelezina, emlaprise2358, Eugene Kuznetsov, Evaderan-Lab, Evgeniy Polyakov, Fausto Morales, Felix Johnny, fo40225, Frederic Bastien, Fredrik Knutsson, fsx950223, Gaurav Singh, Gauri1 Deshpande, George Grzegorz Pawelczak, gerbauz, Gianluca Baratti, Giorgio Arena, Gmc2, Guozhong Zhuang, Hannes Achleitner, Harirai, HarisWang, Harsh188, hedgehog91, Hemal Mamtora, Hideto Ueno, Hugh Ku, Ian Beauregard, Ilya Persky, jacco, Jakub BerΓ‘nek, Jan Jongboom, Javier Montalt Tordera, Jens Elofsson, Jerry Shih, jerryyin, jgehw, Jinjing Zhou, jma, jmsmdy, Johan NordstrΓΆm, John Poole, Jonah Kohn, Jonathan Dekhtiar, jpodivin, Jung Daun, Kai Katsumata, Kaixi Hou, Kamil Rakoczy, Kaustubh Maske Patil, Kazuaki Ishizaki, Kedar Sovani, Koan-Sin Tan, Koki Ibukuro, Krzysztof Laskowski, Kushagra Sharma, Kushan Ahmadian, Lakshay Tokas, Leicong Li, levinxo, Lukas Geiger, Maderator, Mahmoud Abuzaina, Mao Yunfei, Marius Brehler, markf, Martin Hwasser, Martin KubovΔΓk, Matt Conley, Matthias, mazharul, mdfaijul, Michael137, MichelBr, Mikhail Startsev, Milan Straka, Ml-0, Myung-Hyun Kim, MΓ₯ns Nilsson, Nathan Luehr, ngc92, nikochiko, Niranjan Hasabnis, nyagato_00, Oceania2018, Oleg Guba, Ongun Kanat, OscarVanL, Patrik Laurell, Paul Tanger, Peter Sobot, Phil Pearl, PlusPlusUltra, Poedator, Prasad Nikam, Rahul-Kamat, Rajeshwar Reddy T, redwrasse, Rickard, Robert Szczepanski, Rohan Lekhwani, Sam Holt, Sami Kama, Samuel Holt, Sandeep Giri, sboshin, Sean Settle, settle, Sharada Shiddibhavi, Shawn Presser, ShengYang1, Shi,Guangyong, Shuxiang Gao, Sicong Li, Sidong-Wei, Srihari Humbarwadi, Srinivasan Narayanamoorthy, Steenu Johnson, Steven Clarkson, stjohnso98, Tamas Bela Feher, Tamas Nyiri, Tarandeep Singh, Teng Lu, Thibaut Goetghebuer-Planchon, Tim Bradley, Tomasz Strejczek, Tongzhou Wang, Torsten Rudolf, Trent Lo, Ty Mick, Tzu-Wei Sung, Varghese, Jojimon, Vignesh Kothapalli, Vishakha Agrawal, Vividha, Vladimir Menshakov, Vladimir Silyaev, VoVAllen, VΓ΅ VΔn NghΔ©a, wondertx, xiaohong1031, Xiaoming (Jason) Cui, Xinan Jiang, Yair Ehrenwald, Yasir Modak, Yasuhiro Matsumoto, Yimei Sun, Yiwen Li, Yixing, Yoav Ramon, Yong Tang, Yong Wu, yuanbopeng, Yunmo Koo, Zhangqiang, Zhou Peng, ZhuBaohe, zilinzhu, zmx
- The byte layout for string tensors across the C-API has been updated to match TF Core/C++; i.e., a contiguous array of
-
v2.3.1 Changes
September 24, 2020π Release 2.3.1
π Bug Fixes and Other Changes
- π Fixes an undefined behavior causing a segfault in
tf.raw_ops.Switch(CVE-2020-15190) - π Fixes three vulnerabilities in conversion to DLPack format (CVE-2020-15191, CVE-2020-15192, CVE-2020-15193)
- π Fixes two vulnerabilities in
SparseFillEmptyRowsGrad(CVE-2020-15194, CVE-2020-15195) - π Fixes several vulnerabilities in
RaggedCountSparseOutputandSparseCountSparseOutputoperations (CVE-2020-15196, CVE-2020-15197, CVE-2020-15198, CVE-2020-15199, CVE-2020-15200, CVE-2020-15201) - π Fixes an integer truncation vulnerability in code using the work sharder API (CVE-2020-15202)
- π Fixes a format string vulnerability in
tf.strings.as_string(CVE-2020-15203) - π Fixes segfault raised by calling session-only ops in eager mode (CVE-2020-15204)
- π Fixes data leak and potential ASLR violation from
tf.raw_ops.StringNGrams(CVE-2020-15205) - π Fixes segfaults caused by incomplete
SavedModelvalidation (CVE-2020-15206) - π Fixes a data corruption due to a bug in negative indexing support in TFLite (CVE-2020-15207)
- π Fixes a data corruption due to dimension mismatch in TFLite (CVE-2020-15208)
- π Fixes several vulnerabilities in TFLite saved model format (CVE-2020-15209, CVE-2020-15210, CVE-2020-15211)
- π Fixes several vulnerabilities in TFLite implementation of segment sum (CVE-2020-15212, CVE-2020-15213, CVE-2020-15214)
- β‘οΈ Updates
sqlite3to3.33.00to handle CVE-2020-15358. - π Fixes deprecated usage of
collectionsAPI - β Removes
scipydependency fromsetup.pysince TensorFlow does not need it to install the pip package
- π Fixes an undefined behavior causing a segfault in
-
v2.3.0 Changes
July 27, 2020π Release 2.3.0
Major Features and Improvements
tf.dataadds two new mechanisms to solve input pipeline bottlenecks and save resources:
π In addition checkout the detailed guide for analyzing input pipeline performance with TF Profiler.
tf.distribute.TPUStrategyis now a stable API and no longer considered experimental for TensorFlow. (earliertf.distribute.experimental.TPUStrategy).π TF Profiler introduces two new tools: a memory profiler to visualize your modelβs memory usage over time and a python tracer which allows you to trace python function calls in your model. Usability improvements include better diagnostic messages and profile options to customize the host and device trace verbosity level.
Introduces experimental support for Keras Preprocessing Layers API (
tf.keras.layers.experimental.preprocessing.*) to handle data preprocessing operations, with support for composite tensor inputs. Please see below for additional details on these layers.π TFLite now properly supports dynamic shapes during conversion and inference. Weβve also added opt-in support on Android and iOS for XNNPACK, a highly optimized set of CPU kernels, as well as opt-in support for executing quantized models on the GPU.
π Libtensorflow packages are available in GCS starting this release. We have also started to release a nightly version of these packages.
The experimental Python API
tf.debugging.experimental.enable_dump_debug_info()now allows you to instrument a TensorFlow program and dump debugging information to a directory on the file system. The directory can be read and visualized by a new interactive dashboard in TensorBoard 2.3 called Debugger V2, which reveals the details of the TensorFlow program including graph structures, history of op executions at the Python (eager) and intra-graph levels, the runtime dtype, shape, and numerical composistion of tensors, as well as their code locations.π₯ Breaking Changes
- Increases the minimum bazel version required to build TF to 3.1.0.
tf.data- Makes the following (breaking) changes to the
tf.data. - C++ API: -
IteratorBase::RestoreInternal,IteratorBase::SaveInternal, andDatasetBase::CheckExternalStatebecome pure-virtual and subclasses are now expected to provide an implementation. - The deprecated
DatasetBase::IsStatefulmethod is removed in favor ofDatasetBase::CheckExternalState. - Deprecated overrides of
DatasetBase::MakeIteratorandMakeIteratorFromInputElementare removed. - The signature of
tensorflow::data::IteratorBase::SaveInternalandtensorflow::data::IteratorBase::SaveInputhas been extended withSerializationContextargument to enable overriding the default policy for the handling external state during iterator checkpointing. This is not a backwards compatible change and all subclasses ofIteratorBaseneed to be updated accordingly.
- Makes the following (breaking) changes to the
tf.keras- Add a new
BackupAndRestorecallback for handling distributed training failures & restarts. Please take a look at this tutorial for details on how to use the callback.
- Add a new
- β‘οΈ
tf.image.extract_glimpsehas been updated to correctly process the case
wherecentered=Falseandnormalized=False. This is a breaking change as
the output is different from (incorrect) previous versions. Note this
π₯ breaking change only impactstf.image.extract_glimpseand
tf.compat.v2.image.extract_glimpseAPI endpoints. The behavior of
tf.compat.v1.image.extract_glimpsedoes not change. The behavior of
exsiting C++ kernelExtractGlimpsedoes not change either, so saved
models usingtf.raw_ops.ExtractGlimpsewill not be impacted.
Known Caveats
tf.lite- Keras-based LSTM models must be converted with an explicit batch size in the input layer.
π Bug Fixes and Other Changes
TF Core:
- Set
tf2_behaviorto 1 to enable V2 for early loading cases. - Add
execute_fn_for_device functionto dynamically choose the implementation based on underlying device placement. - Eager:
- Add
reduce_logsumexpbenchmark with experiment compile. - Give
EagerTensors a meaningful__array__implementation. - Add another version of defun matmul for performance analysis.
- Add
tf.function/AutoGraph:AutoGraphnow includes into TensorFlow loops any variables that are closed over by local functions. Previously, such variables were sometimes incorrectly ignored.- functions returned by the
get_concrete_functionmethod oftf.functionobjects can now be called with arguments consistent with the original arguments or type specs passed toget_concrete_function. This calling convention is now the preferred way to use concrete functions with nested values and composite tensors. Please check the guide for more details onconcrete_ function. - Update
tf.function'sexperimental_relax_shapesto handle composite tensors appropriately. - Optimize
tf.functioninvocation, by removing redundant list converter. tf.functionwill retrace when called with a different variable instead of simply using thedtype&shape.- Improve support for dynamically-sized TensorArray inside
tf.function.
tf.math:- Narrow down
argmin/argmaxcontract to always return the smallest index for ties. tf.math.reduce_varianceandtf.math.reduce_stdreturn correct computation for complex types and no longer support integer types.- Add Bessel functions of order 0,1 to
tf.math.special. tf.dividenow always returns a tensor to be consistent with documentation and other APIs.
- Narrow down
tf.image:- Replaced
tf.image.non_max_suppression_paddedwith a new implementation that supports batched inputs, which is considerably faster on TPUs and GPUs. Boxes with area=0 will be ignored. Existing usage with single inputs should still work as before.
- Replaced
tf.linalg- Add
tf.linalg.banded_triangular_solve.
- Add
tf.random:- Add
tf.random.stateless_parameterized_truncated_normal.
- Add
tf.ragged:- Add
tf.ragged.crossandtf.ragged.cross_hashedoperations.
- Add
tf.RaggedTensor:RaggedTensor.to_tensor()now preserves static shape.- Add
tf.strings.format()andtf.print()to support RaggedTensors.
tf.saved_model:@tf.functionfrom SavedModel no longer ignores args after aRaggedTensorwhen selecting the concrete function to run.- Fix save model issue for ops with a list of functions.
- Add
tf.saved_model.LoadOptionswithexperimental_io_deviceas arg with default valueNoneto choose the I/O device for loading models and weights. - Update
tf.saved_model.SaveOptionswithexperimental_io_deviceas arg with default valueNoneto choose the I/O device for saving models and weights. - Mutable tables now restore checkpointed values when loaded from SavedModel.
- GPU
- TF 2.3 includes PTX kernels only for compute capability 7.0 to reduce the TF pip binary size. Earlier releases included PTX for a variety of older compute capabilities.
- Others
- Retain parent namescope for ops added inside
tf.while_loop/tf.cond/tf.switch_case. - Update
tf.vectorized_mapto support vectorizingtf.while_loopand TensorList operations. tf.custom_gradientcan now be applied to functions that accept nested structures oftensorsas inputs (instead of just a list of tensors). Note that Python structures such as tuples and lists now won't be treated as tensors, so if you still want them to be treated that way, you need to wrap them withtf.convert_to_tensor.- No lowering on gradient case op when input is
DeviceIndexop. - Extend the ragged version of
tf.gatherto supportbatch_dimsandaxisargs. - Update
tf.map_fnto support RaggedTensors and SparseTensors. - Deprecate
tf.group. It is not useful in eager mode. - Add CPU and GPU implementation of modified variation of
FTRL/FTRLV2that can triggerred bymultiply_linear_by_lrallowing a learning rate of zero.
- Retain parent namescope for ops added inside
tf.data:tf.data.experimental.dense_to_ragged_batchworks correctly with tuples.tf.data.experimental.dense_to_ragged_batchto output variable ragged rank.tf.data.experimental.cardinalityis now a method ontf.data.Dataset.- π
tf.data.Datasetnow supportslen(Dataset)when the cardinality is finite.
tf.distribute:- π Expose experimental
tf.distribute.DistributedDatasetandtf.distribute.DistributedIteratorto distribute input data when usingtf.distributeto scale training on multiple devices.- Added a
get_next_as_optionalmethod fortf.distribute.DistributedIteratorclass to return atf.experimental.Optionalinstance that contains the next value for all replicas or none instead of raising an out of range error. Also see new guide on input distribution.
- Added a
- π Allow var.assign on MirroredVariables with aggregation=NONE in replica context. Previously this would raise an error. We now allow this because many users and library writers find using
.assignin replica context to be more convenient, instead of having to useStrategy.extended.updatewhich was the previous way of updating variables in this situation. - π·
tf.distribute.experimental.MultiWorkerMirroredStrategyadds support for partial batches. Workers running out of data now continue to participate in the training with empty inputs, instead of raising an error. Learn more about partial batches here. - π Improve the performance of reading metrics eagerly under
tf.distribute.experimental.MultiWorkerMirroredStrategy. - π Fix the issue that
strategy.reduce()insidetf.functionmay raise exceptions when the values to reduce are from loops or if-clauses. - π Fix the issue that
tf.distribute.MirroredStrategycannot be used together withtf.distribute.experimental.MultiWorkerMirroredStrategy. - β Add a
tf.distribute.cluster_resolver.TPUClusterResolver.connectAPI to simplify TPU initialization.
tf.keras:- π Introduces experimental preprocessing layers API (
tf.keras.layers.experimental.preprocessing) to handle data preprocessing operations such as categorical feature encoding, text vectorization, data normalization, and data discretization (binning). The newly added layers provide a replacement for the legacy feature column API, and support composite tensor inputs. - Added categorical data processing layers:
IntegerLookup&StringLookup: build an index of categorical feature valuesCategoryEncoding: turn integer-encoded categories into one-hot, multi-hot, or tf-idf encoded representationsCategoryCrossing: create new categorical features representing co-occurrences of previous categorical feature valuesHashing: the hashing trick, for large-vocabulary categorical featuresDiscretization: turn continuous numerical features into categorical features by binning their values
- Improved image preprocessing layers:
CenterCrop,Rescaling - Improved image augmentation layers:
RandomCrop,RandomFlip,RandomTranslation,RandomRotation,RandomHeight,RandomWidth,RandomZoom,RandomContrast - Improved
TextVectorizationlayer, which handles string tokenization, n-gram generation, and token encoding- The
TextVectorizationlayer now accounts for the mask_token as part of the vocabulary size when output_mode='int'. This means that, if you have a max_tokens value of 5000, your output will have 5000 unique values (not 5001 as before). - Change the return value of
TextVectorization.get_vocabulary()frombytetostring. Users who previously were calling 'decode' on the output of this method should no longer need to do so.
- The
- Introduce new Keras dataset generation utilities :
image_dataset_from_directoryis a utility based ontf.data.Dataset, meant to replace the legacyImageDataGenerator. It takes you from a structured directory of images to a labeled dataset, in one function call. Note that it doesn't perform image data augmentation (which is meant to be done using preprocessing layers).text_dataset_from_directorytakes you from a structured directory of text files to a labeled dataset, in one function call.timeseries_dataset_from_arrayis atf.data.Dataset-based replacement of the legacyTimeseriesGenerator. It takes you from an array of timeseries data to a dataset of shifting windows with their targets.
- Added
experimental_steps_per_execution
arg tomodel.compileto indicate the number of batches to run pertf.functioncall. This can speed up Keras Models on TPUs up to 3x. - π Extends
tf.keras.layers.Lambdalayers to support multi-argument lambdas, and keyword arguments when calling the layer. - Functional models now get constructed if any tensor in a layer call's arguments/keyword arguments comes from a keras input. Previously the functional api would only work if all of the elements in the first argument to the layer came from a keras input.
- Clean up
BatchNormalizationlayer'strainableproperty to act like standard python state when it's used insidetf.functions(frozen at tracing time), instead of acting like a pseudo-variable whose updates kind of sometimes get reflected in already-tracedtf.functiontraces. - β Add the
Conv1DTransposelayer. - π Refine the semantics of
SensitivitySpecificityBasederived metrics. See the updated API docstrings fortf.keras.metrics.SensitivityAtSpecificityandtf.keras.metrics.SpecificityAtSensitivty.
tf.lite:- Converter
- Restored
inference_input_typeandinference_output_typeflags in TF 2.x TFLiteConverter (backward compatible with TF 1.x) to support integer (tf.int8, tf.uint8) input and output types in post training full integer quantized models. - Added support for converting and resizing models with dynamic (placeholder) dimensions. Previously, there was only limited support for dynamic batch size, and even that did not guarantee that the model could be properly resized at runtime.
- Enabled experimental support for a new quantization mode with 16-bit activations and 8-bit weights. See
lite.OpsSet.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8.
- Restored
- CPU
- Fix an issue w/ dynamic weights and
Conv2Don x86. - Add a runtime Android flag for enabling
XNNPACKfor optimized CPU performance. - Add a runtime iOS flag for enabling
XNNPACKfor optimized CPU performance. - Add a compiler flag to enable building a TFLite library that applies
XNNPACKdelegate automatically when the model has afp32operation.
- Fix an issue w/ dynamic weights and
- GPU
- Allow GPU acceleration starting with internal graph nodes
- Experimental support for quantized models with the Android GPU delegate
- Add GPU delegate whitelist.
- Rename GPU whitelist -> compatibility (list).
- Improve GPU compatibility list entries from crash reports.
- NNAPI
- Set default value for
StatefulNnApiDelegate::Options::max_number_delegated_partitionsto 3. - Add capability to disable
NNAPICPU and checkNNAPIErrno. - Fix crashes when using
NNAPIwith target accelerator specified with model containing Conv2d or FullyConnected or LSTM nodes with quantized weights. - Fix
ANEURALNETWORKS_BAD_DATAexecution failures withsum/max/min/reduceoperations withscalarinputs.
- Set default value for
- Hexagon
- TFLite Hexagon Delegate out of experimental.
- Experimental
int8support for most hexagon ops. - Experimental per-channel quant support for
convin Hexagon delegate. - Support dynamic batch size in C++ API.
- CoreML
- Opensource CoreML delegate
- Misc
- Enable building Android TFLite targets on Windows
- Add support for
BatchMatMul. - Add support for
half_pixel_centerswithResizeNearestNeighbor. - Add 3D support for
BatchToSpaceND. - Add 5D support for
BroadcastSub,Maximum,Minimum,TransposeandBroadcastDiv. - Rename
kTfLiteActRelu1tokTfLiteActReluN1To1. - Enable flex delegate on tensorflow.lite.Interpreter Python package.
- Add
Buckettize,SparseCrossandBoostedTreesBucketizeto the flex whitelist. - Add support for selective registration of flex ops.
- Add missing kernels for flex delegate whitelisted ops.
- Fix issue when using direct
ByteBufferinputs with graphs that have dynamic shapes. - Fix error checking supported operations in a model containing
HardSwish.
π Packaging Support
- π Added
tf.sysconfig.get_build_info(). Returns a dict that describes the build environment of the currently installed TensorFlow package, e.g. the NVIDIA CUDA and NVIDIA CuDNN versions used when TensorFlow was built.
Profiler
- π Fix a subtle use-after-free issue in
XStatVisitor::RefValue().
TPU Enhancements
- β Adds 3D mesh support in TPU configurations ops.
- Added TPU code for
FTRLwithmultiply_linear_by_lr. - Silently adds a new file system registry at
gstpu. - π Support
restartTypein cloud tpu client. - Depend on a specific version of google-api-python-client.
- π Fixes apiclient import.
Tracing and Debugging
- Add a
TFE_Py_Executetraceme.
π XLA Support
- Implement stable
argminandargmax
Thanks to our Contributors
π This release contains contributions from many people at Google, as well as:
π 902449@58880@bigcat_chen@ASIC, Abdul Baseer Khan, Abhineet Choudhary, Abolfazl Shahbazi, Adam Hillier, ag.ramesh, Agoniii, Ajay P, Alex Hoffman, Alexander Bayandin, Alexander Grund, Alexandre Abadie, Alexey Rogachevskiy, amoitra, Andrew Stevens, Angus-Luo, Anshuman Tripathy, Anush Elangovan, Artem Mavrin, Ashutosh Hathidara, autoih, Ayushman Kumar, ayushmankumar7, Bairen Yi, Bas Aarts, Bastian Eichenberger, Ben Barsdell, bhack, Bharat Raghunathan, Biagio Montaruli, Bigcat-Himax, blueyi, Bryan Cutler, Byambaa, Carlos Hernandez-Vaquero, Chen Lei, Chris Knorowski, Christian Clauss, chuanqiw, CuiYifeng, Daniel Situnayake, Daria Zhuravleva, Dayananda-V, Deven Desai, Devi Sandeep Endluri, Dmitry Zakharov, Dominic Jack, Duncan Riach, Edgar Liberis, Ehsan Toosi, ekuznetsov139, Elena Zhelezina, Eugene Kuznetsov, Eugene Mikhantiev, Evgenii Zheltonozhskii, Fabio Di Domenico, Fausto Morales, Fei Sun, feihugis, Felix E. Klee, flyingcat, Frederic Bastien, Fredrik Knutsson, frreiss, fsx950223, ganler, Gaurav Singh, Georgios Pinitas, Gian Marco Iodice, Giorgio Arena, Giuseppe Rossini, Gregory Keith, Guozhong Zhuang, gurushantj, Hahn Anselm, Harald Husum, Harjyot Bagga, Hristo Vrigazov, Ilya Persky, Ir1d, Itamar Turner-Trauring, jacco, Jake Tae, Janosh Riebesell, Jason Zaman, jayanth, Jeff Daily, Jens Elofsson, Jinzhe Zeng, JLZ, Jonas Skog, Jonathan Dekhtiar, Josh Meyer, Joshua Chia, Judd, justkw, Kaixi Hou, Kam D Kasravi, Kamil Rakoczy, Karol Gugala, Kayou, Kazuaki Ishizaki, Keith Smiley, Khaled Besrour, Kilaru Yasaswi Sri Chandra Gandhi, Kim, Young Soo, Kristian Hartikainen, Kwabena W. Agyeman, Leslie-Fang, Leslie-Fang-Intel, Li, Guizi, Lukas Geiger, Lutz Roeder, M\U00E5Ns Nilsson, Mahmoud Abuzaina, Manish, Marcel Koester, Marcin Sielski, marload, Martin Jul, Matt Conley, mdfaijul, Meng, Peng, Meteorix, Michael KΓ€ufl, Michael137, Milan Straka, Mitchell Vitez, Ml-0, Mokke Meguru, Mshr-H, nammbash, Nathan Luehr, naumkin, Neeraj Bhadani, ngc92, Nick Morgan, nihui, Niranjan Hasabnis, Niranjan Yadla, Nishidha Panpaliya, Oceania2018, oclyke, Ouyang Jin, OverLordGoldDragon, Owen Lyke, Patrick Hemmer, Paul Andrey, Peng Sun, periannath, Phil Pearl, Prashant Dandriyal, Prashant Kumar, Rahul Huilgol, Rajan Singh, Rajeshwar Reddy T, rangjiaheng, Rishit Dagli, Rohan Reddy, rpalakkal, rposts, Ruan Kunliang, Rushabh Vasani, Ryohei Ikegami, Semun Lee, Seo-Inyoung, Sergey Mironov, Sharada Shiddibhavi, ShengYang1, Shraiysh Vaishay, Shunya Ueta, shwetaoj, Siyavash Najafzade, Srinivasan Narayanamoorthy, Stephan Uphoff, storypku, sunchenggen, sunway513, Sven-Hendrik Haase, Swapnil Parekh, Tamas Bela Feher, Teng Lu, tigertang, tomas, Tomohiro Ubukata, tongxuan.ltx, Tony Tonev, Tzu-Wei Huang, TΓ©o Bouvard, Uday Bondhugula, Vaibhav Jade, Vijay Tadikamalla, Vikram Dattu, Vincent Abriou, Vishnuvardhan Janapati, Vo Van Nghia, VoVAllen, Will Battel, William D. Irons, wyzhao, Xiaoming (Jason) Cui, Xiaoquan Kong, Xinan Jiang, xutianming, Yair Ehrenwald, Yasir Modak, Yasuhiro Matsumoto, Yixing Fu, Yong Tang, Yuan Tang, zhaozheng09, Zilin Zhu, zilinzhu, εΌ εΏθ±ͺ
-
v2.3.0-rc2 Changes
July 18, 2020π Release 2.3.0
Major Features and Improvements
tf.dataadds two new mechanisms to solve input pipeline bottlenecks and save resources:
π In addition checkout the detailed guide for analyzing input pipeline performance with TF Profiler.
tf.distribute.TPUStrategyis now a stable API and no longer considered experimental for TensorFlow. (earliertf.distribute.experimental.TPUStrategy).π TF Profiler introduces two new tools: a memory profiler to visualize your modelβs memory usage over time and a python tracer which allows you to trace python function calls in your model. Usability improvements include better diagnostic messages and profile options to customize the host and device trace verbosity level.
Introduces experimental support for Keras Preprocessing Layers API (
tf.keras.layers.experimental.preprocessing.*) to handle data preprocessing operations, with support for composite tensor inputs. Please see below for additional details on these layers.π TFLite now properly supports dynamic shapes during conversion and inference. Weβve also added opt-in support on Android and iOS for XNNPACK, a highly optimized set of CPU kernels, as well as opt-in support for executing quantized models on the GPU.
π Libtensorflow packages are available in GCS starting this release. We have also started to release a nightly version of these packages.
The experimental Python API
tf.debugging.experimental.enable_dump_debug_info()now allows you to instrument a TensorFlow program and dump debugging information to a directory on the file system. The directory can be read and visualized by a new interactive dashboard in TensorBoard 2.3 called Debugger V2, which reveals the details of the TensorFlow program including graph structures, history of op executions at the Python (eager) and intra-graph levels, the runtime dtype, shape, and numerical composistion of tensors, as well as their code locations.π₯ Breaking Changes
- Increases the minimum bazel version required to build TF to 3.1.0.
tf.data- Makes the following (breaking) changes to the
tf.data. - C++ API: -
IteratorBase::RestoreInternal,IteratorBase::SaveInternal, andDatasetBase::CheckExternalStatebecome pure-virtual and subclasses are now expected to provide an implementation. - The deprecated
DatasetBase::IsStatefulmethod is removed in favor ofDatasetBase::CheckExternalState. - Deprecated overrides of
DatasetBase::MakeIteratorandMakeIteratorFromInputElementare removed. - The signature of
tensorflow::data::IteratorBase::SaveInternalandtensorflow::data::IteratorBase::SaveInputhas been extended withSerializationContextargument to enable overriding the default policy for the handling external state during iterator checkpointing. This is not a backwards compatible change and all subclasses ofIteratorBaseneed to be updated accordingly.
- Makes the following (breaking) changes to the
tf.keras- Add a new
BackupAndRestorecallback for handling distributed training failures & restarts. Please take a look at this tutorial for details on how to use the callback.
- Add a new
- β‘οΈ
tf.image.extract_glimpsehas been updated to correctly process the case
wherecentered=Falseandnormalized=False. This is a breaking change as
the output is different from (incorrect) previous versions. Note this
π₯ breaking change only impactstf.image.extract_glimpseand
tf.compat.v2.image.extract_glimpseAPI endpoints. The behavior of
tf.compat.v1.image.extract_glimpsedoes not change. The behavior of
exsiting C++ kernelExtractGlimpsedoes not change either, so saved
models usingtf.raw_ops.ExtractGlimpsewill not be impacted.
π Bug Fixes and Other Changes
TF Core:
- Set
tf2_behaviorto 1 to enable V2 for early loading cases. - Add
execute_fn_for_device functionto dynamically choose the implementation based on underlying device placement. - Eager:
- Add
reduce_logsumexpbenchmark with experiment compile. - Give
EagerTensors a meaningful__array__implementation. - Add another version of defun matmul for performance analysis.
- Add
tf.function/AutoGraph:AutoGraphnow includes into TensorFlow loops any variables that are closed over by local functions. Previously, such variables were sometimes incorrectly ignored.- functions returned by the
get_concrete_functionmethod oftf.functionobjects can now be called with arguments consistent with the original arguments or type specs passed toget_concrete_function. This calling convention is now the preferred way to use concrete functions with nested values and composite tensors. Please check the guide for more details onconcrete_ function. - Update
tf.function'sexperimental_relax_shapesto handle composite tensors appropriately. - Optimize
tf.functioninvocation, by removing redundant list converter. tf.functionwill retrace when called with a different variable instead of simply using thedtype&shape.- Improve support for dynamically-sized TensorArray inside
tf.function.
tf.math:- Narrow down
argmin/argmaxcontract to always return the smallest index for ties. tf.math.reduce_varianceandtf.math.reduce_stdreturn correct computation for complex types and no longer support integer types.- Add Bessel functions of order 0,1 to
tf.math.special. tf.dividenow always returns a tensor to be consistent with documentation and other APIs.
- Narrow down
tf.image:- Replaced
tf.image.non_max_suppression_paddedwith a new implementation that supports batched inputs, which is considerably faster on TPUs and GPUs. Boxes with area=0 will be ignored. Existing usage with single inputs should still work as before.
- Replaced
tf.linalg- Add
tf.linalg.banded_triangular_solve.
- Add
tf.random:- Add
tf.random.stateless_parameterized_truncated_normal.
- Add
tf.ragged:- Add
tf.ragged.crossandtf.ragged.cross_hashedoperations.
- Add
tf.RaggedTensor:RaggedTensor.to_tensor()now preserves static shape.- Add
tf.strings.format()andtf.print()to support RaggedTensors.
tf.saved_model:@tf.functionfrom SavedModel no longer ignores args after aRaggedTensorwhen selecting the concrete function to run.- Fix save model issue for ops with a list of functions.
- Add
tf.saved_model.LoadOptionswithexperimental_io_deviceas arg with default valueNoneto choose the I/O device for loading models and weights. - Update
tf.saved_model.SaveOptionswithexperimental_io_deviceas arg with default valueNoneto choose the I/O device for saving models and weights.
- GPU
- No longer includes PTX kernels for GPU except for sm_70 to reduce binary size.
- Others
- Retain parent namescope for ops added inside
tf.while_loop/tf.cond/tf.switch_case. - Update
tf.vectorized_mapto support vectorizingtf.while_loopand TensorList operations. tf.custom_gradientcan now be applied to functions that accept nested structures oftensorsas inputs (instead of just a list of tensors). Note that Python structures such as tuples and lists now won't be treated as tensors, so if you still want them to be treated that way, you need to wrap them withtf.convert_to_tensor.- No lowering on gradient case op when input is
DeviceIndexop. - Extend the ragged version of
tf.gatherto supportbatch_dimsandaxisargs. - Update
tf.map_fnto support RaggedTensors and SparseTensors. - Deprecate
tf.group. It is not useful in eager mode. - Add CPU and GPU implementation of modified variation of
FTRL/FTRLV2that can triggerred bymultiply_linear_by_lrallowing a learning rate of zero.
- Retain parent namescope for ops added inside
tf.data:tf.data.experimental.dense_to_ragged_batchworks correctly with tuples.tf.data.experimental.dense_to_ragged_batchto output variable ragged rank.tf.data.experimental.cardinalityis now a method ontf.data.Dataset.- π
tf.data.Datasetnow supportslen(Dataset)when the cardinality is finite.
tf.distribute:- π Expose experimental
tf.distribute.DistributedDatasetandtf.distribute.DistributedIteratorto distribute input data when usingtf.distributeto scale training on multiple devices.- Added a
get_next_as_optionalmethod fortf.distribute.DistributedIteratorclass to return atf.experimental.Optionalinstance that contains the next value for all replicas or none instead of raising an out of range error. Also see new guide on input distribution.
- Added a
- π Allow var.assign on MirroredVariables with aggregation=NONE in replica context. Previously this would raise an error. We now allow this because many users and library writers find using
.assignin replica context to be more convenient, instead of having to useStrategy.extended.updatewhich was the previous way of updating variables in this situation. - π·
tf.distribute.experimental.MultiWorkerMirroredStrategyadds support for partial batches. Workers running out of data now continue to participate in the training with empty inputs, instead of raising an error. Learn more about partial batches here. - π Improve the performance of reading metrics eagerly under
tf.distribute.experimental.MultiWorkerMirroredStrategy. - π Fix the issue that
strategy.reduce()insidetf.functionmay raise exceptions when the values to reduce are from loops or if-clauses. - π Fix the issue that
tf.distribute.MirroredStrategycannot be used together withtf.distribute.experimental.MultiWorkerMirroredStrategy. - β Add a
tf.distribute.cluster_resolver.TPUClusterResolver.connectAPI to simplify TPU initialization.
tf.keras:- π Introduces experimental preprocessing layers API (
tf.keras.layers.experimental.preprocessing) to handle data preprocessing operations such as categorical feature encoding, text vectorization, data normalization, and data discretization (binning). The newly added layers provide a replacement for the legacy feature column API, and support composite tensor inputs. - Added categorical data processing layers:
IntegerLookup&StringLookup: build an index of categorical feature valuesCategoryEncoding: turn integer-encoded categories into one-hot, multi-hot, or tf-idf encoded representationsCategoryCrossing: create new categorical features representing co-occurrences of previous categorical feature valuesHashing: the hashing trick, for large-vocabulary categorical featuresDiscretization: turn continuous numerical features into categorical features by binning their values
- Improved image preprocessing layers:
CenterCrop,Rescaling - Improved image augmentation layers:
RandomCrop,RandomFlip,RandomTranslation,RandomRotation,RandomHeight,RandomWidth,RandomZoom,RandomContrast - Improved
TextVectorizationlayer, which handles string tokenization, n-gram generation, and token encoding- The
TextVectorizationlayer now accounts for the mask_token as part of the vocabulary size when output_mode='int'. This means that, if you have a max_tokens value of 5000, your output will have 5000 unique values (not 5001 as before). - Change the return value of
TextVectorization.get_vocabulary()frombytetostring. Users who previously were calling 'decode' on the output of this method should no longer need to do so.
- The
- Introduce new Keras dataset generation utilities :
image_dataset_from_directoryis a utility based ontf.data.Dataset, meant to replace the legacyImageDataGenerator. It takes you from a structured directory of images to a labeled dataset, in one function call. Note that it doesn't perform image data augmentation (which is meant to be done using preprocessing layers).text_dataset_from_directorytakes you from a structured directory of text files to a labeled dataset, in one function call.timeseries_dataset_from_arrayis atf.data.Dataset-based replacement of the legacyTimeseriesGenerator. It takes you from an array of timeseries data to a dataset of shifting windows with their targets.
- Added
experimental_steps_per_execution
arg tomodel.compileto indicate the number of batches to run pertf.functioncall. This can speed up Keras Models on TPUs up to 3x. - π Extends
tf.keras.layers.Lambdalayers to support multi-argument lambdas, and keyword arguments when calling the layer. - Functional models now get constructed if any tensor in a layer call's arguments/keyword arguments comes from a keras input. Previously the functional api would only work if all of the elements in the first argument to the layer came from a keras input.
- Clean up
BatchNormalizationlayer'strainableproperty to act like standard python state when it's used insidetf.functions(frozen at tracing time), instead of acting like a pseudo-variable whose updates kind of sometimes get reflected in already-tracedtf.functiontraces. - β Add the
Conv1DTransposelayer. - π Refine the semantics of
SensitivitySpecificityBasederived metrics. See the updated API docstrings fortf.keras.metrics.SensitivityAtSpecificityandtf.keras.metrics.SpecificityAtSensitivty.
tf.lite:- Converter
- Restored
inference_input_typeandinference_output_typeflags in TF 2.x TFLiteConverter (backward compatible with TF 1.x) to support integer (tf.int8, tf.uint8) input and output types in post training full integer quantized models. - Added support for converting and resizing models with dynamic (placeholder) dimensions. Previously, there was only limited support for dynamic batch size, and even that did not guarantee that the model could be properly resized at runtime.
- Enabled experimental support for a new quantization mode with 16-bit activations and 8-bit weights. See
lite.OpsSet.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8.
- Restored
- CPU
- Fix an issue w/ dynamic weights and
Conv2Don x86. - Add a runtime Android flag for enabling
XNNPACKfor optimized CPU performance. - Add a runtime iOS flag for enabling
XNNPACKfor optimized CPU performance. - Add a compiler flag to enable building a TFLite library that applies
XNNPACKdelegate automatically when the model has afp32operation.
- Fix an issue w/ dynamic weights and
- GPU
- Allow GPU acceleration starting with internal graph nodes
- Experimental support for quantized models with the Android GPU delegate
- Add GPU delegate whitelist.
- Rename GPU whitelist -> compatibility (list).
- Improve GPU compatibility list entries from crash reports.
- NNAPI
- Set default value for
StatefulNnApiDelegate::Options::max_number_delegated_partitionsto 3. - Add capability to disable
NNAPICPU and checkNNAPIErrno. - Fix crashes when using
NNAPIwith target accelerator specified with model containing Conv2d or FullyConnected or LSTM nodes with quantized weights. - Fix
ANEURALNETWORKS_BAD_DATAexecution failures withsum/max/min/reduceoperations withscalarinputs.
- Set default value for
- Hexagon
- TFLite Hexagon Delegate out of experimental.
- Experimental
int8support for most hexagon ops. - Experimental per-channel quant support for
convin Hexagon delegate. - Support dynamic batch size in C++ API.
- CoreML
- Opensource CoreML delegate
- Misc
- Enable building Android TFLite targets on Windows
- Add support for
BatchMatMul. - Add support for
half_pixel_centerswithResizeNearestNeighbor. - Add 3D support for
BatchToSpaceND. - Add 5D support for
BroadcastSub,Maximum,Minimum,TransposeandBroadcastDiv. - Rename
kTfLiteActRelu1tokTfLiteActReluN1To1. - Enable flex delegate on tensorflow.lite.Interpreter Python package.
- Add
Buckettize,SparseCrossandBoostedTreesBucketizeto the flex whitelist. - Add support for selective registration of flex ops.
- Add missing kernels for flex delegate whitelisted ops.
- Fix issue when using direct
ByteBufferinputs with graphs that have dynamic shapes. - Fix error checking supported operations in a model containing
HardSwish.
π Packaging Support
- π Added
tf.sysconfig.get_build_info(). Returns a dict that describes the currently installed TensorFlow package, e.g. the NVIDIA CUDA and NVIDIA CuDNN versions that the package was built to support.
Profiler
- π Fix a subtle use-after-free issue in
XStatVisitor::RefValue().
TPU Enhancements
- β Adds 3D mesh support in TPU configurations ops.
- Added TPU code for
FTRLwithmultiply_linear_by_lr. - Silently adds a new file system registry at
gstpu. - π Support
restartTypein cloud tpu client. - Depend on a specific version of google-api-python-client.
- π Fixes apiclient import.
Tracing and Debugging
- Add a
TFE_Py_Executetraceme.
π XLA Support
- Implement stable
argminandargmax
Thanks to our Contributors
π This release contains contributions from many people at Google, as well as:
π 902449@58880@bigcat_chen@ASIC, Abdul Baseer Khan, Abhineet Choudhary, Abolfazl Shahbazi, Adam Hillier, ag.ramesh, Agoniii, Ajay P, Alex Hoffman, Alexander Bayandin, Alexander Grund, Alexandre Abadie, Alexey Rogachevskiy, amoitra, Andrew Stevens, Angus-Luo, Anshuman Tripathy, Anush Elangovan, Artem Mavrin, Ashutosh Hathidara, autoih, Ayushman Kumar, ayushmankumar7, Bairen Yi, Bas Aarts, Bastian Eichenberger, Ben Barsdell, bhack, Bharat Raghunathan, Biagio Montaruli, Bigcat-Himax, blueyi, Bryan Cutler, Byambaa, Carlos Hernandez-Vaquero, Chen Lei, Chris Knorowski, Christian Clauss, chuanqiw, CuiYifeng, Daniel Situnayake, Daria Zhuravleva, Dayananda-V, Deven Desai, Devi Sandeep Endluri, Dmitry Zakharov, Dominic Jack, Duncan Riach, Edgar Liberis, Ehsan Toosi, ekuznetsov139, Elena Zhelezina, Eugene Kuznetsov, Eugene Mikhantiev, Evgenii Zheltonozhskii, Fabio Di Domenico, Fausto Morales, Fei Sun, feihugis, Felix E. Klee, flyingcat, Frederic Bastien, Fredrik Knutsson, frreiss, fsx950223, ganler, Gaurav Singh, Georgios Pinitas, Gian Marco Iodice, Giorgio Arena, Giuseppe Rossini, Gregory Keith, Guozhong Zhuang, gurushantj, Hahn Anselm, Harald Husum, Harjyot Bagga, Hristo Vrigazov, Ilya Persky, Ir1d, Itamar Turner-Trauring, jacco, Jake Tae, Janosh Riebesell, Jason Zaman, jayanth, Jeff Daily, Jens Elofsson, Jinzhe Zeng, JLZ, Jonas Skog, Jonathan Dekhtiar, Josh Meyer, Joshua Chia, Judd, justkw, Kaixi Hou, Kam D Kasravi, Kamil Rakoczy, Karol Gugala, Kayou, Kazuaki Ishizaki, Keith Smiley, Khaled Besrour, Kilaru Yasaswi Sri Chandra Gandhi, Kim, Young Soo, Kristian Hartikainen, Kwabena W. Agyeman, Leslie-Fang, Leslie-Fang-Intel, Li, Guizi, Lukas Geiger, Lutz Roeder, M\U00E5Ns Nilsson, Mahmoud Abuzaina, Manish, Marcel Koester, Marcin Sielski, marload, Martin Jul, Matt Conley, mdfaijul, Meng, Peng, Meteorix, Michael KΓ€ufl, Michael137, Milan Straka, Mitchell Vitez, Ml-0, Mokke Meguru, Mshr-H, nammbash, Nathan Luehr, naumkin, Neeraj Bhadani, ngc92, Nick Morgan, nihui, Niranjan Hasabnis, Niranjan Yadla, Nishidha Panpaliya, Oceania2018, oclyke, Ouyang Jin, OverLordGoldDragon, Owen Lyke, Patrick Hemmer, Paul Andrey, Peng Sun, periannath, Phil Pearl, Prashant Dandriyal, Prashant Kumar, Rahul Huilgol, Rajan Singh, Rajeshwar Reddy T, rangjiaheng, Rishit Dagli, Rohan Reddy, rpalakkal, rposts, Ruan Kunliang, Rushabh Vasani, Ryohei Ikegami, Semun Lee, Seo-Inyoung, Sergey Mironov, Sharada Shiddibhavi, ShengYang1, Shraiysh Vaishay, Shunya Ueta, shwetaoj, Siyavash Najafzade, Srinivasan Narayanamoorthy, Stephan Uphoff, storypku, sunchenggen, sunway513, Sven-Hendrik Haase, Swapnil Parekh, Tamas Bela Feher, Teng Lu, tigertang, tomas, Tomohiro Ubukata, tongxuan.ltx, Tony Tonev, Tzu-Wei Huang, TΓ©o Bouvard, Uday Bondhugula, Vaibhav Jade, Vijay Tadikamalla, Vikram Dattu, Vincent Abriou, Vishnuvardhan Janapati, Vo Van Nghia, VoVAllen, Will Battel, William D. Irons, wyzhao, Xiaoming (Jason) Cui, Xiaoquan Kong, Xinan Jiang, xutianming, Yair Ehrenwald, Yasir Modak, Yasuhiro Matsumoto, Yixing Fu, Yong Tang, Yuan Tang, zhaozheng09, Zilin Zhu, zilinzhu, εΌ εΏθ±ͺ
-
v2.3.0-rc1 Changes
July 09, 2020π Release 2.3.0
Major Features and Improvements
tf.dataadds two new mechanisms to solve input pipeline bottlenecks and save resources:
π In addition checkout the detailed guide for analyzing input pipeline performance with TF Profiler.
tf.distribute.TPUStrategyis now a stable API and no longer considered experimental for TensorFlow. (earliertf.distribute.experimental.TPUStrategy).π TF Profiler introduces two new tools: a memory profiler to visualize your modelβs memory usage over time and a python tracer which allows you to trace python function calls in your model. Usability improvements include better diagnostic messages and profile options to customize the host and device trace verbosity level.
Introduces experimental support for Keras Preprocessing Layers API (
tf.keras.layers.experimental.preprocessing.*) to handle data preprocessing operations, with support for composite tensor inputs. Please see below for additional details on these layers.π TFLite now properly supports dynamic shapes during conversion and inference. Weβve also added opt-in support on Android and iOS for XNNPACK, a highly optimized set of CPU kernels, as well as opt-in support for executing quantized models on the GPU.
π Libtensorflow packages are available in GCS starting this release. We have also started to release a nightly version of these packages.
π₯ Breaking Changes
- Increases the minimum bazel version required to build TF to 3.1.0.
tf.data- Makes the following (breaking) changes to the
tf.data. - C++ API: -
IteratorBase::RestoreInternal,IteratorBase::SaveInternal, andDatasetBase::CheckExternalStatebecome pure-virtual and subclasses are now expected to provide an implementation. - The deprecated
DatasetBase::IsStatefulmethod is removed in favor ofDatasetBase::CheckExternalState. - Deprecated overrides of
DatasetBase::MakeIteratorandMakeIteratorFromInputElementare removed. - The signature of
tensorflow::data::IteratorBase::SaveInternalandtensorflow::data::IteratorBase::SaveInputhas been extended withSerializationContextargument to enable overriding the default policy for the handling external state during iterator checkpointing. This is not a backwards compatible change and all subclasses ofIteratorBaseneed to be updated accordingly.
- Makes the following (breaking) changes to the
tf.keras- Add a new
BackupAndRestorecallback for handling distributed training failures & restarts. Please take a look at this tutorial for details on how to use the callback.
- Add a new
- β‘οΈ
tf.image.extract_glimpsehas been updated to correctly process the case
wherecentered=Falseandnormalized=False. This is a breaking change as
the output is different from (incorrect) previous versions. Note this
π₯ breaking change only impactstf.image.extract_glimpseand
tf.compat.v2.image.extract_glimpseAPI endpoints. The behavior of
tf.compat.v1.image.extract_glimpsedoes not change. The behavior of
exsiting C++ kernelExtractGlimpsedoes not change either, so saved
models usingtf.raw_ops.ExtractGlimpsewill not be impacted.
π Bug Fixes and Other Changes
TF Core:
- Set
tf2_behaviorto 1 to enable V2 for early loading cases. - β Add a function to dynamically choose the implementation based on underlying device placement.
- Eager:
- Add
reduce_logsumexpbenchmark with experiment compile. - Give
EagerTensors a meaningful__array__implementation. - Add another version of defun matmul for performance analysis.
- Add
tf.function/AutoGraph:AutoGraphnow includes into TensorFlow loops any variables that are closed over by local functions. Previously, such variables were sometimes incorrectly ignored.- functions returned by the
get_concrete_functionmethod oftf.functionobjects can now be called with arguments consistent with the original arguments or type specs passed toget_concrete_function. This calling convention is now the preferred way to use concrete functions with nested values and composite tensors. Please check the guide for more details onconcrete_ function. - Update
tf.function'sexperimental_relax_shapesto handle composite tensors appropriately. - Optimize
tf.functioninvocation, by removing redundant list converter. tf.functionwill retrace when called with a different variable instead of simply using thedtype&shape.- Improve support for dynamically-sized TensorArray inside
tf.function.
tf.math:- Narrow down
argmin/argmaxcontract to always return the smallest index for ties. tf.math.reduce_varianceandtf.math.reduce_stdreturn correct computation for complex types and no longer support integer types.- Add Bessel functions of order 0,1 to
tf.math.special. tf.dividenow always returns a tensor to be consistent with documentation and other APIs.
- Narrow down
tf.image:- Replaced
tf.image.non_max_suppression_paddedwith a new implementation that supports batched inputs, which is considerably faster on TPUs and GPUs. Boxes with area=0 will be ignored. Existing usage with single inputs should still work as before.
- Replaced
tf.linalg- Add
tf.linalg.banded_triangular_solve.
- Add
tf.random:- Add
tf.random.stateless_parameterized_truncated_normal.
- Add
tf.ragged:- Add
tf.ragged.crossandtf.ragged.cross_hashedoperations.
- Add
tf.RaggedTensor:RaggedTensor.to_tensor()now preserves static shape.- Add
tf.strings.format()andtf.print()to support RaggedTensors.
tf.saved_model:@tf.functionfrom SavedModel no longer ignores args after aRaggedTensorwhen selecting the concrete function to run.- Fix save model issue for ops with a list of functions.
- Add
tf.saved_model.LoadOptionswithexperimental_io_deviceas arg with default valueNoneto choose the I/O device for loading models and weights. - Update
tf.saved_model.SaveOptionswithexperimental_io_deviceas arg with default valueNoneto choose the I/O device for saving models and weights.
- GPU
- No longer includes PTX kernels for GPU except for sm_70 to reduce binary size.
- Others
- Retain parent namescope for ops added inside
tf.while_loop/tf.cond/tf.switch_case. - Update
tf.vectorized_mapto support vectorizingtf.while_loopand TensorList operations. tf.custom_gradientcan now be applied to functions that accept nested structures oftensorsas inputs (instead of just a list of tensors). Note that Python structures such as tuples and lists now won't be treated as tensors, so if you still want them to be treated that way, you need to wrap them withtf.convert_to_tensor.- No lowering on gradient case op when input is
DeviceIndexop. - Fix in c_api
DEFINE_GETATTR. - Extend the ragged version of
tf.gatherto supportbatch_dimsandaxisargs. - Update
tf.map_fnto support RaggedTensors and SparseTensors. - Deprecate
tf.group. It is not useful in eager mode. - Add a new variant of
FTRLallowing a learning rate of zero.
- Retain parent namescope for ops added inside
tf.data:tf.data.experimental.dense_to_ragged_batchworks correctly with tuples.tf.data.experimental.dense_to_ragged_batchto output variable ragged rank.tf.data.experimental.cardinalityis now a method ontf.data.Dataset.- π
tf.data.Datasetnow supportslen(Dataset)when the cardinality is finite.
tf.distribute:- π Expose experimental
tf.distribute.DistributedDatasetandtf.distribute.DistributedIteratorto distribute input data when usingtf.distributeto scale training on multiple devices.- Added a
get_next_as_optionalmethod fortf.distribute.DistributedIteratorclass to return atf.experimental.Optionalinstance that contains the next value for all replicas or none instead of raising an out of range error. Also see new guide on input distribution.
- Added a
- π Allow
var.assignonMirroredVariableswithaggregation=NONEin replica context. Previously this would raise an error since there was no way to confirm that the values being assigned to theMirroredVariableswere in fact identical. - π·
tf.distribute.experimental.MultiWorkerMirroredStrategyadds support for partial batches. Workers running out of data now continue to participate in the training with empty inputs, instead of raising an error. - π Improve the performance of reading metrics eagerly under
tf.distribute.experimental.MultiWorkerMirroredStrategy. - π Fix the issue that
strategy.reduce()insidetf.functionmay raise exceptions when the values to reduce are from loops or if-clauses. - π Fix the issue that
tf.distribute.MirroredStrategycannot be used together withtf.distribute.experimental.MultiWorkerMirroredStrategy. - β Add a
tf.distribute.cluster_resolver.TPUClusterResolver.connectAPI to simplify TPU initialization.
tf.keras:- π Introduces experimental preprocessing layers API (
tf.keras.layers.experimental.preprocessing) to handle data preprocessing operations such as categorical feature encoding, text vectorization, data normalization, and data discretization (binning). The newly added layers provide a replacement for the legacy feature column API, and support composite tensor inputs. - Added categorical data processing layers:
IntegerLookup&StringLookup: build an index of categorical feature valuesCategoryEncoding: turn integer-encoded categories into one-hot, multi-hot, or tf-idf encoded representationsCategoryCrossing: create new categorical features representing co-occurrences of previous categorical feature valuesHashing: the hashing trick, for large-vocabulary categorical featuresDiscretization: turn continuous numerical features into categorical features by binning their values
- Improved image preprocessing layers:
CenterCrop,Rescaling - Improved image augmentation layers:
RandomCrop,RandomFlip,RandomTranslation,RandomRotation,RandomHeight,RandomWidth,RandomZoom,RandomContrast - Improved
TextVectorizationlayer, which handles string tokenization, n-gram generation, and token encoding- The
TextVectorizationlayer now accounts for the mask_token as part of the vocabulary size when output_mode='int'. This means that, if you have a max_tokens value of 5000, your output will have 5000 unique values (not 5001 as before). - Change the return value of
TextVectorization.get_vocabulary()frombytetostring. Users who previously were calling 'decode' on the output of this method should no longer need to do so.
- The
- Introduce new Keras dataset generation utilities :
image_dataset_from_directoryis a utility based ontf.data.Dataset, meant to replace the legacyImageDataGenerator. It takes you from a structured directory of images to a labeled dataset, in one function call. Note that it doesn't perform image data augmentation (which is meant to be done using preprocessing layers).text_dataset_from_directorytakes you from a structured directory of text files to a labeled dataset, in one function call.timeseries_dataset_from_arrayis atf.data.Dataset-based replacement of the legacyTimeseriesGenerator. It takes you from an array of timeseries data to a dataset of shifting windows with their targets.
- Added
experimental_steps_per_execution
arg tomodel.compileto indicate the number of batches to run pertf.functioncall. This can speed up Keras Models on TPUs up to 3x. - π Extends
tf.keras.layers.Lambdalayers to support multi-argument lambdas, and keyword arguments when calling the layer. - Functional models now get constructed if any tensor in a layer call's arguments/keyword arguments comes from a keras input. Previously the functional api would only work if all of the elements in the first argument to the layer came from a keras input.
- Clean up
BatchNormalizationlayer'strainableproperty to act like standard python state when it's used insidetf.functions(frozen at tracing time), instead of acting like a pseudo-variable whose updates kind of sometimes get reflected in already-tracedtf.functiontraces. - β Add the
Conv1DTransposelayer. - π Fix bug in
SensitivitySpecificityBasederived metrics. - Blacklist Case op from callback
tf.lite:- Converter
- Restored
inference_input_typeandinference_output_typeflags in TF 2.x TFLiteConverter (backward compatible with TF 1.x) to support integer (tf.int8, tf.uint8) input and output types in post training full integer quantized models. - Added support for converting and resizing models with dynamic (placeholder) dimensions. Previously, there was only limited support for dynamic batch size, and even that did not guarantee that the model could be properly resized at runtime.
- Restored
- CPU
- Fix an issue w/ dynamic weights and
Conv2Don x86. - Add a runtime Android flag for enabling
XNNPACKfor optimized CPU performance. - Add a runtime iOS flag for enabling
XNNPACKfor optimized CPU performance. - Add a compiler flag to enable building a TFLite library that applies
XNNPACKdelegate automatically when the model has afp32operation.
- Fix an issue w/ dynamic weights and
- GPU
- Allow GPU acceleration starting with internal graph nodes
- Experimental support for quantized models with the Android GPU delegate
- Add GPU delegate whitelist.
- Rename GPU whitelist -> compatibility (list).
- Improve GPU compatibility list entries from crash reports.
- NNAPI
- Set default value for
StatefulNnApiDelegate::Options::max_number_delegated_partitionsto 3. - Add capability to disable
NNAPICPU and checkNNAPIErrno. - Fix crashes when using
NNAPIwith target accelerator specified with model containing Conv2d or FullyConnected or LSTM nodes with quantized weights. - Fix
ANEURALNETWORKS_BAD_DATAexecution failures withsum/max/min/reduceoperations withscalarinputs.
- Set default value for
- Hexagon
- TFLite Hexagon Delegate out of experimental.
- Experimental
int8support for most hexagon ops. - Experimental per-channel quant support for
convin Hexagon delegate. - Support dynamic batch size in C++ API.
- CoreML
- Opensource CoreML delegate
- Misc
- Enable building Android TFLite targets on Windows
- Add support for
BatchMatMul. - Add support for
half_pixel_centerswithResizeNearestNeighbor. - Add 3D support for
BatchToSpaceND. - Add 5D support for
BroadcastSub,Maximum,Minimum,TransposeandBroadcastDiv. - Rename
kTfLiteActRelu1tokTfLiteActReluN1To1. - Enable flex delegate on tensorflow.lite.Interpreter Python package.
- Add
Buckettize,SparseCrossandBoostedTreesBucketizeto the flex whitelist. - Add support for selective registration of flex ops.
- Add missing kernels for flex delegate whitelisted ops.
- Fix issue when using direct
ByteBufferinputs with graphs that have dynamic shapes. - Fix error checking supported operations in a model containing
HardSwish.
Profiler
* Fix a subtle use-after-free issue in `XStatVisitor::RefValue()`.TPU Enhancements
- π 3D mesh support
- Added TPU code for
FTRLwithmultiply_linear_by_lr. - Silently adds a new file system registry at
gstpu. - π Support
restartTypein cloud tpu client. - Depend on a specific version of google-api-python-client.
- π Fixes apiclient import.
π XLA Support
- Implement stable
argminandargmax
Tracing and Debugging
- Add a
TFE_Py_Executetraceme.
π Packaging Support
- π Added
tf.sysconfig.get_build_info(). Returns a dict that describes the currently installed TensorFlow package, e.g. the NVIDIA CUDA and NVIDIA CuDNN versions that the package was built to support.
Thanks to our Contributors
π This release contains contributions from many people at Google, as well as:
π 902449@58880@bigcat_chen@ASIC, Abdul Baseer Khan, Abhineet Choudhary, Abolfazl Shahbazi, Adam Hillier, ag.ramesh, Agoniii, Ajay P, Alex Hoffman, Alexander Bayandin, Alexander Grund, Alexandre Abadie, Alexey Rogachevskiy, amoitra, Andrew Stevens, Angus-Luo, Anshuman Tripathy, Anush Elangovan, Artem Mavrin, Ashutosh Hathidara, autoih, Ayushman Kumar, ayushmankumar7, Bairen Yi, Bas Aarts, Bastian Eichenberger, Ben Barsdell, bhack, Bharat Raghunathan, Biagio Montaruli, Bigcat-Himax, blueyi, Bryan Cutler, Byambaa, Carlos Hernandez-Vaquero, Chen Lei, Chris Knorowski, Christian Clauss, chuanqiw, CuiYifeng, Daniel Situnayake, Daria Zhuravleva, Dayananda-V, Deven Desai, Devi Sandeep Endluri, Dmitry Zakharov, Dominic Jack, Duncan Riach, Edgar Liberis, Ehsan Toosi, ekuznetsov139, Elena Zhelezina, Eugene Kuznetsov, Eugene Mikhantiev, Evgenii Zheltonozhskii, Fabio Di Domenico, Fausto Morales, Fei Sun, feihugis, Felix E. Klee, flyingcat, Frederic Bastien, Fredrik Knutsson, frreiss, fsx950223, ganler, Gaurav Singh, Georgios Pinitas, Gian Marco Iodice, Giorgio Arena, Giuseppe Rossini, Gregory Keith, Guozhong Zhuang, gurushantj, Hahn Anselm, Harald Husum, Harjyot Bagga, Hristo Vrigazov, Ilya Persky, Ir1d, Itamar Turner-Trauring, jacco, Jake Tae, Janosh Riebesell, Jason Zaman, jayanth, Jeff Daily, Jens Elofsson, Jinzhe Zeng, JLZ, Jonas Skog, Jonathan Dekhtiar, Josh Meyer, Joshua Chia, Judd, justkw, Kaixi Hou, Kam D Kasravi, Kamil Rakoczy, Karol Gugala, Kayou, Kazuaki Ishizaki, Keith Smiley, Khaled Besrour, Kilaru Yasaswi Sri Chandra Gandhi, Kim, Young Soo, Kristian Hartikainen, Kwabena W. Agyeman, Leslie-Fang, Leslie-Fang-Intel, Li, Guizi, Lukas Geiger, Lutz Roeder, M\U00E5Ns Nilsson, Mahmoud Abuzaina, Manish, Marcel Koester, Marcin Sielski, marload, Martin Jul, Matt Conley, mdfaijul, Meng, Peng, Meteorix, Michael KΓ€ufl, Michael137, Milan Straka, Mitchell Vitez, Ml-0, Mokke Meguru, Mshr-H, nammbash, Nathan Luehr, naumkin, Neeraj Bhadani, ngc92, Nick Morgan, nihui, Niranjan Hasabnis, Niranjan Yadla, Nishidha Panpaliya, Oceania2018, oclyke, Ouyang Jin, OverLordGoldDragon, Owen Lyke, Patrick Hemmer, Paul Andrey, Peng Sun, periannath, Phil Pearl, Prashant Dandriyal, Prashant Kumar, Rahul Huilgol, Rajan Singh, Rajeshwar Reddy T, rangjiaheng, Rishit Dagli, Rohan Reddy, rpalakkal, rposts, Ruan Kunliang, Rushabh Vasani, Ryohei Ikegami, Semun Lee, Seo-Inyoung, Sergey Mironov, Sharada Shiddibhavi, ShengYang1, Shraiysh Vaishay, Shunya Ueta, shwetaoj, Siyavash Najafzade, Srinivasan Narayanamoorthy, Stephan Uphoff, storypku, sunchenggen, sunway513, Sven-Hendrik Haase, Swapnil Parekh, Tamas Bela Feher, Teng Lu, tigertang, tomas, Tomohiro Ubukata, tongxuan.ltx, Tony Tonev, Tzu-Wei Huang, TΓ©o Bouvard, Uday Bondhugula, Vaibhav Jade, Vijay Tadikamalla, Vikram Dattu, Vincent Abriou, Vishnuvardhan Janapati, Vo Van Nghia, VoVAllen, Will Battel, William D. Irons, wyzhao, Xiaoming (Jason) Cui, Xiaoquan Kong, Xinan Jiang, xutianming, Yair Ehrenwald, Yasir Modak, Yasuhiro Matsumoto, Yixing Fu, Yong Tang, Yuan Tang, zhaozheng09, Zilin Zhu, zilinzhu, εΌ εΏθ±ͺ
-
v2.3.0-rc0 Changes
June 26, 2020π Release 2.3.0
Major Features and Improvements
tf.dataadds two new mechanisms to solve input pipeline bottlenecks and save resources:
π In addition checkout the detailed guide for analyzing input pipeline performance with TF Profiler.
tf.distribute.TPUStrategyis now a stable API and no longer considered experimental for TensorFlow. (earliertf.distribute.experimental.TPUStrategy).π TF Profiler introduces two new tools: a memory profiler to visualize your modelβs memory usage over time and a python tracer which allows you to trace python function calls in your model. Usability improvements include better diagnostic messages and profile options to customize the host and device trace verbosity level.
Introduces experimental support for Keras Preprocessing Layers API (
tf.keras.layers.experimental.preprocessing.*) to handle data preprocessing operations, with support for composite tensor inputs. Please see below for additional details on these layers.π TFLite now properly supports dynamic shapes during conversion and inference. Weβve also added opt-in support on Android and iOS for XNNPACK, a highly optimized set of CPU kernels, as well as opt-in support for executing quantized models on the GPU.
π Libtensorflow packages are available in GCS starting this release. We have also started to release a nightly version of these packages.
π₯ Breaking Changes
- Increases the minimum bazel version required to build TF to 3.1.0.
tf.data- Makes the following (breaking) changes to the
tf.data. - C++ API: -
IteratorBase::RestoreInternal,IteratorBase::SaveInternal, andDatasetBase::CheckExternalStatebecome pure-virtual and subclasses are now expected to provide an implementation. - The deprecated
DatasetBase::IsStatefulmethod is removed in favor ofDatasetBase::CheckExternalState. - Deprecated overrides of
DatasetBase::MakeIteratorandMakeIteratorFromInputElementare removed. - The signature of
tensorflow::data::IteratorBase::SaveInternalandtensorflow::data::IteratorBase::SaveInputhas been extended withSerializationContextargument to enable overriding the default policy for the handling external state during iterator checkpointing. This is not a backwards compatible change and all subclasses ofIteratorBaseneed to be updated accordingly.
- Makes the following (breaking) changes to the
tf.keras- Add a new
BackupAndRestorecallback for handling distributed training failures & restarts. Please take a look at this tutorial for details on how to use the callback.
- Add a new
- β‘οΈ
tf.image.extract_glimpsehas been updated to correctly process the case
wherecentered=Falseandnormalized=False. This is a breaking change as
the output is different from (incorrect) previous versions. Note this
π₯ breaking change only impactstf.image.extract_glimpseand
tf.compat.v2.image.extract_glimpseAPI endpoints. The behavior of
tf.compat.v1.image.extract_glimpsedoes not change. The behavior of
exsiting C++ kernelExtractGlimpsedoes not change as well, so saved
models will not be impacted.
π Bug Fixes and Other Changes
TF Core:
- Set
tf2_behaviorto 1 to enable V2 for early loading cases. - β Add a function to dynamically choose the implementation based on underlying device placement.
- Eager:
- Add
reduce_logsumexpbenchmark with experiment compile. - Give
EagerTensors a meaningful__array__implementation. - Add another version of defun matmul for performance analysis.
- Add
tf.function/AutoGraph:AutoGraphnow includes into TensorFlow loops any variables that are closed over by local functions. Previously, such variables were sometimes incorrectly ignored.- functions returned by the
get_concrete_functionmethod oftf.functionobjects can now be called with arguments consistent with the original arguments or type specs passed toget_concrete_function. This calling convention is now the preferred way to use concrete functions with nested values and composite tensors. Please check the guide for more details onconcrete_ function. - Update
tf.function'sexperimental_relax_shapesto handle composite tensors appropriately. - Optimize
tf.functioninvocation, by removing redundant list converter. tf.functionwill retrace when called with a different variable instead of simply using thedtype&shape.- Improve support for dynamically-sized TensorArray inside
tf.function.
tf.math:- Narrow down
argmin/argmaxcontract to always return the smallest index for ties. tf.math.reduce_varianceandtf.math.reduce_stdreturn correct computation for complex types and no longer support integer types.- Add Bessel functions of order 0,1 to
tf.math.special. tf.dividenow always returns a tensor to be consistent with documentation and other APIs.
- Narrow down
tf.image:- Replaces
tf.image.non_max_suppression_paddedwith a new implementation that supports batched inputs, which is considerably faster on TPUs and GPUs. Boxes with area=0 will be neglected. Existing usage with single inputs should still work as before.
- Replaces
tf.linalg- Add
tf.linalg.banded_triangular_solve.
- Add
tf.random:- Add
tf.random.stateless_parameterized_truncated_normal.
- Add
tf.ragged:- Add
tf.ragged.crossandtf.ragged.cross_hashedoperations.
- Add
tf.RaggedTensor:RaggedTensor.to_tensor()now preserves static shape.- Add
tf.strings.format()andtf.print()to support RaggedTensors.
tf.saved_model:@tf.functionfrom SavedModel no longer ignores args after aRaggedTensorwhen selecting the concrete function to run.- Fix save model issue for ops with a list of functions.
- Add
tf.saved_model.LoadOptionswithexperimental_io_deviceas arg with default valueNoneto choose the I/O device for loading models and weights. - Update
tf.saved_model.SaveOptionswithexperimental_io_deviceas arg with default valueNoneto choose the I/O device for saving models and weights.
- GPU
- No longer includes PTX kernels for GPU except for sm_70 to reduce binary size.
- Profiler
- Fix a subtle use-after-free issue in
XStatVisitor::RefValue().
- Fix a subtle use-after-free issue in
- Others
- Retain parent namescope for ops added inside
tf.while_loop/tf.cond/tf.switch_case. - Update
tf.vectorized_mapto support vectorizingtf.while_loopand TensorList operations. tf.custom_gradientcan now be applied to functions that accept nested structures oftensorsas inputs (instead of just a list of tensors). Note that Python structures such as tuples and lists now won't be treated as tensors, so if you still want them to be treated that way, you need to wrap them withtf.convert_to_tensor.- No lowering on gradient case op when input is
DeviceIndexop. - Fix in c_api
DEFINE_GETATTR. - Extend the ragged version of
tf.gatherto supportbatch_dimsandaxisargs. - Update
tf.map_fnto support RaggedTensors and SparseTensors. - Deprecate
tf.group. It is not useful in eager mode. - Add a new variant of
FTRLallowing a learning rate of zero.
- Retain parent namescope for ops added inside
tf.data:tf.data.experimental.dense_to_ragged_batchworks correctly with tuples.tf.data.experimental.dense_to_ragged_batchto output variable ragged rank.tf.data.experimental.cardinalityis now a method ontf.data.Dataset.- π
tf.data.Datasetnow supportslen(Dataset)when the cardinality is finite.
tf.distribute:- π Expose experimental
tf.distribute.DistributedDatasetandtf.distribute.DistributedIteratorto distribute input data when usingtf.distributeto scale training on multiple devices.- Added a
get_next_as_optionalmethod fortf.distribute.DistributedIteratorclass to return atf.experimental.Optionalinstance that contains the next value for all replicas or none instead of raising an out of range error. Also see new guide on input distribution.
- Added a
- π Allow
var.assignonMirroredVariableswithaggregation=NONEin replica context. Previously this would raise an error since there was no way to confirm that the values being assigned to theMirroredVariableswere in fact identical. - π·
tf.distribute.experimental.MultiWorkerMirroredStrategyadds support for partial batches. Workers running out of data now continue to participate in the training with empty inputs, instead of raising an error. - π Improve the performance of reading metrics eagerly under
tf.distribute.experimental.MultiWorkerMirroredStrategy. - π Fix the issue that
strategy.reduce()insidetf.functionmay raise exceptions when the values to reduce are from loops or if-clauses. - π Fix the issue that
tf.distribute.MirroredStrategycannot be used together withtf.distribute.experimental.MultiWorkerMirroredStrategy. - β Add a
tf.distribute.cluster_resolver.TPUClusterResolver.connectAPI to simplify TPU initialization.
tf.keras:- π Introduces experimental preprocessing layers API (
tf.keras.layers.experimental.preprocessing) to handle data preprocessing operations such as categorical feature encoding, text vectorization, data normalization, and data discretization (binning). The newly added layers provide a replacement for the legacy feature column API, and support composite tensor inputs. - Added categorical data processing layers:
IntegerLookup&StringLookup: build an index of categorical feature valuesCategoryEncoding: turn integer-encoded categories into one-hot, multi-hot, or tf-idf encoded representationsCategoryCrossing: create new categorical features representing co-occurrences of previous categorical feature valuesHashing: the hashing trick, for large-vocabulary categorical featuresDiscretization: turn continuous numerical features into categorical features by binning their values
- Improved image preprocessing layers:
CenterCrop,Rescaling - Improved image augmentation layers:
RandomCrop,RandomFlip,RandomTranslation,RandomRotation,RandomHeight,RandomWidth,RandomZoom,RandomContrast - Improved
TextVectorizationlayer, which handles string tokenization, n-gram generation, and token encoding- The
TextVectorizationlayer now accounts for the mask_token as part of the vocabulary size when output_mode='int'. This means that, if you have a max_tokens value of 5000, your output will have 5000 unique values (not 5001 as before). - Change the return value of
TextVectorization.get_vocabulary()frombytetostring. Users who previously were calling 'decode' on the output of this method should no longer need to do so.
- The
- Introduce new Keras dataset generation utilities :
image_dataset_from_directoryis a utility based ontf.data.Dataset, meant to replace the legacyImageDataGenerator. It takes you from a structured directory of images to a labeled dataset, in one function call. Note that it doesn't perform image data augmentation (which is meant to be done using preprocessing layers).text_dataset_from_directorytakes you from a structured directory of text files to a labeled dataset, in one function call.timeseries_dataset_from_arrayis atf.data.Dataset-based replacement of the legacyTimeseriesGenerator. It takes you from an array of timeseries data to a dataset of shifting windows with their targets.
- Added
experimental_steps_per_execution
arg tomodel.compileto indicate the number of batches to run pertf.functioncall. This can speed up Keras Models on TPUs up to 3x. - Functional models now get constructed if any tensor in a layer call's arguments/keyword arguments comes from a keras input. Previously the functional api would only work if all of the elements in the first argument to the layer came from a keras input.
- Clean up
BatchNormalizationlayer'strainableproperty to act like standard python state when it's used insidetf.functions(frozen at tracing time), instead of acting like a pseudo-variable whose updates kind of sometimes get reflected in already-tracedtf.functiontraces. - β Add the
Conv1DTransposelayer. - π Fix bug in
SensitivitySpecificityBasederived metrics. - Blacklist Case op from callback
tf.lite:- Converter
- Restored
inference_input_typeandinference_output_typeflags in TF 2.x TFLiteConverter (backward compatible with TF 1.x) to support integer (tf.int8, tf.uint8) input and output types in post training full integer quantized models. - Added support for converting and resizing models with dynamic (placeholder) dimensions. Previously, there was only limited support for dynamic batch size, and even that did not guarantee that the model could be properly resized at runtime.
- Restored
- CPU
- Fix an issue w/ dynamic weights and
Conv2Don x86. - Add a runtime Android flag for enabling
XNNPACKfor optimized CPU performance. - Add a runtime iOS flag for enabling
XNNPACKfor optimized CPU performance. - Add a compiler flag to enable building a TFLite library that applies
XNNPACKdelegate automatically when the model has afp32operation.
- Fix an issue w/ dynamic weights and
- GPU
- Allow GPU acceleration starting with internal graph nodes
- Experimental support for quantized models with the Android GPU delegate
- Add GPU delegate whitelist.
- Rename GPU whitelist -> compatibility (list).
- Improve GPU compatibility list entries from crash reports.
- NNAPI
- Set default value for
StatefulNnApiDelegate::Options::max_number_delegated_partitionsto 3. - Add capability to disable
NNAPICPU and checkNNAPIErrno. - Fix crashes when using
NNAPIwith target accelerator specified with model containing Conv2d or FullyConnected or LSTM nodes with quantized weights. - Fix
ANEURALNETWORKS_BAD_DATAexecution failures withsum/max/min/reduceoperations withscalarinputs.
- Set default value for
- Hexagon
- TFLite Hexagon Delegate out of experimental.
- Experimental
int8support for most hexagon ops. - Experimental per-channel quant support for
convin Hexagon delegate. - Support dynamic batch size in C++ API.
- CoreML
- Opensource CoreML delegate
- Misc
- Enable building Android TFLite targets on Windows
- Add support for
BatchMatMul. - Add support for
half_pixel_centerswithResizeNearestNeighbor. - Add 3D support for
BatchToSpaceND. - Add 5D support for
BroadcastSub,Maximum,Minimum,TransposeandBroadcastDiv. - Rename
kTfLiteActRelu1tokTfLiteActReluN1To1. - Enable flex delegate on tensorflow.lite.Interpreter Python package.
- Add
Buckettize,SparseCrossandBoostedTreesBucketizeto the flex whitelist. - Add support for selective registration of flex ops.
- Add missing kernels for flex delegate whitelisted ops.
- Fix issue when using direct
ByteBufferinputs with graphs that have dynamic shapes. - Fix error checking supported operations in a model containing
HardSwish.
TPU Enhancements
- π 3D mesh support
- Added TPU code for
FTRLwithmultiply_linear_by_lr. - Silently adds a new file system registry at
gstpu. - π Support
restartTypein cloud tpu client. - Depend on a specific version of google-api-python-client.
- π Fixes apiclient import.
π XLA Support
- Implement stable
argminandargmax
Tracing and Debugging
- Add a
TFE_Py_Executetraceme.
Thanks to our Contributors
π This release contains contributions from many people at Google, as well as:
π 902449@58880@bigcat_chen@ASIC, Abdul Baseer Khan, Abhineet Choudhary, Abolfazl Shahbazi, Adam Hillier, ag.ramesh, Agoniii, Ajay P, Alex Hoffman, Alexander Bayandin, Alexander Grund, Alexandre Abadie, Alexey Rogachevskiy, amoitra, Andrew Stevens, Angus-Luo, Anshuman Tripathy, Anush Elangovan, Artem Mavrin, Ashutosh Hathidara, autoih, Ayushman Kumar, ayushmankumar7, Bairen Yi, Bas Aarts, Bastian Eichenberger, Ben Barsdell, bhack, Bharat Raghunathan, Biagio Montaruli, Bigcat-Himax, blueyi, Bryan Cutler, Byambaa, Carlos Hernandez-Vaquero, Chen Lei, Chris Knorowski, Christian Clauss, chuanqiw, CuiYifeng, Daniel Situnayake, Daria Zhuravleva, Dayananda-V, Deven Desai, Devi Sandeep Endluri, Dmitry Zakharov, Dominic Jack, Duncan Riach, Edgar Liberis, Ehsan Toosi, ekuznetsov139, Elena Zhelezina, Eugene Kuznetsov, Eugene Mikhantiev, Evgenii Zheltonozhskii, Fabio Di Domenico, Fausto Morales, Fei Sun, feihugis, Felix E. Klee, flyingcat, Frederic Bastien, Fredrik Knutsson, frreiss, fsx950223, ganler, Gaurav Singh, Georgios Pinitas, Gian Marco Iodice, Giorgio Arena, Giuseppe Rossini, Gregory Keith, Guozhong Zhuang, gurushantj, Hahn Anselm, Harald Husum, Harjyot Bagga, Hristo Vrigazov, Ilya Persky, Ir1d, Itamar Turner-Trauring, jacco, Jake Tae, Janosh Riebesell, Jason Zaman, jayanth, Jeff Daily, Jens Elofsson, Jinzhe Zeng, JLZ, Jonas Skog, Jonathan Dekhtiar, Josh Meyer, Joshua Chia, Judd, justkw, Kaixi Hou, Kam D Kasravi, Kamil Rakoczy, Karol Gugala, Kayou, Kazuaki Ishizaki, Keith Smiley, Khaled Besrour, Kilaru Yasaswi Sri Chandra Gandhi, Kim, Young Soo, Kristian Hartikainen, Kwabena W. Agyeman, Leslie-Fang, Leslie-Fang-Intel, Li, Guizi, Lukas Geiger, Lutz Roeder, M\U00E5Ns Nilsson, Mahmoud Abuzaina, Manish, Marcel Koester, Marcin Sielski, marload, Martin Jul, Matt Conley, mdfaijul, Meng, Peng, Meteorix, Michael KΓ€ufl, Michael137, Milan Straka, Mitchell Vitez, Ml-0, Mokke Meguru, Mshr-H, nammbash, Nathan Luehr, naumkin, Neeraj Bhadani, ngc92, Nick Morgan, nihui, Niranjan Hasabnis, Niranjan Yadla, Nishidha Panpaliya, Oceania2018, oclyke, Ouyang Jin, OverLordGoldDragon, Owen Lyke, Patrick Hemmer, Paul Andrey, Peng Sun, periannath, Phil Pearl, Prashant Dandriyal, Prashant Kumar, Rahul Huilgol, Rajan Singh, Rajeshwar Reddy T, rangjiaheng, Rishit Dagli, Rohan Reddy, rpalakkal, rposts, Ruan Kunliang, Rushabh Vasani, Ryohei Ikegami, Semun Lee, Seo-Inyoung, Sergey Mironov, Sharada Shiddibhavi, ShengYang1, Shraiysh Vaishay, Shunya Ueta, shwetaoj, Siyavash Najafzade, Srinivasan Narayanamoorthy, Stephan Uphoff, storypku, sunchenggen, sunway513, Sven-Hendrik Haase, Swapnil Parekh, Tamas Bela Feher, Teng Lu, tigertang, tomas, Tomohiro Ubukata, tongxuan.ltx, Tony Tonev, Tzu-Wei Huang, TΓ©o Bouvard, Uday Bondhugula, Vaibhav Jade, Vijay Tadikamalla, Vikram Dattu, Vincent Abriou, Vishnuvardhan Janapati, Vo Van Nghia, VoVAllen, Will Battel, William D. Irons, wyzhao, Xiaoming (Jason) Cui, Xiaoquan Kong, Xinan Jiang, xutianming, Yair Ehrenwald, Yasir Modak, Yasuhiro Matsumoto, Yixing Fu, Yong Tang, Yuan Tang, zhaozheng09, Zilin Zhu, zilinzhu, εΌ εΏθ±ͺ