Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
This repository was archived by the owner on Jun 30, 2022. It is now read-only.

Releases: GoogleCloudPlatform/DataflowPythonSDK

Future Releases

31 May 17:16

Choose a tag to compare

All future releases will be announced on Release Notes: Dataflow SDK for Python and releases will be available on PyPI.

See README.md for more information.

Version 0.2.7

14 Jun 06:22

Choose a tag to compare

The 0.2.7 release includes the following changes:

  • Introduce OperationCounters.should_sample for sampling for size estimation.
  • Implement fixed sharding in TextFileSink.
  • Use multiple file rename threads in finalize_write method.
  • Retry idempotent I/O operations on GCS timeout.

Version 0.2.6

10 Jun 16:01

Choose a tag to compare

The 0.2.6 release includes the following changes:

  • Allow Pipeline objects to be used in Python with statements.
  • Several bug fixes.

Version 0.2.5

31 May 21:55

Choose a tag to compare

The 0.2.5 release includes the following changes:

  • Support for creating custom sources, and reading them with DirectRunner and DataflowRunner.
  • DiskCachedPipelineRunner as a disk backed alternative to DirectRunner.
  • Ignore undeclared side outputs of DoFns in cloud executor.
  • Fix pickling issue when the Seaborn package is loaded.
  • Enable gzip compression on text files sink.

Version 0.2.4

11 May 22:12

Choose a tag to compare

The 0.2.4 release includes the following changes:

  • Support for large iterable side inputs.
  • Enable support for all supported counter types.
  • Modify --requirements_file behavior to locally cache packages.
  • Support for non-native TextFileSink.
  • Several fixes.

Version 0.2.3

19 Apr 23:30

Choose a tag to compare

The 0.2.3 release includes several fixes:

  • Removed version pin for google-apitools package.
  • Removed version pin for oath2client package.
  • Better inter-op with the gcloud package
  • Raising correct exception for failures in start/finish DoFn methods.

Version 0.2.2

01 Apr 00:48

Choose a tag to compare

The 0.2.2 release includes the following changes:

  • Improved memory footprint for DirectPipelineRunner.
  • Multiple bug fixes (BigQuerySink schema handling for record field types, more clear error messages for missing files, etc.).
  • Several performance improvements (cythonize some files, reduced debug logging, etc.).
  • New example
    using more complex BigQuery schemas

This release supports only batch execution. Streaming processing is not available yet.
The batch execution can be done locally (for development/testing) or in the Google cloud using the Cloud Dataflow service. Running against the Google cloud requires whitelisting using this form.

Version 0.2.1

21 Mar 18:39

Choose a tag to compare

The 0.2.1 release includes the following changes:

  • Optimized performance for the following features:
    • Logging
    • Shuffle Writing
    • Using Coders
    • Compiling some of the worker modules with Cython
  • Changed the default behavior for Cloud execution: Instead of downloading the SDK from a Cloud Storage bucket, you now download the SDK as a tarball from GitHub. When you run jobs using the Dataflow service, the SDK version used will match the version you've downloaded (to your local environment). You can use the --sdk_location pipeline option to override this behavior and provide an explicit tarball location (Cloud Storage path or URL).
  • Fixed several pickling issues related to how Dataflow serializes user functions and data.
  • Fixed several worker lease expiration issues experienced when processing large datasets.
  • Improved validation to detect various common errors, such as access issues and invalid parameter combinations, much earlier in time.

Version 0.2.0

03 Mar 07:25

Choose a tag to compare

Initial release of the open-sourced Datafow SDK for Python.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.