Apache Beam provides a simple, powerful programming model for building both batch and streaming parallel data processing pipelines.
The Apache Beam SDK for Python provides access to Apache Beam capabilities from the Python programming language.
The key concepts in this programming model are
PCollection: represents a collection of data,
which could be bounded or unbounded in size.
PTransform: represents a
computation that transforms input PCollections into output PCollections.
Pipeline: manages a directed acyclic graph of
PTransform s and
PCollection s that is ready for execution.
PipelineRunner: specifies where and how
the pipeline should execute.
Read: read from an external source.
Write: write to an external data sink.
At the top of your source file:
import apache_beam as beam
After this import statement
Transform classes are available as
beam.FlatMap,
beam.GroupByKey, etc.
Pipeline class is available as
beam.Pipeline
Text read/write transforms are available as
beam.io.ReadFromText,
beam.io.WriteToText.
Examples
The examples subdirectory has some examples.
CoderAvroGenericCoderBooleanCoderBytesCoderCloudpickleCoderDillCoderFastPrimitivesCoderFloatCoderIterableCoderListCoderMapCoderNullableCoderPickleCoderProtoCoderProtoPlusCoderShardedKeyCoderSinglePrecisionFloatCoderSingletonCoderStrUtf8CoderTimestampCoderTupleCoderTupleSequenceCoderVarIntCoderWindowedValueCoderParamWindowedValueCoderBigIntegerCoderDecimalCoderPaneInfoCoderextract_counter_value()extract_gauge_value()extract_distribution()extract_string_set_value()extract_bounded_trie_value()extract_histogram_value()create_labels()int64_user_counter()int64_counter()int64_user_distribution()int64_distribution()int64_user_gauge()int64_gauge()user_set_string()user_histogram()user_bounded_trie()create_monitoring_info()is_counter()is_gauge()is_distribution()is_histogram()is_string_set()is_bounded_trie()is_user_monitoring_info()extract_metric_result_map_value()parse_namespace_and_name()get_step_name()to_key()sum_payload_combiner()distribution_payload_combiner()consolidate()get_generator()parse_byte_size()div_round_up()rotate_key()initial_splitting_zipf()SyntheticStepNonLiquidShardingOffsetRangeTrackerSyntheticSDFStepRestrictionProviderget_synthetic_sdf_step()SyntheticSourceSyntheticSDFSourceRestrictionProviderSyntheticSDFAsSourceShuffleBarrierSideInputBarriermerge_using_gbk()merge_using_side_input()expand_using_gbk()expand_using_second_output()parse_args()run()StatefulLoadGeneratorconvert_to_typing_type()iter_urns()PayloadBuilderSchemaBasedPayloadBuilderImplicitSchemaPayloadBuilderNamedTupleBasedPayloadBuilderSchemaTransformPayloadBuilderExplicitSchemaTransformPayloadBuilderJavaClassLookupPayloadBuilderSchemaTransformsConfigManagedReplacementSchemaAwareExternalTransformJavaExternalTransformAnnotationBasedPayloadBuilderDataclassBasedPayloadBuilderExternalTransformExpansionAndArtifactRetrievalStubJavaJarExpansionServiceBeamJarExpansionServicememoize()StateSpecReadModifyWriteStateSpecBagStateSpecSetStateSpecCombiningValueStateSpecOrderedListStateSpecTimerTimerSpecon_timer()get_dofn_specs()is_stateful_dofn()validate_stateful_dofn()BaseTimerRuntimeTimerRuntimeStateReadModifyWriteRuntimeStateAccumulatingRuntimeStateBagRuntimeStateSetRuntimeStateCombiningValueRuntimeStateOrderedListRuntimeStateUserStateContextmatch_is_named_tuple()extract_optional_type()is_any()is_new_type()is_forward_ref()convert_builtin_to_typing()convert_typing_to_builtin()convert_collections_to_typing()is_builtin()TypedWindowedValueconvert_to_beam_type()convert_to_beam_types()convert_to_python_type()convert_to_python_types()pop_one()pop_two()pop_three()push_value()nop()resume()pop_top()end_for()end_send()copy()rot_n()rot_two()rot_three()rot_four()dup_top()unary()unary_positive()unary_negative()unary_invert()unary_not()unary_convert()get_iter()symmetric_binary_op()binary_power()inplace_power()binary_multiply()inplace_multiply()binary_divide()inplace_divide()binary_floor_divide()inplace_floor_divide()binary_true_divide()inplace_true_divide()binary_modulo()inplace_modulo()binary_add()inplace_add()binary_subtract()inplace_subtract()binary_subscr()binary_lshift()inplace_lshift()binary_rshift()inplace_rshift()binary_and()inplace_and()binary_xor()inplace_xor()binary_or()inplace_or()binary_op()store_subscr()binary_slice()store_slice()print_item()print_newline()list_append()set_add()map_add()load_locals()exec_stmt()build_class()unpack_sequence()dup_topx()store_attr()delete_attr()store_global()delete_global()load_const()load_name()build_tuple()build_list()build_set()build_map()build_const_key_map()list_to_tuple()build_string()list_extend()set_update()dict_update()dict_merge()load_attr()load_method()compare_op()is_op()contains_op()import_name()import_from()load_global()store_map()load_fast()load_fast_load_fast()load_fast_check()load_fast_and_clear()store_fast()store_fast_store_fast()store_fast_load_fast()delete_fast()swap()reraise()gen_start()load_closure()load_deref()make_function()set_function_attribute()make_closure()build_slice()to_bool()format_value()convert_value()format_simple()format_with_spec()build_list_unpack()build_set_unpack()build_tuple_unpack()build_tuple_unpack_with_call()build_map_unpack()named_fields_to_schema()named_fields_from_schema()typing_to_runner_api()typing_from_runner_api()value_to_runner_api()value_from_runner_api()option_to_runner_api()option_from_runner_api()schema_field()SchemaTranslationnamed_tuple_from_schema()named_tuple_to_schema()schema_from_element_type()named_fields_from_element_type()union_schema_type()LogicalTypeRegistryLogicalTypeNoArgumentLogicalTypePassThroughLogicalTypeMicrosInstantRepresentationMillisInstantMicrosInstantPythonCallableFixedPrecisionDecimalArgumentRepresentationDecimalLogicalTypeFixedPrecisionDecimalLogicalTypeFixedBytesVariableBytesFixedStringVariableStringJdbcDateTypeJdbcTimeTypePermanentExceptionFuzzedExponentialIntervalsretry_on_server_errors_filter()retry_on_server_errors_and_notfound_filter()retry_on_server_errors_and_timeout_filter()retry_on_server_errors_timeout_or_quota_issues_filter()retry_on_beam_io_error_filter()retry_if_valid_input_but_server_error_and_timeout_filter()Clockno_retries()with_exponential_backoff()NotAvailableWithReasonProvideras_provider()as_provider_list()ExternalProviderjava_jar()maven_jar()beam_jar()docker()RemoteProviderExternalJavaProviderpython()ExternalPythonProviderYamlProviderfix_pycallable()InlineProviderMetaInlineProviderget_default_sql_provider()SqlBackedProviderelement_to_rows()dicts_to_rows()YamlProvidersTranslatingProvidercreate_java_builtin_provider()PypiExpansionServiceRenamingProviderload_providers()parse_providers()merge_providers()standard_providers()PipelinePipeline.runner_implemented_transforms()Pipeline.display_data()Pipeline.optionsPipeline.allow_unsafe_triggersPipeline.transform_annotations()Pipeline.replace_all()Pipeline.run()Pipeline.visit()Pipeline.apply()Pipeline.to_runner_api()Pipeline.merge_compatible_environments()Pipeline.from_runner_api()transform_annotations()