Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Feat: handle invalid parquet data source configs#152

Merged
prakharmathur82 merged 239 commits intoraystack:dagger-parquet-file-processingraystack/dagger:dagger-parquet-file-processingfrom
Meghajit:feat/issue#150-handle-invalid-parquet-data-source-configsMeghajit/dagger:feat/issue#150-handle-invalid-parquet-data-source-configsCopy head branch name to clipboard
May 30, 2022
Merged

Feat: handle invalid parquet data source configs#152
prakharmathur82 merged 239 commits intoraystack:dagger-parquet-file-processingraystack/dagger:dagger-parquet-file-processingfrom
Meghajit:feat/issue#150-handle-invalid-parquet-data-source-configsMeghajit/dagger:feat/issue#150-handle-invalid-parquet-data-source-configsCopy head branch name to clipboard

Conversation

@Meghajit
Copy link
Member

PR for #150

- added split assigners to assign splits based on
timestamp in url and based on index in filepaths array

[raystack#99]
- add methods to get FileSplitAssigner and FileRecordFormat based
on configs
- pass StencilClientOrchestrator to SourceFactory as well when
creating the source

[raystack#99]
- this is required for parsing the parquet SimpleGroup data
structure into Java objects.

[raystack#99]
- implement parsers for int32, int64 and boolean
parquet data types

[raystack#99]
- remove abstract method serializer from the interface
as it is not required

[raystack#99]
- return DaggerDeserializationException instead of
ClassCastException when logical type is incorrect

[raystack#99]
- return DaggerDeserializationException instead of
ClassCastException when logical type is incorrect

[raystack#99]
- change the class to a usual class instead of
a factory class

[raystack#99]
- ParquetDataTypeParser.getValueOrDefault() now returns
the default value only if the deserialized value is null.

[raystack#99]
Meghajit added 14 commits May 6, 2022 11:08
- add validation methods to check if SimpleGroup map
schema follows Apache Parquet LogicalTypes spec or legacy one
- official spec
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#backward-compatibility-rules-1
- add some tests

[raystack#137]
- add tests
- refactor implementation of the original method into smaller
modular methods

[raystack#137]
- remove proto keyword
- update usages
- this fixes for review comment
raystack#138 (comment)
and raystack#138 (comment)

[raystack#138]
…00-parquet-complex-and-repeated-datatype-deserialization
- replace transformFromKafka with transformFromProto
- fixes for review comment
raystack#140 (comment)

[raystack#140]
…serialization' into feat/issue#137-parquet-map-and-group-timestamp-deserialization
…00-parquet-complex-and-repeated-datatype-deserialization
…serialization' into feat/issue#137-parquet-map-and-group-timestamp-deserialization
Meghajit added 7 commits May 17, 2022 13:22
- replace `KafkaTransform` keyword

[raystack#100]
…' into feat/issue#100-parquet-complex-and-repeated-datatype-deserialization
…serialization' into feat/issue#137-parquet-map-and-group-timestamp-deserialization
…' into feat/issue#137-parquet-map-and-group-timestamp-deserialization
…ization' into feat/issue#150-handle-invalid-parquet-data-source-configs
…' into feat/issue#150-handle-invalid-parquet-data-source-configs

# Conflicts:
#	dagger-core/src/main/java/io/odpf/dagger/core/source/config/StreamConfig.java
#	dagger-core/src/main/java/io/odpf/dagger/core/source/config/adapter/FileDateRangeAdaptor.java
#	dagger-core/src/main/java/io/odpf/dagger/core/source/parquet/splitassigner/ChronologyOrderedSplitAssigner.java
#	dagger-core/src/test/java/io/odpf/dagger/core/source/config/StreamConfigTest.java
#	dagger-core/src/test/java/io/odpf/dagger/core/source/config/adapter/FileDateRangeAdaptorTest.java
#	dagger-core/src/test/java/io/odpf/dagger/core/source/parquet/ParquetFileSourceTest.java
#	dagger-core/src/test/java/io/odpf/dagger/core/source/parquet/splitassigner/ChronologyOrderedSplitAssignerTest.java
@prakharmathur82 prakharmathur82 merged commit 1cf812a into raystack:dagger-parquet-file-processing May 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle missing configs, extra whitespaces or incorrect config values for Parquet Source

2 participants

Comments

Close sidebar
Morty Proxy This is a proxified and sanitized view of the page, visit original site.