Polars Extractor

A simple data extraction script that can be deployed as a container. This uses Python, with Polars, ConnectorX and Pyarrow. The first two are implemented in Rust, and the second in C++. These libs do the heavy lifting, while Python binds it all.

This can be a lot less resource intensive then running PySpark (uses JVM), and a lot faster than using Pandas (Python implementation).

Parameters:

NAME	DESCRIPTION	DEFAULT
DATABASE_URL	Database connection URL with credentials
TABLE	DB Table
SCHEMA	DB Schema	public
WRITE_PATH	Destination Path or fsspec URL
QUERY_OVERWRITE	Query string to overwrite the default	select * from schema.table
WRITE_PARTITIONED	If filled, will use column as hive like partition
READ_PARTITIONED	Makes the script read the data in parts (in parallel)
PARTITION_NUMBER	If READ_PARTITIONED, instructs the amount of them.	4

Running it locally:

DATABASE_URL=postgresql://user:password@localhost:5432/cast_concursos TABLE=table_name WRITE_PATH="." WRITE_PARTITIONED="partition_column" python main.py

Name	Name	Last commit message	Last commit date
Latest commit History 3 Commits
Dockerfile	Dockerfile
README.md	README.md
main.py	main.py
requirements.txt	requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Polars Extractor

Parameters:

Running it locally:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

auyer/polars-extraction

Folders and files

Latest commit

History

Repository files navigation

Polars Extractor

Parameters:

Running it locally:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages