Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

auyer/polars-extraction

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Polars Extractor

A simple data extraction script that can be deployed as a container. This uses Python, with Polars, ConnectorX and Pyarrow. The first two are implemented in Rust, and the second in C++. These libs do the heavy lifting, while Python binds it all.

This can be a lot less resource intensive then running PySpark (uses JVM), and a lot faster than using Pandas (Python implementation).

Parameters:

NAME DESCRIPTION DEFAULT
DATABASE_URL Database connection URL with credentials
TABLE DB Table
SCHEMA DB Schema public
WRITE_PATH Destination Path or fsspec URL
QUERY_OVERWRITE Query string to overwrite the default select * from schema.table
WRITE_PARTITIONED If filled, will use column as hive like partition
READ_PARTITIONED Makes the script read the data in parts (in parallel)
PARTITION_NUMBER If READ_PARTITIONED, instructs the amount of them. 4

Running it locally:

DATABASE_URL=postgresql://user:password@localhost:5432/cast_concursos TABLE=table_name WRITE_PATH="." WRITE_PARTITIONED="partition_column" python main.py

About

A sample repository showing how to extract data from a database with Polars.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
Morty Proxy This is a proxified and sanitized view of the page, visit original site.