Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Functions to get all darwin cut notes based on image dimensions - in python and spark for efficient parallel processing

Notifications You must be signed in to change notification settings

HackTheStacks/darwin-image-preprocessing

Open more actions menu

Repository files navigation

darwin-image-preprocessing

Functions to get all darwin cut notes based on image dimensions and throw away full-page notes (non cut notes). Works by comparing image dimensions to mean image dimensions within folder. Written in PySpark for efficient parallel processing due to dataset size of ~350GB and ~60k images.

About

Functions to get all darwin cut notes based on image dimensions - in python and spark for efficient parallel processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.