-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
馃専 Feature Description
Integrate Bodo Dataframe's or compiler into qlib as an optional execution backend. Bodo Dataframe's are a drop in replacement for Pandas dataframes and offer expression tree optimization with lazy evaluation and parallel execution that can scale to multiple nodes. It automatically fallsback to Pandas if an unsupported operation is encountered. Bodo's compiler is built on Numba and automatically parallelizes numpy/pandas/python code across one or more nodes. This could accelerate qlib expression execution or data transformations
Motivation
- Application scenario
Accelerating/Scaling qlib data transformations, potentially to multiple nodes - Related works (Papers, Github repos etc.):
https://github.com/bodo-ai/bodo
Alternatives
There are other options for high performance/parallel dataframes such as polars and pyspark but polars isn't pandas compatible increasing the integration effort and pyspark can be complicated to configure and deploy and doesn't offer competitive performance in our experience.
Additional Notes
My main purpose in creating this issue is gauging if there's interest and if so the best places to start looking for a POC that I would work on. Thanks!