-
Notifications
You must be signed in to change notification settings - Fork 220
Description
Problem
The current load_bulk()
method calls add_child()
or add_root()
in a loop, which means each node insertion triggers an individual database save operation. For large tree structures (thousands of nodes), this becomes a significant performance bottleneck.
Background: I'm using django-treebeard in a production enterprise application with high tree write throughput. We frequently need to load large tree structures (10,000-100,000+ nodes) from external sources, and the current load_bulk() implementation was unusable for these use cases. I've implemented a custom mixin using treebeard's utility methods to batch the inserts, which reduced our load time by 100x+. I'd love to contribute an implementation back to the community.
Current Behavior
# In load_bulk()
if parent:
node_obj = parent.add_child(**node_data)
else:
node_obj = cls.add_root(**node_data)
Each iteration performs a separate database round trip, making it infeasible to load large trees efficiently.
Proposed Solution
Add a true bulk loading method that:
- Collects all node data in memory (or in batches)
- Calculates tree structure metadata (path/depth/etc.) upfront
- Uses Django's bulk_create() for efficient batch inserts
This could be:
- A new method like
bulk_create_tree()
orload_bulk_optimized()
- A parameter on existing
load_bulk()
:load_bulk(..., use_bulk_insert=True)
Questions
- Would the maintainers be interested in this feature?
- Any preference on API design (new method vs. parameter)?
- I'm happy to implement this if it aligns with the project's direction