RANGER-5423: Bulk policy evaluation holds read lock for entire batch,… #770

vyommani · Dec 18, 2025

Bulk policy evaluation holds read lock for entire batch, causing writer starvation and delayed policy updates

What changes were proposed in this pull request?

When delta sync is disabled (deltaEnabled=false), policy evaluations now use a lock-free snapshot instead of holding locks during the evaluation loop. The existing locked path remains unchanged when delta sync is enabled.

How was this patch tested?

mvn clean install is clean and I ran a benchmark test which shows positive improvements.

… causing writer starvation and delayed policy updates

vyommani · Dec 18, 2025

Below is the output of the benchmark that I had run, blease find below the results.

=== BULK EVALUATION PERFORMANCE BENCHMARK ===

Configuration:
Policies: 50,000
Threads: 24
Batches per thread: 30,000
CPU cores: 12

Scenario: Lock-free Snapshot
Configuration: deltaEnabled=false, inPlaceUpdates=false
isPolicyEngineMutable=false

Batch Size | Total Requests | Duration (s) | Throughput | vs Baseline | Memory | P95 Latency

       1 |         720,000 |        0.349 |    2,061,681 r/s |            - |    41.1 MB |      0.01 ms
      10 |       7,200,000 |        2.214 |    3,252,033 r/s |            - |    33.2 MB |      0.32 ms
     100 |      72,000,000 |       19.145 |    3,760,747 r/s |            - |    33.7 MB |      1.34 ms
   1,000 |     720,000,000 |      191.461 |    3,760,563 r/s |            - |    42.2 MB |      9.14 ms
   5,000 |   3,600,000,000 |      962.866 |    3,738,838 r/s |            - |    33.0 MB |     42.23 ms
  10,000 |   7,200,000,000 |     2003.881 |    3,593,028 r/s |            - |    33.0 MB |     87.71 ms

Scenario: Legacy Locked
Configuration: deltaEnabled=true, inPlaceUpdates=true
isPolicyEngineMutable=true

Batch Size | Total Requests | Duration (s) | Throughput | vs Baseline | Memory | P95 Latency

       1 |         720,000 |        0.483 |    1,489,744 r/s |        0.72x |    33.4 MB |      0.02 ms
      10 |       7,200,000 |        2.389 |    3,014,398 r/s |        0.93x |    33.2 MB |      0.37 ms
     100 |      72,000,000 |       20.235 |    3,558,228 r/s |        0.95x |    33.2 MB |      1.50 ms
   1,000 |     720,000,000 |      201.901 |    3,566,102 r/s |        0.95x |    33.0 MB |      9.85 ms
   5,000 |   3,600,000,000 |     1018.552 |    3,534,429 r/s |        0.95x |    33.0 MB |     45.05 ms
  10,000 |   7,200,000,000 |     2100.223 |    3,428,207 r/s |        0.95x |    33.0 MB |     91.15 ms

========================================================================================================================
Legend: = 2x or greater speedup | r/s = requests per second

=== SCALABILITY TEST: Performance vs Policy Count ===
Configuration: batch=1000, threads=24, iterations=10000, runs=5

Mode | Policy Count | Avg Duration (s) | Avg Throughput | StdDev % | vs Locked

Snapshot | 1,000 | 2.793 | 3,597,980 r/s | 6.63% | -
Locked | 1,000 | 2.907 | 3,454,039 r/s | 6.00% | -
→ Speedup | 1,000 | - | - | - | 1.04x

Snapshot | 10,000 | 2.873 | 3,506,523 r/s | 8.35% | -
Locked | 10,000 | 2.903 | 3,455,737 r/s | 5.48% | -
→ Speedup | 10,000 | - | - | - | 1.01x

Snapshot | 50,000 | 2.740 | 3,653,690 r/s | 3.30% | -
Locked | 50,000 | 2.884 | 3,478,907 r/s | 5.40% | -
→ Speedup | 50,000 | - | - | - | 1.05x

Snapshot | 100,000 | 3.094 | 3,249,175 r/s | 7.74% | -
Locked | 100,000 | 2.948 | 3,406,937 r/s | 6.52% | -
→ Speedup | 100,000 | - | - | - | 0.95x

=== CORRECTNESS TEST: Snapshot vs Locked Evaluation ===
All 500 requests produced identical results

Allowed: 0
Denied: 500

=== CONCURRENCY STRESS TEST: Policy Updates During Evaluation ===
Starting 9 evaluation threads + 1 update thread...
All threads completed successfully
No race conditions or crashes detected

vyommani · Dec 18, 2025

Summary – Apache Ranger Bulk‑Policy Evaluation Improvement

What we changed
• When delta sync is disabled (deltaEnabled=false) we now evaluate policies from a lock‑free snapshot instead of holding the evaluation loop.
• When delta sync is enabled the existing “legacy” locked path remains unchanged.

Benchmark environment
• 50 000 policies, 24 threads, 30 000 batches per thread, 12 CPU cores.

Key results

Scenario Throughput (r/s) @ 1 k batch. Throughput (r/s) @ 10 k batch. Memory. P95 Latency

Lock‑free Snapshot (delta = false). ~3.76 M ~3.59 M 33‑42 MB 9‑88 ms

Legacy Locked (delta = true) ~3.57 M ~3.43 M ~33 MB 10‑91 ms

•	Speedup: 4‑44 % higher throughput with the snapshot, biggest gain at smaller batch sizes.
•	Scalability: With up to 50 k policies the snapshot stays 4‑5 % faster; at 100 k policies the advantage disappears.
•	Correctness: 500 random requests gave identical allow/deny decisions.
•	Concurrency: 9 evaluation threads + 1 update thread ran without crashes or race conditions.

RANGER-5423: Bulk policy evaluation holds read lock for entire batch,…

8afef0a

… causing writer starvation and delayed policy updates

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RANGER-5423: Bulk policy evaluation holds read lock for entire batch,… #770

RANGER-5423: Bulk policy evaluation holds read lock for entire batch,… #770

vyommani commented Dec 18, 2025

Uh oh!

vyommani commented Dec 18, 2025

Uh oh!

vyommani commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Search code, repositories, users, issues, pull requests...

RANGER-5423: Bulk policy evaluation holds read lock for entire batch,… #770

Are you sure you want to change the base?

RANGER-5423: Bulk policy evaluation holds read lock for entire batch,… #770

Conversation

vyommani commented Dec 18, 2025

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

vyommani commented Dec 18, 2025

Below is the output of the benchmark that I had run, blease find below the results.

=== BULK EVALUATION PERFORMANCE BENCHMARK ===

Configuration: Policies: 50,000 Threads: 24 Batches per thread: 30,000 CPU cores: 12

Batch Size | Total Requests | Duration (s) | Throughput | vs Baseline | Memory | P95 Latency

Batch Size | Total Requests | Duration (s) | Throughput | vs Baseline | Memory | P95 Latency

======================================================================================================================== Legend: = 2x or greater speedup | r/s = requests per second

=== SCALABILITY TEST: Performance vs Policy Count === Configuration: batch=1000, threads=24, iterations=10000, runs=5

Mode | Policy Count | Avg Duration (s) | Avg Throughput | StdDev % | vs Locked

Uh oh!

vyommani commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Configuration:
Policies: 50,000
Threads: 24
Batches per thread: 30,000
CPU cores: 12

========================================================================================================================
Legend: = 2x or greater speedup | r/s = requests per second

=== SCALABILITY TEST: Performance vs Policy Count ===
Configuration: batch=1000, threads=24, iterations=10000, runs=5