Clarification on concurrency fix in feature flags (PR #4034) #4101

Feb 26, 2025

antoniobg
Feb 26, 2025

Hi Sentry team,

We’ve been testing the new feature flag functionality (currently in beta) and ran into an issue that appears to have been addressed by #4034

The fix introduces a lock to prevent concurrency issues, but we were wondering why FlagBuffered is shared across contexts rather than being thread-local. Wouldn’t it make more sense for it to be per-context to avoid the need for locking?

Thanks in advance for your help!

Answered by cmanallen

Mar 7, 2025

Thanks for reaching out!

we were wondering why FlagBuffered is shared across contexts rather than being thread-local. Wouldn’t it make more sense for it to be per-context to avoid the need for locking?

The first order question being "why lock"? It solved a problem in a way I knew how to solve it. There's no deeper meaning to it than that. If this has caused deadlocks for you or performance degradation please open an issue and document the problem as best you can. We'll address as quickly as possible.

The original question being "Why share FlagBuffer across contexts". I don't think it should be shared but sharing it is a choice (or mistake) people make. In my opinion, the Scope should be…

View full answer

antoniobg · Mar 7, 2025

cmanallen
Mar 7, 2025
Collaborator

Thanks for reaching out!

we were wondering why FlagBuffered is shared across contexts rather than being thread-local. Wouldn’t it make more sense for it to be per-context to avoid the need for locking?

The first order question being "why lock"? It solved a problem in a way I knew how to solve it. There's no deeper meaning to it than that. If this has caused deadlocks for you or performance degradation please open an issue and document the problem as best you can. We'll address as quickly as possible.

The original question being "Why share FlagBuffer across contexts". I don't think it should be shared but sharing it is a choice (or mistake) people make. In my opinion, the Scope should be forked when you enter a new thread or async context but that assumes something about the way your program works.

"Why not make this threadlocal". Two reasons, 1) we need it to work in async code and 2) we need the state of the parent to be copied to the child. threadlocals don't solve for either of those conditions.

Phrasing the question differently "why not use a contextvar". That gives us the state of the parent in the child and it works in async code. But now we run into a new set of problems. Mutable data-structures which are mutated in the child are also mutated in the parent and in other children. To solve this we need to do something. These are the ideas I came up with to solve this problem:

Use immutable data structures (maybe a linked-list).
Copy the contextvar (flag buffer) on each call to get.

There may be more options but this is what I could think of. Copying the contextvar (flag buffer) on get sounds more expensive that a lock but I haven't measured. Immutable data-structures are interesting. I've not implemented an LRU using them before. Maybe there's something to be gained from more research in this area. Will an immutable LRU be faster or slower than a lock? I don't know; I haven't measured. Should we use an LRU at all? If these locks are causing issues maybe we've made a mistake in our choice of data structure. We can and should re-assess if that's the case.

Having said all that. Is it valuable to scope the FlagBuffer to a thread or async context? I can write code like this in python.

x = 1
t = threading.Thread(target=lambda: global x; x = x + 1)
t.start()
t.join()

if x == 2:
    raise Exception("oops")

x is mutated within the child and has an affect on the remaining program execution. If mutating x was guarded by a feature flag should Sentry capture that flag? I think so. But doing so (in a developer-friendly format) requires the programmer to mutate a flag buffer which is shared between parent and child.

At the moment I'm erring on the side of leaving the lock in. That being said, I only know what I know. If I've made a mistake or I've missed something or if you know a different way please correct me and I can update the code. Thanks again for reaching out!

3 replies

antoniobg Mar 11, 2025
Author

Thank you very much for taking the time to write such a detailed answer.

To give more context on our case, we observed issues in two scenarios:

Dramatiq jobs (we were using the sentry-dramatiq library, which I just noticed has now been integrated into the official SDK).
An async job structured similarly to this one:

import asyncio
from sentry_sdk import configure_scope
from contextlib import contextmanager
from concurrent.futures import ThreadPoolExecutor

class CustomJob:
    def __init__(
        self,
        max_threads: int = 10,
        pool_size: int = 1000,
    ):
        self.__max_threads = max_threads
        self.__pool_size = pool_size
        self.__event_loop = asyncio.new_event_loop()
        self.__executor = ThreadPoolExecutor(max_workers=self.__max_threads)

    @contextmanager
    def _sentry_metadata(self):
        with configure_scope() as scope:
            scope.set_tag('job.name', self.observability_name)
            yield

    def perform(
        self, *args, **kwargs
    ):
        with self._sentry_metadata():
            self.__event_loop.run_until_complete(self.__do_async_stuff())
            self.__event_loop.close()

    async def __do_async_stuff(self, ids: list[int]):
        # ...
        pass

I suspect that we might not be handling Sentry’s scope correctly in this job. From my understanding, we should be forking the scope with new_scope() in each async task to ensure proper isolation. Could you confirm if this is the correct approach?

Again, thank you for your time. We’ll test the feature with the latest version and monitor its behavior.

cmanallen Mar 17, 2025
Collaborator

@antoniobg Sorry for the delay! Yes, that's the correct implementation.

antoniobg Mar 18, 2025
Author

Thanks, @cmanallen! By the way, we've given it another try with the newer SDK version and haven't seen the issue anymore (also no deadlocks or performance degradation).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on concurrency fix in feature flags (PR #4034) #4101

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment · 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Search code, repositories, users, issues, pull requests...

Clarification on concurrency fix in feature flags (PR #4034) #4101

Uh oh!

antoniobg Feb 26, 2025

Replies: 1 comment · 3 replies

Uh oh!

cmanallen Mar 7, 2025 Collaborator

Uh oh!

antoniobg Mar 11, 2025 Author

Uh oh!

Uh oh!

cmanallen Mar 17, 2025 Collaborator

Uh oh!

antoniobg Mar 18, 2025 Author

antoniobg
Feb 26, 2025

cmanallen
Mar 7, 2025
Collaborator

antoniobg Mar 11, 2025
Author

cmanallen Mar 17, 2025
Collaborator

antoniobg Mar 18, 2025
Author