Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Discussion options

Hi Sentry team,

We’ve been testing the new feature flag functionality (currently in beta) and ran into an issue that appears to have been addressed by #4034

The fix introduces a lock to prevent concurrency issues, but we were wondering why FlagBuffered is shared across contexts rather than being thread-local. Wouldn’t it make more sense for it to be per-context to avoid the need for locking?

Thanks in advance for your help!

You must be logged in to vote

Thanks for reaching out!

we were wondering why FlagBuffered is shared across contexts rather than being thread-local. Wouldn’t it make more sense for it to be per-context to avoid the need for locking?

The first order question being "why lock"? It solved a problem in a way I knew how to solve it. There's no deeper meaning to it than that. If this has caused deadlocks for you or performance degradation please open an issue and document the problem as best you can. We'll address as quickly as possible.

The original question being "Why share FlagBuffer across contexts". I don't think it should be shared but sharing it is a choice (or mistake) people make. In my opinion, the Scope should be…

Replies: 1 comment · 3 replies

Comment options

Thanks for reaching out!

we were wondering why FlagBuffered is shared across contexts rather than being thread-local. Wouldn’t it make more sense for it to be per-context to avoid the need for locking?

The first order question being "why lock"? It solved a problem in a way I knew how to solve it. There's no deeper meaning to it than that. If this has caused deadlocks for you or performance degradation please open an issue and document the problem as best you can. We'll address as quickly as possible.

The original question being "Why share FlagBuffer across contexts". I don't think it should be shared but sharing it is a choice (or mistake) people make. In my opinion, the Scope should be forked when you enter a new thread or async context but that assumes something about the way your program works.

"Why not make this threadlocal". Two reasons, 1) we need it to work in async code and 2) we need the state of the parent to be copied to the child. threadlocals don't solve for either of those conditions.

Phrasing the question differently "why not use a contextvar". That gives us the state of the parent in the child and it works in async code. But now we run into a new set of problems. Mutable data-structures which are mutated in the child are also mutated in the parent and in other children. To solve this we need to do something. These are the ideas I came up with to solve this problem:

  1. Use immutable data structures (maybe a linked-list).
  2. Copy the contextvar (flag buffer) on each call to get.

There may be more options but this is what I could think of. Copying the contextvar (flag buffer) on get sounds more expensive that a lock but I haven't measured. Immutable data-structures are interesting. I've not implemented an LRU using them before. Maybe there's something to be gained from more research in this area. Will an immutable LRU be faster or slower than a lock? I don't know; I haven't measured. Should we use an LRU at all? If these locks are causing issues maybe we've made a mistake in our choice of data structure. We can and should re-assess if that's the case.

Having said all that. Is it valuable to scope the FlagBuffer to a thread or async context? I can write code like this in python.

x = 1
t = threading.Thread(target=lambda: global x; x = x + 1)
t.start()
t.join()

if x == 2:
    raise Exception("oops")

x is mutated within the child and has an affect on the remaining program execution. If mutating x was guarded by a feature flag should Sentry capture that flag? I think so. But doing so (in a developer-friendly format) requires the programmer to mutate a flag buffer which is shared between parent and child.

At the moment I'm erring on the side of leaving the lock in. That being said, I only know what I know. If I've made a mistake or I've missed something or if you know a different way please correct me and I can update the code. Thanks again for reaching out!

You must be logged in to vote
3 replies
@antoniobg
Comment options

Thank you very much for taking the time to write such a detailed answer.

To give more context on our case, we observed issues in two scenarios:

  1. Dramatiq jobs (we were using the sentry-dramatiq library, which I just noticed has now been integrated into the official SDK).
  2. An async job structured similarly to this one:
import asyncio
from sentry_sdk import configure_scope
from contextlib import contextmanager
from concurrent.futures import ThreadPoolExecutor

class CustomJob:
    def __init__(
        self,
        max_threads: int = 10,
        pool_size: int = 1000,
    ):
        self.__max_threads = max_threads
        self.__pool_size = pool_size
        self.__event_loop = asyncio.new_event_loop()
        self.__executor = ThreadPoolExecutor(max_workers=self.__max_threads)

    @contextmanager
    def _sentry_metadata(self):
        with configure_scope() as scope:
            scope.set_tag('job.name', self.observability_name)
            yield

    def perform(
        self, *args, **kwargs
    ):
        with self._sentry_metadata():
            self.__event_loop.run_until_complete(self.__do_async_stuff())
            self.__event_loop.close()

    async def __do_async_stuff(self, ids: list[int]):
        # ...
        pass

I suspect that we might not be handling Sentry’s scope correctly in this job. From my understanding, we should be forking the scope with new_scope() in each async task to ensure proper isolation. Could you confirm if this is the correct approach?

Again, thank you for your time. We’ll test the feature with the latest version and monitor its behavior.

@cmanallen
Comment options

@antoniobg Sorry for the delay! Yes, that's the correct implementation.

@antoniobg
Comment options

Thanks, @cmanallen! By the way, we've given it another try with the newer SDK version and haven't seen the issue anymore (also no deadlocks or performance degradation).

Answer selected by antoniobg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
🙏
Q&A
Labels
None yet
2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.