Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

dridi
Copy link
Member

@dridi dridi commented Oct 3, 2025

The changes from #4073 increases the likelihood of a data race between a BOC state check followed immediately by an objcore flag check:

ObjWaitState(oc, BOS_STREAM);
AZ(oc->flags & OC_F_BUSY);
if (oc->boc->state == BOS_FAILED)
        AN(oc->flags & OC_F_FAILED);

Once we reach BOS_STREAM, we touch oc->flags to check OC_F_BUSY.

If the backend fetch failed immediately after reaching BOS_STREAM, we may then observe BOS_FAILED when we touch oc->boc->state but still see oc->flags from before the failure in this thread.

Because oc->flags and oc->boc->state are guarded by different locks, we can't safely rely on this construct. One option could be to move the OC_F_BUSY check down, and another, could be to capture the BOC state:

state = ObjWaitState(oc, BOS_STREAM);
AZ(oc->flags & OC_F_BUSY);
if (state == BOS_FAILED)
        AN(oc->flags & OC_F_FAILED);

This pull request implements the state capture.

dridi added 2 commits October 3, 2025 10:18
This will allow safe checks related to BOC states that are not guarded
by the BOC lock.
After returning from ObjWaitState(), replace any access of
oc->boc->state with the one that was captured by the wait
loop.
dridi added 2 commits October 3, 2025 12:58
It should be safe as it was because there shouldn't be a transition from
BOS_FINISHED to BOS_FAILED, but now that it can be atomic, it should not
only be safe but also future-proof. It originally matched ignore_state
from the coccinelle patch because of the separate boc variable, and
had to be manually edited.
@dridi dridi force-pushed the atomic_boc_state branch from 1a0f32e to ad2a668 Compare October 3, 2025 10:59
@nigoroll
Copy link
Member

nigoroll commented Oct 8, 2025

I understand that we should read the boc state consistently, but don't we have an outdated read left on oc->flags here?

@dridi
Copy link
Member Author

dridi commented Oct 8, 2025

I'm not sure what "here" is referring to, the link you shared is highlighting if (state == BOS_FAILED).

Around this line there are two accesses to oc->flags:

state = ObjWaitState(oc, BOS_STREAM);
AZ(oc->flags & OC_F_BUSY);
if (state == BOS_FAILED)
        AN(oc->flags & OC_F_FAILED);

The first one is always true so it's not a problem, and the second is safe too as long as the observed state is BOS_FAILED. We first fail the oc before failing the boc, so capturing BOS_FAILED is our warranty.

Whereas before, the sequence could have been:

  • client resumes upon BOS_STREAM
  • client observes no OC_F_BUSY
  • backend fetch fails
  • client observes BOS_FAILED
  • client cannot observe OC_F_FAILED

@nigoroll
Copy link
Member

@dridi I clarified my earlier comment.
My question was regarding outdated reads of OC_F_*. We set the oc flags before taking the boc lock so, while common architectures are likely to have a read of the oc flags updated, there is no guarantee, IIUC.

@nigoroll
Copy link
Member

But anyway, irrespective of a possible outdated read, this patch is an improvement, so we should merge it.

@dridi
Copy link
Member Author

dridi commented Oct 13, 2025

Oh, we may still observe an outdated oc->flags, but because we observed BOS_FAILED at resumption time it must contain the OC_F_FAILED bit. That bit is raised prior to reaching BOS_FAILED.

@dridi
Copy link
Member Author

dridi commented Oct 13, 2025

while common architectures are likely to have a read of the oc flags updated, there is no guarantee, IIUC.

I guess the alternative is to either drop the check altogether or check while holding the objhead lock. I would expect that the boc critical section on the client side is enough to invalidate the oc pointer and warrant a refresh.

@nigoroll
Copy link
Member

I would expect that the boc critical section on the client side is enough to invalidate the oc pointer and warrant a refresh.

That's what I guessed with similarly structured code and had to learn that the answer is no. But we can still see if it happens here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.