-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[Impeller] when decoding image on iOS, block on command buffer scheduling and log gpu errors. #169378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
<< "GPU Error submitting image decoding command buffer."; | ||
} | ||
}, | ||
/*block_on_schedule=*/true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be inefficient, needlessly synchronizing the cpu and the gpu in most cases.
I think the better way to implement this would be for ImageDecoderImpeller
to maintain a receipt from submitting the command buffer. It will add a fml::SyncSwitch::Observer
to the fml::SyncSwitch
, when that notifies the ImageDecoderImpeller
that we are losing access to the GPU it uses the receipt to get access to the buffer and call waitUntilScheduled
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is not what block_on_scheduled does. in fact, scheduling has nothing to do with gpu usage at all (that would be waitUntilCompleted). All it blocks on is the queue acknowledging the buffer, which is not synchronous unlike GLES/Vulkan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an asynchronous operation happening which waitUntilScheduled
is causing the calling thread to block on. The approach I'm proposing will only block the thread in the case of applicationDidEnterBackground:
which is my understanding of what Apple is recommending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but blocking a background worker for a ms or two as part of an operation (image decoding) that regularly takes 100s of ms hardly seems like a problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not. I'm not quite sure why we would prefer it though when we know of an alternative. Plus the alternative doesn't make everyone that submits a command buffer learn what that flag means when it's only relevant on one platform, in one usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus we're dealing with a bit of a black box with the driver. I'd say it's preferred we follow Apple's recommendation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and I'll add that we don't know if it fixes it for real or not. If we want to engineer a better solution if it works, I'm down for that. But if we end up back at square one because it doesn't fix it I plan to revert this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From offline chat: Calling waitForScheduled when we don't have access to the gpu will cause deadlock, but this is happening only inside of a syncswitch block to guarantee we have the GPU.
<< "GPU Error submitting image decoding command buffer."; | ||
} | ||
}, | ||
/*block_on_schedule=*/true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only going to be an issue for the decoder? Seems like command buffers submitted on the raster thread could run into the same problem too, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never gotten a report of that. Perhaps if it did happen it isn't noticable that the encoding fails since by the time the app is actually foregrounded another frame will have rendered. Whereas if the upload fails for image decoding that failure sticks around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm fine going ahead with this to test out once we've cleaned it up. Add documentation and file an tech debt issue for the extra thread blocking we could remove.
@@ -107,7 +107,8 @@ class CommandBuffer { | ||
|
||
virtual std::shared_ptr<BlitPass> OnCreateBlitPass() = 0; | ||
|
||
[[nodiscard]] virtual bool OnSubmitCommands(CompletionCallback callback) = 0; | ||
[[nodiscard]] virtual bool OnSubmitCommands(bool block_on_schedule, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstring for block_on_schedule. (just link to SubmitCommands)
@@ -127,9 +128,10 @@ class CommandBuffer { | ||
/// | ||
/// @param[in] callback The completion callback. | ||
/// | ||
[[nodiscard]] bool SubmitCommands(const CompletionCallback& callback); | ||
[[nodiscard]] bool SubmitCommands(bool block_on_schedule, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstring to block_on_schedule.
Attempted fix for #166668 . Plan is to ask for testing once rolled into g3 since we weren't given a repro.
Explaination: we can only submit command buffers to the metal command queue when the device is in the foreground. push notifications temporarily wake up the app and can trigger images to begin decoding. This decoding can fail, possibly due to the app re-backgrounding. In theory, the sync switch should prevent this from happening - but because we don't actually block on the scheduling of the command buffer there is technically a race that can happen.
Add the ability to block submission on scheduling for metal. note that other graphics APIs do not have a distinction for commited vs scheduled and so it no-ops for them.
See https://developer.apple.com/documentation/metal/preparing-your-metal-app-to-run-in-the-background