Fix 'Invalid argument' due to stale / GC'd NAPI::Values in Wrap() cache #667
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @markandrus and other maintainers (new to the project, not sure who all might be looking at this!),
Thanks for your work on this project. I can tell from working on this PR that it has been a big undertaking and I really appreciate that.
I'm submitting this PR in reference to one of the issues discussed in #645 - specifically the second issue mentioned where
getTransceivers()
,getReceivers()
andgetSenders()
on theRTCPeerConnection
can sometimes (seemingly randomly) throw aInvalid argument
up the stack back to NodeJS / V8.This issue has been seemingly random at first, but frequent / reproducible enough on our project that I decided to dive in and see if I couldn't figure it out / fix it. I am quite confident at this point that I have worked around / "fixed" the issue for our use case, and we are working on deploying it now for our own use via a manual github linkage in our
package.json
plus hosting our own built binaries.However, that is obviously not ideal long-term, and I'm also not 100% confident that the way I fixed the issue is correct, so I'd love to have you look at this and help determine if we should merge this fix to mainline or fix it another way instead.
The issue, I think, is that the
Wrap()
/BidiMap
structure being used for most of theRtc...
objects, in conjunction with the use ofNapi::HandleScope
in theCreate()
functions for these objects, is technically allowing Node / V8 to GC these value items despite the BidiMap holding a pointer to it. This GC can even happen within a single callstack (although that has been much rarer), because theHandleScope
created right before we instantiate theseValue
objects inCreate()
means that they are technically "out of scope" immediately afterCreate()
completes. Under sufficient memory pressure or other conditions, the VM can choose to GC them which appears to set_value
tonullptr
which in turn means theseValue
objects now showIsEmpty() == true
AKA they areundefined
for all intents and purposes beyond that point.Depending on how they are used from here, this can result in either the
Invalid argument
referenced by @mlogan (#645 (comment)) and @dxiao2003 (#645 (comment)), or it can in some cases propagate all the way to JS code which will then find e.g.transceiver.receiver === undefined
(which shouldn't be possible), or it can hard-segfault the entire VM immediately (e.g. inRTCPeerConnection::OnTrack
, whenConvert
ing aMediaStream
that has experienced this issue).Where my confidence begins to decline a bit is in my fix - I know it works, but I'm not sure if it's the best way to fix it long-term or if there is a better way to fix it. As you can see in the code in this PR, calling
->Ref()
on the value in question ensures that it does not get GC'd even once outside of theHandleScope
in which it was created. However, my concern is that while this effectively solves converting theWrap
BidiMap
from a weak-reference map to a strong reference map, it may unintentionally cause memory leaks for theValue
s in question. I'm not familiar enough with NAPI to know who is responsible for freeing objects / calling the destructors in question and so while I'm trying to follow your lead by calling->Unref()
where I seewrap()->Release(this)
, there are some places I cannot easily do so and I'm also not sure if this actually solves the leak correctly or not.Basically I'm a bit out of my depth here if this fix is not proper for the long term, but I am 100% confident that this solves the issue in question and we will be deploying this workaround ourselves to production in the meantime while figuring out with you here what the ideal long-term fix is.
Thanks again for your help both on this PR and this project in general! Please let me know if there's any additional information / context I can add to this PR that would be useful.