Fix 'Invalid argument' due to stale / GC'd NAPI::Values in Wrap() cache #667

cojo · Nov 19, 2020

Hi @markandrus and other maintainers (new to the project, not sure who all might be looking at this!),

Thanks for your work on this project. I can tell from working on this PR that it has been a big undertaking and I really appreciate that.

I'm submitting this PR in reference to one of the issues discussed in #645 - specifically the second issue mentioned where getTransceivers(), getReceivers() and getSenders() on the RTCPeerConnection can sometimes (seemingly randomly) throw a Invalid argument up the stack back to NodeJS / V8.

This issue has been seemingly random at first, but frequent / reproducible enough on our project that I decided to dive in and see if I couldn't figure it out / fix it. I am quite confident at this point that I have worked around / "fixed" the issue for our use case, and we are working on deploying it now for our own use via a manual github linkage in our package.json plus hosting our own built binaries.

However, that is obviously not ideal long-term, and I'm also not 100% confident that the way I fixed the issue is correct, so I'd love to have you look at this and help determine if we should merge this fix to mainline or fix it another way instead.

The issue, I think, is that the Wrap() / BidiMap structure being used for most of the Rtc... objects, in conjunction with the use of Napi::HandleScope in the Create() functions for these objects, is technically allowing Node / V8 to GC these value items despite the BidiMap holding a pointer to it. This GC can even happen within a single callstack (although that has been much rarer), because the HandleScope created right before we instantiate these Value objects in Create() means that they are technically "out of scope" immediately after Create() completes. Under sufficient memory pressure or other conditions, the VM can choose to GC them which appears to set _value to nullptr which in turn means these Value objects now show IsEmpty() == true AKA they are undefined for all intents and purposes beyond that point.

Depending on how they are used from here, this can result in either the Invalid argument referenced by @mlogan (#645 (comment)) and @dxiao2003 (#645 (comment)), or it can in some cases propagate all the way to JS code which will then find e.g. transceiver.receiver === undefined (which shouldn't be possible), or it can hard-segfault the entire VM immediately (e.g. in RTCPeerConnection::OnTrack, when Converting a MediaStream that has experienced this issue).

Where my confidence begins to decline a bit is in my fix - I know it works, but I'm not sure if it's the best way to fix it long-term or if there is a better way to fix it. As you can see in the code in this PR, calling ->Ref() on the value in question ensures that it does not get GC'd even once outside of the HandleScope in which it was created. However, my concern is that while this effectively solves converting the Wrap BidiMap from a weak-reference map to a strong reference map, it may unintentionally cause memory leaks for the Values in question. I'm not familiar enough with NAPI to know who is responsible for freeing objects / calling the destructors in question and so while I'm trying to follow your lead by calling ->Unref() where I see wrap()->Release(this), there are some places I cannot easily do so and I'm also not sure if this actually solves the leak correctly or not.

Basically I'm a bit out of my depth here if this fix is not proper for the long term, but I am 100% confident that this solves the issue in question and we will be deploying this workaround ourselves to production in the meantime while figuring out with you here what the ideal long-term fix is.

Thanks again for your help both on this PR and this project in general! Please let me know if there's any additional information / context I can add to this PR that would be useful.

cojo · Dec 2, 2020

Hi @markandrus - wanted to check in and see if you needed any more info / anything else from us for this PR at this time?

I noticed the tests failed, but from reading the logs it seems like they just ran out of time on Appveyor / I couldn't find any other errors, but let me know if I missed something and I can take a look at fixing if so.

I also noticed you merged the M87 PR - let me know if this needs to be manually tested with those updates (since I believe I found this workaround on the M81 version) and I can find time to do that as well.

Thanks again!

markandrus · Dec 26, 2020

Hi @cojo,

Thanks a lot for your investigation, write-up, and proposed changes! I was able to reproduce the issue and validate the IsEmpty behavior, which is a huge step forward towards solving this. For whatever reason, I am seeing it happen much more frequently with getReceivers on the test/destructor/index.js tests under Node 15. So it's easy to reproduce now.

As for the fix, I believe the BidiMap cannot be used with anything that's not Ref-ed. Because we have tests for ensuring destructors fire, we can't just Ref and never Unref.

However, my concern is that while this effectively solves converting the Wrap BidiMap from a weak-reference map to a strong reference map, it may unintentionally cause memory leaks for the Values in question.

Yes this is a problem.

Here is how I think we need to solve it (at least with respect to RTCRtpTransceivers, RTCRtpSenders, and RTCRtpReceivers), and I've started experimenting…

Under Unified Plan,
- The set of RTCRtpTransceivers on an RTCPeerConnection only grows.
- Every RTCRtpTransceiver maps to exactly one RTCRtpSender and RTCRtpReceiver.
- Therefore, we can Ref in the constructors of RTCRtpSender and RTCRtpReceiver.
- In RTCRtpTransceiver's destructor, we should Unref its RTCRtpSender and RTCRtpReceiver.
- We should also Ref in RTCRtpTransceiver's constructor, and Unref each RTCRtpTransceiver in RTCPeerConnection's destructor. *
Under Plan B,
- There are no RTCRtpTransceivers…
- I'm not sure if the set of RTCRtpSenders and RTCRtpReceivers grows, if rollback affects this, etc. **
- Nevertheless, we can Ref in the constructors of RTCRtpSender and RTCRtpReceiver.
- In RTCPeerConnection's destructor, Unref each RTCRtpSender and RTCRtpReceiver.

* We need to be careful with closing the RTCPeerConnection. At the moment, we may close the PeerConnectionFactory, shut down the signaling thread, null out the _jinglePeerConnection, etc. This is a problem, since we still need to fetch the RTCRtpTransceivers in the destructor, which depends on the _jinglePeerConnection, the signaling thread, etc. We could either move the teardown logic out of RTCPeerConnection::Close and only do it in the destructor (need to double-check why we were doing it early, and what impact that might have on tests), or somehow "cache" the RTCRtpTransceivers (we sort of have this pattern already with _cached_configuration on RTCPeerConnection and the various _cached-prefixed properties on RTCDataChannel).
** If this isn't true, we risk leaks under Plan B.

brianathere · Jun 15, 2021

I have encountered this issue on v0.4.7 running in Node 16.3 on MacOS, 100% repro of crash in RTCPeerConnection.getTransceivers in my test setup (though not a minimal repro). Is there something I can provide to help with a fix?

node-webrtc#667

Fix 'Invalid argument' due to stale / GC'd NAPI::Values in Wrap() cache

57df027

cojo mentioned this pull request Nov 19, 2020

Segmentation faults when used with SIP.js: N-API thread safety issue? #645

Open

duvallj mentioned this pull request May 2, 2024

Merge Outstanding PRs From Upstream WonderInventions/node-webrtc#15

Open

3 tasks

duvallj pushed a commit to WonderInventions/node-webrtc that referenced this pull request Sep 10, 2024

Fix 'Invalid argument' due to stale / GC'd NAPI::Values in Wrap() cache

6e132f5

node-webrtc#667

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix 'Invalid argument' due to stale / GC'd NAPI::Values in Wrap() cache #667

Fix 'Invalid argument' due to stale / GC'd NAPI::Values in Wrap() cache #667

Uh oh!

cojo commented Nov 19, 2020 •

edited

Loading

Uh oh!

cojo commented Dec 2, 2020

Uh oh!

markandrus commented Dec 26, 2020

Uh oh!

brianathere commented Jun 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Search code, repositories, users, issues, pull requests...

Fix 'Invalid argument' due to stale / GC'd NAPI::Values in Wrap() cache #667

Are you sure you want to change the base?

Fix 'Invalid argument' due to stale / GC'd NAPI::Values in Wrap() cache #667

Uh oh!

Conversation

cojo commented Nov 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cojo commented Dec 2, 2020

Uh oh!

markandrus commented Dec 26, 2020

Uh oh!

brianathere commented Jun 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cojo commented Nov 19, 2020 •

edited

Loading