-
-
Notifications
You must be signed in to change notification settings - Fork 321
Fix race conditions in FTP socket closure that cause intermittent errors #17540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ViRb3
wants to merge
7
commits into
iterate-ch:master
Choose a base branch
from
ViRb3:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+137
−1
Open
Changes from 1 commit
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
e571e75
Fix race conditions in FTP close
ViRb3 e84e5ee
Improvements in FTPSocket
ViRb3 c79862d
Fix constant flushing in FTPSocket
ViRb3 ad035ed
Replace buffered with proxy streams in FTPSocket
ViRb3 b227e01
Surface critical errors in FTPSocket
ViRb3 352e11c
Fix read amplification in FTPSocket
ViRb3 47b7033
Refactor FTPSocket to address comments
ViRb3 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Improvements in FTPSocket
- Loading branch information
commit e84e5ee10f888ad9c0d12f0e8e100f8562ae6cb5
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running in a background thread and missing any error looks dangerious to me as we will not report any error when closing the stream fails which may indicate that the file is not properly persisted on the remote server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree. I tried really hard to find a workaround, but couldn't come up with one. Having 1 whole second delay between each file upload is really bad when you upload lots of small files, and it causes Cyberduck to be upwards of 10x slower than competitor FTP clients. If you have any suggestion here, I'm more than happy to take it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't think this failing specifically could result in data loss. By the time we've reached here we already flushed our data, closed our stream, and waited for the server to close its stream. So, I don't think failing here could ever result in data loss.
The only case of data loss should occur if "Failed to shutdown output for socket" is hit. Maybe we can change that to throw instead of logging and discarding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are probably right there is no risk here in the real world. But it should probably not be in the finally block as it does not need to be executed when above code fails with I/O.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that close quirk on macOS specific to server implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't want to include the workaround if it can only be reproduced with a single FTP server implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No other FTP client experiences this delay though, so it's clearly on Cyberduck's side. I'm away from home this week but will look more into it next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dkocher I did a deep dive on this.
close()
syscall.netstat -anl
to file and confirmed that in both apps, the socket transitions from SYN_SENT -> ESTABLISHED -> FIN_WAIT_1 -> FIN_WAIT_2 -> TIME_WAIT. Forklift remains in TIME_WAIT for 5s+, while Cyberduck for 1-5s+ depending on SO_LINGER flag. I'm unsure why SO_LINGER(true, 0) results in 1s of TIME_WAIT, as I thought that it should be skipped entirely? This could be a quirk in netstat on macOS though.If the control socket is truly independent of the data socket as I understand it is, then my conclusion is that Socket.close() blocking is purely due to timing. There should be nothing preventing it from happening on any FTP server given that it hits the same timing.
So, unless you have better ideas, I vote to keep the current workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just one more suggestion to try out to see what difference it makes with different
SO_LINGER
options:connection.socket.linger
set tofalse
indefault.properties
or explicitly inDefaultSocketConfigurator
DefaultSocketConfigurator
to set with a shorter timeout value of1
. With0
I see426
errors for data transfers. 2Footnotes
94718d5 ↩
6c2d19f ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have opened #17562 to make the SO_LINGER timeout configurable.