Fix race condition when creating a watch #19
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @luite, I'm the maintainer of hfsnotify, which depends on this package. I wanted to draw your attention to a race condition I think I've discovered.
For a while I've been trying to track down some flaky behavior in the
hfsnotify
tests on macOS. The tests all do something like the following:Fairly often, I see an issue where no matter how long the tests wait, the expected event(s) for a given change never arrive.
Now I think it's because of how the watch creation in
c_fsevents.m
works. This code spins off a thread to run thewatchRunLoop
function, which callsFSEventStreamScheduleWithRunLoop
andFSEventStreamStart
to start the watch. However, thecreateWatch
function may return before this thread finishes its work. As a result, the caller is under the impression that the watch is ready to go and picking up events, but it's not. There is then a short window where any filesystem events that occur will be missed.I've put a simple fix in this PR: now
createWatch
re-locks the mutex after starting thewatchRunLoop
thread. SincewatchRunLoop
releases the lock after it finishes starting the watch, this ensures thatcreateWatch
doesn't return prematurely. With this change, the racy behavior seems to be fixed: I can run 50 consecutive runs of the full test suite with no failures.(Note: I'm not actually 100% sure that this is race-free, since I can't really tell from the Apple documentation whether FSEventStreamStart is "synchronous" or not. In my testing it seems reliable, but it's possible that there's still a small race window. FWIW, I noticed there's another function called FSEventStreamFlushSync which we could potentially call to ensure the event stream is up and running before we return.)
This is the simplest fix I could think of; I saw you were considering a more general restructuring in #18, so maybe that could also solve the problem.