Skip to content

Fix race condition when creating a watch #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 16, 2025
Merged

Conversation

thomasjm
Copy link

@thomasjm thomasjm commented Feb 4, 2025

Hi @luite, I'm the maintainer of hfsnotify, which depends on this package. I wanted to draw your attention to a race condition I think I've discovered.

For a while I've been trying to track down some flaky behavior in the hfsnotify tests on macOS. The tests all do something like the following:

  1. Create a temporary directory to run the test in
  2. Start a watch on that directory
  3. Make some kind of filesystem change, and wait to see that the expected events are received from the watcher.

Fairly often, I see an issue where no matter how long the tests wait, the expected event(s) for a given change never arrive.

Now I think it's because of how the watch creation in c_fsevents.m works. This code spins off a thread to run the watchRunLoop function, which calls FSEventStreamScheduleWithRunLoop and FSEventStreamStart to start the watch. However, the createWatch function may return before this thread finishes its work. As a result, the caller is under the impression that the watch is ready to go and picking up events, but it's not. There is then a short window where any filesystem events that occur will be missed.

I've put a simple fix in this PR: now createWatch re-locks the mutex after starting the watchRunLoop thread. Since watchRunLoop releases the lock after it finishes starting the watch, this ensures that createWatch doesn't return prematurely. With this change, the racy behavior seems to be fixed: I can run 50 consecutive runs of the full test suite with no failures.

(Note: I'm not actually 100% sure that this is race-free, since I can't really tell from the Apple documentation whether FSEventStreamStart is "synchronous" or not. In my testing it seems reliable, but it's possible that there's still a small race window. FWIW, I noticed there's another function called FSEventStreamFlushSync which we could potentially call to ensure the event stream is up and running before we return.)

This is the simplest fix I could think of; I saw you were considering a more general restructuring in #18, so maybe that could also solve the problem.

@luite
Copy link
Owner

luite commented Feb 4, 2025

Thanks! That looks like a reasonable way to fix it. I'd say it slightly changes the semantics from asynchronous to synchronous. But for the user it should be easy to forkIO it back into asynchronous if desired.

I'll give it a closer look soon and release an update on hackage.

@thomasjm
Copy link
Author

thomasjm commented Feb 4, 2025

I'd say it slightly changes the semantics from asynchronous to synchronous. But for the user it should be easy to forkIO it back into asynchronous if desired.

FWIW there is nothing in the type signature or haddocks that indicates that this very synchronous-looking function would work this way, and it escapes me why anyone would want to recover the current behavior :P. I think it's completely proper for a "create watch" function to fully finish its work before returning, as the equivalent Linux and Windows file watching libraries do.

@luite
Copy link
Owner

luite commented Feb 15, 2025

sorry for the delay. I'll make a new release with this change

@thomasjm
Copy link
Author

Thanks!

@luite luite merged commit 07dc894 into luite:master Feb 16, 2025
@thomasjm
Copy link
Author

Just checking in @luite, could you create a Hackage release with this change? Thanks!

@luite
Copy link
Owner

luite commented Feb 18, 2025

I've uploaded it

@thomasjm
Copy link
Author

Thank you!

simonmichael added a commit to simonmichael/hledger that referenced this pull request Apr 2, 2025
This also bumps to fsnotify-0.4.2.0 and hfsevents-0.1.8, fixing some
events being ignored on mac (luite/hfsevents#19),
which could make hledger-ui --watch more reliable (though I haven't
noticed that problem).
simonmichael added a commit to simonmichael/hledger that referenced this pull request Apr 2, 2025
This requires hfsevents-0.1.8 which fixes some events being ignored on
mac (luite/hfsevents#19), possibly making
hledger-ui --watch more reliable in that regard.
simonmichael added a commit to simonmichael/hledger that referenced this pull request May 16, 2025
This requires hfsevents-0.1.8 which fixes some events being ignored on
mac (luite/hfsevents#19), possibly making
hledger-ui --watch more reliable in that regard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants