Skip to content

Very long (10 - 30 seconds) transaction failure timeout while offline #2877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
larssn opened this issue Aug 2, 2021 · 25 comments · Fixed by #3664
Closed

Very long (10 - 30 seconds) transaction failure timeout while offline #2877

larssn opened this issue Aug 2, 2021 · 25 comments · Fixed by #3664

Comments

@larssn
Copy link

larssn commented Aug 2, 2021

[READ] Step 1: Are you in the right place?

I was sent here by firebase/flutterfire#6736 (comment).

I've reproduced this in a Flutter project, but the guys over there thought it to be an underlying problem with the android SDK.

[REQUIRED] Step 2: Describe your environment

  • Android Studio version: Using VSCode
  • Firebase Component: Firestore
  • Component version: 23.0.3

[REQUIRED] Step 3: Describe the problem

Steps to reproduce:

  1. Setup a new project that has a button for a transaction that gets a document (doesnt matter which - we're offline).
  2. Kill the internet for the tablet. It shouldn't matter how you do this.
  3. Press the button and wait for the transaction to fail, and note the time it takes. Do this several times. It should fail quick at first (30ms - 50ms), but after 6ish clicks the time will jump to 10+ seconds. The times will get progressively worse until the tablet has "rested" for a few minutes, after which it returns to failing in milliseconds and the cycle starts over.

I hope those directions are sufficient without me having to set up a new environment. 😅

@aguatno
Copy link

aguatno commented Aug 3, 2021

Hi @ larssn thanks for reporting this issue. Though, I wasn't able to reproduce the issue using quickstart, could it be possible to provide a runnable mcve that we can use to replicate the issue? Thanks.

@larssn
Copy link
Author

larssn commented Aug 3, 2021

So you're saying your transactions always failed immediately while offline?

@aguatno
Copy link

aguatno commented Aug 4, 2021

HI @larssn Sorry, I was able to reproduce now. Allow me to consult this with our engineering team and will get back to you with updates.

@wu-hui
Copy link
Contributor

wu-hui commented Aug 13, 2021

Hi @larssn

Firestore transaction requires network to work, and has a re-try logic before it reports failure, hence the long failure time. We are also unable to detect network condition reliably, so fail early is also difficult to do. To the SDK, it might just be the network is not reliable.

I'd suggest you disable all operations that would lead to transactions, if your app can be sure it is offline.

@wu-hui wu-hui closed this as completed Aug 13, 2021
@larssn
Copy link
Author

larssn commented Aug 13, 2021

@wu-hui @aguatno
I don't understand why this was closed, the issue remains.

I know it requires network to work, thats not the point of this issue. In the real world, networks can sporadically be unavailable.

Why does it fail immediately sometimes, while at others it takes 20+ seconds? That is inconsistent behaviour, which makes using the SDK needlessly complicated.

@aguatno aguatno reopened this Aug 16, 2021
@wu-hui
Copy link
Contributor

wu-hui commented Aug 17, 2021

I see, the issue is that it is not consistent..not that it takes a while to fail.

I think this is because multiple transactions are initiated, and all of them are doing retries, on a single thread. This might lead to some of them being blocked. I will verify this.

@larssn
Copy link
Author

larssn commented Aug 17, 2021

That does not appear to be what is happening. My test above requires each transaction to fail, before attempting the next one.

You shouldn't test them in parallel.

@wu-hui
Copy link
Contributor

wu-hui commented Aug 17, 2021

I see, thanks.

@wu-hui
Copy link
Contributor

wu-hui commented Aug 17, 2021

Hmm, I actually cannot reproduce this myself. Can you help out with either:

  • Sharing the debug log (run with FirebaseFirestore.setLoggingEnabled(true);)
  • Sharing a mini-repro that I can try out?

Thank you!

@larssn
Copy link
Author

larssn commented Aug 18, 2021

I don't suppose a Flutter project would be much use to you.

@aguatno Are you able to share the project you used to reproduce this?

@aguatno
Copy link

aguatno commented Aug 18, 2021

I'm using this repo to test the behaviour https://github.com/aguatno/github2877.

I just follow the step mentioned above

It should fail quick at first (30ms - 50ms), but after 6ish clicks the time will jump to 10+ seconds.

Then after clicking the button once, the logs show this immediately: "Unable to resolve host "firestore.googleapis.com". Though, after a few clicks. I noticed that it takes a few seconds again for me to see the unable to resolve host as shown below.
AEyPCXReVp3vSjy

Hi @larssn let me know if we are on the same page. Thanks

@larssn
Copy link
Author

larssn commented Aug 18, 2021

Sounds related yes. I know its a bit difficult when I'm coming from Flutter.

@wu-hui
Copy link
Contributor

wu-hui commented Aug 25, 2021

Sorry for getting back late..I tried reproduction by @aguatno , it is not doing transaction though..it is doing a simple get. I changed it to transaction and still cannot reproduce.

@larssn I have worked with flutter before, if you do not mind, please share your reproduction code.

@larssn
Copy link
Author

larssn commented Aug 25, 2021

@wu-hui Sounds good. I made a full repro here: firebase/flutterfire#6736

We're trying to figure out if it is related to firebase/flutterfire#6749, or vice versa.

@larssn
Copy link
Author

larssn commented Sep 3, 2021

Any progress?

@wu-hui
Copy link
Contributor

wu-hui commented Sep 6, 2021

So I tried the flutter repro you provided, and I still cannot reproduce what you described. Note I do not have any devices you tested, but I think devices do not matter here.

In my test, when offline, transactions take seconds or 10s of seconds to complete, this variance is expected due to retry.

I suspected the reason you see some quirkiness is that in your repro, the _lastTransactionStart and _lastTransactionEnd are updated from each press of the testing button, and when you have multiple flying transactions, they might step on each other.

I could not come up with a good explanation why you see what you saw still though..it seems it is always the 6th transaction that run slower in your case.

I would recommend you to update your testing code to remove the race condition, you can calculate _lastTransactionStart and _lastTransactionEnd from inside the button onPressed, and only update the final result string as state. Let's see if that makes any difference.

FWIW, I also tried reproduce with a native Android app, and it worked as expected (transactions take long time to fail when offline). I can be wrong, but this seems to be an issue unrelated to the SDK itself.

@larssn
Copy link
Author

larssn commented Sep 7, 2021

As I mentioned, you shouldn't have multiple transactions running at the time. Just test one at the time.

Also, what you are describing is exactly the problem: Transactions shouldn't take 10s of seconds to time out. No user is ever going to be that patient, and your app will feel broken if you make anyone wait that long. So sorry, but that is not acceptable.

I don't understand why it would work like that in the first place. The retry layer should have the final say in the matter, and if the current state is offline, the transaction shouldn't even try: It should just fail immediately.

In any case, timeouts that take 20+ seconds in not acceptable.

@wu-hui
Copy link
Contributor

wu-hui commented Sep 8, 2021

Firestore Transactions requires app to stay online all the time, otherwise there is no way to ensure consistency. The problem is there is no reliable way to tell if the current state is offline or just connection being flaky. Our assumption here is device is more likely to have flaky connections than being completed offline, hence we do retry.

If the issue is transactions sometimes retry, sometimes fail right away, then it is an issue I can keep investigate. If the issue is transaction should not take seconds to fail, I think what we can do is to disable retries as an option for app developers. I can bring this up to the stake holders to see if we can get this worked out.

@larssn
Copy link
Author

larssn commented Sep 8, 2021

Well we would appreciate some traction on this. Because put yourself in the shoes of our customers: If they have mission critical data, and are on bad wifi (and there are a surprising amount of those out there), they will not stand around and look at whatever progress indicator we have running to indicate a transaction, for 20+ seconds, only to try again, and having to wait for 20+ seconds again.

Our assumption here is device is more likely to have flaky connections than being completed offline, hence we do retry.

Which is a valid assumption, but wouldn't it be smart to fail fast and retry often, instead of failing after such a long time?

I could not come up with a good explanation why you see what you saw still though..it seems it is always the 6th transaction that run slower in your case.

So you couldn't reproduce this in Android? That seems to hint at a problem in the Flutter library then?

@wu-hui
Copy link
Contributor

wu-hui commented Sep 9, 2021

I could not reproduce myself, it was consistently failing after seconds of retrying.

Regarding fail fast, I don't think current API supports retry after failing, there is no way we can notify user if the transaction succeeded or not, if the returned android Task already failed.

We are considering to add an option to allow developers specify if they want their transactions to retry when seeing a failure from network.

@larssn
Copy link
Author

larssn commented Sep 9, 2021

Well we would really appreciate them failing faster. We basically have to ditch transactions as they are now, and find an alternative way of doing things; because we cannot wait 20+ seconds in worst case scenarios. 🙁

@thebrianchen
Copy link

This is being tracked internally at b/199542118.

@larssn
Copy link
Author

larssn commented Jan 11, 2022

Any status on this?

@dconeybe dconeybe added the type: feature request New feature or request label Jan 11, 2022
@wu-hui
Copy link
Contributor

wu-hui commented Jan 11, 2022

Hey @larssn

We have not been able to address this so far. I will try to find some time to make things better however, I am thinking maybe we can introduce a way for user to specify a retry strategy, should transactions fail. Will update when I have something more concrete.

@dconeybe
Copy link
Contributor

dconeybe commented May 25, 2022

Update: Due to an administrative error, the fix #3664 won't be released until approximately the end of June 2022, v24.2.0.

The release notes at https://firebase.google.com/support/release-notes/android#firestore_v24-1-2 incorrectly state that this feature was included in v24.1.2. Those release notes will be updated shortly to remove the erroneous mention that this bug is fixed.

@firebase firebase locked and limited conversation to collaborators May 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants