Increasing timeout from 0.8 to 2 #195

downloaderfan · 2021-11-02T08:03:19Z

credentials_gce() function was failing since default timeout was too low. Increasing it from 0.8 to 2 fixed the issue.

craigcitro · 2021-11-03T03:48:56Z

To confirm: did you run into the timeout on a GCE instance running the usual metadata server?

In particular, there's a tension here: you want a long timeout when you're on GCE, but you want a short timeout when you're not on GCE and you're stuck waiting for this request to timeout. (Whether the request will immediately fail or timeout depends on your networking config, sometimes including factors outside your control.)

downloaderfan · 2021-11-03T08:42:20Z

@craigcitro Yup, I was simply running credentials_gce() function in RStudio installed in Google Compute Engine to auth into BQ using default compute engine account & found that the below line was failing

response <- try({
  httr::with_config(httr::timeout(timeout), {
    httr::GET('http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token', httr::add_headers("Metadata-Flavor" = "Google"))
  })
}, silent = TRUE)

It is the default metadata server request I believe. On increasing timeout from 0.8 to 2, it started working.

The particular GET request doesn't seem to work outside of GCP, it's specific to the VM.

craigcitro · 2021-11-03T14:38:19Z

We use the same request to detect that we're on GCE (eg

gargle/R/credentials_gce.R

Lines 129 to 132 in 4732f8c

    
           detect_gce <- function() { 
        
             response <- gce_metadata_request("", stop_on_error = FALSE) 
        
             !(inherits(response, "try-error") %||% httr::http_error(response)) 
        
           }

).

I suspect raising the timeout is worthwhile, but it's worth noting that a large timeout may lead to user-visible slowness for some users when checking whether or not they're on GCE.

jennybc · 2021-11-03T15:59:22Z

We could make this timeout configurable via an env var, with the default short, suitable for people who are not on GCE (the vast majority of gargle usage, I suspect). But GCE users could set it to something much longer.

downloaderfan · 2021-11-03T18:01:13Z

@craigcitro I understand your point, as @jennybc has said, it would be great if we can make it configurable with the default value being the same as it is now so that it doesn't affect existing users. Given the current state, it is too low such that it is timing out before the metadata server can return a legitimate request and since I'm running this on Google Compute Engine, it's a server to server connection so I don't think network speed should be the problem. I believe it would be worthwhile to make it configurable.

craigcitro · 2021-11-03T18:14:30Z

👍 on configurability.

Actually, as a related tweak: in detect_gce, if we detect that we are on GCE, maybe we should then increase the timeout, since we know it's worth waiting? This would fix your situation here (credential fetch itself was slow) without any extra work on your part, and be 0 cost for non-GCE users.

jennybc · 2021-11-03T18:28:37Z

I feel like there is some risk of going in circles here.

Early on, we call detect_gce(), which calls gce_metadata_request(), which is where we're talking about a longer timeout. But any call to token_fetch() can potentially hit this code.

I understand that we might increase the timeout once we know we're on GCE. But it feels like the only way to know we're on GCE also involves a metadata request. So for that first call, only a timeout-increase-via-config would really help.

Do I have this right?

jennybc · 2021-11-03T18:33:29Z

Oh maybe this is what I was missing:

gce_metadata_request("", stop_on_error = FALSE) is fast. This is what's in detect_gce().

gce_metadata_request(glue("instance/service-accounts/{service_account}/token")) is slower. This is what's in fetch_gce_access_token().

jennybc · 2021-11-03T18:35:41Z

So I think I'm on board with this suggestion from @craigcitro now:

if we detect that we are on GCE, maybe we should then increase the timeout, since we know it's worth waiting? This would fix your situation here (credential fetch itself was slow) without any extra work on your part, and be 0 cost for non-GCE users.

downloaderfan · 2021-11-03T19:42:13Z

👍 on configurability.

Actually, as a related tweak: in detect_gce, if we detect that we are on GCE, maybe we should then increase the timeout, since we know it's worth waiting? This would fix your situation here (credential fetch itself was slow) without any extra work on your part, and be 0 cost for non-GCE users.

Great idea!!! Yes, that would solve this problem without any additional configuration, such that I can discard any separate code/configuration for auth in all my scripts.

downloaderfan · 2021-11-03T20:01:13Z

We can replace the line

timeout <- getOption("gargle.gce.timeout", default = 2)

with

timeout <- getOption("gargle.gce.timeout", default = ifelse(!detect_gce(), 0.8, 2))

Added it via a commit

Modifying the code such that higher timeout is set if metadata token request is coming from GCP

jennybc · 2021-11-03T21:19:59Z

R/credentials_gce.R

  }
  url <- paste0(root_url, "computeMetadata/v1/", path)
-  timeout <- getOption("gargle.gce.timeout", default = 2)
+  timeout <- getOption("gargle.gce.timeout", default = ifelse(!detect_gce(), 0.8, 2))


This would seem to create a recursive call to gce_metadata_request().

Now that we seem to be converging on the right solution, I plan to make a PR that implements it.

@jennybc Alright, thanks :)

Increasing timeout from 0.8 to 2

fff4eb1

credentials_gce() function was failing since default timeout was too low. Increasing it from 0.8 to 2 fixed the issue.

Conditional timeout for GCP

2573204

Modifying the code such that higher timeout is set if metadata token request is coming from GCP

jennybc reviewed Nov 3, 2021

View reviewed changes

craigcitro mentioned this pull request Nov 9, 2021

Increase GCE metadata timeout for GKE workload identity #186

Closed

jennybc mentioned this pull request Jan 30, 2023

Refactor timeout for requests to GCE metadataserver #235

Merged

jennybc closed this in #235 Jan 30, 2023

Increasing timeout from 0.8 to 2 #195

Increasing timeout from 0.8 to 2 #195

Uh oh!

Conversation

downloaderfan commented Nov 2, 2021

Uh oh!

craigcitro commented Nov 3, 2021

Uh oh!

downloaderfan commented Nov 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

craigcitro commented Nov 3, 2021

Uh oh!

jennybc commented Nov 3, 2021

Uh oh!

downloaderfan commented Nov 3, 2021

Uh oh!

craigcitro commented Nov 3, 2021

Uh oh!

jennybc commented Nov 3, 2021

Uh oh!

jennybc commented Nov 3, 2021

Uh oh!

jennybc commented Nov 3, 2021

Uh oh!

downloaderfan commented Nov 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

downloaderfan commented Nov 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jennybc Nov 3, 2021

Choose a reason for hiding this comment

Uh oh!

downloaderfan Nov 4, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

downloaderfan commented Nov 3, 2021 •

edited

Loading

downloaderfan commented Nov 3, 2021 •

edited

Loading

downloaderfan commented Nov 3, 2021 •

edited

Loading