Skip to content

Conversation

@downloaderfan
Copy link

credentials_gce() function was failing since default timeout was too low. Increasing it from 0.8 to 2 fixed the issue.

credentials_gce() function was failing since default timeout was too low. Increasing it from 0.8 to 2 fixed the issue.
@craigcitro
Copy link
Collaborator

To confirm: did you run into the timeout on a GCE instance running the usual metadata server?

In particular, there's a tension here: you want a long timeout when you're on GCE, but you want a short timeout when you're not on GCE and you're stuck waiting for this request to timeout. (Whether the request will immediately fail or timeout depends on your networking config, sometimes including factors outside your control.)

@downloaderfan
Copy link
Author

downloaderfan commented Nov 3, 2021

@craigcitro Yup, I was simply running credentials_gce() function in RStudio installed in Google Compute Engine to auth into BQ using default compute engine account & found that the below line was failing

response <- try({
  httr::with_config(httr::timeout(timeout), {
    httr::GET('http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token', httr::add_headers("Metadata-Flavor" = "Google"))
  })
}, silent = TRUE)

It is the default metadata server request I believe. On increasing timeout from 0.8 to 2, it started working.

The particular GET request doesn't seem to work outside of GCP, it's specific to the VM.

@craigcitro
Copy link
Collaborator

We use the same request to detect that we're on GCE (eg

gargle/R/credentials_gce.R

Lines 129 to 132 in 4732f8c

detect_gce <- function() {
response <- gce_metadata_request("", stop_on_error = FALSE)
!(inherits(response, "try-error") %||% httr::http_error(response))
}
).

I suspect raising the timeout is worthwhile, but it's worth noting that a large timeout may lead to user-visible slowness for some users when checking whether or not they're on GCE.

@jennybc
Copy link
Member

jennybc commented Nov 3, 2021

We could make this timeout configurable via an env var, with the default short, suitable for people who are not on GCE (the vast majority of gargle usage, I suspect). But GCE users could set it to something much longer.

@downloaderfan
Copy link
Author

@craigcitro I understand your point, as @jennybc has said, it would be great if we can make it configurable with the default value being the same as it is now so that it doesn't affect existing users. Given the current state, it is too low such that it is timing out before the metadata server can return a legitimate request and since I'm running this on Google Compute Engine, it's a server to server connection so I don't think network speed should be the problem. I believe it would be worthwhile to make it configurable.

@craigcitro
Copy link
Collaborator

👍 on configurability.

Actually, as a related tweak: in detect_gce, if we detect that we are on GCE, maybe we should then increase the timeout, since we know it's worth waiting? This would fix your situation here (credential fetch itself was slow) without any extra work on your part, and be 0 cost for non-GCE users.

@jennybc
Copy link
Member

jennybc commented Nov 3, 2021

I feel like there is some risk of going in circles here.

Early on, we call detect_gce(), which calls gce_metadata_request(), which is where we're talking about a longer timeout. But any call to token_fetch() can potentially hit this code.

I understand that we might increase the timeout once we know we're on GCE. But it feels like the only way to know we're on GCE also involves a metadata request. So for that first call, only a timeout-increase-via-config would really help.

Do I have this right?

@jennybc
Copy link
Member

jennybc commented Nov 3, 2021

Oh maybe this is what I was missing:

gce_metadata_request("", stop_on_error = FALSE) is fast. This is what's in detect_gce().

gce_metadata_request(glue("instance/service-accounts/{service_account}/token")) is slower. This is what's in fetch_gce_access_token().

@jennybc
Copy link
Member

jennybc commented Nov 3, 2021

So I think I'm on board with this suggestion from @craigcitro now:

if we detect that we are on GCE, maybe we should then increase the timeout, since we know it's worth waiting? This would fix your situation here (credential fetch itself was slow) without any extra work on your part, and be 0 cost for non-GCE users.

@downloaderfan
Copy link
Author

downloaderfan commented Nov 3, 2021

👍 on configurability.

Actually, as a related tweak: in detect_gce, if we detect that we are on GCE, maybe we should then increase the timeout, since we know it's worth waiting? This would fix your situation here (credential fetch itself was slow) without any extra work on your part, and be 0 cost for non-GCE users.

Great idea!!! Yes, that would solve this problem without any additional configuration, such that I can discard any separate code/configuration for auth in all my scripts.

@downloaderfan
Copy link
Author

downloaderfan commented Nov 3, 2021

We can replace the line

timeout <- getOption("gargle.gce.timeout", default = 2)

with

timeout <- getOption("gargle.gce.timeout", default = ifelse(!detect_gce(), 0.8, 2))

Added it via a commit

Modifying the code such that higher timeout is set if metadata token request is coming from GCP
}
url <- paste0(root_url, "computeMetadata/v1/", path)
timeout <- getOption("gargle.gce.timeout", default = 2)
timeout <- getOption("gargle.gce.timeout", default = ifelse(!detect_gce(), 0.8, 2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would seem to create a recursive call to gce_metadata_request().

Now that we seem to be converging on the right solution, I plan to make a PR that implements it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jennybc Alright, thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants