Refactor to_dataframe deterministicly update progress bar.#8303
Merged
tswast merged 5 commits intogoogleapis:masterfrom Jun 18, 2019
Merged
Refactor to_dataframe deterministicly update progress bar.#8303tswast merged 5 commits intogoogleapis:masterfrom
to_dataframe deterministicly update progress bar.#8303tswast merged 5 commits intogoogleapis:masterfrom
Conversation
Previously, a background thread was used to collect progress bar updates from worker threads. So as not to block downloads for progress bar updates, `put_nowait` was used to make progress bar updates. Missed writes to the progress bar were ignored. This caused non-deterministic progress bar updates and test flakiness. Now, worker threads push dataframes to the queue, and the return values for `download_dataframe_bqstorage` and `download_dataframe_tabledata_list` have been updated to return an iterable of pandas DataFrame objects instead of a single DataFrame. This allows progress bar updates to be done independently of which underlying API is used to download the DataFrames. Also, the logic for working with pandas has been moved to the `_pandas_helpers` module.
plamut
reviewed
Jun 14, 2019
Contributor
plamut
left a comment
There was a problem hiding this comment.
The changes generally seem fine to me, but I did leave a few comments that might be relevant - please check, just in case.
plamut
approved these changes
Jun 17, 2019
Contributor
plamut
left a comment
There was a problem hiding this comment.
Looks good, and thanks for the additional explanation of the design decisions!
Will wait if the other reviewers have something else to add.
plamut
reviewed
Jun 19, 2019
| # prevents the queue from filling up, because the main thread | ||
| # has smaller gaps in time between calls to the queue's get | ||
| # method. For a detailed explaination, see: | ||
| # https://friendliness.dev/2019/06/18/python-nowait/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously, a background thread was used to collect progress bar updates
from worker threads. So as not to block downloads for progress bar
updates,
put_nowaitwas used to make progress bar updates. Missedwrites to the progress bar were ignored. This caused non-deterministic
progress bar updates and test flakiness.
Now, worker threads push dataframes to the queue, and the return values
for
download_dataframe_bqstorageanddownload_dataframe_tabledata_listhave been updated to return aniterable of pandas DataFrame objects instead of a single DataFrame. This
allows progress bar updates to be done independently of which underlying
API is used to download the DataFrames.
Also, the logic for working with pandas has been moved to the
_pandas_helpersmodule.Closes #8175