Closed
Description
Environment details
- OS type and version: macOS Catalina (10.15.5)
- Python version:
python --version
: Python 3.7.6 - pip version:
pip --version
: pip 20.0.2 google-cloud-bigquery
version:pip show google-cloud-bigquery
Name: google-cloud-bigquery
Version: 1.24.0
Summary: Google BigQuery API client library
Home-page: https://github.com/GoogleCloudPlatform/google-cloud-python
Author: Google LLC
Author-email: [email protected]
License: Apache 2.0
Location: /Users/swast/miniconda3/envs/ibis-dev/lib/python3.7/site-packages
Requires: google-cloud-core, google-api-core, google-resumable-media, google-auth, protobuf, six
Code example
Code:
from google.cloud import bigquery
client = bigquery.Client()
df = client.query("SELECT TIMESTAMP '4567-01-01 00:00:00' AS `tmp`").to_dataframe()
Stack trace
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
<ipython-input-3-6b8b40790c39> in <module>
----> 1 df = client.query("SELECT TIMESTAMP '4567-01-01 00:00:00' AS `tmp`").to_dataframe()
~/miniconda3/envs/ibis-dev/lib/python3.7/site-packages/google/cloud/bigquery/job.py in to_dataframe(self, bqstorage_client, dtypes, progress_bar_type, create_bqstorage_client)
3372 dtypes=dtypes,
3373 progress_bar_type=progress_bar_type,
-> 3374 create_bqstorage_client=create_bqstorage_client,
3375 )
3376
~/miniconda3/envs/ibis-dev/lib/python3.7/site-packages/google/cloud/bigquery/table.py in to_dataframe(self, bqstorage_client, dtypes, progress_bar_type, create_bqstorage_client)
1729 create_bqstorage_client=create_bqstorage_client,
1730 )
-> 1731 df = record_batch.to_pandas()
1732 for column in dtypes:
1733 df[column] = pandas.Series(df[column], dtype=dtypes[column])
~/miniconda3/envs/ibis-dev/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib._PandasConvertible.to_pandas()
~/miniconda3/envs/ibis-dev/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas()
~/miniconda3/envs/ibis-dev/lib/python3.7/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, categories, ignore_metadata, types_mapper)
764 _check_data_column_metadata_consistency(all_columns)
765 columns = _deserialize_column_index(table, all_columns, column_indexes)
--> 766 blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
767
768 axes = [columns, index]
~/miniconda3/envs/ibis-dev/lib/python3.7/site-packages/pyarrow/pandas_compat.py in _table_to_blocks(options, block_table, categories, extension_columns)
1100 columns = block_table.column_names
1101 result = pa.lib.table_to_blocks(options, block_table, categories,
-> 1102 list(extension_columns.keys()))
1103 return [_reconstruct_block(item, columns, extension_columns)
1104 for item in result]
~/miniconda3/envs/ibis-dev/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.table_to_blocks()
~/miniconda3/envs/ibis-dev/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
ArrowInvalid: Casting from timestamp[us, tz=UTC] to timestamp[ns] would result in out of bounds timestamp: 81953424000000000
Potential solutions
In order of my preference:
- Catch this exception from arrow and pass in the option to arrow to use datetime objects only in that case (no option in google-cloud-bigquery): See: https://issues.apache.org/jira/browse/ARROW-5359
- Add option to use Fletcher to make a dataframe backed by the arrow table https://github.com/xhochy/fletcher
- Add option to use datetime objects for timestamp/datetime columns.