Skip to content

increased memory usage in 2.4.0 #394

Closed
@pietrodn

Description

@pietrodn

Version 2.4.0 of the library is allocating much more memory that the previous version, 2.3.1, when running multiple queries.
In particular, it seems that the QueryJob object is retaining the results of the query internally, and that memory is not deallocated.

I think that the problem is related to #374.

Environment details

  • macOS 11.0.1 (also observing this on Linux in a production environment)
  • Python version: 3.8.6
  • pip version: 20.1.1
  • google-cloud-bigquery version: 2.4.0

Steps to reproduce

Run the script in the code example with google-cloud-bigquery 2.4.0 and 2.3.1 versions.
You will also need to install:

google-cloud-bigquery-storage==2.1.0
pandas==1.1.4
psutil==5.7.3

The outputs on my machine are:

With 2.4.0:

Initial memory used: 77 MB
Memory used: 642 MB
Memory used: 875 MB
Memory used: 1117 MB
Memory used: 1342 MB
Memory used: 1568 MB
Memory used: 1792 MB
Memory used: 2039 MB
Memory used: 2265 MB
Memory used: 2505 MB
Memory used: 2725 MB

With 2.3.1:

Initial memory used: 77 MB
Memory used: 97 MB
Memory used: 98 MB
Memory used: 99 MB
Memory used: 99 MB
Memory used: 99 MB
Memory used: 99 MB
Memory used: 100 MB
Memory used: 101 MB
Memory used: 101 MB
Memory used: 101 MB

Code example

Please note that we are storing a reference to the QueryJob objects, but not to the resulting DataFrames.

import os

import psutil
from google.cloud import bigquery

if __name__ == '__main__':
    client = bigquery.Client()

    process = psutil.Process(os.getpid())
    print(f"Initial memory used: {process.memory_info().rss / 1e6:.0f} MB")

    jobs = []

    for i in range(10):
        job = client.query("SELECT x FROM UNNEST(GENERATE_ARRAY(1, 1000000)) AS x")
        job.result().to_dataframe()
        jobs.append(job)
        print(f"Memory used: {process.memory_info().rss / 1e6:.0f} MB")

Metadata

Metadata

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions