Skip to content

read_hdf / store.select modifies the passed columns parameters when multi-indexed #7212

Closed
@eldad-a

Description

@eldad-a

code to reproduce:

import pandas as pd
import numpy as np

## generate data
df = pd.DataFrame(np.random.rand(4,5), index=list('abcd'), columns=list('ABCDE'))
df.index.name = 'letters'
df = df.set_index(keys='E' , append=True)

## save to hdf5
h5name = 'tst.h5'
key = 'tst_key'
df.to_hdf(h5name, key,
          mode='a', append=True,
          data_columns = df.index.names+df.columns.tolist(),
          index=False, 
          complevel=5, complib='blosc', 
          #expectedrows = expectedrows ,
          )

## load part of df
cols2load = list('BCD')
print 'before loading: \n\t cols2load = {}'.format(cols2load)
df_ = pd.read_hdf(h5name, key, columns= cols2load)
print 'after loading: \n\t cols2load = {}'.format(cols2load)

The printed output:

before loading:
cols2load = ['B', 'C', 'D']
after loading:
cols2load = ['E', 'letters', 'B', 'C', 'D']

pd.version = '0.13.1'

Activity

jreback

jreback commented on May 22, 2014

@jreback
Contributor

hmm, not sure their any guarantees on this, but makes sense to simply copy this and not modify.

Want to do a pull-request?

added this to the 0.14.1 milestone on May 22, 2014
eldad-a

eldad-a commented on May 22, 2014

@eldad-a
Author

my current solution is to pass a copy:

df_ = pd.read_hdf(h5name, key, columns= list(cols2load))

BTW, the same problem holds for the where parameter of read_hdf.
I do not expect to be able to do the pull-request soon (it'll be my first, so it will take time...).
I mainly thought it is worth posting as I wasted quite some time on finding the "bug" in my code.
Turned out it wast this (I was using cols2load for several different purposes in the code).

jreback

jreback commented on May 22, 2014

@jreback
Contributor

no problem

take your time

about to release 0.14.0 in any event

modified the milestones: 0.15.0, 0.14.1 on Jun 26, 2014
jreback

jreback commented on Jun 26, 2014

@jreback
Contributor

@eldad-a if you have a pull-request for this would be great

eldad-a

eldad-a commented on Jun 27, 2014

@eldad-a
Author

@jreback unfortunately i do not expect to be able to get into the matter soon.
In case I will, I'll defenitely submit a pull-request.

jreback

jreback commented on Jun 27, 2014

@jreback
Contributor

ok, thanks

modified the milestones: 0.16.0, Next Major Release on Mar 6, 2015

10 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      read_hdf / store.select modifies the passed columns parameters when multi-indexed · Issue #7212 · pandas-dev/pandas