Closed
Description
code to reproduce:
import pandas as pd
import numpy as np
## generate data
df = pd.DataFrame(np.random.rand(4,5), index=list('abcd'), columns=list('ABCDE'))
df.index.name = 'letters'
df = df.set_index(keys='E' , append=True)
## save to hdf5
h5name = 'tst.h5'
key = 'tst_key'
df.to_hdf(h5name, key,
mode='a', append=True,
data_columns = df.index.names+df.columns.tolist(),
index=False,
complevel=5, complib='blosc',
#expectedrows = expectedrows ,
)
## load part of df
cols2load = list('BCD')
print 'before loading: \n\t cols2load = {}'.format(cols2load)
df_ = pd.read_hdf(h5name, key, columns= cols2load)
print 'after loading: \n\t cols2load = {}'.format(cols2load)
The printed output:
before loading:
cols2load = ['B', 'C', 'D']
after loading:
cols2load = ['E', 'letters', 'B', 'C', 'D']
pd.version = '0.13.1'
Activity
jreback commentedon May 22, 2014
hmm, not sure their any guarantees on this, but makes sense to simply copy this and not modify.
Want to do a pull-request?
eldad-a commentedon May 22, 2014
my current solution is to pass a copy:
BTW, the same problem holds for the
where
parameter ofread_hdf
.I do not expect to be able to do the pull-request soon (it'll be my first, so it will take time...).
I mainly thought it is worth posting as I wasted quite some time on finding the "bug" in my code.
Turned out it wast this (I was using
cols2load
for several different purposes in the code).jreback commentedon May 22, 2014
no problem
take your time
about to release 0.14.0 in any event
jreback commentedon Jun 26, 2014
@eldad-a if you have a pull-request for this would be great
eldad-a commentedon Jun 27, 2014
@jreback unfortunately i do not expect to be able to get into the matter soon.
In case I will, I'll defenitely submit a pull-request.
jreback commentedon Jun 27, 2014
ok, thanks
10 remaining items