Skip to content

ERR: disallow non-hashables in Index construction & rename #20527

Closed
@hodossy

Description

@hodossy
from pandas import Index, DataFrame

df = DataFrame(index=Index([1, 2, 3, 4]), columns=['A', 'B'])
df.index.rename(['foo'], inplace=True)
df.reset_index()

TypeError Traceback (most recent call last)
in ()
----> 5 df.reset_index()

c:\program files\python36\lib\site-packages\pandas\core\frame.py in reset_index(self, level, drop, inplace, col_level, col_fill)
3377 # to ndarray and maybe infer different dtype
3378 level_values = _maybe_casted_values(lev, lab)
-> 3379 new_obj.insert(0, name, level_values)
3380
3381 new_obj.index = new_index

c:\program files\python36\lib\site-packages\pandas\core\frame.py in insert(self, loc, column, value, allow_duplicates)
2611 value = self._sanitize_column(column, value, broadcast=False)
2612 self._data.insert(loc, column, value,
-> 2613 allow_duplicates=allow_duplicates)
2614
2615 def assign(self, **kwargs):

c:\program files\python36\lib\site-packages\pandas\core\internals.py in insert(self, loc, item, value, allow_duplicates)
4059
4060 """
-> 4061 if not allow_duplicates and item in self.items:
4062 # Should this be a different kind of error??
4063 raise ValueError('cannot insert {}, already exists'.format(item))

c:\program files\python36\lib\site-packages\pandas\core\indexes\base.py in contains(self, key)
1692 @appender(_index_shared_docs['contains'] % _index_doc_kwargs)
1693 def contains(self, key):
-> 1694 hash(key)
1695 try:
1696 return key in self._engine

TypeError: unhashable type: 'list'

Problem description

It think it is reasonable to expect Index([1, 2, 3, 4]).rename(['foo']) and Index([1, 2, 3, 4]).rename('foo') resulting in the same, however this is not the case.

Probably this line be modified in Index.rename:

- return self.set_names([name], inplace=inplace)
+ return self.set_names(name, inplace=inplace)

Expected Output

>>>Index([1, 2, 3, 4]).rename(['foo'])
Int64Index([1, 2, 3, 4], dtype='int64', name='foo')

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.14.1
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Activity

jorisvandenbossche

jorisvandenbossche commented on Mar 29, 2018

@jorisvandenbossche
Member

The problem with your expected output is that some other 'containers', like a tuple, are actually allowed as an index name:

In [45]: df = DataFrame(index=Index([1, 2, 3, 4]), columns=['A', 'B'])

In [46]: df.index = df.index.rename(('foo',))

In [47]: df.reset_index()
Out[47]: 
   (foo,)    A    B
0       1  NaN  NaN
1       2  NaN  NaN
2       3  NaN  NaN
3       4  NaN  NaN

So since this is already allowed, we would need to keep this behaviour. And then unpacking a list but not a tuple would also be strange I think.

That said, I think if we choose to not change this behaviour, we should raise the error earlier, and df.index.rename(['foo']) could already raise an error.

jreback

jreback commented on Mar 29, 2018

@jreback
Contributor

technically we do allow non-hashable things in index names. IIRC we did try to remove this in a previous PR (but was not merged). Note that we do require hashability in Series names. So would take a PR to raise (both on construction and renaming)

In [3]: Series([1,2, 3], name=['foo'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-7e95b8d7ccb6> in <module>()
----> 1 Series([1,2, 3], name=['foo'])

~/pandas/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    272         generic.NDFrame.__init__(self, data, fastpath=True)
    273 
--> 274         self.name = name
    275         self._set_axis(0, index, fastpath=True)
    276 

~/pandas/pandas/core/generic.py in __setattr__(self, name, value)
   4398             object.__setattr__(self, name, value)
   4399         elif name in self._metadata:
-> 4400             object.__setattr__(self, name, value)
   4401         else:
   4402             try:

~/pandas/pandas/core/series.py in name(self, value)
    391     def name(self, value):
    392         if value is not None and not is_hashable(value):
--> 393             raise TypeError('Series.name must be a hashable type')
    394         object.__setattr__(self, '_name', value)
    395 

TypeError: Series.name must be a hashable type

so will repurpose this issue

added this to the Next Major Release milestone on Mar 29, 2018
changed the title [-]Calling rename() on Index object with a list makes reset_index() fail on a DataFrame[/-] [+]ERR: disallow non-hashables in Index construction & rename[/+] on Mar 29, 2018
added
Error ReportingIncorrect or improved errors from pandas
and removed on Mar 29, 2018
arminv

arminv commented on Mar 29, 2018

@arminv
Contributor

I would like to work on this issue.

hodossy

hodossy commented on Apr 3, 2018

@hodossy
Author

What if rename would be only modified as

def rename(self, *names, inplace=False):
"""..."""
    return self.set_names(names, inplace=inplace)

and remove the rename overwrite

class MultiIndex(Index):
    
    rename = Index.set_names  # delete this

from MultiIndex. This way old functionality is kept (except MultiIndex.rename), but one can achive what I want by passing multiple name arguments. Meaning that

df = DataFrame(index=Index([1, 2, 3, 4]), columns=['A', 'B'])
df.index.rename(['foo'], inplace=True)
df.reset_index()

Still raises an error but

index.rename(*['foo'])
index.rename(*['baz', 'quz'])

can be called on either Index and MultiIndex objects. So that in cases like mine, where it is not known beforehand whether I will be dealing with a multi or single level index, I can avoid the typecheck and rename the levels easily and then reset the index on the dataframe.

Please note that I did not have time to check te corner cases or run a regression on this idea, I am only interested in your opinion.

modified the milestones: Next Major Release, 0.23.0 on Apr 19, 2018

1 remaining item

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Error ReportingIncorrect or improved errors from pandasReshapingConcat, Merge/Join, Stack/Unstack, Explodegood first issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jreback@jorisvandenbossche@arminv@hodossy

        Issue actions

          ERR: disallow non-hashables in Index construction & rename · Issue #20527 · pandas-dev/pandas