Skip to content

BUG: wrong alignment of column names in the DataFrame repr when using pyarrow-backed string dtype #54797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Aug 28, 2023 · 0 comments · Fixed by #54801
Labels
Bug Strings String extension data type and string data

Comments

@jorisvandenbossche
Copy link
Member

When enabling the future string options, I see a wrong alignment of the column names:

In [10]: pd.options.future.infer_string = True

In [11]: df = pd.DataFrame({"long_column_name": [1, 2, 3], "col2": [4, 5, 6]})

In [12]: df
Out[12]: 
   long_column_name  col2            
0                 1                 4
1                 2                 5
2                 3                 6

The names seem left aligned instead of right? And every column uses the width needed for the widest column.

This happens when the Index object backing the columns uses the string dtype.
It seem this also happens for the the other String dtype instantiations (eg after doing df.columns = df.columns.astype("string[python]"), i.e. not Arrow specific), but in general it's harder to get a DataFrame with such column names with using pd.options.mode.string_storage, since you still need to manually specify the dtype.

cc @phofl

@jorisvandenbossche jorisvandenbossche added Bug Strings String extension data type and string data labels Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant