Skip to content

Conversation

@staticdev
Copy link
Contributor

@staticdev staticdev commented Jan 14, 2021

  • Just passes binary to tomlkit. The only way to work on Windows to parse .toml files with special characters.

Closes #213

@cjolowicz
Copy link
Owner

cjolowicz commented Jan 14, 2021

Thank you for the report and for contributing a fix! 🙇‍♂️

Per the TOML spec, input documents must be valid UTF-8. IIUC the problem is that Path.read_text() ends up using the preferred locale encoding instead. Judging by the traceback in your issue description, this happens to be CP-1252 on your system.

As a fix, I would prefer to be explicit about the encoding:

-        text = path.read_text()
+        text = path.read_text(encoding="utf-8")

Could you try this and adapt the PR?

@staticdev
Copy link
Contributor Author

Thank you for the report and for contributing a fix!

Per the TOML spec, input documents must be valid UTF-8. IIUC the problem is that Path.read_text() ends up using the preferred locale encoding instead. Judging by the traceback in your issue description, this happens to be CP-1252 on your system.

As a fix, I would prefer to be explicit about the encoding:

-        text = path.read_text()
+        text = path.read_text(encoding="utf-8")

Could you try this and adapt the PR?

Done @cjolowicz.

@cjolowicz cjolowicz changed the title Fix TOML parse Decode pyproject.toml as UTF-8 regardless of system locale Jan 14, 2021
@cjolowicz cjolowicz added the bug Something isn't working label Jan 14, 2021
@cjolowicz cjolowicz merged commit 1d13c74 into cjolowicz:master Jan 14, 2021
@cjolowicz
Copy link
Owner

Released in 0.7.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UnicodeDecodeError error on Windows after upgrading from 0.5.0 to 0.6.0

2 participants