-
Notifications
You must be signed in to change notification settings - Fork 875
TOC:Anchor link written in Japanese does not work #1118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
First of all, I'm not familiar with Japanese to know what is correct. However, even in English there is occasionally a change in a character when generating a slug. So, yes, you sometimes need to account for that when manually creating your own links to those generated slugs. For all I know, converting We need to be very careful here about what changes we make. For example, if we make a change will that cause pre-existing links to slugs in existing document to now be wrong when those documents are re-rendered using the new version of Markdown? |
Some historical investigation. I think the initial purpose of #970 added support for Unicode IDs, but it left the first part of that code unchanged, so the normalization happens also when Interesting that that PR adds a test for Japanese! It checks that If we just replace NFKD with NFKC in all cases, it will break the old behavior of ASCII-fying Extended Latin. But I think it should be safe to do it (or disable normalization at all) when |
@mitya57 thanks for doing the research on this. I think your proposal makes sense. If I passed We provide two options (1) normalize to ASCII and (2) preserve Unicode as-is (only normalizing whitespace). If users want other behavior, they can provide their own function or use some third-party provided function. |
It already works like that. To replace What I meant is that I see why normalization is needed with ASCII slugs: it is an essential part of making them ASCII. But I don't see a need in normalization when we use unicode slugs. |
Thanks for reminding me how the normalization works (I forget those details sometimes). In any event, I agree, we don't need normalization for unicode slugs. |
Update the existing test and add a new one to make sure that the behavior of default slugify function has not changed. Fixes Python-Markdown#1118.
Update the existing test and add a new one to make sure that the behavior of default slugify function has not changed. Fixes #1118.
I'm using the extension TOC with slugify_unicode for Japanese.
And I'm using anchor links.
In some cases, Japanese anchor link does not work.
That is when Japanese characters contains dakuon(for example 'ba')
or handakuon(for example 'pa').
I think this is because the characters in the generated ID
and the characters in the header are different.
Sample Markdown:
Generated html:
The result I expect is:
As far as I can tell, this depends on how the unicodedata.normalize()
method arguments are used.
In other words, I think we need to change the first argument
from "NFKD" to "NFKC".
Reference: Difference Between NFD, NFC, NFKD, and NFKC Explained with Python Code | by Xu LIANG | Towards Data Science
I'm not familiar with unicode.
And this is my first post.
Please investigate.
The text was updated successfully, but these errors were encountered: