-
Notifications
You must be signed in to change notification settings - Fork 875
<hr> tags break markup processing in 3.3.0+? #1053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'll have to test the tip. We have some unreleased fixes as there were some inconsistencies with certain tags. |
Yeah, it looks like new behavior introduced by the new parser. I would think On the one hand, as it is a block item, I'm not sure it should render inside a paragraph. But on the other hand, it seems to treat it as a block (which it is) and just keep searching for an end tag. It doesn't find the end, but assumes all the trailing text is inside the Interestingly, if we add
Seems we need to handle unclosed tags a bit better:
@waylan thoughts? |
Here is a potential fix. Interested to hear your opinion on approach @waylan or if you have a different suggestion: diff --git a/markdown/extensions/md_in_html.py b/markdown/extensions/md_in_html.py
index a2137c7..4bfc6c9 100644
--- a/markdown/extensions/md_in_html.py
+++ b/markdown/extensions/md_in_html.py
@@ -36,9 +36,10 @@ class HTMLExtractorExtra(HTMLExtractor):
# Block-level tags which never get their content parsed.
self.raw_tags = ['canvas', 'math', 'option', 'pre', 'script', 'style', 'textarea']
# Block-level tags in which the content gets parsed as blocks
- self.block_tags = [tag for tag in self.block_level_tags if tag not in self.span_tags + self.raw_tags]
super().__init__(md, *args, **kwargs)
+ self.block_tags = [tag for tag in self.block_level_tags if tag not in self.span_tags + self.raw_tags]
+
def reset(self):
"""Reset this instance. Loses all unprocessed data."""
self.mdstack = [] # When markdown=1, stack contains a list of tags
@@ -119,6 +120,9 @@ class HTMLExtractorExtra(HTMLExtractor):
else:
self.handle_data(text)
+ if tag in self.empty_tags:
+ self.handle_endtag(tag)
+
def handle_endtag(self, tag):
if tag in self.block_level_tags:
if self.inraw:
diff --git a/markdown/htmlparser.py b/markdown/htmlparser.py
index 6776d34..bd75368 100644
--- a/markdown/htmlparser.py
+++ b/markdown/htmlparser.py
@@ -56,6 +56,10 @@ class HTMLExtractor(htmlparser.HTMLParser):
def __init__(self, md, *args, **kwargs):
if 'convert_charrefs' not in kwargs:
kwargs['convert_charrefs'] = False
+
+ # Block tags that should contain no content (self closing)
+ self.empty_tags = ['hr']
+
# This calls self.reset
super().__init__(*args, **kwargs)
self.md = md
@@ -135,6 +139,9 @@ class HTMLExtractor(htmlparser.HTMLParser):
# This is presumably a standalone tag in a code span (see #1036).
self.clear_cdata_mode()
+ if tag in self.empty_tags:
+ self.handle_endtag(tag)
+
def handle_endtag(self, tag):
text = self.get_endtag_text(tag)
|
Should be noted that this doesn't universally fix potential data loss in |
This will have to be tweaked as well as it isn't perfect and doubles |
This is a bit more sane: diff --git a/markdown/extensions/md_in_html.py b/markdown/extensions/md_in_html.py
index a2137c7..1fc4ab8 100644
--- a/markdown/extensions/md_in_html.py
+++ b/markdown/extensions/md_in_html.py
@@ -36,9 +36,10 @@ class HTMLExtractorExtra(HTMLExtractor):
# Block-level tags which never get their content parsed.
self.raw_tags = ['canvas', 'math', 'option', 'pre', 'script', 'style', 'textarea']
# Block-level tags in which the content gets parsed as blocks
- self.block_tags = [tag for tag in self.block_level_tags if tag not in self.span_tags + self.raw_tags]
super().__init__(md, *args, **kwargs)
+ self.block_tags = [tag for tag in self.block_level_tags if tag not in self.span_tags + self.raw_tags]
+
def reset(self):
"""Reset this instance. Loses all unprocessed data."""
self.mdstack = [] # When markdown=1, stack contains a list of tags
diff --git a/markdown/htmlparser.py b/markdown/htmlparser.py
index 6776d34..2e14038 100644
--- a/markdown/htmlparser.py
+++ b/markdown/htmlparser.py
@@ -56,6 +56,10 @@ class HTMLExtractor(htmlparser.HTMLParser):
def __init__(self, md, *args, **kwargs):
if 'convert_charrefs' not in kwargs:
kwargs['convert_charrefs'] = False
+
+ # Block tags that should contain no content (self closing)
+ self.empty_tags = ['hr']
+
# This calls self.reset
super().__init__(*args, **kwargs)
self.md = md
@@ -264,6 +268,13 @@ class HTMLExtractor(htmlparser.HTMLParser):
if end.endswith('/>'):
# XHTML-style empty tag: <span attr="value" />
self.handle_startendtag(tag, attrs)
+ elif tag in self.empty_tags:
+ if re.match(r'\s*</\s*{}\s*>'.format(tag), self.rawdata[self.line_offset + self.offset + len(self.__starttag_text):]):
+ if tag in self.CDATA_CONTENT_ELEMENTS:
+ self.set_cdata_mode(tag)
+ self.handle_starttag(tag, attrs)
+ else:
+ self.handle_startendtag(tag, attrs)
else:
# *** set cdata_mode first so we can override it in handle_starttag (see #1036) ***
if tag in self.CDATA_CONTENT_ELEMENTS:
|
Still not happy with how the |
This was an intentional change in behavior. Previously, we strictly enforced the rule which required that an HTML block must start with a blank line. However, in practice, in most cases, even the reference implementation doesn't follow that rule (strangely it does with I should also note that it is invalid HTML to have an <p>foo
<hr>
bar
</p> as <p>foo
</p><hr>
bar
<p></p> Note that
would be <p>foo</p>
<hr>
<p>bar</p> And that will provide a better rendering in the browser. Which means that we do have a bug in that the line after the |
I have some work here: #1054. We can discuss changes there and change direction if I'm going in bad direction. This case:
Should render as
|
<hr>
elements seem to interrupt markdown text processing:'*emphasis1*\n<hr>\n*emphasis2*'
'<p><em>emphasis1</em></p>\n<hr>\n*emphasis2*'
'<p><em>emphasis1</em>\n<hr>\n<em>emphasis2</em></p>'
*emphasis1*\n<hr/>\n*emphasis2*
(closing thehr
tag) causes*emphasis2*
to render correctly.Naked
<hr>
tags should be valid HTML: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/hrHaving the
md_in_html
extension enabled or disabled doesn't seem to make a difference.Versions
Steps to reproduce
mderror.py
:Output:
897c854 also fails:
3.3.0
starts failing:3.2.2
passes all tests:The text was updated successfully, but these errors were encountered: