Skip to content

PRE block child nodes not parsed & DOCTYPE not recognized #78

@nonara

Description

@nonara

Hi! First, thanks for the great work on this library! This is truly the fastest and best out there.

I needed a lightning fast html-to-markdown library, and everything out there was extremely slow. With the help of node-html-parser, I was able to write something that blazes through it! (I'll be writing a readme soon, but it is a full release)

In the process, I discovered two issues, which I'll detail below.

PRE blocks don't parse children

In pre-formatted elements, contents are always being treated as text, so child-nodes do not get parsed.

This causes an issue for multi-line code blocks. Per spec, a multi-line code block is <code> wrapped in <pre>. (see: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/code#Notes)

The following treats <code ... > as a text node, where spec behaviour is to allow HTML child nodes within pre

<pre>
  <code class='language-javascript'>
    console.log('hello');
  </code>
</pre>

You can see how we workaround this, here: https://github.com/crosstype/node-html-markdown/blob/master/src/config.ts#L79-L89

DOCTYPE is parsed as a text node

<!DOCTYPE ...> nodes are being parsed as text nodes. I think the desired behaviour here would be to simply ignore the node.

To work around this, we're temporarily replacing them ahead of time. (see: https://github.com/crosstype/node-html-markdown/blob/master/src/main.ts#L40-L42)

Side-note

This and another of my libraries broke after a recent yarn install, due to the fact that the options we passed were no longer valid. We were specifying { pre: true, style: false }

Not a major issue, but if you're following semver, changing public API options is generally considered a breaking change, as it could cause projects which use it as a dependency to fail to compile.

Thanks again for the great work, and I hope you have a good day!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions