-
Notifications
You must be signed in to change notification settings - Fork 116
Description
Hi! First, thanks for the great work on this library! This is truly the fastest and best out there.
I needed a lightning fast html-to-markdown library, and everything out there was extremely slow. With the help of node-html-parser, I was able to write something that blazes through it! (I'll be writing a readme soon, but it is a full release)
In the process, I discovered two issues, which I'll detail below.
PRE blocks don't parse children
In pre-formatted elements, contents are always being treated as text, so child-nodes do not get parsed.
This causes an issue for multi-line code blocks. Per spec, a multi-line code block is <code> wrapped in <pre>. (see: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/code#Notes)
The following treats <code ... > as a text node, where spec behaviour is to allow HTML child nodes within pre
<pre>
<code class='language-javascript'>
console.log('hello');
</code>
</pre>You can see how we workaround this, here: https://github.com/crosstype/node-html-markdown/blob/master/src/config.ts#L79-L89
DOCTYPE is parsed as a text node
<!DOCTYPE ...> nodes are being parsed as text nodes. I think the desired behaviour here would be to simply ignore the node.
To work around this, we're temporarily replacing them ahead of time. (see: https://github.com/crosstype/node-html-markdown/blob/master/src/main.ts#L40-L42)
Side-note
This and another of my libraries broke after a recent yarn install, due to the fact that the options we passed were no longer valid. We were specifying { pre: true, style: false }
Not a major issue, but if you're following semver, changing public API options is generally considered a breaking change, as it could cause projects which use it as a dependency to fail to compile.
Thanks again for the great work, and I hope you have a good day!