SgmlReader可以用来解析HTML/SGML,并能把不规范的Html转换成规范的XHtml
string SgmlTranslate(string input){var reader = new SgmlReader();reader.DocType = "HTML";reader.WhitespaceHandling = WhitespaceHandling.None;reader.CaseFolding = Sgml.CaseFolding.ToLower;reader.InputStream = new StringReader(input);var output = new StringWriter();var writer = new XmlTextWriter(output);writer.Formatting = Formatting.Indented;while (reader.Read()){if (reader.NodeType != XmlNodeType.Whitespace&& reader.NodeType != XmlNodeType.Comment)writer.WriteNode(reader, true);}writer.Close();return output.ToString();}
仿照官方给出的示例代码写的,
另外,更改了这一句,可使得生成的XML有缩进
reader.WhitespaceHandling = WhitespaceHandling.None
最后,记得一定要把Comment类型的NodeType排除了···
昨儿险些被一个注释害死