-
Notifications
You must be signed in to change notification settings - Fork 18k
encoding/xml: XML CDATA section could be joined together with regular characters #12611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
also is a problem if the xml element contains indented children and a cdata section example:
",cdata" of parent ends up being: workaround is to use ",innerxml" and create a custom marshalXML/unmarshalXML method for your datatype |
Two comments from me:
|
You said:
I agree. However I can't distinguish cdata sections using unmarshal, and I can't distinguish cdata sections using token. This behavior is not to spec:When I simply write what I read using unmarshal and then marshal these types of cdata sections get a ton of extra newlines (in addition to the regular indent ones) all wrapped in cdata. Aka crazy output. Very much against section 2.11 |
Example crazy output using ",cdata": Input of 'Unmarshal'
Output of 'MarshalIndent'
|
I still believe this is a different issue, related but not the same as this one. I am not the author of the XML package, so I can't give an authorative answer. Perhaps you should post code in a new bug report which shows the behaviour? This makes it easier to reproduce the problem. Your “crazy output” seems crazy to me, too. I think you have hit a bug, while my issue is just a nuisance. |
Fair enough, thanks for your feedback, and have a Merry Christmas |
i was going to make a new bug report because of a bug with the comment node parsing, but i found this issue still open. existing bug:
i wanted to add the following
upon some more looking at the XML specification, it looks like the Go XML parser behaves according to specification.
although, i wonder whether the Go parser would allow getting the actual XML tree, like how the Mozilla Firefox JS engine does: https://go.dev/play/p/2FnjtqHdKnC package main
import (
"encoding/xml"
"fmt"
)
type Person struct {
XMLName xml.Name `xml:"PERSON"`
Comment1 xml.Comment `xml:",comment"`
Name struct {
XMLName xml.Name `xml:"NAME"`
CData1 string `xml:",cdata"`
Example struct {
XMLName xml.Name `xml:"EXAMPLE"`
}
CData2 string `xml:",cdata"`
}
Comment2 xml.Comment `xml:",comment"`
}
func main() {
var d string = `<PERSON>
<!-- comment1 -->
<NAME>
<![CDATA[John1]]>
<EXAMPLE />
<![CDATA[Doe]]>
</NAME>
<!-- comment2 -->
</PERSON>`
fmt.Printf("input XML document: %s\n", d)
var unmarshalledData Person
var err error
err = xml.Unmarshal([]byte(d), &unmarshalledData)
if err != nil {
fmt.Println("error unmarshalling XML:", err)
return
}
fmt.Printf("got unmarshalled data: %+v\n", unmarshalledData)
var output []byte
output, err = xml.Marshal(unmarshalledData)
if err != nil {
fmt.Println("error marshalling XML:", err)
return
}
fmt.Printf("then marshalled the unmarshalled data: %s\n", output)
var expectedData Person = Person{
XMLName: xml.Name{
Local: "PERSON",
},
Comment1: xml.Comment(" comment1 "),
Name: struct {
XMLName xml.Name `xml:"NAME"`
CData1 string `xml:",cdata"`
Example struct {
XMLName xml.Name `xml:"EXAMPLE"`
}
CData2 string `xml:",cdata"`
}{
XMLName: xml.Name{
Local: "NAME",
},
CData1: "John",
Example: struct {
XMLName xml.Name `xml:"EXAMPLE"`
}{
XMLName: xml.Name{
Local: "EXAMPLE",
},
},
CData2: "Doe",
},
Comment2: xml.Comment(" comment2 "),
}
fmt.Printf("expected unmarshalled data (ignore text nodes because not specified in `Person` struct): %+v\n", expectedData)
output, err = xml.Marshal(expectedData)
if err != nil {
fmt.Println("error marshalling XML:", err)
return
}
fmt.Printf("then marshalled the expected unmarshalled data: %s\n", output)
}
// Output:
// input XML document: <PERSON>
// <!-- comment1 -->
// <NAME>
// <![CDATA[John1]]>
// <EXAMPLE />
// <![CDATA[Doe]]>
// </NAME>
// <!-- comment2 -->
// </PERSON>
// got unmarshalled data: {XMLName:{Space: Local:PERSON} Comment1:[32 99 111 109 109 101 110 116 49 32 32 99 111 109 109 101 110 116 50 32] Name:{XMLName:{Space: Local:NAME} CData1:
// John1
//
// Doe
// Example:{XMLName:{Space: Local:EXAMPLE}} CData2:} Comment2:[]}
// then marshalled the unmarshalled data: <PERSON><!-- comment1 comment2 --><NAME><![CDATA[
// John1
//
// Doe
// ]]><EXAMPLE></EXAMPLE></NAME></PERSON>
// expected unmarshalled data (ignore text nodes because not specified in `Person` struct): {XMLName:{Space: Local:PERSON} Comment1:[32 99 111 109 109 101 110 116 49 32] Name:{XMLName:{Space: Local:NAME} CData1:John Example:{XMLName:{Space: Local:EXAMPLE}} CData2:Doe} Comment2:[32 99 111 109 109 101 110 116 50 32]}
// then marshalled the expected unmarshalled data: <PERSON><!-- comment1 --><NAME><![CDATA[John]]><EXAMPLE></EXAMPLE><![CDATA[Doe]]></NAME><!-- comment2 --></PERSON> here's what the JavaScript engine of Mozilla Firefox 137 gave me: const xmlStr = `<PERSON>
<!-- comment1 -->
<NAME>
<![CDATA[John1]]>
<EXAMPLE />
<![CDATA[Doe]]>
</NAME>
<!-- comment2 -->
</PERSON>`
const parser = new DOMParser();
const doc = parser.parseFromString(xmlStr, "application/xml");
// print the name of the root element or error message
const errorNode = doc.querySelector("parsererror");
if (errorNode) {
console.error("error while parsing", errorNode);
} else {
console.log(doc.childNodes[0].childNodes);
console.log(doc.childNodes[0].childNodes[3].childNodes);
}
// Output:
// NodeList(7) [ #text, <!-- comment1 -->, #text, NAME, #text, <!-- comment2 -->, #text ]
// NodeList(7) [ #text, CDATASection, #text, EXAMPLE, #text, CDATASection, #text ] |
go version go1.5 darwin/amd64
One thing I stumbled across yesterday (not a real bug, but a minor nuisance from a user's perspective perhaps):
gives
I would expect one
xml.CharData{}
token instead:While I understand the source of the three tokens, I would expect one as the user (= me) is unable to distinguish between a CDATA node and a regular text node.
The text was updated successfully, but these errors were encountered: