Skip to content

Invalid parsing of processing instructions #770

@chw-1

Description

@chw-1

Hello,

In version 1.9.2, processing instructions are not correctly parsed any more.
Here is sample code for reproducing the issue.

package jsoupbug;

import java.util.List;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Node;
import org.jsoup.parser.Parser;

public class JsoupBug {

    private static final String XML = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<?myProcessingInstruction My Processing instruction.?>";

    public static void main(String[] args) {
        Document document = Jsoup.parse(XML, "", Parser.xmlParser());
        document.outputSettings().prettyPrint(false);
        List<Node> nodes = document.childNodes();
        Node node = nodes.get(2);
        String outerHtml = node.outerHtml();
        System.out.println(outerHtml);
    }

}

When I correctly understand the spec (https://www.w3.org/TR/REC-xml/#sec-pi) spaces are valid characters for processing instructions, but Jsoup messes things up.

With version 1.9.2 this prints:
<?myprocessingInstruction my="" processing="" instruction.=""?>
However in 1.9.1 the behavior is as I would expect:
<?myProcessingInstruction My Processing instruction.?>

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA confirmed bug, that we should fixfixedAn {bug|improvement} that has been {fixed|implemented}

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions