Skip to content

Python: Port and extend XXE modeling #6112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 91 commits into from
Mar 14, 2022

Conversation

jorgectf
Copy link
Contributor

@jorgectf jorgectf commented Jun 19, 2021

This PR introduces the modeling of the following XML parsing-related libraries and specific methods:

XML Parsers:
  xml.etree.ElementTree.XMLParser() - not extends entities anymore
  lxml.etree.XMLParser() - no_network=True huge_tree=False resolve_entities=True
  lxml.etree.get_default_parser() - no options, default above options
  xml.sax.make_parser() - parser.setFeature(xml.sax.handler.feature_external_ges, True)

XML Parsing:
  string:
    xml.etree.ElementTree.fromstring(list)
    xml.etree.ElementTree.XML
    lxml.etree.fromstring(list)
    lxml.etree.XML
    xmltodict.parse - disable_entities=True

  file StringIO(), BytesIO(b):
    xml.etree.ElementTree.parse
    lxml.etree.parse
    xml.dom.(mini|pull)dom.parse(String)

@jorgectf
Copy link
Contributor Author

jorgectf commented Jun 19, 2021

I have thought about directly changing the current code, but I will be writing into experimental because of a dilemma that has just come up.

/**
* A data-flow node that decodes data from a binary or textual format. This
* is intended to include deserialization, unmarshalling, decoding, unpickling,
* decompressing, decrypting, parsing etc.
*
* A decoding (automatically) preserves taint from input to output. However, it can
* also be a problem in itself, for example if it allows code execution or could result
* in denial-of-service.
*
* Extend this class to refine existing API models. If you want to model new APIs,
* extend `Decoding::Range` instead.
*/
class Decoding extends DataFlow::Node {
Decoding::Range range;
Decoding() { this = range }
/** Holds if this call may execute code embedded in its input. */
predicate mayExecuteInput() { range.mayExecuteInput() }
/** Gets an input that is decoded by this function. */
DataFlow::Node getAnInput() { result = range.getAnInput() }
/** Gets the output that contains the decoded data produced by this function. */
DataFlow::Node getOutput() { result = range.getOutput() }
/** Gets an identifier for the format this function decodes from, such as "JSON". */
string getFormat() { result = range.getFormat() }
}

Should we treat XXE as a deserialization? If so, according to Concepts.qll (L114), the only way to look for sinks in taint configs is mayExecuteInput() (L130). However, an XXE won't execute code/commands (unless PHP's expect wrapper is loaded) but can be dangerous (SSRF, DoS).

Taking into account that changing mayExecuteInput() to mayBeDangerous() is a bit ambiguous, I guess I will create an XXE Concept for now and leave the issue for the pros 😎.

@jorgectf jorgectf changed the title Python: Port and extend unsafe deserialization modeling Python: Port and extend XXE modeling Jun 19, 2021
@jorgectf jorgectf marked this pull request as ready for review July 22, 2021 17:35
@jorgectf jorgectf requested a review from a team as a code owner July 22, 2021 17:35
@jorgectf
Copy link
Contributor Author

This query is ready for code review 😃

@RasmusWL RasmusWL self-assigned this Aug 25, 2021
@jorgectf jorgectf marked this pull request as draft August 25, 2021 14:14
@jorgectf jorgectf marked this pull request as ready for review August 25, 2021 15:18
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed privately, instead of typing out lengthy replies, I would simple do the changed modeling myself... that is ready now in jorgectf#9.

That PR I made IS quite a mouthful. I think there are interesting things for you to learn from this, but I can also understand that it could take some time for you to process this. If you have not had time to review this within 1 week, I think I will just merge this PR, and apply this commits on top, so we can get your good work closer to being part of the default query suite (unless you object to this 1 week).

Comment on lines 44 to 64
@app.route("/xml_etree_fromstring-lxml_etree_XMLParser")
def xml_parser_2():
xml_content = request.args['xml_content']

parser = lxml.etree.XMLParser()
return xml.etree.ElementTree.fromstring(xml_content, parser=parser).text

@app.route("/xml_etree_fromstring-lxml_get_default_parser")
def xml_parser_3():
xml_content = request.args['xml_content']

parser = lxml.etree.get_default_parser()
return xml.etree.ElementTree.fromstring(xml_content, parser=parser).text

@app.route("/xml_etree_fromstring-lxml_get_default_parser")
def xml_parser_4():
xml_content = request.args['xml_content']

parser = xml.sax.make_parser()
parser.setFeature(xml.sax.handler.feature_external_ges, True)
return xml.etree.ElementTree.fromstring(xml_content, parser=parser).text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you seen anyone use xml.etree with a parser from a different package in any real code? If not, I would consider this usecase a bit too obscure, and not have any tests for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, I just tried all parsing modules with all parsers and noted which worked. Using a different parser than the one from the package being used doesn't really make sense, I'm fine removing this use case :)

override DataFlow::Node getAnInput() { none() }

override predicate vulnerable(string kind) {
kind = "XXE" and not this.getArgByName("resolve_entities").asExpr() = any(False f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you already re-wrote this, but consider:

lxml.etree.XMLParser(resolve_entities=True)

The first predicate will hold for such a call (meaning we treat it as vulnerable to XXE), but since the resolve_entities keyword argument is present, the second predicate will not hold for such a call (meaning we treat it as safe for XXE). So there is a subtle difference. (and having tests for all such cases really helps to get such things right 😉)

predicate works() {
  not this.getArgByName("resolve_entities").getALocalSource().asExpr() = any(False f)
}

predicate doesNotWork() {
  not (
    exists(this.getArgByName("resolve_entities")) or
    this.getArgByName("resolve_entities").asExpr() = any(False f)
  )
}

I ended up writing this as

(
  // resolve_entities has default True
  not exists(this.getArgByName("resolve_entities"))
  or
  this.getArgByName("resolve_entities").getALocalSource().asExpr() = any(True f)
)

Comment on lines 127 to 137
predicate vulnerable(DataFlow::Node n, string kind) {
exists(API::Node handler, API::Node feature |
handler = API::moduleImport("xml").getMember("sax").getMember("handler") and
DataFlow::exprNode(trackSaxFeature(this, feature).asExpr())
.(DataFlow::LocalSourceNode)
.flowsTo(n)
|
kind = ["XXE", "DTD retrieval"] and
feature = handler.getMember("feature_external_ges")
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten this

exists(DataFlow::MethodCallNode parse, API::Node handler, API::Node feature |
handler = API::moduleImport("xml").getMember("sax").getMember("handler") and
parse.calls(trackSaxFeature(this, feature), "parse") and
parse.getArg(0) = this.getAnInput() // enough to avoid FPs?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten this

Comment on lines 37 to 43
predicate xmlInjectionVulnerable(DataFlow::PathNode source, DataFlow::PathNode sink, string kind) {
xmlInjection(source, sink) and
(
xmlParsingInputAsVulnerableSink(sink.getNode(), kind) or
xmlParserInputAsVulnerableSink(sink.getNode(), kind)
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have written a solution for this

* * `getAnInput()`'s result would be `foo`.
* * `vulnerable(kind)`'s `kind` would be `Billion Laughs` and `Quadratic Blowup`.
*/
private class XMLRPCServer extends DataFlow::CallCfgNode, XML::XMLParser::Range {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up writing separate query for this.

@jorgectf jorgectf requested a review from RasmusWL March 4, 2022 16:22
@RasmusWL
Copy link
Member

RasmusWL commented Mar 4, 2022

Cheers 👍 I think this should be good to go now, but will need tests to be ✔️ first. Will probably merge it by monday 👍

@RasmusWL RasmusWL force-pushed the jorgectf/python/deserialization branch from 44c9443 to 0e9da4a Compare March 9, 2022 10:06
@RasmusWL RasmusWL merged commit 2f4a22c into github:main Mar 14, 2022
@jorgectf
Copy link
Contributor Author

This PR has been an amazing ride @RasmusWL, thank you!

@jorgectf jorgectf deleted the jorgectf/python/deserialization branch March 14, 2022 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants