-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Python: Port and extend XXE modeling #6112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Port and extend XXE modeling #6112
Conversation
I have thought about directly changing the current code, but I will be writing into experimental because of a dilemma that has just come up. codeql/python/ql/src/semmle/python/Concepts.qll Lines 112 to 140 in 26a04d6
Should we treat Taking into account that changing |
This query is ready for code review 😃 |
I had forgotten about this, but better late than never... also added a small representative test
But handling this in a nice way will require some restructuring
and handle parser being passed as positional argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed privately, instead of typing out lengthy replies, I would simple do the changed modeling myself... that is ready now in jorgectf#9.
That PR I made IS quite a mouthful. I think there are interesting things for you to learn from this, but I can also understand that it could take some time for you to process this. If you have not had time to review this within 1 week, I think I will just merge this PR, and apply this commits on top, so we can get your good work closer to being part of the default query suite (unless you object to this 1 week).
@app.route("/xml_etree_fromstring-lxml_etree_XMLParser") | ||
def xml_parser_2(): | ||
xml_content = request.args['xml_content'] | ||
|
||
parser = lxml.etree.XMLParser() | ||
return xml.etree.ElementTree.fromstring(xml_content, parser=parser).text | ||
|
||
@app.route("/xml_etree_fromstring-lxml_get_default_parser") | ||
def xml_parser_3(): | ||
xml_content = request.args['xml_content'] | ||
|
||
parser = lxml.etree.get_default_parser() | ||
return xml.etree.ElementTree.fromstring(xml_content, parser=parser).text | ||
|
||
@app.route("/xml_etree_fromstring-lxml_get_default_parser") | ||
def xml_parser_4(): | ||
xml_content = request.args['xml_content'] | ||
|
||
parser = xml.sax.make_parser() | ||
parser.setFeature(xml.sax.handler.feature_external_ges, True) | ||
return xml.etree.ElementTree.fromstring(xml_content, parser=parser).text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you seen anyone use xml.etree
with a parser from a different package in any real code? If not, I would consider this usecase a bit too obscure, and not have any tests for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, I just tried all parsing modules with all parsers and noted which worked. Using a different parser than the one from the package being used doesn't really make sense, I'm fine removing this use case :)
override DataFlow::Node getAnInput() { none() } | ||
|
||
override predicate vulnerable(string kind) { | ||
kind = "XXE" and not this.getArgByName("resolve_entities").asExpr() = any(False f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you already re-wrote this, but consider:
lxml.etree.XMLParser(resolve_entities=True)
The first predicate will hold for such a call (meaning we treat it as vulnerable to XXE), but since the resolve_entities
keyword argument is present, the second predicate will not hold for such a call (meaning we treat it as safe for XXE). So there is a subtle difference. (and having tests for all such cases really helps to get such things right 😉)
predicate works() {
not this.getArgByName("resolve_entities").getALocalSource().asExpr() = any(False f)
}
predicate doesNotWork() {
not (
exists(this.getArgByName("resolve_entities")) or
this.getArgByName("resolve_entities").asExpr() = any(False f)
)
}
I ended up writing this as
(
// resolve_entities has default True
not exists(this.getArgByName("resolve_entities"))
or
this.getArgByName("resolve_entities").getALocalSource().asExpr() = any(True f)
)
predicate vulnerable(DataFlow::Node n, string kind) { | ||
exists(API::Node handler, API::Node feature | | ||
handler = API::moduleImport("xml").getMember("sax").getMember("handler") and | ||
DataFlow::exprNode(trackSaxFeature(this, feature).asExpr()) | ||
.(DataFlow::LocalSourceNode) | ||
.flowsTo(n) | ||
| | ||
kind = ["XXE", "DTD retrieval"] and | ||
feature = handler.getMember("feature_external_ges") | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rewritten this
exists(DataFlow::MethodCallNode parse, API::Node handler, API::Node feature | | ||
handler = API::moduleImport("xml").getMember("sax").getMember("handler") and | ||
parse.calls(trackSaxFeature(this, feature), "parse") and | ||
parse.getArg(0) = this.getAnInput() // enough to avoid FPs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rewritten this
predicate xmlInjectionVulnerable(DataFlow::PathNode source, DataFlow::PathNode sink, string kind) { | ||
xmlInjection(source, sink) and | ||
( | ||
xmlParsingInputAsVulnerableSink(sink.getNode(), kind) or | ||
xmlParserInputAsVulnerableSink(sink.getNode(), kind) | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have written a solution for this
* * `getAnInput()`'s result would be `foo`. | ||
* * `vulnerable(kind)`'s `kind` would be `Billion Laughs` and `Quadratic Blowup`. | ||
*/ | ||
private class XMLRPCServer extends DataFlow::CallCfgNode, XML::XMLParser::Range { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up writing separate query for this.
Co-authored-by: Jorge <[email protected]>
Nice spotted @jorgectf!
Rasmus' rewrite of github#6112 See github#6112 (review)
Cheers 👍 I think this should be good to go now, but will need tests to be ✔️ first. Will probably merge it by monday 👍 |
Not the prettiest solution... but it works ¯\_(ツ)_/¯
44c9443
to
0e9da4a
Compare
This PR has been an amazing ride @RasmusWL, thank you! |
This PR introduces the modeling of the following XML parsing-related libraries and specific methods: