-
Notifications
You must be signed in to change notification settings - Fork 1.7k
C++: Implement use-after-free and double-free queries using the new IR use-use dataflow #12569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Implement use-after-free and double-free queries using the new IR use-use dataflow #12569
Conversation
QHelp previews: cpp/ql/src/Critical/DoubleFree.qhelpPotential double freeDeallocating memory more than once can lead to a double-free vulnerability. This can be exploited to corrupt the allocator's internal data structures, which can lead to denial-of-service attacks by crashing the program, or security vulnerabilities, by allowing an attacker to overwrite arbitrary memory locations. RecommendationEnsure that all execution paths deallocate the allocated memory at most once. If possible, reassign the pointer to a null value after deallocating it. This will prevent double-free vulnerabilities since most deallocation functions will perform a null-pointer check before attempting to deallocate the memory. Exampleint* f() {
int *buff = malloc(SIZE*sizeof(int));
do_stuff(buff);
free(buff);
int *new_buffer = malloc(SIZE*sizeof(int));
free(buff); // BAD: If new_buffer is assigned the same address as buff,
// the memory allocator will free the new buffer memory region,
// leading to use-after-free problems and memory corruption.
return new_buffer;
}
References
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments. This looks good so far!
There are false positives in the following cases which I think needs to be addressed: char *a = (char*) malloc(10);
void *b = realloc(a, strlen(a) + 1); // Use after free of a on strlen(a) char *a = (char*) malloc(10);
void *b = realloc(a, 11);
if (!b) free(a); // a is untouched if b == NULL. This case happens very often. void f(void ** a) {
free(*a);
*a = malloc(10); // use after free, but 'a' is not freed. |
8338a4d
to
60ee783
Compare
In general, I don't think we should track flow through void* realloc(void* a, n {)
void *ret = malloc(n);
if (ret) {
free(a);
}
return ret;
} maybe we could make it work, but we should also consider that free also special cases void free(void* a) [
if (a) free(a); // call to actual free
} which causes FP's in the code: void* test() {
void* a = NULL;
free(a);
use(a); // GOOD [FALSE POSITIVE]
return a; // GOOD [FALSE POSITIVE]
} Would range analysis help in this case to understand that |
I'm sure we could do something like what you suggest in the query, but for the purposes of getting this PR merged as quickly as possible, I think we should just exclude There are many possible ways of figuring out that In general, since the goal of the trick with dominators and post-dominators was to avoid having to invent complex path-sensitive analysis, I don't think it's worth putting too many clever into this case initially. So: Let's exclude |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're on the right track with this. We should definitely pull the common code out into a suitably named .qll
, and take a look at performance on a DCA run. We'll want a docs review in good time as well.
cpp/ql/test/query-tests/Security/CWE/CWE-416/semmle/tests/UseAfterFree.expected
Show resolved
Hide resolved
10c6b1b
to
ba80fc7
Compare
8c5e55e
to
b94e989
Compare
b94e989
to
17df634
Compare
I've refactored the code to so that the main logic is shared though a parameterized module, which makes the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments (for whoever ends up moving this PR across the finish line).
cpp/ql/src/Critical/UseAfterFree.ql
Outdated
e = dfe.asExpr() and | ||
( | ||
e = any(ReturnStmt rs).getExpr() or | ||
e = any(FunctionCall fc).getAChild() or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we really want here?
-
Is it because we want to catch the following:
void myFunc(void* p) { ... } free(p); myFunc(p);
and we have to stop at the argument because we're doing local-only flow?
If so, I think we should be a bit smarter here and identify "functions that always dereference a parameter". We can do that by checking if there's local flow from a parameter node to a use, and ensuring that the use always post-dominates the parameter (very much like what we're doing in the query already).
-
Or is it because we want to catch this (which I doubt we'd actually see in the wild, and I think this would even be included in the
PointerDereferenceExpr
case):free(myFunc); myFunc();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first case. The intention is flag passing a pointer pointing to de-allocated memory to a function, indeed
We could extend the analysis to be interprocedural, but I would argue that passing an invalid pointer to any function should be considered harmful, and I have not seen any FP's for that case.
The only point it would make sense is if the callee just ignores the argument.
It should be noted that we block flow though CallByReference
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followup after a sync talk. This would avoid an FP in
void *a = ...
free(a);
printf("%p", a);
dominates(e1, e2) or | ||
postDominates(e2, e1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we've discussed offline, this should be replaced with the IR versions. Sadly, the IR doesn't yet include delete
so this may cause us to lose results 😭. If this is unacceptable (which it very well might be) it should at least be changed to use the basic block version of these predicates (since the control-flow node version of these won't scale to large databases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already do use the IR flow so we are already missing these results, so switchig should not cause harm. It would supposedly also remove one FP in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. Good point. I can push a IR-ified version of this to the PR later today then 👍.
@@ -156,7 +156,7 @@ template<class T> T test() | ||
T* x; | ||
use(x); // GOOD | ||
delete x; | ||
use(x); // BAD | ||
use(x); // BAD [NOT DETECTED] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this makes sense. delete x
isn't in the IR, so there's no dataflow from that x
to use(x)
.
I've created a PR against this branch to implement functionality to detect indirect free's. Its only a draft, and I assume it could be implemented in a much more efficient way that would also track the path better. |
40635ad
to
2c51000
Compare
209814f
to
b957565
Compare
…ap with the new non-experimental one.
DCA is happy now 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a few comments, mostly small details. We'll want a docs review ideally before this is merged.
call.getArgumentOperand(i) = n.asOperand() and | ||
init.hasIndex(i) and | ||
init.getEnclosingFunction() = call.getStaticCallTarget() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this case is for arguments that are always dereferenced by the function.
Why does DataFlow
not find these results satisfactorily without a special case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normal interprocedural will find this result, but because we're adding the extra condition that either:
- The
free
dominates theuse
, or - The
use
post-dominates thefree
And domination/post-domination is only defined intraprocedurally. So this query is only finding cases of use-after-free when the use happens in the same function as the free (because neither of the above conditions will hold when the source and sink are in different functions). So we won't catch stuff like:
void use(char* p) { /* ... reading *p somewhere in here ... */ }
void f() {
free(p);
use(p);
}
since *p
free doesn't dominate *p
.
So to get a slightly better TP rate, we use local dataflow to identify the functions that will always dereference one of their parameters (in this case, the first parameter of use
is one such function). And we then include use(p)
as a sink in the configuration. And since use(p)
is dominates by free(p)
this free/use pair will satisfy one of the criteria (in fact, both of them in this case) above.
An alternative idea would be to define a concept of interproceudral domination/post-domination. That would probably amount to the same amount of code as this find-always-dereferenced-arguments strategy, though. It would give us slightly better paths, though 🤔 (since, with the current approach, we show the path up to the call to the always-dereferencing-function, and not actually all the way to the dereference operation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, makes sense I think.
FlowState() { isFree(_, this, _) } | ||
|
||
string toString() { result = super.toString() } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a FlowState
to track the source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we need to implement the extra condition we place on the source/sink pair:
- The source dominates the sink, or
- The sink post-dominates the source
So in the isSink
definition in the FlowFromFree
module, we use the flow state to check the domination/post-domination relationship between the source and the sink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd have done that check in the query where we have access to both the source and sink without need of a FlowState
... but then this approach puts the code in the shared library and keeps the query code cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The good thing about doing it inside the configuration is that we only generate the dataflow paths that actually satisfy the criteria. If we move the check to the query we'd generate all paths from a source to a sink, and only then filter out the ones that don't satisfy any of the criteria.
| | ||
isASink(sink, e) and | ||
isFree(source, state, dealloc) and | ||
e != state and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check e != state
? Perhaps an example would help me see this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a bit annoying. I was seeing some FPs where, for some reason, when there was a conversion on the free
argument, we'd wind up with FPs where we claimed that there was a double-free in a case such as:
void free(void*);
void f(char* p) {
free(p);
}
(I think, because dataflow goes (void*)p -> p -> (void*)p
, which is clearly wrong). I'd like to investigate that in a PR after this, though. For now, the check that the source expression isn't the same as the sink expression does the job, though.
I know that's not a super satisfying answer 😂.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, it's a practical workaround that we hopefully won't need forever.
Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
I should add, the results look great, especially all those SAMATE results on DCA (and the results in 'real' projects as well). |
Changes LGTM. |
Awesome. I've applied the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this PR now, subject to the docs review and new DCA run.
It will be great to have these two new / greatly improved queries!
FWIW, DCA still looks happy 🎉 |
I'll review this on behalf of Docs! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andersfugmann 👋🏻 - this LGTM from an editorial point of view.
I made some suggestions and comments. Feel free to ignore those you don't agree with 😅
Please note that I wasn't able to get an HTML or a Markdown preview of the qhelp file so I wasn't able to comment on the rendering, or on the code example you use.
|
||
<overview> | ||
<p> | ||
Deallocating memory more than once can lead to a double-free vulnerability. This can be exploited to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore if deallocate
is a word that's commonly used for this. (Just to let you know that I didn't find the term on Merriam Webster or the Microsoft style guide).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads up. It's a commonly used word in this context, so I'll keep the terminology as-is :)
<overview> | ||
<p> | ||
Deallocating memory more than once can lead to a double-free vulnerability. This can be exploited to | ||
corrupt the allocator's internal data structures, which can lead to denial-of-service attacks by crashing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be capitalized as we often refer to DoS?
corrupt the allocator's internal data structures, which can lead to denial-of-service attacks by crashing | |
corrupt the allocator's internal data structures, which can lead to Denial-of-Service attacks by crashing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. A quick search in our repo shows that we're now quite consistent with the capitalization: https://github.com/search?q=repo%3Agithub%2Fcodeql%20%22Denial-of-Service%22&type=code. I think "denial-of-service" wins by a small margin, so I'll leave it as-is for now.
--- | ||
category: newQuery | ||
--- | ||
* The query `cpp/use-after-free` has been modernized and assigned the precision "medium". The query finds cases of where a pointer is dereferenced after its memory has been deallocated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the new query be cpp/double-free
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: mc <42146119+mchammer01@users.noreply.github.com>
Co-authored-by: mc <42146119+mchammer01@users.noreply.github.com>
Hi @mchammer01, Thanks for the review! I've done most of the requested changes (on behalf of @andersfugmann). I'm not sure why the qhelp preview didn't show up, but I've manually verified that this looks okay locally, so I think we're okay with this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking at my comments @MathiasVP 😃
LGTM
This PR adds a simple query for detecting CWE 415 (double free) and updated existing use-after-free query to be based on IR use-use dataflow.
Based on my MRVA runs, I've tentatively put the queries at precision medium. Most FPs I've seen have been due to us not realizing that a path is infeasible because of a call to
exit(0)
(or related functions). Based on those findings, I expect that both queries can be put at precision high once #11356 has been merged.Todo:
Future ideas: