C++: Implement use-after-free and double-free queries using the new IR use-use dataflow #12569

andersfugmann · Mar 17, 2023

This PR adds a simple query for detecting CWE 415 (double free) and updated existing use-after-free query to be based on IR use-use dataflow.

Based on my MRVA runs, I've tentatively put the queries at precision medium. Most FPs I've seen have been due to us not realizing that a path is infeasible because of a call to exit(0) (or related functions). Based on those findings, I expect that both queries can be put at precision high once #11356 has been merged.

Todo:

Refactor to use a parameterized module to increase code sharing
dominator and postdominator should be rewritten to use ir based cfg to handle the ternary operator.
Performance investigation
DCA run
Verify / Set precision of the queries
Use IR based dominance and post dominance predicates

Future ideas:

Mark function where parameters are post dominated by a freeExpr as DeallocationFunctions. This could increase number of results found.
Mark functions where function exit is dominated by a call to a function that never returns, and break flow if any step calls a function that never returns, to reduce FP's.
Ignore use on function which never dereferences the argument. This is useful is e.g. the pointer address is printed after the pointer has been freed.

github-actions · Mar 17, 2023

QHelp previews:

cpp/ql/src/Critical/DoubleFree.qhelp

Potential double free

Deallocating memory more than once can lead to a double-free vulnerability. This can be exploited to corrupt the allocator's internal data structures, which can lead to denial-of-service attacks by crashing the program, or security vulnerabilities, by allowing an attacker to overwrite arbitrary memory locations.

Recommendation

Ensure that all execution paths deallocate the allocated memory at most once. If possible, reassign the pointer to a null value after deallocating it. This will prevent double-free vulnerabilities since most deallocation functions will perform a null-pointer check before attempting to deallocate the memory.

Example

int* f() {
	int *buff = malloc(SIZE*sizeof(int));
	do_stuff(buff);
	free(buff);
	int *new_buffer = malloc(SIZE*sizeof(int));
	free(buff); // BAD: If new_buffer is assigned the same address as buff,
              // the memory allocator will free the new buffer memory region,
              // leading to use-after-free problems and memory corruption.
	return new_buffer;
}

References

OWASP: Doubly freeing memory.
Common Weakness Enumeration: CWE-415.

MathiasVP

Some initial comments. This looks good so far!

cpp/ql/src/Critical/DoubleFree.ql

cpp/ql/src/Critical/UseAfterFree.ql

cpp/ql/src/Critical/DoubleFree.qhelp

andersfugmann · Mar 17, 2023

There are false positives in the following cases which I think needs to be addressed:

char *a = (char*) malloc(10);
void *b = realloc(a, strlen(a) + 1); // Use after free of a on strlen(a)

char *a = (char*) malloc(10);
void *b = realloc(a, 11); 
if (!b) free(a); // a is untouched if b == NULL.  This case happens very often.

void f(void ** a) {
   free(*a);
   *a = malloc(10); // use after free, but 'a' is not freed.

andersfugmann · Mar 20, 2023

In general, I don't think we should track flow through realloc for this query, as it requires understanding of the input and return values. If we could model realloc as:

void* realloc(void* a, n {)
  void *ret = malloc(n); 
  if (ret) {
    free(a);
  }
  return ret;
}

maybe we could make it work, but we should also consider that free also special cases NULL:

void free(void* a) [
  if (a) free(a); // call to actual free
}

which causes FP's in the code:

void* test() {
   void* a = NULL;
   free(a); 
   use(a); // GOOD [FALSE POSITIVE]
   return a; // GOOD [FALSE POSITIVE]
}

Would range analysis help in this case to understand that a is NULL?

MathiasVP · Mar 20, 2023

In general, I don't think we should track flow through realloc for this query, as it requires understanding of the input and return values. If we could model realloc as:
...
Would range analysis help in this case to understand that a is NULL?

I'm sure we could do something like what you suggest in the query, but for the purposes of getting this PR merged as quickly as possible, I think we should just exclude realloc initially.

There are many possible ways of figuring out that a is NULL. Range analysis is one way (and it would probably work for this case), the Guards library is another way (which likely wouldn't work here since there are no guards, but just a initialization with null that is probably not very likely in real-world code). Some combination of GVN or dataflow is a third option.

In general, since the goal of the trick with dominators and post-dominators was to avoid having to invent complex path-sensitive analysis, I don't think it's worth putting too many clever into this case initially.

So: Let's exclude realloc for now.

geoffw0

Looks like we're on the right track with this. We should definitely pull the common code out into a suitably named .qll, and take a look at performance on a DCA run. We'll want a docs review in good time as well.

cpp/ql/src/Critical/DoubleFree.qhelp

cpp/ql/test/query-tests/Security/CWE/CWE-416/semmle/tests/UseAfterFree.expected

cpp/ql/src/Critical/DoubleFree.ql

cpp/ql/src/Critical/UseAfterFree.ql

andersfugmann · Mar 23, 2023

I've refactored the code to so that the main logic is shared though a parameterized module, which makes the
queries much simpler, but I have made no attempts at analyzing performance of the queries.

MathiasVP

Some more comments (for whoever ends up moving this PR across the finish line).

cpp/ql/src/Critical/UseAfterFree.ql

MathiasVP · Mar 23, 2023

cpp/ql/src/Critical/UseAfterFree.ql

+  e = dfe.asExpr() and
+  (
+    e = any(ReturnStmt rs).getExpr() or
+    e = any(FunctionCall fc).getAChild() or


What do we really want here?

Is it because we want to catch the following:

void myFunc(void* p) { ... } free(p); myFunc(p);

and we have to stop at the argument because we're doing local-only flow?

If so, I think we should be a bit smarter here and identify "functions that always dereference a parameter". We can do that by checking if there's local flow from a parameter node to a use, and ensuring that the use always post-dominates the parameter (very much like what we're doing in the query already).

Or is it because we want to catch this (which I doubt we'd actually see in the wild, and I think this would even be included in the PointerDereferenceExpr case):

free(myFunc); myFunc();

The first case. The intention is flag passing a pointer pointing to de-allocated memory to a function, indeed

We could extend the analysis to be interprocedural, but I would argue that passing an invalid pointer to any function should be considered harmful, and I have not seen any FP's for that case.
The only point it would make sense is if the callee just ignores the argument.

It should be noted that we block flow though CallByReference.

Followup after a sync talk. This would avoid an FP in

void *a = ... free(a); printf("%p", a);

cpp/ql/src/Critical/FlowAfterFree.qll

MathiasVP · Mar 23, 2023

cpp/ql/src/Critical/FlowAfterFree.qll

+      dominates(e1, e2) or
+      postDominates(e2, e1)


As we've discussed offline, this should be replaced with the IR versions. Sadly, the IR doesn't yet include delete so this may cause us to lose results 😭. If this is unacceptable (which it very well might be) it should at least be changed to use the basic block version of these predicates (since the control-flow node version of these won't scale to large databases).

We already do use the IR flow so we are already missing these results, so switchig should not cause harm. It would supposedly also remove one FP in the tests.

Oh, right. Good point. I can push a IR-ified version of this to the PR later today then 👍.

MathiasVP · Mar 23, 2023

cpp/ql/test/query-tests/Security/CWE/CWE-416/semmle/tests/test.cpp

@@ -156,7 +156,7 @@ template<class T> T test()
 	T* x;
 	use(x); // GOOD
 	delete x;
-	use(x); // BAD
+	use(x); // BAD [NOT DETECTED]


Yeah, this makes sense. delete x isn't in the IR, so there's no dataflow from that x to use(x).

andersfugmann · Mar 30, 2023

I've created a PR against this branch to implement functionality to detect indirect free's. Its only a draft, and I assume it could be implemented in a much more efficient way that would also track the path better.

cpp/ql/src/Critical/DoubleFree.ql

cpp/ql/src/Critical/UseAfterFree.ql

…ap with the new non-experimental one.

MathiasVP · Apr 12, 2023

@geoffw0 DCA pointed me to a couple of bad join-orders that I've fixed in 49cceb2. I'm running another DCA job to confirm that I've fixed them all.

MathiasVP · Apr 12, 2023

DCA is happy now 🎉

geoffw0

Quite a few comments, mostly small details. We'll want a docs review ideally before this is merged.

cpp/ql/src/Critical/DoubleFree.qhelp

cpp/ql/src/Critical/DoubleFree.ql

geoffw0 · Apr 12, 2023

cpp/ql/src/Critical/UseAfterFree.ql

+    call.getArgumentOperand(i) = n.asOperand() and
+    init.hasIndex(i) and
+    init.getEnclosingFunction() = call.getStaticCallTarget()
+  )


So this case is for arguments that are always dereferenced by the function.

Why does DataFlow not find these results satisfactorily without a special case?

Normal interprocedural will find this result, but because we're adding the extra condition that either:

The free dominates the use, or

The use post-dominates the free

And domination/post-domination is only defined intraprocedurally. So this query is only finding cases of use-after-free when the use happens in the same function as the free (because neither of the above conditions will hold when the source and sink are in different functions). So we won't catch stuff like:

void use(char* p) { /* ... reading *p somewhere in here ... */ } void f() { free(p); use(p); }

since *p free doesn't dominate *p.

So to get a slightly better TP rate, we use local dataflow to identify the functions that will always dereference one of their parameters (in this case, the first parameter of use is one such function). And we then include use(p) as a sink in the configuration. And since use(p) is dominates by free(p) this free/use pair will satisfy one of the criteria (in fact, both of them in this case) above.

An alternative idea would be to define a concept of interproceudral domination/post-domination. That would probably amount to the same amount of code as this find-always-dereferenced-arguments strategy, though. It would give us slightly better paths, though 🤔 (since, with the current approach, we show the path up to the call to the always-dereferencing-function, and not actually all the way to the dereference operation)

Thanks for the explanation, makes sense I think.

geoffw0 · Apr 12, 2023

cpp/ql/src/Critical/FlowAfterFree.qll

+      FlowState() { isFree(_, this, _) }
+
+      string toString() { result = super.toString() }
+    }


Why do we need a FlowState to track the source?

Because we need to implement the extra condition we place on the source/sink pair:

The source dominates the sink, or

The sink post-dominates the source

So in the isSink definition in the FlowFromFree module, we use the flow state to check the domination/post-domination relationship between the source and the sink.

I think I'd have done that check in the query where we have access to both the source and sink without need of a FlowState ... but then this approach puts the code in the shared library and keeps the query code cleaner.

The good thing about doing it inside the configuration is that we only generate the dataflow paths that actually satisfy the criteria. If we move the check to the query we'd generate all paths from a source to a sink, and only then filter out the ones that don't satisfy any of the criteria.

geoffw0 · Apr 12, 2023

cpp/ql/src/Critical/FlowAfterFree.qll

+      |
+        isASink(sink, e) and
+        isFree(source, state, dealloc) and
+        e != state and


Why do we check e != state? Perhaps an example would help me see this.

Yeah, that's a bit annoying. I was seeing some FPs where, for some reason, when there was a conversion on the free argument, we'd wind up with FPs where we claimed that there was a double-free in a case such as:

void free(void*); void f(char* p) { free(p); }

(I think, because dataflow goes (void*)p -> p -> (void*)p, which is clearly wrong). I'd like to investigate that in a PR after this, though. For now, the check that the source expression isn't the same as the sink expression does the job, though.

I know that's not a super satisfying answer 😂.

That's fine, it's a practical workaround that we hopefully won't need forever.

cpp/ql/test/query-tests/Critical/MemoryFreed/test_free.cpp

cpp/ql/src/Critical/FlowAfterFree.qll

Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>

geoffw0 · Apr 12, 2023

I should add, the results look great, especially all those SAMATE results on DCA (and the results in 'real' projects as well).

MathiasVP · Apr 13, 2023

Thanks for all the comments, @geoffw0. I think I've addressed them all. One pair of commits (23a7cd9 and 40dde93) introduced a couple of code changes, so I'll rerun DCA.

geoffw0 · Apr 13, 2023

Changes LGTM.

MathiasVP · Apr 13, 2023

Awesome. I've applied the ready-for-docs-review now then.

geoffw0

I'm happy with this PR now, subject to the docs review and new DCA run.

It will be great to have these two new / greatly improved queries!

MathiasVP · Apr 13, 2023

FWIW, DCA still looks happy 🎉

mchammer01 · Apr 14, 2023

I'll review this on behalf of Docs!

mchammer01

@andersfugmann 👋🏻 - this LGTM from an editorial point of view.
I made some suggestions and comments. Feel free to ignore those you don't agree with 😅
Please note that I wasn't able to get an HTML or a Markdown preview of the qhelp file so I wasn't able to comment on the rendering, or on the code example you use.

cpp/ql/src/Critical/DoubleFree.ql

mchammer01 · Apr 14, 2023

cpp/ql/src/Critical/DoubleFree.qhelp

+
+<overview>
+<p>
+Deallocating memory more than once can lead to a double-free vulnerability. This can be exploited to


Feel free to ignore if deallocate is a word that's commonly used for this. (Just to let you know that I didn't find the term on Merriam Webster or the Microsoft style guide).

Thanks for the heads up. It's a commonly used word in this context, so I'll keep the terminology as-is :)

cpp/ql/src/Critical/DoubleFree.qhelp

mchammer01 · Apr 14, 2023

cpp/ql/src/Critical/DoubleFree.qhelp

+<overview>
+<p>
+Deallocating memory more than once can lead to a double-free vulnerability. This can be exploited to
+corrupt the allocator's internal data structures, which can lead to denial-of-service attacks by crashing


Should this be capitalized as we often refer to DoS?

Suggested change

corrupt the allocator's internal data structures, which can lead to denial-of-service attacks by crashing

corrupt the allocator's internal data structures, which can lead to Denial-of-Service attacks by crashing

Good question. A quick search in our repo shows that we're now quite consistent with the capitalization: https://github.com/search?q=repo%3Agithub%2Fcodeql%20%22Denial-of-Service%22&type=code. I think "denial-of-service" wins by a small margin, so I'll leave it as-is for now.

mchammer01 · Apr 14, 2023

cpp/ql/src/change-notes/2023-04-11-use-after-free.md

+---
+category: newQuery
+---
+* The query `cpp/use-after-free` has been modernized and assigned the precision "medium". The query finds cases of where a pointer is dereferenced after its memory has been deallocated.


Should the new query be cpp/double-free?

This PR contains two new queries: A new query with the id cpp/double-free (whose change note is added here) and another query cpp/use-after-free (whose change note is added here). So I think this is correct.

Co-authored-by: mc <42146119+mchammer01@users.noreply.github.com>

MathiasVP · Apr 17, 2023

Hi @mchammer01,

Thanks for the review! I've done most of the requested changes (on behalf of @andersfugmann). I'm not sure why the qhelp preview didn't show up, but I've manually verified that this looks okay locally, so I think we're okay with this.

mchammer01

Thanks for looking at my comments @MathiasVP 😃
LGTM

github-actions bot added C++ documentation labels Mar 17, 2023

andersfugmann changed the title ~~Andersfugmann/use after free~~ Implement use-after-free and double-free queries using the new IR use-use dataflow Mar 17, 2023

MathiasVP reviewed Mar 17, 2023

View reviewed changes

andersfugmann force-pushed the andersfugmann/use_after_free branch from 8338a4d to 60ee783 Compare March 20, 2023 07:59

MathiasVP changed the title ~~Implement use-after-free and double-free queries using the new IR use-use dataflow~~ C++: Implement use-after-free and double-free queries using the new IR use-use dataflow Mar 20, 2023

geoffw0 reviewed Mar 20, 2023

View reviewed changes

cpp/ql/src/Critical/DoubleFree.qhelp Outdated Show resolved Hide resolved

cpp/ql/test/query-tests/Security/CWE/CWE-416/semmle/tests/UseAfterFree.expected Show resolved Hide resolved

andersfugmann force-pushed the andersfugmann/use_after_free branch from 10c6b1b to ba80fc7 Compare March 23, 2023 08:45

github-advanced-security bot found potential problems Mar 23, 2023

View reviewed changes

cpp/ql/src/Critical/DoubleFree.ql Fixed Show resolved Hide resolved

cpp/ql/src/Critical/UseAfterFree.ql Fixed Show fixed Hide fixed

cpp/ql/src/Critical/UseAfterFree.ql Fixed Show fixed Hide fixed

MathiasVP reviewed Mar 23, 2023

View reviewed changes

cpp/ql/src/Critical/UseAfterFree.ql Outdated Show resolved Hide resolved

andersfugmann force-pushed the andersfugmann/use_after_free branch from 8c5e55e to b94e989 Compare March 23, 2023 11:35

github-actions bot added the Ruby label Mar 23, 2023

andersfugmann force-pushed the andersfugmann/use_after_free branch from b94e989 to 17df634 Compare March 23, 2023 11:45

github-actions bot removed the Ruby label Mar 23, 2023

MathiasVP reviewed Mar 23, 2023

View reviewed changes

MathiasVP force-pushed the andersfugmann/use_after_free branch from 40635ad to 2c51000 Compare April 5, 2023 10:59

github-advanced-security bot found potential problems Apr 6, 2023

View reviewed changes

cpp/ql/src/Critical/DoubleFree.ql Fixed Show fixed Hide fixed

cpp/ql/src/Critical/UseAfterFree.ql Fixed Show fixed Hide fixed

MathiasVP added 3 commits April 11, 2023 13:52

C++: Make basic block information available from dataflow nodes.

d65bb3b

C++: Add a flow-after-free library.

dfe00ff

C++: Add double-free query.

cc12e74

MathiasVP force-pushed the andersfugmann/use_after_free branch from 209814f to b957565 Compare April 11, 2023 13:45

MathiasVP added 3 commits April 11, 2023 15:21

C++: Add double-free query documentation.

fb2ec15

C++: Add double-free tests.

a8151b4

C++: Change the id of the experimental double-free query to not overl…

17fe5f2

…ap with the new non-experimental one.

C++: More QLDoc.

ab70f57

geoffw0 reviewed Apr 12, 2023

View reviewed changes

Update cpp/ql/src/Critical/FlowAfterFree.qll

ba4e3ae

Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>

MathiasVP added 9 commits April 13, 2023 10:10

C++: Fix qhelp for double-free.

e0aeea0

C++: Add QLDoc to 'isExcludeFreePair'.

d304022

C++: Ensure that the 'use-after-free' query is run on 'test_free.cpp'.

c76dbeb

C++: Fix test annotations.

416f8d5

C++: Fix missing result and accept test changes.

23a7cd9

C++: Fix FP and accept test changes.

40dde93

C++: Fix annotations.

31b71ea

C++: Fix annotations.

1ac5db3

C++: Fix annotations.

b2d4a82

MathiasVP added the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Apr 13, 2023

geoffw0 previously approved these changes Apr 13, 2023

View reviewed changes

mchammer01 self-requested a review April 14, 2023 15:47

mchammer01 requested changes Apr 14, 2023

View reviewed changes

Update cpp/ql/src/Critical/DoubleFree.ql

dba46bd

Co-authored-by: mc <42146119+mchammer01@users.noreply.github.com>

MathiasVP dismissed geoffw0’s stale review via dba46bd April 17, 2023 06:38

Update cpp/ql/src/Critical/DoubleFree.qhelp

fa5ed04

Co-authored-by: mc <42146119+mchammer01@users.noreply.github.com>

mchammer01 approved these changes Apr 17, 2023

View reviewed changes

MathiasVP merged commit 7eee589 into github:main Apr 17, 2023

	corrupt the allocator's internal data structures, which can lead to denial-of-service attacks by crashing
	corrupt the allocator's internal data structures, which can lead to Denial-of-Service attacks by crashing

Search code, repositories, users, issues, pull requests...

C++: Implement use-after-free and double-free queries using the new IR use-use dataflow #12569

C++: Implement use-after-free and double-free queries using the new IR use-use dataflow #12569

Uh oh!

Conversation

andersfugmann commented Mar 17, 2023 • edited by MathiasVP Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Potential double free

Recommendation

Example

References

Uh oh!

MathiasVP left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andersfugmann commented Mar 17, 2023

Uh oh!

andersfugmann commented Mar 20, 2023

Uh oh!

MathiasVP commented Mar 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geoffw0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andersfugmann commented Mar 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MathiasVP left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andersfugmann commented Mar 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MathiasVP commented Apr 12, 2023

Uh oh!

MathiasVP commented Apr 12, 2023

Uh oh!

geoffw0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andersfugmann commented Mar 17, 2023 •

edited by MathiasVP

Loading

github-actions bot commented Mar 17, 2023 •

edited

Loading

MathiasVP commented Mar 20, 2023 •

edited

Loading

andersfugmann commented Mar 23, 2023 •

edited

Loading

andersfugmann commented Mar 30, 2023 •

edited

Loading

MathiasVP Apr 13, 2023 •

edited

Loading