Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

C++: Implement use-after-free and double-free queries using the new IR use-use dataflow #12569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Apr 17, 2023

Conversation

andersfugmann
Copy link
Contributor

@andersfugmann andersfugmann commented Mar 17, 2023

This PR adds a simple query for detecting CWE 415 (double free) and updated existing use-after-free query to be based on IR use-use dataflow.

Based on my MRVA runs, I've tentatively put the queries at precision medium. Most FPs I've seen have been due to us not realizing that a path is infeasible because of a call to exit(0) (or related functions). Based on those findings, I expect that both queries can be put at precision high once #11356 has been merged.


Todo:

  • Refactor to use a parameterized module to increase code sharing
  • dominator and postdominator should be rewritten to use ir based cfg to handle the ternary operator.
  • Performance investigation
  • DCA run
  • Verify / Set precision of the queries
  • Use IR based dominance and post dominance predicates

Future ideas:

  • Mark function where parameters are post dominated by a freeExpr as DeallocationFunctions. This could increase number of results found.
  • Mark functions where function exit is dominated by a call to a function that never returns, and break flow if any step calls a function that never returns, to reduce FP's.
  • Ignore use on function which never dereferences the argument. This is useful is e.g. the pointer address is printed after the pointer has been freed.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2023

QHelp previews:

cpp/ql/src/Critical/DoubleFree.qhelp

Potential double free

Deallocating memory more than once can lead to a double-free vulnerability. This can be exploited to corrupt the allocator's internal data structures, which can lead to denial-of-service attacks by crashing the program, or security vulnerabilities, by allowing an attacker to overwrite arbitrary memory locations.

Recommendation

Ensure that all execution paths deallocate the allocated memory at most once. If possible, reassign the pointer to a null value after deallocating it. This will prevent double-free vulnerabilities since most deallocation functions will perform a null-pointer check before attempting to deallocate the memory.

Example

int* f() {
	int *buff = malloc(SIZE*sizeof(int));
	do_stuff(buff);
	free(buff);
	int *new_buffer = malloc(SIZE*sizeof(int));
	free(buff); // BAD: If new_buffer is assigned the same address as buff,
              // the memory allocator will free the new buffer memory region,
              // leading to use-after-free problems and memory corruption.
	return new_buffer;
}

References

@andersfugmann andersfugmann changed the title Andersfugmann/use after free Implement use-after-free and double-free queries using the new IR use-use dataflow Mar 17, 2023
Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments. This looks good so far!

cpp/ql/src/Critical/DoubleFree.ql Outdated Show resolved Hide resolved
cpp/ql/src/Critical/DoubleFree.ql Outdated Show resolved Hide resolved
cpp/ql/src/Critical/DoubleFree.ql Outdated Show resolved Hide resolved
cpp/ql/src/Critical/UseAfterFree.ql Outdated Show resolved Hide resolved
cpp/ql/src/Critical/DoubleFree.qhelp Outdated Show resolved Hide resolved
@andersfugmann
Copy link
Contributor Author

There are false positives in the following cases which I think needs to be addressed:

char *a = (char*) malloc(10);
void *b = realloc(a, strlen(a) + 1); // Use after free of a on strlen(a)
char *a = (char*) malloc(10);
void *b = realloc(a, 11); 
if (!b) free(a); // a is untouched if b == NULL.  This case happens very often.
void f(void ** a) {
   free(*a);
   *a = malloc(10); // use after free, but 'a' is not freed.    

@andersfugmann andersfugmann force-pushed the andersfugmann/use_after_free branch from 8338a4d to 60ee783 Compare March 20, 2023 07:59
@MathiasVP MathiasVP changed the title Implement use-after-free and double-free queries using the new IR use-use dataflow C++: Implement use-after-free and double-free queries using the new IR use-use dataflow Mar 20, 2023
@andersfugmann
Copy link
Contributor Author

In general, I don't think we should track flow through realloc for this query, as it requires understanding of the input and return values. If we could model realloc as:

void* realloc(void* a, n {)
  void *ret = malloc(n); 
  if (ret) {
    free(a);
  }
  return ret;
}

maybe we could make it work, but we should also consider that free also special cases NULL:

void free(void* a) [
  if (a) free(a); // call to actual free
}

which causes FP's in the code:

void* test() {
   void* a = NULL;
   free(a); 
   use(a); // GOOD [FALSE POSITIVE]
   return a; // GOOD [FALSE POSITIVE]
} 

Would range analysis help in this case to understand that a is NULL?

@MathiasVP
Copy link
Contributor

MathiasVP commented Mar 20, 2023

In general, I don't think we should track flow through realloc for this query, as it requires understanding of the input and return values. If we could model realloc as:
...
Would range analysis help in this case to understand that a is NULL?

I'm sure we could do something like what you suggest in the query, but for the purposes of getting this PR merged as quickly as possible, I think we should just exclude realloc initially.

There are many possible ways of figuring out that a is NULL. Range analysis is one way (and it would probably work for this case), the Guards library is another way (which likely wouldn't work here since there are no guards, but just a initialization with null that is probably not very likely in real-world code). Some combination of GVN or dataflow is a third option.

In general, since the goal of the trick with dominators and post-dominators was to avoid having to invent complex path-sensitive analysis, I don't think it's worth putting too many clever into this case initially.

So: Let's exclude realloc for now.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're on the right track with this. We should definitely pull the common code out into a suitably named .qll, and take a look at performance on a DCA run. We'll want a docs review in good time as well.

cpp/ql/src/Critical/DoubleFree.qhelp Outdated Show resolved Hide resolved
@andersfugmann andersfugmann force-pushed the andersfugmann/use_after_free branch from 10c6b1b to ba80fc7 Compare March 23, 2023 08:45
cpp/ql/src/Critical/DoubleFree.ql Fixed Show resolved Hide resolved
cpp/ql/src/Critical/UseAfterFree.ql Fixed Show fixed Hide fixed
cpp/ql/src/Critical/UseAfterFree.ql Fixed Show fixed Hide fixed
cpp/ql/src/Critical/UseAfterFree.ql Outdated Show resolved Hide resolved
@andersfugmann andersfugmann force-pushed the andersfugmann/use_after_free branch from 8c5e55e to b94e989 Compare March 23, 2023 11:35
@github-actions github-actions bot added the Ruby label Mar 23, 2023
@andersfugmann andersfugmann force-pushed the andersfugmann/use_after_free branch from b94e989 to 17df634 Compare March 23, 2023 11:45
@github-actions github-actions bot removed the Ruby label Mar 23, 2023
@andersfugmann
Copy link
Contributor Author

andersfugmann commented Mar 23, 2023

I've refactored the code to so that the main logic is shared though a parameterized module, which makes the
queries much simpler, but I have made no attempts at analyzing performance of the queries.

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments (for whoever ends up moving this PR across the finish line).

cpp/ql/src/Critical/UseAfterFree.ql Outdated Show resolved Hide resolved
e = dfe.asExpr() and
(
e = any(ReturnStmt rs).getExpr() or
e = any(FunctionCall fc).getAChild() or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we really want here?

  • Is it because we want to catch the following:

    void myFunc(void* p) { ... }
    free(p);
    myFunc(p);

    and we have to stop at the argument because we're doing local-only flow?

    If so, I think we should be a bit smarter here and identify "functions that always dereference a parameter". We can do that by checking if there's local flow from a parameter node to a use, and ensuring that the use always post-dominates the parameter (very much like what we're doing in the query already).

  • Or is it because we want to catch this (which I doubt we'd actually see in the wild, and I think this would even be included in the PointerDereferenceExpr case):

    free(myFunc);
    myFunc();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first case. The intention is flag passing a pointer pointing to de-allocated memory to a function, indeed

We could extend the analysis to be interprocedural, but I would argue that passing an invalid pointer to any function should be considered harmful, and I have not seen any FP's for that case.
The only point it would make sense is if the callee just ignores the argument.

It should be noted that we block flow though CallByReference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followup after a sync talk. This would avoid an FP in

void *a = ...
free(a);
printf("%p", a);

cpp/ql/src/Critical/FlowAfterFree.qll Outdated Show resolved Hide resolved
cpp/ql/src/Critical/FlowAfterFree.qll Outdated Show resolved Hide resolved
cpp/ql/src/Critical/FlowAfterFree.qll Outdated Show resolved Hide resolved
cpp/ql/src/Critical/FlowAfterFree.qll Outdated Show resolved Hide resolved
Comment on lines 78 to 79
dominates(e1, e2) or
postDominates(e2, e1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we've discussed offline, this should be replaced with the IR versions. Sadly, the IR doesn't yet include delete so this may cause us to lose results 😭. If this is unacceptable (which it very well might be) it should at least be changed to use the basic block version of these predicates (since the control-flow node version of these won't scale to large databases).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already do use the IR flow so we are already missing these results, so switchig should not cause harm. It would supposedly also remove one FP in the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. Good point. I can push a IR-ified version of this to the PR later today then 👍.

@@ -156,7 +156,7 @@ template<class T> T test()
T* x;
use(x); // GOOD
delete x;
use(x); // BAD
use(x); // BAD [NOT DETECTED]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this makes sense. delete x isn't in the IR, so there's no dataflow from that x to use(x).

@andersfugmann
Copy link
Contributor Author

andersfugmann commented Mar 30, 2023

I've created a PR against this branch to implement functionality to detect indirect free's. Its only a draft, and I assume it could be implemented in a much more efficient way that would also track the path better.

@MathiasVP MathiasVP force-pushed the andersfugmann/use_after_free branch from 40635ad to 2c51000 Compare April 5, 2023 10:59
cpp/ql/src/Critical/DoubleFree.ql Fixed Show fixed Hide fixed
cpp/ql/src/Critical/UseAfterFree.ql Fixed Show fixed Hide fixed
@MathiasVP MathiasVP force-pushed the andersfugmann/use_after_free branch from 209814f to b957565 Compare April 11, 2023 13:45
@MathiasVP
Copy link
Contributor

@geoffw0 DCA pointed me to a couple of bad join-orders that I've fixed in 49cceb2. I'm running another DCA job to confirm that I've fixed them all.

@MathiasVP
Copy link
Contributor

DCA is happy now 🎉

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a few comments, mostly small details. We'll want a docs review ideally before this is merged.

cpp/ql/src/Critical/DoubleFree.qhelp Outdated Show resolved Hide resolved
cpp/ql/src/Critical/DoubleFree.ql Show resolved Hide resolved
call.getArgumentOperand(i) = n.asOperand() and
init.hasIndex(i) and
init.getEnclosingFunction() = call.getStaticCallTarget()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this case is for arguments that are always dereferenced by the function.

Why does DataFlow not find these results satisfactorily without a special case?

Copy link
Contributor

@MathiasVP MathiasVP Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normal interprocedural will find this result, but because we're adding the extra condition that either:

  • The free dominates the use, or
  • The use post-dominates the free

And domination/post-domination is only defined intraprocedurally. So this query is only finding cases of use-after-free when the use happens in the same function as the free (because neither of the above conditions will hold when the source and sink are in different functions). So we won't catch stuff like:

void use(char* p) { /* ... reading *p somewhere in here ... */ }
void f() {
  free(p);
  use(p);
}

since *p free doesn't dominate *p.

So to get a slightly better TP rate, we use local dataflow to identify the functions that will always dereference one of their parameters (in this case, the first parameter of use is one such function). And we then include use(p) as a sink in the configuration. And since use(p) is dominates by free(p) this free/use pair will satisfy one of the criteria (in fact, both of them in this case) above.

An alternative idea would be to define a concept of interproceudral domination/post-domination. That would probably amount to the same amount of code as this find-always-dereferenced-arguments strategy, though. It would give us slightly better paths, though 🤔 (since, with the current approach, we show the path up to the call to the always-dereferencing-function, and not actually all the way to the dereference operation)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, makes sense I think.

FlowState() { isFree(_, this, _) }

string toString() { result = super.toString() }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a FlowState to track the source?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we need to implement the extra condition we place on the source/sink pair:

  • The source dominates the sink, or
  • The sink post-dominates the source

So in the isSink definition in the FlowFromFree module, we use the flow state to check the domination/post-domination relationship between the source and the sink.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd have done that check in the query where we have access to both the source and sink without need of a FlowState ... but then this approach puts the code in the shared library and keeps the query code cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The good thing about doing it inside the configuration is that we only generate the dataflow paths that actually satisfy the criteria. If we move the check to the query we'd generate all paths from a source to a sink, and only then filter out the ones that don't satisfy any of the criteria.

|
isASink(sink, e) and
isFree(source, state, dealloc) and
e != state and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check e != state? Perhaps an example would help me see this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a bit annoying. I was seeing some FPs where, for some reason, when there was a conversion on the free argument, we'd wind up with FPs where we claimed that there was a double-free in a case such as:

void free(void*);

void f(char* p) {
  free(p);
}

(I think, because dataflow goes (void*)p -> p -> (void*)p, which is clearly wrong). I'd like to investigate that in a PR after this, though. For now, the check that the source expression isn't the same as the sink expression does the job, though.

I know that's not a super satisfying answer 😂.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, it's a practical workaround that we hopefully won't need forever.

cpp/ql/test/query-tests/Critical/MemoryFreed/test_free.cpp Outdated Show resolved Hide resolved
cpp/ql/test/query-tests/Critical/MemoryFreed/test_free.cpp Outdated Show resolved Hide resolved
cpp/ql/test/query-tests/Critical/MemoryFreed/test_free.cpp Outdated Show resolved Hide resolved
cpp/ql/src/Critical/FlowAfterFree.qll Outdated Show resolved Hide resolved
Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
@geoffw0
Copy link
Contributor

geoffw0 commented Apr 12, 2023

I should add, the results look great, especially all those SAMATE results on DCA (and the results in 'real' projects as well).

@MathiasVP
Copy link
Contributor

Thanks for all the comments, @geoffw0. I think I've addressed them all. One pair of commits (23a7cd9 and 40dde93) introduced a couple of code changes, so I'll rerun DCA.

@geoffw0
Copy link
Contributor

geoffw0 commented Apr 13, 2023

Changes LGTM.

@MathiasVP MathiasVP added the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Apr 13, 2023
@MathiasVP
Copy link
Contributor

Awesome. I've applied the ready-for-docs-review now then.

geoffw0
geoffw0 previously approved these changes Apr 13, 2023
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this PR now, subject to the docs review and new DCA run.

It will be great to have these two new / greatly improved queries!

@MathiasVP
Copy link
Contributor

FWIW, DCA still looks happy 🎉

@mchammer01 mchammer01 self-requested a review April 14, 2023 15:47
@mchammer01
Copy link
Contributor

I'll review this on behalf of Docs!

Copy link
Contributor

@mchammer01 mchammer01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andersfugmann 👋🏻 - this LGTM from an editorial point of view.
I made some suggestions and comments. Feel free to ignore those you don't agree with 😅
Please note that I wasn't able to get an HTML or a Markdown preview of the qhelp file so I wasn't able to comment on the rendering, or on the code example you use.

cpp/ql/src/Critical/DoubleFree.ql Outdated Show resolved Hide resolved

<overview>
<p>
Deallocating memory more than once can lead to a double-free vulnerability. This can be exploited to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore if deallocate is a word that's commonly used for this. (Just to let you know that I didn't find the term on Merriam Webster or the Microsoft style guide).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up. It's a commonly used word in this context, so I'll keep the terminology as-is :)

cpp/ql/src/Critical/DoubleFree.qhelp Outdated Show resolved Hide resolved
<overview>
<p>
Deallocating memory more than once can lead to a double-free vulnerability. This can be exploited to
corrupt the allocator's internal data structures, which can lead to denial-of-service attacks by crashing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be capitalized as we often refer to DoS?

Suggested change
corrupt the allocator's internal data structures, which can lead to denial-of-service attacks by crashing
corrupt the allocator's internal data structures, which can lead to Denial-of-Service attacks by crashing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. A quick search in our repo shows that we're now quite consistent with the capitalization: https://github.com/search?q=repo%3Agithub%2Fcodeql%20%22Denial-of-Service%22&type=code. I think "denial-of-service" wins by a small margin, so I'll leave it as-is for now.

---
category: newQuery
---
* The query `cpp/use-after-free` has been modernized and assigned the precision "medium". The query finds cases of where a pointer is dereferenced after its memory has been deallocated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the new query be cpp/double-free?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR contains two new queries: A new query with the id cpp/double-free (whose change note is added here) and another query cpp/use-after-free (whose change note is added here). So I think this is correct.

Co-authored-by: mc <42146119+mchammer01@users.noreply.github.com>
Co-authored-by: mc <42146119+mchammer01@users.noreply.github.com>
@MathiasVP
Copy link
Contributor

Hi @mchammer01,

Thanks for the review! I've done most of the requested changes (on behalf of @andersfugmann). I'm not sure why the qhelp preview didn't show up, but I've manually verified that this looks okay locally, so I think we're okay with this.

Copy link
Contributor

@mchammer01 mchammer01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at my comments @MathiasVP 😃
LGTM :shipit:

@MathiasVP MathiasVP merged commit 7eee589 into github:main Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ documentation ready-for-doc-review This PR requires and is ready for review from the GitHub docs team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.