The Wayback Machine - https://web.archive.org/web/20250521104120/https://github.com/github/codeql/pull/4579
Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

JS: Detect more expensive regexps in js/polynomial-redos #4579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 5, 2020

Conversation

erik-krogh
Copy link
Contributor

@erik-krogh erik-krogh commented Oct 29, 2020

Recognizes the sink in CVE-2018-20801.

An evaluation shows good performance, but the two new results are both FPs.
However, they are only FPs because the input cannot contain newlines, so I'm not sure how we should handle that.
(We already had such FPs, see line 7, 20, 25, and 30 of the existing test case, the only difference is that we now have more).

@github-actions github-actions bot added the JS label Oct 29, 2020
@erik-krogh erik-krogh marked this pull request as ready for review October 30, 2020 10:44
@erik-krogh erik-krogh requested a review from a team as a code owner October 30, 2020 10:44
@asgerf
Copy link
Contributor

asgerf commented Nov 2, 2020

However, they are only FPs because the input cannot contain newlines, so I'm not sure how we should handle that.

Yeah that's a bit tricky, but I think it might be worth special-casing the handling of .*$ against URL sources.

The way I see it, if $ is preceded by a universal regexp term (a term that matches anything), then the $ is guaranteed to match the empty string, and the not matchesEpsilon(succ) condition should morally have failed but actually passed, leading to the FP. We could generalize the predicate to look for universal regexps, but that still wouldn't handle .* because it's not truly universal, so we need a way to defer the check once we know the source.

We can convert the succ variable in the PolynomialBackTrackingTerm charpred to a field, which is easy because it's a top-level exists in the charpred (pasted below for reference):

class PolynomialBackTrackingTerm extends InfiniteRepetitionQuantifier {
  string reason;
  PolynomialBackTrackingTerm() {
    // the regexp may fail to match ...
    exists(RegExpTerm succ | this.getSuccessor+() = succ | not matchesEpsilon(succ)) and

If I understand correctly, we use succ as sort of a witness that the regexp may yet fail to match after matching this. In our example $ is the witness.

If the taint source is found to be a RequestInputAccess with getKind() = "url", we then require that the witness is not a $ immediately preceded by .*. (Or, as there can be multiple witnesses in general, at least one witness must not be such a $).

@esbena
Copy link
Contributor

esbena commented Nov 3, 2020

If I understand correctly, we use succ as sort of a witness that the regexp may yet fail to match after matching this

Correct. The regexp engine will then try all of the N^2 sequences that lead to that unsatisfiable succ before finally giving up.

@erik-krogh
Copy link
Contributor Author

Yeah that's a bit tricky, but I think it might be worth special-casing the handling of .*$ against URL sources.

I'll try to do that.

If I understand correctly, we use succ as sort of a witness that the regexp may yet fail to match after matching this

Correct. The regexp engine will then try all of the N^2 sequences that lead to that unsatisfiable succ before finally giving up.

I've found the Regex Debugger on regex101.com quite helpful in understanding regexp performance.
(Link to regex with quadratic performance: https://regex101.com/r/JHUvrh/1/debugger)

@erik-krogh
Copy link
Contributor Author

Yeah that's a bit tricky, but I think it might be worth special-casing the handling of .*$ against URL sources.

I've done it now. But I'm not proud of the implementation.

I did the pruning of results in the where clause.
Because the only other way I see is to use flow-labels.
And I don't want to introduce flow-labels to this query just to support that case.

Copy link
Contributor

@asgerf asgerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Putting the check in the where clause is exactly what I had in mind.

Just a minor typo otherwise looks good to merge.

…linearBackTracking.qll

Co-authored-by: Asger F <asgerf@github.com>
@codeql-ci codeql-ci merged commit c85f817 into github:main Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.