The Wayback Machine - https://web.archive.org/web/20230223074730/https://github.com/nodejs/node/pull/45736
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-ASCII character support #45736

Merged
merged 1 commit into from Feb 18, 2023
Merged

Conversation

mertcanaltin
Copy link
Contributor

@mertcanaltin mertcanaltin commented Dec 4, 2022

Fixes #45706
Fixes #46508

  • Is the character's ASCII code greater than 127? (In this case, the function returns true)
  • Is the character a number? (In this case, the function returns false)
  • Is the character a "#" symbol? (In this case, the function returns false)
  • If no conditions are met, the function returns true.

@nodejs-github-bot nodejs-github-bot added dont-land-on-v14.x PRs that should not land on the v14.x-staging branch and should not be released in v14.x. needs-ci PRs that need a full CI run. test_runner labels Dec 4, 2022
@mertcanaltin
Copy link
Contributor Author

mertcanaltin commented Dec 4, 2022

test

 const  isLiteralSymbol = (char) => {
    const code = char.charCodeAt(0);

    if (code > 127) {
      return true;
    }

    if (char >= '0' && char <= 9) {
      return false;
    }

    if (code === 35) {
      return false;
    }

    return true;
  }

output

isLiteralSymbol('أهلا')
true

lib/internal/test_runner/tap_lexer.js Outdated Show resolved Hide resolved
lib/internal/test_runner/tap_lexer.js Outdated Show resolved Hide resolved
test/parallel/test-runner-tap-lexer.js Outdated Show resolved Hide resolved
@anonrig anonrig added the commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. label Dec 4, 2022
@mertcanaltin
Copy link
Contributor Author

@anonrig I applied the changes, thank you very much for the review 🚀

test/parallel/test-runner-tap-lexer.js Outdated Show resolved Hide resolved
lib/internal/test_runner/tap_lexer.js Outdated Show resolved Hide resolved
lib/internal/test_runner/tap_lexer.js Outdated Show resolved Hide resolved
lib/internal/test_runner/tap_lexer.js Outdated Show resolved Hide resolved
lib/internal/test_runner/tap_lexer.js Outdated Show resolved Hide resolved
@anonrig anonrig requested a review from MoLow December 4, 2022 23:03
@anonrig
Copy link
Member

anonrig commented Dec 4, 2022

@manekinekko Can you review this?

Copy link
Member

@MoLow MoLow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change LGTM, but this seems to break quite a few tests

lib/internal/test_runner/tap_lexer.js Outdated Show resolved Hide resolved
assert.strictEqual(isLiteralSymbol('#'), false);
assert.strictEqual(isLiteralSymbol('أ'), true);
assert.strictEqual(isLiteralSymbol('ت'), true);
assert.strictEqual(isLiteralSymbol('ث'), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you:

  1. group these into 2 groups: literals (true) and non-literals (false)?
  2. add more samples of non-Latin characters? you can cherry-pick from this list here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manekinekko thank you so much i will fly here 🚀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manekinekko would that be healthy

{
  const literals = ['A', 'a', '-', '+', 'أ', 'ت', 'ث', '讲', '演', '講'];
  const nonLiterals = ['0', '#', '\\', '+', '-'];
  
  literals.forEach((literal) => {
  assert.strictEqual(isLiteralSymbol(literal), true);
  });
  
  nonLiterals.forEach((nonLiteral) => {
  assert.strictEqual(isLiteralSymbol(nonLiteral), false);
  });
}

@mertcanaltin mertcanaltin requested review from manekinekko, MoLow and anonrig and removed request for manekinekko, MoLow and anonrig December 5, 2022 14:59
@mertcanaltin
Copy link
Contributor Author

I'm so sorry I triggered you all to review, sorry for the extra notifications :/

lib/internal/test_runner/tap_lexer.js Outdated Show resolved Hide resolved
lib/internal/test_runner/tap_lexer.js Outdated Show resolved Hide resolved
test/parallel/test-runner-tap-lexer.js Outdated Show resolved Hide resolved
@mertcanaltin
Copy link
Contributor Author

@manekinekko I sent the edits thank you very much

Copy link
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really accept zero width characters as acceptable input? I would skip all of them. We also already have a function to check for these:

const isZeroWidthCodePoint = (code) => {
return code <= 0x1F || // C0 control codes
(code >= 0x7F && code <= 0x9F) || // C1 control codes
(code >= 0x300 && code <= 0x36F) || // Combining Diacritical Marks
(code >= 0x200B && code <= 0x200F) || // Modifying Invisible Characters
// Combining Diacritical Marks for Symbols
(code >= 0x20D0 && code <= 0x20FF) ||
(code >= 0xFE00 && code <= 0xFE0F) || // Variation Selectors
(code >= 0xFE20 && code <= 0xFE2F) || // Combining Half Marks
(code >= 0xE0100 && code <= 0xE01EF); // Variation Selectors
};

@mertcanaltin
Copy link
Contributor Author

mertcanaltin commented Dec 7, 2022

I didn't accept zero-width characters @BridgeAR i will update the code like this

  if (typeof char !== 'string' || util.inspect.isZeroWidthCodePoint(char)) {
      return false;
    }

can I do it like this?

@mertcanaltin mertcanaltin reopened this Dec 7, 2022
@cjihrig
Copy link
Contributor

cjihrig commented Dec 12, 2022

@mertcanaltin I think that would be ok.

@mertcanaltin
Copy link
Contributor Author

hi i made some edits @anonrig , Thank you very much @MoLow

@MoLow
Copy link
Member

MoLow commented Feb 8, 2023

@mertcanaltin can you squash your branch removing the merge commit?
i.e

git fetch upstream main
git reset --soft upstream/main
git commit -m "<COMMIT MESSAGE>"
git push --force-with-lease

lib/internal/test_runner/tap_lexer.js Outdated Show resolved Hide resolved
test/parallel/test-runner-tap-lexer.js Outdated Show resolved Hide resolved
test/parallel/test-runner-tap-lexer.js Outdated Show resolved Hide resolved
@MoLow
Copy link
Member

MoLow commented Feb 8, 2023

@mertcanaltin the first commit message should start with test_runner: prefix and describe the change.
the first commit is the commit that will be merged into the main branch so its name must be meaningful

MoLow
MoLow approved these changes Feb 8, 2023
@MoLow
Copy link
Member

MoLow commented Feb 8, 2023

CC @nodejs/test_runner @manekinekko additional reviews will be appreciated

@MoLow
Copy link
Member

MoLow commented Feb 8, 2023

@anonrig can you please dismiss your review/approve?

@MoLow MoLow removed the commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. label Feb 8, 2023
anonrig
anonrig approved these changes Feb 8, 2023
@MoLow MoLow added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. and removed author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Feb 9, 2023
@MoLow MoLow added the request-ci Add this label to start a Jenkins CI on a PR. label Feb 18, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Feb 18, 2023
@nodejs-github-bot
Copy link
Contributor

@MoLow MoLow added the commit-queue Add this label to land a pull request using GitHub Actions. label Feb 18, 2023
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Feb 18, 2023
@nodejs-github-bot nodejs-github-bot merged commit 3c6547f into nodejs:main Feb 18, 2023
54 checks passed
@nodejs-github-bot
Copy link
Contributor

Landed in 3c6547f

@mertcanaltin mertcanaltin changed the title [WIP] Non-ASCII character support Non-ASCII character support Feb 18, 2023
@mertcanaltin
Copy link
Contributor Author

🎉

MoLow pushed a commit to MoLow/node that referenced this pull request Feb 18, 2023
PR-URL: nodejs#45736
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
MylesBorins pushed a commit that referenced this pull request Feb 18, 2023
PR-URL: #45736
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
@MylesBorins MylesBorins mentioned this pull request Feb 19, 2023
MylesBorins pushed a commit that referenced this pull request Feb 20, 2023
PR-URL: #45736
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Moshe Atlow <moshe@atlow.co.il>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dont-land-on-v14.x PRs that should not land on the v14.x-staging branch and should not be released in v14.x. needs-ci PRs that need a full CI run. test_runner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test runner regression in 19.2.0 Tap parser fails when test name includes non ASCII characters
9 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.