Compiler lex interpolations 2 #43132

petebacondarwin · Aug 12, 2021

~~Blocked behind #43129~~

This PR is a replacement for the reverted #42062

The significant commit that should fix the i18n message ID problem is 359e043. I have checked that the compliance test passes before this PR.

AndrewKushnir · Aug 18, 2021

Presubmit + Global Presubmit.

AndrewKushnir · Aug 18, 2021

@petebacondarwin FYI there are legit failures (related to i18n) in the presubmit. I will investigate further and try to come up with a repro within the next day or two. Will keep you updated.

petebacondarwin · Aug 18, 2021

😭 - I am sorry @AndrewKushnir - this is becoming an epic. I just realised that I don't think I fixed the i18n problem for attributes, only for text nodes. Let me add tests and fix that.

AndrewKushnir · Aug 19, 2021

Presubmit + Global Presubmit.

AndrewKushnir · Sep 7, 2021

New global presubmit.

AndrewKushnir · Sep 13, 2021

One more global presubmit (after g3 cleanup).

AndrewKushnir · Sep 16, 2021

Quick update: most recent TGP is "green" and extra checks in g3 confirmed that extracted message IDs are the same before/after the fix, so we should be ready for merge.

Adding "risk: medium" label to indicate that this PR may still have a potential for some breakages, so we should merge and sync the change individually (do not bundle it with other changes).

The lexer now splits interpolation tokens out from text tokens. Previously the contents of `<div>Hello, {{ name}}<div>` would be a single text token. Now it will be three tokens: ``` TEXT: "Hello, " INTERPOLATION: "{{", " name", "}}" TEXT: "" ``` - INTERPOLATION tokens have three parts, "start marker", "expression" and "end marker". - INTERPOLATION tokens are always preceded and followed by TEXT tokens, even if they represent an empty string. The HTML parser has been modified to recombine these tokens to allow this refactoring to have limited effect in this commit. Further refactorings to use these new tokens will follow in subsequent commits.

The lexer now splits encoded entity tokens out from text and attribute value tokens. Previously encoded entities would be decoded and the decoded value would be included as part of the text token of the surrounding text. Now the entities have their own tokens. There are two scenarios: text and attribute values. Previously the contents of `<div>Hello & goodbye</div>` would be a single TEXT token. Now it will be three tokens: ``` TEXT: "Hello " ENCODED_ENTITY: "&", "&" TEXT: " goodbye" ``` Previously the attribute value in `<div title="Hello & goodbye">` would be a single text token. Now it will be three tokens: ``` ATTR_VALUE_TEXT: "Hello " ENCODED_ENTITY: "&", "&" ATTR_VALUE_TEXT: " goodbye" ``` - ENCODED_ENTITY tokens have two parts: "decoded" and "encoded". - ENCODED_ENTITY tokens are always preceded and followed by either TEXT tokens or ATTR_VALUE_TEXT tokens, depending upon the context, even if they represent an empty string. The HTML parser has been modified to recombine these tokens to allow this refactoring to have limited effect in this commit. Further refactorings to use these new tokens will follow in subsequent commits.

When it was tokenized, text content is split into parts that can include interpolations and encoded entities tokens. To make this information available to downstream processing, this commit adds these tokens to the `Text` AST nodes, with suitable processing.

The tests were checking that the source-span of parsed HTML nodes were accurate, but they were not checking the span when it includes the "leading trivia", which are given by the `fullStart` rather than `start` location.

…sages Previously, the way templates were tokenized meant that we lost information about the location of interpolations if the template contained encoded HTML entities. This meant that the mapping back to the source interpolated strings could be offset incorrectly. Also, the source-span assigned to an i18n message did not include leading whitespace. This confused the output source-mappings so that the first text nodes of the message stopped at the first non-whitespace character. This commit makes use of the previous refactorings, where more fine grain information was provided in text tokens, to enable the parser to identify the location of the interpolations in the original source more accurately. Fixes angular#41034

…43132) The lexer now splits encoded entity tokens out from text and attribute value tokens. Previously encoded entities would be decoded and the decoded value would be included as part of the text token of the surrounding text. Now the entities have their own tokens. There are two scenarios: text and attribute values. Previously the contents of `<div>Hello & goodbye</div>` would be a single TEXT token. Now it will be three tokens: ``` TEXT: "Hello " ENCODED_ENTITY: "&", "&" TEXT: " goodbye" ``` Previously the attribute value in `<div title="Hello & goodbye">` would be a single text token. Now it will be three tokens: ``` ATTR_VALUE_TEXT: "Hello " ENCODED_ENTITY: "&", "&" ATTR_VALUE_TEXT: " goodbye" ``` - ENCODED_ENTITY tokens have two parts: "decoded" and "encoded". - ENCODED_ENTITY tokens are always preceded and followed by either TEXT tokens or ATTR_VALUE_TEXT tokens, depending upon the context, even if they represent an empty string. The HTML parser has been modified to recombine these tokens to allow this refactoring to have limited effect in this commit. Further refactorings to use these new tokens will follow in subsequent commits. PR Close #43132

When it was tokenized, text content is split into parts that can include interpolations and encoded entities tokens. To make this information available to downstream processing, this commit adds these tokens to the `Text` AST nodes, with suitable processing. PR Close #43132

The tests were checking that the source-span of parsed HTML nodes were accurate, but they were not checking the span when it includes the "leading trivia", which are given by the `fullStart` rather than `start` location. PR Close #43132

…sages (#43132) Previously, the way templates were tokenized meant that we lost information about the location of interpolations if the template contained encoded HTML entities. This meant that the mapping back to the source interpolated strings could be offset incorrectly. Also, the source-span assigned to an i18n message did not include leading whitespace. This confused the output source-mappings so that the first text nodes of the message stopped at the first non-whitespace character. This commit makes use of the previous refactorings, where more fine grain information was provided in text tokens, to enable the parser to identify the location of the interpolations in the original source more accurately. Fixes #41034 PR Close #43132

These token interfaces will make it easier to reason about tokens in the parser and in specs. Previously, it was never clear what items could appear in the `parts` array of a token given a particular `TokenType`. Now, each token interface declares a labelled tuple for the parts, which helps to document the token better. PR Close #43132

…butes (#43132) This tests a scenario that was failing in an internal project. PR Close #43132

…43132) The lexer now splits interpolation tokens out from text tokens. Previously the contents of `<div>Hello, {{ name}}<div>` would be a single text token. Now it will be three tokens: ``` TEXT: "Hello, " INTERPOLATION: "{{", " name", "}}" TEXT: "" ``` - INTERPOLATION tokens have three parts, "start marker", "expression" and "end marker". - INTERPOLATION tokens are always preceded and followed by TEXT tokens, even if they represent an empty string. The HTML parser has been modified to recombine these tokens to allow this refactoring to have limited effect in this commit. Further refactorings to use these new tokens will follow in subsequent commits. PR Close #43132

…43132) The lexer now splits encoded entity tokens out from text and attribute value tokens. Previously encoded entities would be decoded and the decoded value would be included as part of the text token of the surrounding text. Now the entities have their own tokens. There are two scenarios: text and attribute values. Previously the contents of `<div>Hello & goodbye</div>` would be a single TEXT token. Now it will be three tokens: ``` TEXT: "Hello " ENCODED_ENTITY: "&", "&" TEXT: " goodbye" ``` Previously the attribute value in `<div title="Hello & goodbye">` would be a single text token. Now it will be three tokens: ``` ATTR_VALUE_TEXT: "Hello " ENCODED_ENTITY: "&", "&" ATTR_VALUE_TEXT: " goodbye" ``` - ENCODED_ENTITY tokens have two parts: "decoded" and "encoded". - ENCODED_ENTITY tokens are always preceded and followed by either TEXT tokens or ATTR_VALUE_TEXT tokens, depending upon the context, even if they represent an empty string. The HTML parser has been modified to recombine these tokens to allow this refactoring to have limited effect in this commit. Further refactorings to use these new tokens will follow in subsequent commits. PR Close #43132

When it was tokenized, text content is split into parts that can include interpolations and encoded entities tokens. To make this information available to downstream processing, this commit adds these tokens to the `Text` AST nodes, with suitable processing. PR Close #43132

The tests were checking that the source-span of parsed HTML nodes were accurate, but they were not checking the span when it includes the "leading trivia", which are given by the `fullStart` rather than `start` location. PR Close #43132

…sages (#43132) Previously, the way templates were tokenized meant that we lost information about the location of interpolations if the template contained encoded HTML entities. This meant that the mapping back to the source interpolated strings could be offset incorrectly. Also, the source-span assigned to an i18n message did not include leading whitespace. This confused the output source-mappings so that the first text nodes of the message stopped at the first non-whitespace character. This commit makes use of the previous refactorings, where more fine grain information was provided in text tokens, to enable the parser to identify the location of the interpolations in the original source more accurately. Fixes #41034 PR Close #43132

These token interfaces will make it easier to reason about tokens in the parser and in specs. Previously, it was never clear what items could appear in the `parts` array of a token given a particular `TokenType`. Now, each token interface declares a labelled tuple for the parts, which helps to document the token better. PR Close #43132

…butes (#43132) This tests a scenario that was failing in an internal project. PR Close #43132

angular-automatic-lock-bot · Oct 21, 2021

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

_{This action has been performed automatically by a bot.}

petebacondarwin added state: blocked action: review The PR is still awaiting reviews from at least one requested reviewer target: patch This PR is targeted for the next patch release labels Aug 12, 2021

google-cla bot added the cla: yes label Aug 12, 2021

AndrewKushnir added the area: compiler Issues related to `ngc`, Angular's template compiler label Aug 12, 2021

ngbot bot added this to the Backlog milestone Aug 12, 2021

petebacondarwin force-pushed the compiler-lex-interpolations-2 branch from 81b5559 to 2594bfe Compare August 17, 2021 09:59

petebacondarwin removed the state: blocked label Aug 17, 2021

petebacondarwin marked this pull request as ready for review August 17, 2021 10:00

pullapprove bot requested a review from AndrewKushnir August 17, 2021 10:00

petebacondarwin added the action: presubmit The PR is in need of a google3 presubmit label Aug 17, 2021

petebacondarwin force-pushed the compiler-lex-interpolations-2 branch from 2594bfe to fe9d947 Compare August 18, 2021 20:37

petebacondarwin force-pushed the compiler-lex-interpolations-2 branch from fe9d947 to 7b98356 Compare August 20, 2021 19:17

AndrewKushnir added risk: medium and removed action: presubmit The PR is in need of a google3 presubmit labels Sep 16, 2021

petebacondarwin added action: merge The PR is ready for merge by the caretaker and removed action: review The PR is still awaiting reviews from at least one requested reviewer labels Sep 16, 2021

petebacondarwin force-pushed the compiler-lex-interpolations-2 branch from 7b98356 to d09a05a Compare September 16, 2021 08:31

petebacondarwin added 5 commits September 16, 2021 17:02

test(compiler): check fullStart source-span

a15933a

The tests were checking that the source-span of parsed HTML nodes were accurate, but they were not checking the span when it includes the "leading trivia", which are given by the `fullStart` rather than `start` location.

jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021

test(compiler): add a test for parsing multiline expressions in attri…

e55d052

…butes (#43132) This tests a scenario that was failing in an internal project. PR Close #43132

jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021

test(compiler): add a test for parsing multiline expressions in attri…

05e1adb

…butes (#43132) This tests a scenario that was failing in an internal project. PR Close #43132

petebacondarwin deleted the compiler-lex-interpolations-2 branch September 20, 2021 10:54

angular-automatic-lock-bot bot locked and limited conversation to collaborators Oct 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compiler lex interpolations 2 #43132

Compiler lex interpolations 2 #43132

Uh oh!

petebacondarwin commented Aug 12, 2021 •

edited

Loading

Uh oh!

AndrewKushnir commented Aug 18, 2021

Uh oh!

AndrewKushnir commented Aug 18, 2021

Uh oh!

petebacondarwin commented Aug 18, 2021

Uh oh!

AndrewKushnir commented Aug 19, 2021 •

edited

Loading

Uh oh!

AndrewKushnir commented Sep 7, 2021

Uh oh!

AndrewKushnir commented Sep 13, 2021

Uh oh!

AndrewKushnir commented Sep 16, 2021 •

edited

Loading

Uh oh!

angular-automatic-lock-bot bot commented Oct 21, 2021

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Compiler lex interpolations 2 #43132

Compiler lex interpolations 2 #43132

Uh oh!

Conversation

petebacondarwin commented Aug 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndrewKushnir commented Aug 18, 2021

Uh oh!

AndrewKushnir commented Aug 18, 2021

Uh oh!

petebacondarwin commented Aug 18, 2021

Uh oh!

AndrewKushnir commented Aug 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndrewKushnir commented Sep 7, 2021

Uh oh!

AndrewKushnir commented Sep 13, 2021

Uh oh!

AndrewKushnir commented Sep 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angular-automatic-lock-bot bot commented Oct 21, 2021

Uh oh!

Uh oh!

petebacondarwin commented Aug 12, 2021 •

edited

Loading

AndrewKushnir commented Aug 19, 2021 •

edited

Loading

AndrewKushnir commented Sep 16, 2021 •

edited

Loading