Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Compiler lex interpolations 2 #43132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

petebacondarwin
Copy link
Contributor

@petebacondarwin petebacondarwin commented Aug 12, 2021

Blocked behind #43129

This PR is a replacement for the reverted #42062

The significant commit that should fix the i18n message ID problem is 359e043. I have checked that the compliance test passes before this PR.

@petebacondarwin petebacondarwin added state: blocked action: review The PR is still awaiting reviews from at least one requested reviewer target: patch This PR is targeted for the next patch release labels Aug 12, 2021
@google-cla google-cla bot added the cla: yes label Aug 12, 2021
@AndrewKushnir AndrewKushnir added the area: compiler Issues related to `ngc`, Angular's template compiler label Aug 12, 2021
@ngbot ngbot bot added this to the Backlog milestone Aug 12, 2021
@petebacondarwin petebacondarwin force-pushed the compiler-lex-interpolations-2 branch from 81b5559 to 2594bfe Compare August 17, 2021 09:59
@petebacondarwin petebacondarwin marked this pull request as ready for review August 17, 2021 10:00
@pullapprove pullapprove bot requested a review from AndrewKushnir August 17, 2021 10:00
@petebacondarwin petebacondarwin added the action: presubmit The PR is in need of a google3 presubmit label Aug 17, 2021
@AndrewKushnir
Copy link
Contributor

Presubmit + Global Presubmit.

@AndrewKushnir
Copy link
Contributor

@petebacondarwin FYI there are legit failures (related to i18n) in the presubmit. I will investigate further and try to come up with a repro within the next day or two. Will keep you updated.

@petebacondarwin
Copy link
Contributor Author

😭 - I am sorry @AndrewKushnir - this is becoming an epic. I just realised that I don't think I fixed the i18n problem for attributes, only for text nodes. Let me add tests and fix that.

@petebacondarwin petebacondarwin force-pushed the compiler-lex-interpolations-2 branch from 2594bfe to fe9d947 Compare August 18, 2021 20:37
@AndrewKushnir
Copy link
Contributor

AndrewKushnir commented Aug 19, 2021

Presubmit + Global Presubmit.

@petebacondarwin petebacondarwin force-pushed the compiler-lex-interpolations-2 branch from fe9d947 to 7b98356 Compare August 20, 2021 19:17
@AndrewKushnir
Copy link
Contributor

New global presubmit.

@AndrewKushnir
Copy link
Contributor

One more global presubmit (after g3 cleanup).

@AndrewKushnir
Copy link
Contributor

AndrewKushnir commented Sep 16, 2021

Quick update: most recent TGP is "green" and extra checks in g3 confirmed that extracted message IDs are the same before/after the fix, so we should be ready for merge.

Adding "risk: medium" label to indicate that this PR may still have a potential for some breakages, so we should merge and sync the change individually (do not bundle it with other changes).

@AndrewKushnir AndrewKushnir added risk: medium and removed action: presubmit The PR is in need of a google3 presubmit labels Sep 16, 2021
@petebacondarwin petebacondarwin added action: merge The PR is ready for merge by the caretaker and removed action: review The PR is still awaiting reviews from at least one requested reviewer labels Sep 16, 2021
@petebacondarwin petebacondarwin force-pushed the compiler-lex-interpolations-2 branch from 7b98356 to d09a05a Compare September 16, 2021 08:31
The lexer now splits interpolation tokens out from text tokens.

Previously the contents of `<div>Hello, {{ name}}<div>` would be a single
text token. Now it will be three tokens:

```
TEXT: "Hello, "
INTERPOLATION: "{{", " name", "}}"
TEXT: ""
```

- INTERPOLATION tokens have three parts, "start marker", "expression"
  and "end marker".
- INTERPOLATION tokens are always preceded and followed by TEXT tokens,
  even if they represent an empty string.

The HTML parser has been modified to recombine these tokens to allow this
refactoring to have limited effect in this commit. Further refactorings
to use these new tokens will follow in subsequent commits.
The lexer now splits encoded entity tokens out from text and attribute value tokens.

Previously encoded entities would be decoded and the decoded value would be
included as part of the text token of the surrounding text. Now the entities
have their own tokens. There are two scenarios: text and attribute values.

Previously the contents of `<div>Hello &amp; goodbye</div>` would be a single
TEXT token. Now it will be three tokens:

```
TEXT: "Hello "
ENCODED_ENTITY: "&", "&amp;"
TEXT: " goodbye"
```

Previously the attribute value in `<div title="Hello &amp; goodbye">` would be
a single text token. Now it will be three tokens:

```
ATTR_VALUE_TEXT: "Hello "
ENCODED_ENTITY: "&", "&amp;"
ATTR_VALUE_TEXT: " goodbye"
```

- ENCODED_ENTITY tokens have two parts: "decoded" and "encoded".
- ENCODED_ENTITY tokens are always preceded and followed by either TEXT tokens
  or ATTR_VALUE_TEXT tokens, depending upon the context, even if they represent
  an empty string.

The HTML parser has been modified to recombine these tokens to allow this
refactoring to have limited effect in this commit. Further refactorings
to use these new tokens will follow in subsequent commits.
When it was tokenized, text content is split into parts that can include
interpolations and encoded entities tokens.

To make this information available to downstream processing, this commit
adds these tokens to the `Text` AST nodes, with suitable processing.
The tests were checking that the source-span of parsed HTML nodes were
accurate, but they were not checking the span when it includes the
"leading trivia", which are given by the `fullStart` rather than `start`
location.
…sages

Previously, the way templates were tokenized meant that we lost information
about the location of interpolations if the template contained encoded HTML
entities. This meant that the mapping back to the source interpolated strings
could be offset incorrectly.

Also, the source-span assigned to an i18n message did not include leading
whitespace. This confused the output source-mappings so that the first text
nodes of the message stopped at the first non-whitespace character.

This commit makes use of the previous refactorings, where more fine grain
information was provided in text tokens, to enable the parser to identify
the location of the interpolations in the original source more accurately.

Fixes angular#41034
jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021
…43132)

The lexer now splits encoded entity tokens out from text and attribute value tokens.

Previously encoded entities would be decoded and the decoded value would be
included as part of the text token of the surrounding text. Now the entities
have their own tokens. There are two scenarios: text and attribute values.

Previously the contents of `<div>Hello &amp; goodbye</div>` would be a single
TEXT token. Now it will be three tokens:

```
TEXT: "Hello "
ENCODED_ENTITY: "&", "&amp;"
TEXT: " goodbye"
```

Previously the attribute value in `<div title="Hello &amp; goodbye">` would be
a single text token. Now it will be three tokens:

```
ATTR_VALUE_TEXT: "Hello "
ENCODED_ENTITY: "&", "&amp;"
ATTR_VALUE_TEXT: " goodbye"
```

- ENCODED_ENTITY tokens have two parts: "decoded" and "encoded".
- ENCODED_ENTITY tokens are always preceded and followed by either TEXT tokens
  or ATTR_VALUE_TEXT tokens, depending upon the context, even if they represent
  an empty string.

The HTML parser has been modified to recombine these tokens to allow this
refactoring to have limited effect in this commit. Further refactorings
to use these new tokens will follow in subsequent commits.

PR Close #43132
jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021
When it was tokenized, text content is split into parts that can include
interpolations and encoded entities tokens.

To make this information available to downstream processing, this commit
adds these tokens to the `Text` AST nodes, with suitable processing.

PR Close #43132
jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021
The tests were checking that the source-span of parsed HTML nodes were
accurate, but they were not checking the span when it includes the
"leading trivia", which are given by the `fullStart` rather than `start`
location.

PR Close #43132
jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021
…sages (#43132)

Previously, the way templates were tokenized meant that we lost information
about the location of interpolations if the template contained encoded HTML
entities. This meant that the mapping back to the source interpolated strings
could be offset incorrectly.

Also, the source-span assigned to an i18n message did not include leading
whitespace. This confused the output source-mappings so that the first text
nodes of the message stopped at the first non-whitespace character.

This commit makes use of the previous refactorings, where more fine grain
information was provided in text tokens, to enable the parser to identify
the location of the interpolations in the original source more accurately.

Fixes #41034

PR Close #43132
jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021
These token interfaces will make it easier to reason about tokens in the
parser and in specs.

Previously, it was never clear what items could appear in the `parts`
array of a token given a particular `TokenType`. Now, each token interface
declares a labelled tuple for the parts, which helps to document the token
better.

PR Close #43132
jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021
…butes (#43132)

This tests a scenario that was failing in an internal project.

PR Close #43132
jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021
…43132)

The lexer now splits interpolation tokens out from text tokens.

Previously the contents of `<div>Hello, {{ name}}<div>` would be a single
text token. Now it will be three tokens:

```
TEXT: "Hello, "
INTERPOLATION: "{{", " name", "}}"
TEXT: ""
```

- INTERPOLATION tokens have three parts, "start marker", "expression"
  and "end marker".
- INTERPOLATION tokens are always preceded and followed by TEXT tokens,
  even if they represent an empty string.

The HTML parser has been modified to recombine these tokens to allow this
refactoring to have limited effect in this commit. Further refactorings
to use these new tokens will follow in subsequent commits.

PR Close #43132
jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021
…43132)

The lexer now splits encoded entity tokens out from text and attribute value tokens.

Previously encoded entities would be decoded and the decoded value would be
included as part of the text token of the surrounding text. Now the entities
have their own tokens. There are two scenarios: text and attribute values.

Previously the contents of `<div>Hello &amp; goodbye</div>` would be a single
TEXT token. Now it will be three tokens:

```
TEXT: "Hello "
ENCODED_ENTITY: "&", "&amp;"
TEXT: " goodbye"
```

Previously the attribute value in `<div title="Hello &amp; goodbye">` would be
a single text token. Now it will be three tokens:

```
ATTR_VALUE_TEXT: "Hello "
ENCODED_ENTITY: "&", "&amp;"
ATTR_VALUE_TEXT: " goodbye"
```

- ENCODED_ENTITY tokens have two parts: "decoded" and "encoded".
- ENCODED_ENTITY tokens are always preceded and followed by either TEXT tokens
  or ATTR_VALUE_TEXT tokens, depending upon the context, even if they represent
  an empty string.

The HTML parser has been modified to recombine these tokens to allow this
refactoring to have limited effect in this commit. Further refactorings
to use these new tokens will follow in subsequent commits.

PR Close #43132
jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021
When it was tokenized, text content is split into parts that can include
interpolations and encoded entities tokens.

To make this information available to downstream processing, this commit
adds these tokens to the `Text` AST nodes, with suitable processing.

PR Close #43132
jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021
The tests were checking that the source-span of parsed HTML nodes were
accurate, but they were not checking the span when it includes the
"leading trivia", which are given by the `fullStart` rather than `start`
location.

PR Close #43132
jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021
…sages (#43132)

Previously, the way templates were tokenized meant that we lost information
about the location of interpolations if the template contained encoded HTML
entities. This meant that the mapping back to the source interpolated strings
could be offset incorrectly.

Also, the source-span assigned to an i18n message did not include leading
whitespace. This confused the output source-mappings so that the first text
nodes of the message stopped at the first non-whitespace character.

This commit makes use of the previous refactorings, where more fine grain
information was provided in text tokens, to enable the parser to identify
the location of the interpolations in the original source more accurately.

Fixes #41034

PR Close #43132
jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021
These token interfaces will make it easier to reason about tokens in the
parser and in specs.

Previously, it was never clear what items could appear in the `parts`
array of a token given a particular `TokenType`. Now, each token interface
declares a labelled tuple for the parts, which helps to document the token
better.

PR Close #43132
jessicajaniuk pushed a commit that referenced this pull request Sep 16, 2021
…butes (#43132)

This tests a scenario that was failing in an internal project.

PR Close #43132
@petebacondarwin petebacondarwin deleted the compiler-lex-interpolations-2 branch September 20, 2021 10:54
@angular-automatic-lock-bot
Copy link

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

This action has been performed automatically by a bot.

@angular-automatic-lock-bot angular-automatic-lock-bot bot locked and limited conversation to collaborators Oct 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
action: merge The PR is ready for merge by the caretaker area: compiler Issues related to `ngc`, Angular's template compiler cla: yes risk: medium target: patch This PR is targeted for the next patch release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.