-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[HtmlSanitizer] Add ability to sanitize a whole document #58524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 7.4
Are you sure you want to change the base?
[HtmlSanitizer] Add ability to sanitize a whole document #58524
Conversation
@tgalopin what happen to the doctype when sanitizing a document ? Is it preserved ? |
// Sanitize the given string for a usage in a <body> tag | ||
$sanitizer->sanitizeFor('body', $userInput); | ||
|
||
// Sanitize the given string as a whole document (including <head> and <body>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Sanitize the given string as a whole document (including <head> and <body>) | |
// Sanitize the given string as a whole document (including <html>, <head> and <body>) |
public function __construct(public ?NodeInterface $node) | ||
{ | ||
public function __construct( | ||
public array $contextsPath, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest documenting the type with @param list<string> $contextsPath
For the moment, I didn't add support for the doctype, I'm not sure if it's needed or not. It should be quite easy to add it again from userland and it's not technically an HTML node. |
@tgalopin when sanitizing a whole document, it is part of the document. If the input has it, it would be great if the output could preserve it as well. |
151329f
to
04b6207
Compare
I investigated a bit and the HTML5 parser actually doesn't parse the doctype. Instead, it always outputs a static doctype as a prefix when saving the document: https://github.com/Masterminds/html5-php/blob/master/src/HTML5/Serializer/OutputRules.php#L192 I think this approach makes sense, as we are by nature always building valid HTML5 documents using the sanitizer. I updated the PR to always add this prefix when sanitizing a document. WDYT? |
This PR adds the ability to sanitize a whole document:
To achieve this while keeping the expected behavior (removing head elements from body tags and vice versa), it introduces a system of contexts path in the cursor to know in which sanitization context the DOM visitor is running in node sanitizers.