Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[HtmlSanitizer] Add ability to sanitize a whole document #58524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 7.4
Choose a base branch
Loading
from

Conversation

tgalopin
Copy link
Contributor

Q A
Branch? 7.2
Bug fix? yes
New feature? yes
Deprecations? no
Issues Fix #58426
License MIT

This PR adds the ability to sanitize a whole document:

use Symfony\Component\HtmlSanitizer\HtmlSanitizer;
use Symfony\Component\HtmlSanitizer\HtmlSanitizerConfig;

$config = (new HtmlSanitizerConfig)->allowSafeElements();

$html = '<html><head><title>Example</title></head><body><p>Example</p></body></html>';

echo (new HtmlSanitizer($config))->sanitizeFor('document', $html);
// Display <html><head><title>Example</title></head><body><p>Example</p></body></html>

To achieve this while keeping the expected behavior (removing head elements from body tags and vice versa), it introduces a system of contexts path in the cursor to know in which sanitization context the DOM visitor is running in node sanitizers.

@stof
Copy link
Member

stof commented Oct 10, 2024

@tgalopin what happen to the doctype when sanitizing a document ? Is it preserved ?

// Sanitize the given string for a usage in a <body> tag
$sanitizer->sanitizeFor('body', $userInput);

// Sanitize the given string as a whole document (including <head> and <body>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Sanitize the given string as a whole document (including <head> and <body>)
// Sanitize the given string as a whole document (including <html>, <head> and <body>)

public function __construct(public ?NodeInterface $node)
{
public function __construct(
public array $contextsPath,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest documenting the type with @param list<string> $contextsPath

@tgalopin
Copy link
Contributor Author

For the moment, I didn't add support for the doctype, I'm not sure if it's needed or not. It should be quite easy to add it again from userland and it's not technically an HTML node.

@stof
Copy link
Member

stof commented Oct 10, 2024

@tgalopin when sanitizing a whole document, it is part of the document. If the input has it, it would be great if the output could preserve it as well.

@tgalopin tgalopin force-pushed the feat/html-sanitizer-document-context branch from 151329f to 04b6207 Compare October 12, 2024 08:03
@tgalopin
Copy link
Contributor Author

I investigated a bit and the HTML5 parser actually doesn't parse the doctype. Instead, it always outputs a static doctype as a prefix when saving the document: https://github.com/Masterminds/html5-php/blob/master/src/HTML5/Serializer/OutputRules.php#L192

I think this approach makes sense, as we are by nature always building valid HTML5 documents using the sanitizer. I updated the PR to always add this prefix when sanitizing a document. WDYT?

@fabpot fabpot modified the milestones: 7.2, 7.3 Nov 20, 2024
@fabpot fabpot modified the milestones: 7.3, 7.4 May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

symfony/html-sanitizer stripping <head> element despite being a safe element by default
4 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.