Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Content sniffing implementation details #5

Copy link
Copy link
@Rob--W

Description

@Rob--W
Issue body actions

Last month I spent two weeks on implementing content sniffing, which was behaviorally identical to Firefox's implementation. Unfortunately, I lost the laptop before I pushed the changes, so I will document what's necessary in case anyone (maybe me?) is interested in implementing a content sniffer.

The full implementation (code and comments) consisted of about 3 - 5k lines of JS code (unit tests were written but not included in this count).

The implementation details are as follows (this is a brain dump from my recollection):

Other notes relevant for the implementation:

  • Content sniffing relies on up to 512 bytes of data, but the media sniffer may try to use more if available.
  • At least for text and HTML, Firefox will only display the response after 512 bytes of data have been written (or 1024, I don't remember).
  • For images and media, Firefox will switch to a special image/media document upon detecting the type (typically via magic bytes; for media sniffer more than magic bytes).
  • There is a draft for a specification at https://mimesniff.spec.whatwg.org/. This specification is close to Firefox's content sniffing. It does have any mention of media sniffing for application/octet-stream, and neither mentions the special application/x-unknown-content-type (this MIME is an artefact of Firefox's implementation; internally it represents the default value for a MIME type in a HTTP channel).
  • Character encoding should be respected/supported. For text/plain the UTF-8 and UTF-16 BOM can be used. For text/html, the content can be transcoded via the TextDecoder/TextEncoder APIs (except for UTF-16, which should not be used for HTML anyway).

Bugs in the webRequest.filterResponseData API that I haven't reported upstream (yet?):

  • If the Content-Type is application/x-unknown-content-type and the response is content-encoded, then the filtered response must also be encoded using the same type (e.g. gzipped) (for other types, e.g. text/html, the encoding is transparent, i.e. the value of the Content-Encoding header does not matter). The easiest way around this is to remove the Accept-Encoding request header or the Content-Encoding response header (or set it to "identity"). The more difficult way to get around this is to implement gzipping (and possibly other (obscure) encoding schemes such as deflate/brotli).
  • If a StreamFilter is closed, Firefox will always commit a navigation to a new document, even if no data was written to that StreamFilter, and even if the tab/frame has navigated to a different page. The only work-around that I could think of is to keep the StreamFilter open forever (yuck).
miaekim and wingman-jr-addonTexKiller and wingman-jr-addon

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.