Auto merge of #28006 - andreubotella:document-charset-bom, r=jdm

Fix `document.characterSet` not reflecting byte order marks.

The process of decoding the network byte stream to Unicode is backed by an instance of `encoding_rs::Decoder`, which will switch the encoding it uses if it finds a BOM in the byte stream. However, this change in encoding is not communicated back to the caller and so `document.characterSet` gives the wrong result. This change fixes that.

See whatwg/html#5359 and whatwg/encoding#203 for the spec-level backing for this change.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [X] `./mach build -d` does not report any errors
- [X] `./mach test-tidy` does not report any errors
- [X] These changes fix #28005 (GitHub issue number if applicable)

<!-- Either: -->
- [X] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->
This commit is contained in:
bors-servo 2021-01-01 13:51:19 -05:00 committed by GitHub
commit 4d1641bf9b
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
8 changed files with 64 additions and 52 deletions

View file

@ -1,4 +0,0 @@
[bom-handling.html]
[document.characterSet should match the BOM]
expected: FAIL

View file

@ -1,4 +0,0 @@
[bom-handling.html]
[document.characterSet should match the BOM]
expected: FAIL

View file

@ -1,10 +1,4 @@
[utf-32-from-win1252.html]
[Expect resources/utf-32-little-endian-bom.xml to parse as UTF-16LE]
expected: FAIL
[Expect resources/utf-32-little-endian-bom.html to parse as UTF-16LE]
expected: FAIL
[Expect resources/utf-32-big-endian-bom.html to parse as windows-1252]
expected: FAIL

View file

@ -1,26 +0,0 @@
[utf-32.html]
type: testharness
[Expect resources/utf-32-big-endian-bom.html to parse as windows-1252]
expected: FAIL
[Expect resources/utf-32-big-endian-bom.xml to parse as windows-1252]
expected: FAIL
[Expect resources/utf-32-big-endian-nobom.html to parse as windows-1252]
expected: FAIL
[Expect resources/utf-32-big-endian-nobom.xml to parse as windows-1252]
expected: FAIL
[Expect resources/utf-32-little-endian-bom.html to parse as UTF-16LE]
expected: FAIL
[Expect resources/utf-32-little-endian-bom.xml to parse as UTF-16LE]
expected: FAIL
[Expect resources/utf-32-little-endian-nobom.html to parse as windows-1252]
expected: FAIL
[Expect resources/utf-32-little-endian-nobom.xml to parse as windows-1252]
expected: FAIL