Fix document.characterSet not reflecting byte order marks.

The process of decoding the network byte stream to Unicode is backed by an instance of `encoding_rs::Decoder`, which will switch the encoding it uses if it finds a BOM in the byte stream. However, this change in encoding is not communicated back to the caller and so `document.characterSet` gives the wrong result. This change fixes that. See whatwg/html#5359 and whatwg/encoding#203 for the spec-level backing for this change. Signed-off-by: Andreu Botella <abb@randomunok.com>
2025-10-02 09:39:14 +01:00 · 2020-12-30 09:46:29 +01:00 · 2020-12-30 09:46:29 +01:00 · cd34f156f6
commit cd34f156f6
parent be19c03d96
8 changed files with 64 additions and 52 deletions
--- a/tests/wpt/metadata-layout-2020/encoding/bom-handling.html.ini
+++ b/tests/wpt/metadata-layout-2020/encoding/bom-handling.html.ini
@ -1,4 +0,0 @@
-[bom-handling.html]
-  [document.characterSet should match the BOM]
-    expected: FAIL
-
--- a/tests/wpt/metadata/encoding/bom-handling.html.ini
+++ b/tests/wpt/metadata/encoding/bom-handling.html.ini
@ -1,4 +0,0 @@
-[bom-handling.html]
-  [document.characterSet should match the BOM]
-    expected: FAIL
-
--- a/tests/wpt/metadata/encoding/utf-32-from-win1252.html.ini
+++ b/tests/wpt/metadata/encoding/utf-32-from-win1252.html.ini
@ -1,10 +1,4 @@
 [utf-32-from-win1252.html]
-  [Expect resources/utf-32-little-endian-bom.xml to parse as UTF-16LE]
-    expected: FAIL
-
-  [Expect resources/utf-32-little-endian-bom.html to parse as UTF-16LE]
-    expected: FAIL
-
  [Expect resources/utf-32-big-endian-bom.html to parse as windows-1252]
    expected: FAIL

--- a/tests/wpt/metadata/encoding/utf-32.html.ini
+++ b/tests/wpt/metadata/encoding/utf-32.html.ini
@ -1,26 +0,0 @@
-[utf-32.html]
-  type: testharness
-  [Expect resources/utf-32-big-endian-bom.html to parse as windows-1252]
-    expected: FAIL
-
-  [Expect resources/utf-32-big-endian-bom.xml to parse as windows-1252]
-    expected: FAIL
-
-  [Expect resources/utf-32-big-endian-nobom.html to parse as windows-1252]
-    expected: FAIL
-
-  [Expect resources/utf-32-big-endian-nobom.xml to parse as windows-1252]
-    expected: FAIL
-
-  [Expect resources/utf-32-little-endian-bom.html to parse as UTF-16LE]
-    expected: FAIL
-
-  [Expect resources/utf-32-little-endian-bom.xml to parse as UTF-16LE]
-    expected: FAIL
-
-  [Expect resources/utf-32-little-endian-nobom.html to parse as windows-1252]
-    expected: FAIL
-
-  [Expect resources/utf-32-little-endian-nobom.xml to parse as windows-1252]
-    expected: FAIL
-