Consume BOM in the text() method of fetch bodies (#36192)

In the fetch spec, the `text()` method of `Body` (an interface mixin implemented by both `Request` and `Response`) consumes the body with the Encoding spec "UTF-8 decode" algorithm, which skips the UTF-8 BOM if it is present at the beginning of the body. Servo's implementation does not do that. This patch fixes this. Signed-off-by: Andreu Botella <abotella@igalia.com>
2025-09-30 00:29:14 +01:00 · 2025-03-28 20:02:48 +01:00 · 2025-03-28 20:02:48 +01:00 · 95c3033456
commit 95c3033456
parent 94bcab177e
3 changed files with 8 additions and 61 deletions
--- a/components/script/body.rs
+++ b/components/script/body.rs
@ -737,8 +737,15 @@ fn run_package_data_algorithm(

 /// <https://fetch.spec.whatwg.org/#ref-for-concept-body-consume-body%E2%91%A4>
 fn run_text_data_algorithm(bytes: Vec<u8>) -> Fallible<FetchedData> {
+    // This implements the Encoding standard's "decode UTF-8", which removes the
+    // BOM if present.
+    let no_bom_bytes = if bytes.starts_with(b"\xEF\xBB\xBF") {
+        &bytes[3..]
+    } else {
+        &bytes
+    };
    Ok(FetchedData::Text(
-        String::from_utf8_lossy(&bytes).into_owned(),
+        String::from_utf8_lossy(no_bom_bytes).into_owned(),
    ))
 }