Consume BOM in the text() method of fetch bodies (#36192)

In the fetch spec, the `text()` method of `Body` (an interface mixin
implemented by both `Request` and `Response`) consumes the body with
the Encoding spec "UTF-8 decode" algorithm, which skips the UTF-8 BOM
if it is present at the beginning of the body. Servo's implementation
does not do that. This patch fixes this.

Signed-off-by: Andreu Botella <abotella@igalia.com>
This commit is contained in:
Andreu Botella 2025-03-28 20:02:48 +01:00 committed by GitHub
parent 94bcab177e
commit 95c3033456
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 8 additions and 61 deletions

View file

@ -737,8 +737,15 @@ fn run_package_data_algorithm(
/// <https://fetch.spec.whatwg.org/#ref-for-concept-body-consume-body%E2%91%A4>
fn run_text_data_algorithm(bytes: Vec<u8>) -> Fallible<FetchedData> {
// This implements the Encoding standard's "decode UTF-8", which removes the
// BOM if present.
let no_bom_bytes = if bytes.starts_with(b"\xEF\xBB\xBF") {
&bytes[3..]
} else {
&bytes
};
Ok(FetchedData::Text(
String::from_utf8_lossy(&bytes).into_owned(),
String::from_utf8_lossy(no_bom_bytes).into_owned(),
))
}