Skip to content

Fatal error: Index out of range while trying to parse non valid HTML #344

@JoseDelPino

Description

@JoseDelPino

This is the error described in #189

As an example this HTML will throw an error:

The error is thrown in this function:
public func matchesAny(_ seq: ParsingStrings) -> Bool {
var buffer = [UInt8](repeating: 0, count: 4) // Max UTF-8 sequence is 4 bytes
var length = 1
buffer[0] = input[pos]

    // Check if the first byte indicates a multi-byte character
    if buffer[0] & 0b10000000 != 0 {
        if buffer[0] & 0b11100000 == 0b11000000, pos + 1 < end {
            buffer[1] = input[pos + 1]
            length = 2
        } else if buffer[0] & 0b11110000 == 0b11100000, pos + 2 < end {
            buffer[1] = input[pos + 1]
            buffer[2] = input[pos + 2]
            length = 3
        } else if buffer[0] & 0b11111000 == 0b11110000, pos + 3 < end {
            buffer[1] = input[pos + 1]
            buffer[2] = input[pos + 2]
            buffer[3] = input[pos + 3]
            length = 4
        } else {
            return false // Invalid UTF-8 sequence
        }
    }
    let bufferSlice = buffer[..<length]
    
    return seq.contains(bufferSlice)
}

Specifically in buffer[0] = input[pos]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions