Parsing large (a lot of tokens) files crashes with out of memory.

## The problem
Hey, I'm trying to use this parser with quite a huge file but it crashes with an out of memory exception. Example files can be found here:
https://github.com/nitotm/efficient-language-detector/tree/main/resources/ngrams

## Context

I'm trying to use phpactor (which uses this parser) to index a large file and when running this parser, it crashes the language server with an out of memory exception. (https://github.com/phpactor/phpactor/issues/2978)

I've traced it down to a function in this project:
https://github.com/microsoft/tolerant-php-parser/blob/457738cbec8a6b337b00946df4228d830d6068b0/src/PhpTokenizer.php#L221

In the doc comment of this function it states that caching the result is up to the user of this parser, but I think that a streamed aproach for tokens would probably be better.

## Ideas
Maybe this should be configurable or depending on the file size of the thing to parse. For small files just returning an array is probably faster but for big files streaming the tokens would make more sense.

What I would suggest is some kind of save and restore mechanism in the tokenizer. This way you can save a point in the tokenizer try tokenizing it one way and if that doesn't work try a different way. This way we only have to keep the tokens since the last save point in memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parsing large (a lot of tokens) files crashes with out of memory. #423

The problem

Context

Ideas

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parsing large (a lot of tokens) files crashes with out of memory. #423

Description

The problem

Context

Ideas

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions