Optimize inference speed ⚡️

Experimenting with compiler options in branch [`fast-opts`](https://github.com/maxbbraun/llama4micro/compare/main...fast-opts).

Switching from `-Os` to `-O3` seems to have significant impact on tokens per second. (`-Ofast` doesn't noticeably add on top.)

```diff
->>> Averaged 2.60 tokens/s
+>>> Averaged 3.79 tokens/s
```

Unfortunately, something about this seems to break the camera input or TPU inference and I haven't debugged that yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize inference speed ⚡️ #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Optimize inference speed ⚡️ #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions