Skip to content

Proprietary inference stack at 10,000 tokens/sec for Warp #8858

@NicoConstant

Description

@NicoConstant

Pre-submit Checks

Describe the solution you'd like?

Hi,

We built a proprietary stack at Kog combining a bare metal inference engine in Assembly/C++ and a parallel model architecture that generates entire sequences simultaneously rather than token by token, reaching 10,000 tokens per second per request where standard providers sit around 100 t/s.

We are looking for teams running generation-heavy workflows like complex multi-repo code changes executed by parallel agents to validate this together.

What would that concretely change for the Warp team?

Is your feature request related to a problem? Please describe.

No response

Additional context

No response

Operating system (OS)

Select an OS

How important is this feature to you?

4

Warp Internal (ignore) - linear-label:39cc6478-1249-4ee7-950b-c428edfeecd1

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    FEATUREFeature Requests

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions