Make Cache statically configurable at model construction time

### Feature request

Be able to construct and load a model like:
```
model = AutoModelForCausalLM.from_pretrained(
    hf_model_repo,
    attn_implementation="sdpa",
    generation_config=GenerationConfig(
        use_cache=True,
        cache_implementation=cache_implementation,
        max_length=max_cache_len,
        cache_config={
            "batch_size": batch_size,
            "max_cache_len": max_cache_len,
        },
    ),
)
```
See additional context in #32253

### Motivation

This feature request is to support `torch.export()`, and ensure the model is exportable in a way that can be further lowered and run in ExecuTorch with performance out-of-the-box.

### Your contribution

TBD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make Cache statically configurable at model construction time #32500

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make Cache statically configurable at model construction time #32500

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions