Modification for Model Generation in `quantized_cache.py`

To enhance the functionality and usability of the model generation code located in `performance_optimization/quantized_cache.py`, the following optimizations are proposed:

1. **Handle Multiple Prompts**: Implement a function to process multiple prompts in a single execution, allowing batch processing and improving efficiency.

2. **Control Over Output Length and Sampling**: Add parameters to allow users to specify the maximum output length and whether to use sampling or greedy decoding for text generation.

3. **Batch Processing**: Optimize the code to process inputs in batches, reducing the overhead of multiple calls to the model and improving performance.

please assign this to me so that i can contribute in it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modification for Model Generation in `quantized_cache.py` #77

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Modification for Model Generation in quantized_cache.py #77

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Modification for Model Generation in `quantized_cache.py` #77