Re: Incoporate Marlin for GPTQ checkpoints into tgis_native #66

cyang49 · 2024-03-22T16:09:11Z

Resubmitting Marlin PR due to accidental removal

Motivation

This PR enables the use of Marlin kernel for GPTQ checkpoints. Marlin is shown to outperform Exllamav2 on Nvidia GPUs, especially for larger batch sizes.

Modifications

The code changes are mostly similar to exllamav2, except that it uses the Marlin kernel code and binding from the AutoGPTQ package instead of sourcing a separate marlin package. I adapted the QuantLinear implementation from AutoGPTQ with changes to remove codes that we don't need. Note that, my changes also enable marlin support for checkpoints that uses activation reordering (desc_act=True).

Marlin can be turned on by setting environment variable GPTQ_CUDA_TYPE=marlin.

Note that Marlin kernel only works on Nvidia GPUs with compute capability >= 8.0.

Result

[Llama-70B-4bit-128g]
Single A100x80GB, 1k context, output 512 tokens, batch size=16,

Marlin
Prefill : 12.2s, Inference time:38.57s
Exllamav2
Prefill : 9.68s, Inference time:79.7s

Investigations are needed as Marlin prefill appears slower.

The code needs to be more thoroughly tested both for the performance and correctness in the following scenarios:

Should not break fp16 logic
Should work for desc_act=False GPTQ checkpoints correctly with optimal performance
Should work for desc_act=True GPTQ checkpoints correctly with optimal performance, with slightly worse performance than the previous scenario
Should not break TP uses, although TP performance still needs further optimizations
Memory management needs extensive reviews

Related Issues

#51

Signed-off-by: Chih-Chieh-Yang <[email protected]>

Signed-off-by: Chih-Chieh Yang <[email protected]> Signed-off-by: Chih-Chieh-Yang <[email protected]>

Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Chih-Chieh Yang <[email protected]> Signed-off-by: Chih-Chieh-Yang <[email protected]>

Signed-off-by: cyang49 <[email protected]> Signed-off-by: Chih-Chieh-Yang <[email protected]>

Signed-off-by: Chih-Chieh-Yang <[email protected]>

cyang49 · 2024-03-25T13:53:21Z

@njhill I really need these changes for #67 for a more thorough performance test. I decide to use exllama as default to get these merged quicker. Please let me know if you need anything else before merging

njhill

Thanks @cyang49 for this great work! Running some internal tests on this rn and will merge as soon as those pass.

cyang49 mentioned this pull request Mar 22, 2024

Performance Optimizations for TP-Aware GPTQ #67

Draft

cyang49 and others added 4 commits March 25, 2024 13:38

Incoporate marlin into tgis_native

0a240b3

Signed-off-by: Chih-Chieh-Yang <[email protected]>

Enable marlin as default GPTQ kernel

a1a3809

Signed-off-by: Chih-Chieh Yang <[email protected]> Signed-off-by: Chih-Chieh-Yang <[email protected]>

Update server/text_generation_server/utils/gptq/marlin.py

66af240

Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Chih-Chieh Yang <[email protected]> Signed-off-by: Chih-Chieh-Yang <[email protected]>

Apply suggestion on GPTQ buffer setup

d418095

Signed-off-by: cyang49 <[email protected]> Signed-off-by: Chih-Chieh-Yang <[email protected]>

cyang49 force-pushed the pr_marlin branch from 025b944 to d418095 Compare March 25, 2024 13:48

changing default to exllama

57cba2d

Signed-off-by: Chih-Chieh-Yang <[email protected]>

njhill approved these changes Mar 25, 2024

View reviewed changes

njhill merged commit 316ca8d into IBM:main Mar 25, 2024

cyang49 deleted the pr_marlin branch March 25, 2024 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re: Incoporate Marlin for GPTQ checkpoints into tgis_native #66

Re: Incoporate Marlin for GPTQ checkpoints into tgis_native #66

Uh oh!

cyang49 commented Mar 22, 2024

Uh oh!

cyang49 commented Mar 25, 2024

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Re: Incoporate Marlin for GPTQ checkpoints into tgis_native #66

Re: Incoporate Marlin for GPTQ checkpoints into tgis_native #66

Uh oh!

Conversation

cyang49 commented Mar 22, 2024

Motivation

Modifications

Result

Related Issues

Uh oh!

cyang49 commented Mar 25, 2024

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!