AutoGPTQ

History

Vivek Khandelwal e4b2493733 Modify qlinear_cuda for tracing the GPTQ model (#367 ) Changes: -- The change to the torch.bitwise_and is done because during tracing this model the current usage of the torch.bitwise_and result in an in-place variant of this op, resulting in an issue during the downstream lowering pipeline of the traced model via Torch-MLIR and IREE-SHARK. That's why the op usage is changed to not result in an in-place variaunt. -- The change to the torch.matmul call in the forward function is done because currently, it assumes that the weights will always be of fp16 type. But, when the model is executed for the float32 weights it results in an error. That's why the current change cast the LHS of the matmul to the same type as the RHS one. Both the above changes doesn't affect the model in any way. Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>		2023-10-21 01:06:01 +09:00
..
qlinear	Modify qlinear_cuda for tracing the GPTQ model (#367 )	2023-10-21 01:06:01 +09:00
triton_utils	Revert "fix bug(breaking change) remove (zeors -= 1)"	2023-09-27 10:37:31 +08:00
__init__.py	refactor file structure	2023-04-25 18:58:20 +08:00
_fused_base.py	add trainable mode	2023-05-26 13:11:30 +08:00
fused_gptj_attn.py	Revert "fix bug(breaking change) remove (zeors -= 1)"	2023-09-27 10:37:31 +08:00
fused_llama_attn.py	Revert "fix bug(breaking change) remove (zeors -= 1)"	2023-09-27 10:37:31 +08:00
fused_llama_mlp.py	update FusedLlamaMLPForQuantizedModel for general usage purpose	2023-05-27 07:47:20 +08:00