AutoGPTQ/auto_gptq/nn_modules
Vivek Khandelwal e4b2493733
Modify qlinear_cuda for tracing the GPTQ model (#367)
Changes:
-- The change to the torch.bitwise_and is done because during
   tracing this model the current usage of the torch.bitwise_and
   result in an in-place variant of this op, resulting in an issue
   during the downstream lowering pipeline of the traced model via
   Torch-MLIR and IREE-SHARK. That's why the op usage is changed to
   not result in an in-place variaunt.

-- The change to the torch.matmul call in the forward function is
   done because currently, it assumes that the weights will always
   be of fp16 type. But, when the model is executed for the float32
   weights it results in an error. That's why the current change
   cast the LHS of the matmul to the same type as the RHS one.

Both the above changes doesn't affect the model in any way.

Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
2023-10-21 01:06:01 +09:00
..
qlinear Modify qlinear_cuda for tracing the GPTQ model (#367) 2023-10-21 01:06:01 +09:00
triton_utils Revert "fix bug(breaking change) remove (zeors -= 1)" 2023-09-27 10:37:31 +08:00
__init__.py refactor file structure 2023-04-25 18:58:20 +08:00
_fused_base.py add trainable mode 2023-05-26 13:11:30 +08:00
fused_gptj_attn.py Revert "fix bug(breaking change) remove (zeors -= 1)" 2023-09-27 10:37:31 +08:00
fused_llama_attn.py Revert "fix bug(breaking change) remove (zeors -= 1)" 2023-09-27 10:37:31 +08:00
fused_llama_mlp.py update FusedLlamaMLPForQuantizedModel for general usage purpose 2023-05-27 07:47:20 +08:00