Commit graph

13 commits

Author SHA1 Message Date
Vivek Khandelwal
e4b2493733
Modify qlinear_cuda for tracing the GPTQ model (#367)
Changes:
-- The change to the torch.bitwise_and is done because during
   tracing this model the current usage of the torch.bitwise_and
   result in an in-place variant of this op, resulting in an issue
   during the downstream lowering pipeline of the traced model via
   Torch-MLIR and IREE-SHARK. That's why the op usage is changed to
   not result in an in-place variaunt.

-- The change to the torch.matmul call in the forward function is
   done because currently, it assumes that the weights will always
   be of fp16 type. But, when the model is executed for the float32
   weights it results in an error. That's why the current change
   cast the LHS of the matmul to the same type as the RHS one.

Both the above changes doesn't affect the model in any way.

Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
2023-10-21 01:06:01 +09:00
潘其威(William)
3de7fbb0d5
Revert "fix bug(breaking change) remove (zeors -= 1)" 2023-09-27 10:37:31 +08:00
qwopqwop200
ad5b0d72ee
fix bug 2023-09-06 16:41:41 +09:00
Felix Marty
38447262c0 fix fused attn 2023-07-31 13:46:32 +00:00
Felix Marty
179776bd1d exllama kernel 2023-07-31 11:50:45 +00:00
PanQiWei
5883b45d73 fix error raised when cuda kernels are not installed 2023-07-26 13:59:28 +08:00
lunar
618a5f50ee
Add transpose operator when replace Conv1d with qlinear_cuda_old 2023-06-05 23:11:18 +08:00
qwopqwop200
f4820f2988
change qlinear cuda support 64dim 2023-06-03 07:30:34 +09:00
qwopqwop200
2df7d7105d
support 64 cuda dim 2023-06-02 19:54:37 +09:00
qwopqwop200
33809a8e59
remove log 2023-05-30 23:51:39 +09:00
qwopqwop200
dfd9dc0e6b
change if trainable backend pytorch 2023-05-30 23:43:55 +09:00
PanQiWei
2b532f9453 add trainable mode 2023-05-26 13:11:30 +08:00
PanQiWei
cfd27e8caa refactor file structure of qlinears 2023-05-26 07:18:16 +08:00
Renamed from auto_gptq/nn_modules/qlinear_old.py (Browse further)