Commit graph

96 commits

Author SHA1 Message Date
Vivek Khandelwal
e4b2493733
Modify qlinear_cuda for tracing the GPTQ model (#367)
Changes:
-- The change to the torch.bitwise_and is done because during
   tracing this model the current usage of the torch.bitwise_and
   result in an in-place variant of this op, resulting in an issue
   during the downstream lowering pipeline of the traced model via
   Torch-MLIR and IREE-SHARK. That's why the op usage is changed to
   not result in an in-place variaunt.

-- The change to the torch.matmul call in the forward function is
   done because currently, it assumes that the weights will always
   be of fp16 type. But, when the model is executed for the float32
   weights it results in an error. That's why the current change
   cast the LHS of the matmul to the same type as the RHS one.

Both the above changes doesn't affect the model in any way.

Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
2023-10-21 01:06:01 +09:00
潘其威(William)
3de7fbb0d5
Revert "fix bug(breaking change) remove (zeors -= 1)" 2023-09-27 10:37:31 +08:00
潘其威(William)
62fd0371ac
Merge branch 'main' into main 2023-09-26 14:09:04 +08:00
Marc Sun
c912bf361a exllamav2 integration 2023-09-25 16:51:18 +00:00
qwopqwop200
6b1ceb1897
if exllama auto diable fused attention 2023-09-06 18:14:04 +09:00
qwopqwop200
ad5b0d72ee
fix bug 2023-09-06 16:41:41 +09:00
qwopqwop200
45a1ee4d84
install check qigen 2023-08-31 14:37:39 +09:00
qwopqwop200
f23a06f911
Merge branch 'PanQiWei:main' into main 2023-08-17 15:22:43 +09:00
qwopqwop200
5d5b687ca8
qigen formatting qlinear 2023-08-17 15:19:01 +09:00
PanQiWei
34b4ba451c fix typo 2023-08-13 16:26:02 +08:00
qwopqwop200
a807e038bb
remove many contiguous and change arguments name 2023-08-11 16:09:42 +09:00
qwopqwop200
870be83bea
Merge branch 'PanQiWei:main' into main 2023-08-10 22:48:30 +09:00
qwopqwop200
7ba78af3ae support cpu 2023-08-10 22:48:04 +09:00
Felix Marty
4af7ea619d patch for transformers compatiblity 2023-08-09 14:23:59 +00:00
PanQiWei
44c7a1a184 make exllama_kernels compilation as optional 2023-08-09 17:42:22 +08:00
qwopqwop200
2f48780165
fix bug disable exlllama 2023-08-07 16:28:30 +09:00
fxmarty
71f23268eb
Merge pull request #1 from qwopqwop200/exllama-q4-kernel
Exllama q4 kernel
2023-08-05 00:15:22 +09:00
Felix Marty
d0608b09db rocm support 2023-08-04 13:38:02 +00:00
Félix Marty
4fb3e20c5e Merge branch 'main' into exllama-q4-kernel 2023-08-04 15:13:34 +02:00
qwopqwop200
79ab5076c7
revert fused_llama_attn.py 2023-08-04 18:19:54 +09:00
qwopqwop200
068210d0b7
exllama support flash attention 2023-08-03 16:30:16 +09:00
qwopqwop200
7a7df5655a
support group query attention 2023-08-03 16:08:49 +09:00
leiwang1999
a0de5c2c51 regist buffer of general quant linear 2023-08-03 05:15:09 +00:00
qwopqwop200
3fc097dcd8
change pcak func support only 4 bit 2023-08-01 20:01:45 +09:00
qwopqwop200
a60c9a8552
add pack fun 2023-08-01 12:22:41 +09:00
Felix Marty
339c57a902 fix 2023-07-31 15:57:44 +00:00
Felix Marty
129fa4b67e act-order now works fine 2023-07-31 15:36:58 +00:00
Felix Marty
38447262c0 fix fused attn 2023-07-31 13:46:32 +00:00
Felix Marty
760667dccc cleaning 2023-07-31 11:58:10 +00:00
Felix Marty
179776bd1d exllama kernel 2023-07-31 11:50:45 +00:00
PanQiWei
5883b45d73 fix error raised when cuda kernels are not installed 2023-07-26 13:59:28 +08:00
qwopqwop200
9578c59d31
fix cuda bug 2023-07-25 16:50:05 +09:00
潘其威(William)
046c031139
Merge pull request #141 from AngainorDev/patch-1
Fix error message
2023-06-19 10:11:10 +08:00
Angainor Development
e75611e1b7
Fix error message 2023-06-05 22:19:09 +02:00
lunar
618a5f50ee
Add transpose operator when replace Conv1d with qlinear_cuda_old 2023-06-05 23:11:18 +08:00
qwopqwop200
f4820f2988
change qlinear cuda support 64dim 2023-06-03 07:30:34 +09:00
qwopqwop200
2df7d7105d
support 64 cuda dim 2023-06-02 19:54:37 +09:00
qwopqwop200
b03f53294f
support 64dim cuda 2023-06-02 19:53:50 +09:00
qwopqwop200
0891ea4036
support 32dim triton] 2023-06-02 19:05:55 +09:00
qwopqwop200
b3654a68c3
support 32dim triton kernel 2023-06-02 19:04:12 +09:00
qwopqwop200
0f2841cb13
remove log 2023-05-30 23:51:55 +09:00
qwopqwop200
33809a8e59
remove log 2023-05-30 23:51:39 +09:00
qwopqwop200
dfd9dc0e6b
change if trainable backend pytorch 2023-05-30 23:43:55 +09:00
qwopqwop200
5274313067
change if trainable backend pytorch 2023-05-30 23:40:58 +09:00
PanQiWei
eb9c0b140f update FusedLlamaMLPForQuantizedModel for general usage purpose 2023-05-27 07:47:20 +08:00
PanQiWei
2b532f9453 add trainable mode 2023-05-26 13:11:30 +08:00
PanQiWei
fe5f5d12ed Merge branch 'main' into peft_integration 2023-05-26 09:48:06 +08:00
PanQiWei
69609c4bc7 support faster vecquant4matmul cuda kernel 2023-05-26 08:55:05 +08:00
PanQiWei
cfd27e8caa refactor file structure of qlinears 2023-05-26 07:18:16 +08:00
qwopqwop200
503f85255d
Update kernels.py 2023-05-25 23:15:33 +09:00