Vivek Khandelwal
e4b2493733
Modify qlinear_cuda for tracing the GPTQ model ( #367 )
...
Changes:
-- The change to the torch.bitwise_and is done because during
tracing this model the current usage of the torch.bitwise_and
result in an in-place variant of this op, resulting in an issue
during the downstream lowering pipeline of the traced model via
Torch-MLIR and IREE-SHARK. That's why the op usage is changed to
not result in an in-place variaunt.
-- The change to the torch.matmul call in the forward function is
done because currently, it assumes that the weights will always
be of fp16 type. But, when the model is executed for the float32
weights it results in an error. That's why the current change
cast the LHS of the matmul to the same type as the RHS one.
Both the above changes doesn't affect the model in any way.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
2023-10-21 01:06:01 +09:00
潘其威(William)
3de7fbb0d5
Revert "fix bug(breaking change) remove (zeors -= 1)"
2023-09-27 10:37:31 +08:00
潘其威(William)
62fd0371ac
Merge branch 'main' into main
2023-09-26 14:09:04 +08:00
Marc Sun
c912bf361a
exllamav2 integration
2023-09-25 16:51:18 +00:00
qwopqwop200
6b1ceb1897
if exllama auto diable fused attention
2023-09-06 18:14:04 +09:00
qwopqwop200
ad5b0d72ee
fix bug
2023-09-06 16:41:41 +09:00
qwopqwop200
45a1ee4d84
install check qigen
2023-08-31 14:37:39 +09:00
qwopqwop200
f23a06f911
Merge branch 'PanQiWei:main' into main
2023-08-17 15:22:43 +09:00
qwopqwop200
5d5b687ca8
qigen formatting qlinear
2023-08-17 15:19:01 +09:00
PanQiWei
34b4ba451c
fix typo
2023-08-13 16:26:02 +08:00
qwopqwop200
a807e038bb
remove many contiguous and change arguments name
2023-08-11 16:09:42 +09:00
qwopqwop200
870be83bea
Merge branch 'PanQiWei:main' into main
2023-08-10 22:48:30 +09:00
qwopqwop200
7ba78af3ae
support cpu
2023-08-10 22:48:04 +09:00
Felix Marty
4af7ea619d
patch for transformers compatiblity
2023-08-09 14:23:59 +00:00
PanQiWei
44c7a1a184
make exllama_kernels compilation as optional
2023-08-09 17:42:22 +08:00
qwopqwop200
2f48780165
fix bug disable exlllama
2023-08-07 16:28:30 +09:00
fxmarty
71f23268eb
Merge pull request #1 from qwopqwop200/exllama-q4-kernel
...
Exllama q4 kernel
2023-08-05 00:15:22 +09:00
Felix Marty
d0608b09db
rocm support
2023-08-04 13:38:02 +00:00
Félix Marty
4fb3e20c5e
Merge branch 'main' into exllama-q4-kernel
2023-08-04 15:13:34 +02:00
qwopqwop200
79ab5076c7
revert fused_llama_attn.py
2023-08-04 18:19:54 +09:00
qwopqwop200
068210d0b7
exllama support flash attention
2023-08-03 16:30:16 +09:00
qwopqwop200
7a7df5655a
support group query attention
2023-08-03 16:08:49 +09:00
leiwang1999
a0de5c2c51
regist buffer of general quant linear
2023-08-03 05:15:09 +00:00
qwopqwop200
3fc097dcd8
change pcak func support only 4 bit
2023-08-01 20:01:45 +09:00
qwopqwop200
a60c9a8552
add pack fun
2023-08-01 12:22:41 +09:00
Felix Marty
339c57a902
fix
2023-07-31 15:57:44 +00:00
Felix Marty
129fa4b67e
act-order now works fine
2023-07-31 15:36:58 +00:00
Felix Marty
38447262c0
fix fused attn
2023-07-31 13:46:32 +00:00
Felix Marty
760667dccc
cleaning
2023-07-31 11:58:10 +00:00
Felix Marty
179776bd1d
exllama kernel
2023-07-31 11:50:45 +00:00
PanQiWei
5883b45d73
fix error raised when cuda kernels are not installed
2023-07-26 13:59:28 +08:00
qwopqwop200
9578c59d31
fix cuda bug
2023-07-25 16:50:05 +09:00
潘其威(William)
046c031139
Merge pull request #141 from AngainorDev/patch-1
...
Fix error message
2023-06-19 10:11:10 +08:00
Angainor Development
e75611e1b7
Fix error message
2023-06-05 22:19:09 +02:00
lunar
618a5f50ee
Add transpose operator when replace Conv1d with qlinear_cuda_old
2023-06-05 23:11:18 +08:00
qwopqwop200
f4820f2988
change qlinear cuda support 64dim
2023-06-03 07:30:34 +09:00
qwopqwop200
2df7d7105d
support 64 cuda dim
2023-06-02 19:54:37 +09:00
qwopqwop200
b03f53294f
support 64dim cuda
2023-06-02 19:53:50 +09:00
qwopqwop200
0891ea4036
support 32dim triton]
2023-06-02 19:05:55 +09:00
qwopqwop200
b3654a68c3
support 32dim triton kernel
2023-06-02 19:04:12 +09:00
qwopqwop200
0f2841cb13
remove log
2023-05-30 23:51:55 +09:00
qwopqwop200
33809a8e59
remove log
2023-05-30 23:51:39 +09:00
qwopqwop200
dfd9dc0e6b
change if trainable backend pytorch
2023-05-30 23:43:55 +09:00
qwopqwop200
5274313067
change if trainable backend pytorch
2023-05-30 23:40:58 +09:00
PanQiWei
eb9c0b140f
update FusedLlamaMLPForQuantizedModel for general usage purpose
2023-05-27 07:47:20 +08:00
PanQiWei
2b532f9453
add trainable mode
2023-05-26 13:11:30 +08:00
PanQiWei
fe5f5d12ed
Merge branch 'main' into peft_integration
2023-05-26 09:48:06 +08:00
PanQiWei
69609c4bc7
support faster vecquant4matmul cuda kernel
2023-05-26 08:55:05 +08:00
PanQiWei
cfd27e8caa
refactor file structure of qlinears
2023-05-26 07:18:16 +08:00
qwopqwop200
503f85255d
Update kernels.py
2023-05-25 23:15:33 +09:00