Commit graph

100 commits

Author SHA1 Message Date
PanQiWei
fdb8c4500a extend to support qlinear_exllama's fusion 2023-08-11 14:52:26 +08:00
PanQiWei
efe47aafe5 prevent potential import error 2023-08-10 15:36:54 +08:00
潘其威(William)
beab695c5b
Merge branch 'main' into xformers_integration 2023-08-10 15:27:11 +08:00
Felix Marty
4af7ea619d patch for transformers compatiblity 2023-08-09 14:23:59 +00:00
PanQiWei
44c7a1a184 make exllama_kernels compilation as optional 2023-08-09 17:42:22 +08:00
PanQiWei
26dc6852fe support inherit one of the three fused attention class and customize attn_bias building logic 2023-08-07 18:59:04 +08:00
qwopqwop200
2f48780165
fix bug disable exlllama 2023-08-07 16:28:30 +09:00
PanQiWei
e5f874e5af add fused attention injection logic to llama 2023-08-07 13:45:37 +08:00
PanQiWei
1f9717af7f change classes default values 2023-08-06 18:24:23 +08:00
PanQiWei
7a70bcf6d8 doing 'memory_efficient_fusion' in __init__ 2023-08-06 17:23:57 +08:00
PanQiWei
01ce32553e remove unnecessary lines 2023-08-06 16:24:44 +08:00
PanQiWei
677409e2fe fix using wrong attribute 2023-08-06 16:23:19 +08:00
PanQiWei
9155ef3038 fix using wrong attribute 2023-08-06 15:37:11 +08:00
PanQiWei
f67b512cee add 'training' argument 2023-08-06 14:54:34 +08:00
PanQiWei
0fcfddda90 rename 'inject_to_model' to 'convert_to_torch_linear' 2023-08-06 12:09:16 +08:00
PanQiWei
2826729e73 use pytorch normal forward logic when trainable is True 2023-08-06 11:44:29 +08:00
PanQiWei
801610367d Merge branch 'main' into xformers_integration 2023-08-05 18:02:00 +08:00
fxmarty
71f23268eb
Merge pull request #1 from qwopqwop200/exllama-q4-kernel
Exllama q4 kernel
2023-08-05 00:15:22 +09:00
Felix Marty
d0608b09db rocm support 2023-08-04 13:38:02 +00:00
Félix Marty
4fb3e20c5e Merge branch 'main' into exllama-q4-kernel 2023-08-04 15:13:34 +02:00
PanQiWei
7d0909160c add fused MLPs 2023-08-04 20:03:16 +08:00
PanQiWei
8b19122775 add fused attentions 2023-08-04 19:11:43 +08:00
PanQiWei
cd8a674002 add FusedGeneralQuantLinear 2023-08-04 19:10:32 +08:00
qwopqwop200
79ab5076c7
revert fused_llama_attn.py 2023-08-04 18:19:54 +09:00
qwopqwop200
068210d0b7
exllama support flash attention 2023-08-03 16:30:16 +09:00
qwopqwop200
7a7df5655a
support group query attention 2023-08-03 16:08:49 +09:00
leiwang1999
a0de5c2c51 regist buffer of general quant linear 2023-08-03 05:15:09 +00:00
qwopqwop200
3fc097dcd8
change pcak func support only 4 bit 2023-08-01 20:01:45 +09:00
qwopqwop200
a60c9a8552
add pack fun 2023-08-01 12:22:41 +09:00
Felix Marty
339c57a902 fix 2023-07-31 15:57:44 +00:00
Felix Marty
129fa4b67e act-order now works fine 2023-07-31 15:36:58 +00:00
Felix Marty
38447262c0 fix fused attn 2023-07-31 13:46:32 +00:00
Felix Marty
760667dccc cleaning 2023-07-31 11:58:10 +00:00
Felix Marty
179776bd1d exllama kernel 2023-07-31 11:50:45 +00:00
PanQiWei
5883b45d73 fix error raised when cuda kernels are not installed 2023-07-26 13:59:28 +08:00
qwopqwop200
9578c59d31
fix cuda bug 2023-07-25 16:50:05 +09:00
潘其威(William)
046c031139
Merge pull request #141 from AngainorDev/patch-1
Fix error message
2023-06-19 10:11:10 +08:00
Angainor Development
e75611e1b7
Fix error message 2023-06-05 22:19:09 +02:00
lunar
618a5f50ee
Add transpose operator when replace Conv1d with qlinear_cuda_old 2023-06-05 23:11:18 +08:00
qwopqwop200
f4820f2988
change qlinear cuda support 64dim 2023-06-03 07:30:34 +09:00
qwopqwop200
2df7d7105d
support 64 cuda dim 2023-06-02 19:54:37 +09:00
qwopqwop200
b03f53294f
support 64dim cuda 2023-06-02 19:53:50 +09:00
qwopqwop200
0891ea4036
support 32dim triton] 2023-06-02 19:05:55 +09:00
qwopqwop200
b3654a68c3
support 32dim triton kernel 2023-06-02 19:04:12 +09:00
qwopqwop200
0f2841cb13
remove log 2023-05-30 23:51:55 +09:00
qwopqwop200
33809a8e59
remove log 2023-05-30 23:51:39 +09:00
qwopqwop200
dfd9dc0e6b
change if trainable backend pytorch 2023-05-30 23:43:55 +09:00
qwopqwop200
5274313067
change if trainable backend pytorch 2023-05-30 23:40:58 +09:00
PanQiWei
eb9c0b140f update FusedLlamaMLPForQuantizedModel for general usage purpose 2023-05-27 07:47:20 +08:00
PanQiWei
2b532f9453 add trainable mode 2023-05-26 13:11:30 +08:00