PanQiWei
|
7c2ec905a6
|
extrac rope logic into a single method for better override in child class
|
2023-08-13 16:13:44 +08:00 |
|
PanQiWei
|
fdb8c4500a
|
extend to support qlinear_exllama's fusion
|
2023-08-11 14:52:26 +08:00 |
|
PanQiWei
|
efe47aafe5
|
prevent potential import error
|
2023-08-10 15:36:54 +08:00 |
|
潘其威(William)
|
beab695c5b
|
Merge branch 'main' into xformers_integration
|
2023-08-10 15:27:11 +08:00 |
|
Felix Marty
|
4af7ea619d
|
patch for transformers compatiblity
|
2023-08-09 14:23:59 +00:00 |
|
PanQiWei
|
44c7a1a184
|
make exllama_kernels compilation as optional
|
2023-08-09 17:42:22 +08:00 |
|
PanQiWei
|
26dc6852fe
|
support inherit one of the three fused attention class and customize attn_bias building logic
|
2023-08-07 18:59:04 +08:00 |
|
qwopqwop200
|
2f48780165
|
fix bug disable exlllama
|
2023-08-07 16:28:30 +09:00 |
|
PanQiWei
|
e5f874e5af
|
add fused attention injection logic to llama
|
2023-08-07 13:45:37 +08:00 |
|
PanQiWei
|
1f9717af7f
|
change classes default values
|
2023-08-06 18:24:23 +08:00 |
|
PanQiWei
|
7a70bcf6d8
|
doing 'memory_efficient_fusion' in __init__
|
2023-08-06 17:23:57 +08:00 |
|
PanQiWei
|
01ce32553e
|
remove unnecessary lines
|
2023-08-06 16:24:44 +08:00 |
|
PanQiWei
|
677409e2fe
|
fix using wrong attribute
|
2023-08-06 16:23:19 +08:00 |
|
PanQiWei
|
9155ef3038
|
fix using wrong attribute
|
2023-08-06 15:37:11 +08:00 |
|
PanQiWei
|
f67b512cee
|
add 'training' argument
|
2023-08-06 14:54:34 +08:00 |
|
PanQiWei
|
0fcfddda90
|
rename 'inject_to_model' to 'convert_to_torch_linear'
|
2023-08-06 12:09:16 +08:00 |
|
PanQiWei
|
2826729e73
|
use pytorch normal forward logic when trainable is True
|
2023-08-06 11:44:29 +08:00 |
|
PanQiWei
|
801610367d
|
Merge branch 'main' into xformers_integration
|
2023-08-05 18:02:00 +08:00 |
|
fxmarty
|
71f23268eb
|
Merge pull request #1 from qwopqwop200/exllama-q4-kernel
Exllama q4 kernel
|
2023-08-05 00:15:22 +09:00 |
|
Felix Marty
|
d0608b09db
|
rocm support
|
2023-08-04 13:38:02 +00:00 |
|
Félix Marty
|
4fb3e20c5e
|
Merge branch 'main' into exllama-q4-kernel
|
2023-08-04 15:13:34 +02:00 |
|
PanQiWei
|
7d0909160c
|
add fused MLPs
|
2023-08-04 20:03:16 +08:00 |
|
PanQiWei
|
8b19122775
|
add fused attentions
|
2023-08-04 19:11:43 +08:00 |
|
PanQiWei
|
cd8a674002
|
add FusedGeneralQuantLinear
|
2023-08-04 19:10:32 +08:00 |
|
qwopqwop200
|
79ab5076c7
|
revert fused_llama_attn.py
|
2023-08-04 18:19:54 +09:00 |
|
qwopqwop200
|
068210d0b7
|
exllama support flash attention
|
2023-08-03 16:30:16 +09:00 |
|
qwopqwop200
|
7a7df5655a
|
support group query attention
|
2023-08-03 16:08:49 +09:00 |
|
leiwang1999
|
a0de5c2c51
|
regist buffer of general quant linear
|
2023-08-03 05:15:09 +00:00 |
|
qwopqwop200
|
3fc097dcd8
|
change pcak func support only 4 bit
|
2023-08-01 20:01:45 +09:00 |
|
qwopqwop200
|
a60c9a8552
|
add pack fun
|
2023-08-01 12:22:41 +09:00 |
|
Felix Marty
|
339c57a902
|
fix
|
2023-07-31 15:57:44 +00:00 |
|
Felix Marty
|
129fa4b67e
|
act-order now works fine
|
2023-07-31 15:36:58 +00:00 |
|
Felix Marty
|
38447262c0
|
fix fused attn
|
2023-07-31 13:46:32 +00:00 |
|
Felix Marty
|
760667dccc
|
cleaning
|
2023-07-31 11:58:10 +00:00 |
|
Felix Marty
|
179776bd1d
|
exllama kernel
|
2023-07-31 11:50:45 +00:00 |
|
PanQiWei
|
5883b45d73
|
fix error raised when cuda kernels are not installed
|
2023-07-26 13:59:28 +08:00 |
|
qwopqwop200
|
9578c59d31
|
fix cuda bug
|
2023-07-25 16:50:05 +09:00 |
|
潘其威(William)
|
046c031139
|
Merge pull request #141 from AngainorDev/patch-1
Fix error message
|
2023-06-19 10:11:10 +08:00 |
|
Angainor Development
|
e75611e1b7
|
Fix error message
|
2023-06-05 22:19:09 +02:00 |
|
lunar
|
618a5f50ee
|
Add transpose operator when replace Conv1d with qlinear_cuda_old
|
2023-06-05 23:11:18 +08:00 |
|
qwopqwop200
|
f4820f2988
|
change qlinear cuda support 64dim
|
2023-06-03 07:30:34 +09:00 |
|
qwopqwop200
|
2df7d7105d
|
support 64 cuda dim
|
2023-06-02 19:54:37 +09:00 |
|
qwopqwop200
|
b03f53294f
|
support 64dim cuda
|
2023-06-02 19:53:50 +09:00 |
|
qwopqwop200
|
0891ea4036
|
support 32dim triton]
|
2023-06-02 19:05:55 +09:00 |
|
qwopqwop200
|
b3654a68c3
|
support 32dim triton kernel
|
2023-06-02 19:04:12 +09:00 |
|
qwopqwop200
|
0f2841cb13
|
remove log
|
2023-05-30 23:51:55 +09:00 |
|
qwopqwop200
|
33809a8e59
|
remove log
|
2023-05-30 23:51:39 +09:00 |
|
qwopqwop200
|
dfd9dc0e6b
|
change if trainable backend pytorch
|
2023-05-30 23:43:55 +09:00 |
|
qwopqwop200
|
5274313067
|
change if trainable backend pytorch
|
2023-05-30 23:40:58 +09:00 |
|
PanQiWei
|
eb9c0b140f
|
update FusedLlamaMLPForQuantizedModel for general usage purpose
|
2023-05-27 07:47:20 +08:00 |
|