Commit graph

13 commits

Author SHA1 Message Date
PanQiWei
7c2ec905a6 extrac rope logic into a single method for better override in child class 2023-08-13 16:13:44 +08:00
PanQiWei
fdb8c4500a extend to support qlinear_exllama's fusion 2023-08-11 14:52:26 +08:00
PanQiWei
efe47aafe5 prevent potential import error 2023-08-10 15:36:54 +08:00
PanQiWei
26dc6852fe support inherit one of the three fused attention class and customize attn_bias building logic 2023-08-07 18:59:04 +08:00
PanQiWei
e5f874e5af add fused attention injection logic to llama 2023-08-07 13:45:37 +08:00
PanQiWei
1f9717af7f change classes default values 2023-08-06 18:24:23 +08:00
PanQiWei
7a70bcf6d8 doing 'memory_efficient_fusion' in __init__ 2023-08-06 17:23:57 +08:00
PanQiWei
677409e2fe fix using wrong attribute 2023-08-06 16:23:19 +08:00
PanQiWei
9155ef3038 fix using wrong attribute 2023-08-06 15:37:11 +08:00
PanQiWei
f67b512cee add 'training' argument 2023-08-06 14:54:34 +08:00
PanQiWei
7d0909160c add fused MLPs 2023-08-04 20:03:16 +08:00
PanQiWei
8b19122775 add fused attentions 2023-08-04 19:11:43 +08:00
PanQiWei
cd8a674002 add FusedGeneralQuantLinear 2023-08-04 19:10:32 +08:00