AutoGPTQ

Author	SHA1	Message	Date
PanQiWei	fdb8c4500a	extend to support qlinear_exllama's fusion	2023-08-11 14:52:26 +08:00
PanQiWei	efe47aafe5	prevent potential import error	2023-08-10 15:36:54 +08:00
潘其威(William)	beab695c5b	Merge branch 'main' into xformers_integration	2023-08-10 15:27:11 +08:00
Felix Marty	4af7ea619d	patch for transformers compatiblity	2023-08-09 14:23:59 +00:00
PanQiWei	44c7a1a184	make exllama_kernels compilation as optional	2023-08-09 17:42:22 +08:00
PanQiWei	26dc6852fe	support inherit one of the three fused attention class and customize attn_bias building logic	2023-08-07 18:59:04 +08:00
qwopqwop200	2f48780165	fix bug disable exlllama	2023-08-07 16:28:30 +09:00
PanQiWei	e5f874e5af	add fused attention injection logic to llama	2023-08-07 13:45:37 +08:00
PanQiWei	1f9717af7f	change classes default values	2023-08-06 18:24:23 +08:00
PanQiWei	7a70bcf6d8	doing 'memory_efficient_fusion' in __init__	2023-08-06 17:23:57 +08:00
PanQiWei	01ce32553e	remove unnecessary lines	2023-08-06 16:24:44 +08:00
PanQiWei	677409e2fe	fix using wrong attribute	2023-08-06 16:23:19 +08:00
PanQiWei	9155ef3038	fix using wrong attribute	2023-08-06 15:37:11 +08:00
PanQiWei	f67b512cee	add 'training' argument	2023-08-06 14:54:34 +08:00
PanQiWei	0fcfddda90	rename 'inject_to_model' to 'convert_to_torch_linear'	2023-08-06 12:09:16 +08:00
PanQiWei	2826729e73	use pytorch normal forward logic when trainable is True	2023-08-06 11:44:29 +08:00
PanQiWei	801610367d	Merge branch 'main' into xformers_integration	2023-08-05 18:02:00 +08:00
fxmarty	71f23268eb	Merge pull request #1 from qwopqwop200/exllama-q4-kernel Exllama q4 kernel	2023-08-05 00:15:22 +09:00
Felix Marty	d0608b09db	rocm support	2023-08-04 13:38:02 +00:00
Félix Marty	4fb3e20c5e	Merge branch 'main' into exllama-q4-kernel	2023-08-04 15:13:34 +02:00
PanQiWei	7d0909160c	add fused MLPs	2023-08-04 20:03:16 +08:00
PanQiWei	8b19122775	add fused attentions	2023-08-04 19:11:43 +08:00
PanQiWei	cd8a674002	add FusedGeneralQuantLinear	2023-08-04 19:10:32 +08:00
qwopqwop200	79ab5076c7	revert fused_llama_attn.py	2023-08-04 18:19:54 +09:00
qwopqwop200	068210d0b7	exllama support flash attention	2023-08-03 16:30:16 +09:00
qwopqwop200	7a7df5655a	support group query attention	2023-08-03 16:08:49 +09:00
leiwang1999	a0de5c2c51	regist buffer of general quant linear	2023-08-03 05:15:09 +00:00
qwopqwop200	3fc097dcd8	change pcak func support only 4 bit	2023-08-01 20:01:45 +09:00
qwopqwop200	a60c9a8552	add pack fun	2023-08-01 12:22:41 +09:00
Felix Marty	339c57a902	fix	2023-07-31 15:57:44 +00:00
Felix Marty	129fa4b67e	act-order now works fine	2023-07-31 15:36:58 +00:00
Felix Marty	38447262c0	fix fused attn	2023-07-31 13:46:32 +00:00
Felix Marty	760667dccc	cleaning	2023-07-31 11:58:10 +00:00
Felix Marty	179776bd1d	exllama kernel	2023-07-31 11:50:45 +00:00
PanQiWei	5883b45d73	fix error raised when cuda kernels are not installed	2023-07-26 13:59:28 +08:00
qwopqwop200	9578c59d31	fix cuda bug	2023-07-25 16:50:05 +09:00
潘其威(William)	046c031139	Merge pull request #141 from AngainorDev/patch-1 Fix error message	2023-06-19 10:11:10 +08:00
Angainor Development	e75611e1b7	Fix error message	2023-06-05 22:19:09 +02:00
lunar	618a5f50ee	Add transpose operator when replace Conv1d with qlinear_cuda_old	2023-06-05 23:11:18 +08:00
qwopqwop200	f4820f2988	change qlinear cuda support 64dim	2023-06-03 07:30:34 +09:00
qwopqwop200	2df7d7105d	support 64 cuda dim	2023-06-02 19:54:37 +09:00
qwopqwop200	b03f53294f	support 64dim cuda	2023-06-02 19:53:50 +09:00
qwopqwop200	0891ea4036	support 32dim triton]	2023-06-02 19:05:55 +09:00
qwopqwop200	b3654a68c3	support 32dim triton kernel	2023-06-02 19:04:12 +09:00
qwopqwop200	0f2841cb13	remove log	2023-05-30 23:51:55 +09:00
qwopqwop200	33809a8e59	remove log	2023-05-30 23:51:39 +09:00
qwopqwop200	dfd9dc0e6b	change if trainable backend pytorch	2023-05-30 23:43:55 +09:00
qwopqwop200	5274313067	change if trainable backend pytorch	2023-05-30 23:40:58 +09:00
PanQiWei	eb9c0b140f	update FusedLlamaMLPForQuantizedModel for general usage purpose	2023-05-27 07:47:20 +08:00
PanQiWei	2b532f9453	add trainable mode	2023-05-26 13:11:30 +08:00

1 2

100 commits