AutoGPTQ

Author	SHA1	Message	Date
qwopqwop200	f23a06f911	Merge branch 'PanQiWei:main' into main	2023-08-17 15:22:43 +09:00
qwopqwop200	5d5b687ca8	qigen formatting qlinear	2023-08-17 15:19:01 +09:00
PanQiWei	34b4ba451c	fix typo	2023-08-13 16:26:02 +08:00
qwopqwop200	a807e038bb	remove many contiguous and change arguments name	2023-08-11 16:09:42 +09:00
qwopqwop200	870be83bea	Merge branch 'PanQiWei:main' into main	2023-08-10 22:48:30 +09:00
qwopqwop200	7ba78af3ae	support cpu	2023-08-10 22:48:04 +09:00
Felix Marty	4af7ea619d	patch for transformers compatiblity	2023-08-09 14:23:59 +00:00
PanQiWei	44c7a1a184	make exllama_kernels compilation as optional	2023-08-09 17:42:22 +08:00
qwopqwop200	2f48780165	fix bug disable exlllama	2023-08-07 16:28:30 +09:00
fxmarty	71f23268eb	Merge pull request #1 from qwopqwop200/exllama-q4-kernel Exllama q4 kernel	2023-08-05 00:15:22 +09:00
Felix Marty	d0608b09db	rocm support	2023-08-04 13:38:02 +00:00
Félix Marty	4fb3e20c5e	Merge branch 'main' into exllama-q4-kernel	2023-08-04 15:13:34 +02:00
qwopqwop200	79ab5076c7	revert fused_llama_attn.py	2023-08-04 18:19:54 +09:00
qwopqwop200	068210d0b7	exllama support flash attention	2023-08-03 16:30:16 +09:00
qwopqwop200	7a7df5655a	support group query attention	2023-08-03 16:08:49 +09:00
leiwang1999	a0de5c2c51	regist buffer of general quant linear	2023-08-03 05:15:09 +00:00
qwopqwop200	3fc097dcd8	change pcak func support only 4 bit	2023-08-01 20:01:45 +09:00
qwopqwop200	a60c9a8552	add pack fun	2023-08-01 12:22:41 +09:00
Felix Marty	339c57a902	fix	2023-07-31 15:57:44 +00:00
Felix Marty	129fa4b67e	act-order now works fine	2023-07-31 15:36:58 +00:00
Felix Marty	38447262c0	fix fused attn	2023-07-31 13:46:32 +00:00
Felix Marty	760667dccc	cleaning	2023-07-31 11:58:10 +00:00
Felix Marty	179776bd1d	exllama kernel	2023-07-31 11:50:45 +00:00
PanQiWei	5883b45d73	fix error raised when cuda kernels are not installed	2023-07-26 13:59:28 +08:00
qwopqwop200	9578c59d31	fix cuda bug	2023-07-25 16:50:05 +09:00
潘其威(William)	046c031139	Merge pull request #141 from AngainorDev/patch-1 Fix error message	2023-06-19 10:11:10 +08:00
Angainor Development	e75611e1b7	Fix error message	2023-06-05 22:19:09 +02:00
lunar	618a5f50ee	Add transpose operator when replace Conv1d with qlinear_cuda_old	2023-06-05 23:11:18 +08:00
qwopqwop200	f4820f2988	change qlinear cuda support 64dim	2023-06-03 07:30:34 +09:00
qwopqwop200	2df7d7105d	support 64 cuda dim	2023-06-02 19:54:37 +09:00
qwopqwop200	b03f53294f	support 64dim cuda	2023-06-02 19:53:50 +09:00
qwopqwop200	0891ea4036	support 32dim triton]	2023-06-02 19:05:55 +09:00
qwopqwop200	b3654a68c3	support 32dim triton kernel	2023-06-02 19:04:12 +09:00
qwopqwop200	0f2841cb13	remove log	2023-05-30 23:51:55 +09:00
qwopqwop200	33809a8e59	remove log	2023-05-30 23:51:39 +09:00
qwopqwop200	dfd9dc0e6b	change if trainable backend pytorch	2023-05-30 23:43:55 +09:00
qwopqwop200	5274313067	change if trainable backend pytorch	2023-05-30 23:40:58 +09:00
PanQiWei	eb9c0b140f	update FusedLlamaMLPForQuantizedModel for general usage purpose	2023-05-27 07:47:20 +08:00
PanQiWei	2b532f9453	add trainable mode	2023-05-26 13:11:30 +08:00
PanQiWei	fe5f5d12ed	Merge branch 'main' into peft_integration	2023-05-26 09:48:06 +08:00
PanQiWei	69609c4bc7	support faster vecquant4matmul cuda kernel	2023-05-26 08:55:05 +08:00
PanQiWei	cfd27e8caa	refactor file structure of qlinears	2023-05-26 07:18:16 +08:00
qwopqwop200	503f85255d	Update kernels.py	2023-05-25 23:15:33 +09:00
PanQiWei	8e034b28bc	remove duplicate code	2023-05-23 23:48:15 +08:00
PanQiWei	4373d6b29c	Merge branch 'main' into improve_cpu_offload	2023-05-23 23:47:33 +08:00
PanQiWei	db63c0876a	half out	2023-05-23 16:08:28 +08:00
Lex Song	f2ab4fab46	Fix CUDA out of memory error in qlinear_old.py Add a missing line from qlinear.py to qlinear_old.py to convert the output tensor. This resolves a CUDA out of memory error that occurred without this line.	2023-05-20 21:10:11 +08:00
潘其威(William)	d4011d29c6	Merge pull request #92 from PanQiWei/fix_triton_integration_bugs fix ImportError when triton is not installed	2023-05-20 17:01:14 +08:00
TheBloke	898f1ef62d	Rename 'quant_cuda' to 'autogptq_cuda' to avoid conflicts with existing GPTQ-for-LLaMa installations.	2023-05-20 09:33:51 +01:00
PanQiWei	73b5952f5e	fix not return directly when triton is not installed	2023-05-20 16:21:52 +08:00

1 2

89 commits