AutoGPTQ

Author	SHA1	Message	Date
Vivek Khandelwal	e4b2493733	Modify qlinear_cuda for tracing the GPTQ model (#367 ) Changes: -- The change to the torch.bitwise_and is done because during tracing this model the current usage of the torch.bitwise_and result in an in-place variant of this op, resulting in an issue during the downstream lowering pipeline of the traced model via Torch-MLIR and IREE-SHARK. That's why the op usage is changed to not result in an in-place variaunt. -- The change to the torch.matmul call in the forward function is done because currently, it assumes that the weights will always be of fp16 type. But, when the model is executed for the float32 weights it results in an error. That's why the current change cast the LHS of the matmul to the same type as the RHS one. Both the above changes doesn't affect the model in any way. Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>	2023-10-21 01:06:01 +09:00
潘其威(William)	3de7fbb0d5	Revert "fix bug(breaking change) remove (zeors -= 1)"	2023-09-27 10:37:31 +08:00
潘其威(William)	62fd0371ac	Merge branch 'main' into main	2023-09-26 14:09:04 +08:00
Marc Sun	c912bf361a	exllamav2 integration	2023-09-25 16:51:18 +00:00
qwopqwop200	6b1ceb1897	if exllama auto diable fused attention	2023-09-06 18:14:04 +09:00
qwopqwop200	ad5b0d72ee	fix bug	2023-09-06 16:41:41 +09:00
qwopqwop200	45a1ee4d84	install check qigen	2023-08-31 14:37:39 +09:00
qwopqwop200	f23a06f911	Merge branch 'PanQiWei:main' into main	2023-08-17 15:22:43 +09:00
qwopqwop200	5d5b687ca8	qigen formatting qlinear	2023-08-17 15:19:01 +09:00
PanQiWei	34b4ba451c	fix typo	2023-08-13 16:26:02 +08:00
qwopqwop200	a807e038bb	remove many contiguous and change arguments name	2023-08-11 16:09:42 +09:00
qwopqwop200	870be83bea	Merge branch 'PanQiWei:main' into main	2023-08-10 22:48:30 +09:00
qwopqwop200	7ba78af3ae	support cpu	2023-08-10 22:48:04 +09:00
Felix Marty	4af7ea619d	patch for transformers compatiblity	2023-08-09 14:23:59 +00:00
PanQiWei	44c7a1a184	make exllama_kernels compilation as optional	2023-08-09 17:42:22 +08:00
qwopqwop200	2f48780165	fix bug disable exlllama	2023-08-07 16:28:30 +09:00
fxmarty	71f23268eb	Merge pull request #1 from qwopqwop200/exllama-q4-kernel Exllama q4 kernel	2023-08-05 00:15:22 +09:00
Felix Marty	d0608b09db	rocm support	2023-08-04 13:38:02 +00:00
Félix Marty	4fb3e20c5e	Merge branch 'main' into exllama-q4-kernel	2023-08-04 15:13:34 +02:00
qwopqwop200	79ab5076c7	revert fused_llama_attn.py	2023-08-04 18:19:54 +09:00
qwopqwop200	068210d0b7	exllama support flash attention	2023-08-03 16:30:16 +09:00
qwopqwop200	7a7df5655a	support group query attention	2023-08-03 16:08:49 +09:00
leiwang1999	a0de5c2c51	regist buffer of general quant linear	2023-08-03 05:15:09 +00:00
qwopqwop200	3fc097dcd8	change pcak func support only 4 bit	2023-08-01 20:01:45 +09:00
qwopqwop200	a60c9a8552	add pack fun	2023-08-01 12:22:41 +09:00
Felix Marty	339c57a902	fix	2023-07-31 15:57:44 +00:00
Felix Marty	129fa4b67e	act-order now works fine	2023-07-31 15:36:58 +00:00
Felix Marty	38447262c0	fix fused attn	2023-07-31 13:46:32 +00:00
Felix Marty	760667dccc	cleaning	2023-07-31 11:58:10 +00:00
Felix Marty	179776bd1d	exllama kernel	2023-07-31 11:50:45 +00:00
PanQiWei	5883b45d73	fix error raised when cuda kernels are not installed	2023-07-26 13:59:28 +08:00
qwopqwop200	9578c59d31	fix cuda bug	2023-07-25 16:50:05 +09:00
潘其威(William)	046c031139	Merge pull request #141 from AngainorDev/patch-1 Fix error message	2023-06-19 10:11:10 +08:00
Angainor Development	e75611e1b7	Fix error message	2023-06-05 22:19:09 +02:00
lunar	618a5f50ee	Add transpose operator when replace Conv1d with qlinear_cuda_old	2023-06-05 23:11:18 +08:00
qwopqwop200	f4820f2988	change qlinear cuda support 64dim	2023-06-03 07:30:34 +09:00
qwopqwop200	2df7d7105d	support 64 cuda dim	2023-06-02 19:54:37 +09:00
qwopqwop200	b03f53294f	support 64dim cuda	2023-06-02 19:53:50 +09:00
qwopqwop200	0891ea4036	support 32dim triton]	2023-06-02 19:05:55 +09:00
qwopqwop200	b3654a68c3	support 32dim triton kernel	2023-06-02 19:04:12 +09:00
qwopqwop200	0f2841cb13	remove log	2023-05-30 23:51:55 +09:00
qwopqwop200	33809a8e59	remove log	2023-05-30 23:51:39 +09:00
qwopqwop200	dfd9dc0e6b	change if trainable backend pytorch	2023-05-30 23:43:55 +09:00
qwopqwop200	5274313067	change if trainable backend pytorch	2023-05-30 23:40:58 +09:00
PanQiWei	eb9c0b140f	update FusedLlamaMLPForQuantizedModel for general usage purpose	2023-05-27 07:47:20 +08:00
PanQiWei	2b532f9453	add trainable mode	2023-05-26 13:11:30 +08:00
PanQiWei	fe5f5d12ed	Merge branch 'main' into peft_integration	2023-05-26 09:48:06 +08:00
PanQiWei	69609c4bc7	support faster vecquant4matmul cuda kernel	2023-05-26 08:55:05 +08:00
PanQiWei	cfd27e8caa	refactor file structure of qlinears	2023-05-26 07:18:16 +08:00
qwopqwop200	503f85255d	Update kernels.py	2023-05-25 23:15:33 +09:00

1 2

96 commits