AutoGPTQ

Author	SHA1	Message	Date
潘其威(William)	1e353a8dc5	Merge pull request #24 from PanQiWei/speedup_quantization Offloading and Multiple devices quantization/inference	2023-04-28 18:50:12 +08:00
PanQiWei	bdb713b5a3	add batch_size to model.quant() api	2023-04-28 18:26:07 +08:00
PanQiWei	3dfc87bec3	return module in .to function	2023-04-28 17:20:46 +08:00
PanQiWei	a69a73a22c	fix device mismatch when directly using model to inference after quantization	2023-04-28 16:41:46 +08:00
qwopqwop200	3f90a22632	fix bug	2023-04-28 08:26:58 +09:00
PanQiWei	d0cd5af5d3	make code more robust	2023-04-28 01:29:12 +08:00
PanQiWei	51d2e53130	add support to cpu offloading and multi gpus inference on quantized model	2023-04-28 00:53:57 +08:00
PanQiWei	b14dca9207	disk offload assertion	2023-04-27 21:31:53 +08:00
PanQiWei	7a3397e7ba	add cpu offload when doing quantization	2023-04-27 21:25:24 +08:00
PanQiWei	498de923f2	support multi gpus quantization	2023-04-27 18:48:43 +08:00
qwopqwop200	8b6ee04aee	add option	2023-04-27 17:29:36 +09:00
PanQiWei	a2abff983e	support dispatch layers to different devices when loading pretrained model before quantization	2023-04-27 02:24:08 +08:00
PanQiWei	950f203260	add 'n_positions' to sequence length search list	2023-04-27 01:09:10 +08:00
PanQiWei	893c3264cb	make layer ignorance more robust	2023-04-26 19:35:19 +08:00
PanQiWei	f2359f56cb	add support to use push_to_hub to upload and share quantized model	2023-04-26 16:55:01 +08:00
PanQiWei	975f100d0f	init Quantizer() at GPTQ() init stage	2023-04-25 23:13:09 +08:00
PanQiWei	062b34f31a	add inference_mode and autocast context manager to generate function	2023-04-25 20:47:33 +08:00
PanQiWei	31d683f85b	add option to choose whether autotune warmup or not after quantization	2023-04-25 20:29:05 +08:00
PanQiWei	9c405b1628	add triton support	2023-04-25 20:05:22 +08:00
PanQiWei	832dc4a7a1	refactor file structure	2023-04-25 18:58:20 +08:00
PanQiWei	419160b733	always trust remote code	2023-04-25 12:52:49 +08:00
PanQiWei	f748dad2e1	always trust remote code	2023-04-25 12:13:46 +08:00
PanQiWei	7d3a625cee	fix mismatch GPTNeoxForCausalLM's lm_head	2023-04-24 20:51:56 +08:00
PanQiWei	1a8c460262	fix problem that some models required more positional arguments in transformer layer's forward function	2023-04-24 14:46:21 +08:00
PanQiWei	37c0a80092	fix problem that some models can't get seqlen from model.config.max_position_embeddings	2023-04-24 14:24:00 +08:00
PanQiWei	4763c0b9a1	fix bugs	2023-04-23 19:27:16 +08:00
PanQiWei	a830a62bc3	fix bugs for attention_mask and position_ids	2023-04-20 18:32:21 +08:00
PanQiWei	bcc7e0a051	make BaseGPTQForCausalLM as nn.Module, add more shortcut apis and fix some bugs	2023-04-17 01:15:30 +08:00
PanQiWei	969ec250ad	add shortcut to model.to method	2023-04-17 00:34:14 +08:00
PanQiWei	12ae4d024c	fix gptj forward and add torch.no_grad context manager	2023-04-17 00:15:41 +08:00
PanQiWei	87402e01c9	fix some errors at quantized model loading	2023-04-16 22:32:13 +08:00
PanQiWei	036a151f10	fix save_dir not exist	2023-04-16 22:20:54 +08:00
PanQiWei	42b28f032f	shortcut model.config	2023-04-16 22:12:42 +08:00
PanQiWei	0c4b039925	add setup.py	2023-04-16 21:06:15 +08:00
PanQiWei	229b61e20e	first init	2023-04-14 01:09:40 +08:00

35 commits