AutoGPTQ

Author	SHA1	Message	Date
潘其威(William)	17db71491f	Update README.md merge the example code of downloading from and uploading to HF Hub into simplest usage code above to keep README compact.	2023-05-30 05:49:29 +08:00
TheBloke	b7bb50b4d5	Fix bug added after merge	2023-05-25 07:05:51 +01:00
Tom Jobbins	492255b400	Merge branch 'main' into TheBloke_support-HF-download	2023-05-25 07:02:13 +01:00
潘其威(William)	18c7ce5875	Merge pull request #100 from PanQiWei/improve_cpu_offload Improve CPU offload	2023-05-24 18:48:37 +08:00
PanQiWei	c341a6df2f	update tutorial	2023-05-24 18:48:19 +08:00
PanQiWei	ac14180946	update tutorial	2023-05-24 18:31:59 +08:00
PanQiWei	065fd1de35	update README	2023-05-24 18:26:47 +08:00
PanQiWei	e6ba062c08	update basic usage example code	2023-05-24 17:58:01 +08:00
PanQiWei	94ef4d5ada	update basic usage example code	2023-05-24 17:56:46 +08:00
PanQiWei	c89bb6450c	correct typo of function name	2023-05-24 17:43:38 +08:00
PanQiWei	10347fdd7b	remove full_cpu_offload argument and unify model dispatch strategy	2023-05-24 17:41:04 +08:00
PanQiWei	379f24c2a5	remove add_align_logits_hook_to_model	2023-05-24 17:01:57 +08:00
PanQiWei	749dba1a7e	disable add_align_logits_hook_to_model for now	2023-05-24 13:42:06 +08:00
PanQiWei	58c1b509f0	support add_align_logits_hook_to_model	2023-05-24 12:50:30 +08:00
PanQiWei	21ab7c435a	make comments more readable	2023-05-24 11:38:29 +08:00
PanQiWei	c31b370228	make_sure_not_tensor_in_meta_device before load checkpoint	2023-05-24 11:32:45 +08:00
PanQiWei	63f1b4e073	remove comment	2023-05-24 11:23:07 +08:00
PanQiWei	057c39e3f2	fix meta device bug when use low_cpu_mem_usage	2023-05-24 11:19:59 +08:00
PanQiWei	e2e7809a1f	always to enable QuantLinear bias to make compatible with model quantized from other frameworks	2023-05-24 10:56:31 +08:00
PanQiWei	8e034b28bc	remove duplicate code	2023-05-23 23:48:15 +08:00
PanQiWei	4373d6b29c	Merge branch 'main' into improve_cpu_offload	2023-05-23 23:47:33 +08:00
PanQiWei	191da8141e	fix device mismatch	2023-05-23 23:22:52 +08:00
PanQiWei	e4e90e8b0a	add warmup_triton method	2023-05-23 23:18:46 +08:00
PanQiWei	ed14d3a786	fix save quantized model failed when load pretrained model using CPU offload	2023-05-23 23:17:11 +08:00
潘其威(William)	7820322089	Merge pull request #66 from LexSong/main Fix CUDA out of memory error in qlinear_old.py	2023-05-23 23:04:45 +08:00
PanQiWei	6476ee4235	add options: 'low_cpu_mem_usage' and 'full_cpu_offload'	2023-05-23 22:51:00 +08:00
PanQiWei	c63959365a	update setup.py	2023-05-23 19:30:47 +08:00
PanQiWei	1b2159bd4c	add more help functions	2023-05-23 19:30:28 +08:00
PanQiWei	db63c0876a	half out	2023-05-23 16:08:28 +08:00
潘其威(William)	1bb7be3dd3	Update issue templates	2023-05-23 15:55:48 +08:00
潘其威(William)	a85d65e915	Update issue templates	2023-05-23 15:53:07 +08:00
Lex Song	f2ab4fab46	Fix CUDA out of memory error in qlinear_old.py Add a missing line from qlinear.py to qlinear_old.py to convert the output tensor. This resolves a CUDA out of memory error that occurred without this line.	2023-05-20 21:10:11 +08:00
TheBloke	bf633c298e	Clean up some unused params	2023-05-20 10:32:27 +01:00
潘其威(William)	d4011d29c6	Merge pull request #92 from PanQiWei/fix_triton_integration_bugs fix ImportError when triton is not installed	2023-05-20 17:01:14 +08:00
潘其威(William)	809efa6fcb	Update README_zh.md	2023-05-20 16:53:27 +08:00
潘其威(William)	082e76713e	Update README.md	2023-05-20 16:52:43 +08:00
潘其威(William)	0ca1752a9b	Merge pull request #93 from TheBloke/TheBloke_rename-quant_cuda2 Rename 'quant_cuda' to 'autogptq_cuda' to avoid conflicts with existing GPTQ-for-LLaMa installations.	2023-05-20 16:44:02 +08:00
PanQiWei	b803369719	update quant_with_alpaca.py	2023-05-20 16:43:21 +08:00
PanQiWei	f78f074409	update quant_with_alpaca.py	2023-05-20 16:42:34 +08:00
TheBloke	898f1ef62d	Rename 'quant_cuda' to 'autogptq_cuda' to avoid conflicts with existing GPTQ-for-LLaMa installations.	2023-05-20 09:33:51 +01:00
PanQiWei	73b5952f5e	fix not return directly when triton is not installed	2023-05-20 16:21:52 +08:00
PanQiWei	86b3b52c63	fix ImportError when triton is not installed	2023-05-20 16:15:20 +08:00
潘其威(William)	13defe253a	Merge pull request #84 from TheBloke/TheBloke_forward-positional-args Forward position args to allow `model(tokens)` syntax	2023-05-20 15:04:27 +08:00
潘其威(William)	d0b7908a2c	Merge pull request #82 from Ph0rk0z/patch-1 Update example script to include desc_act	2023-05-20 15:03:18 +08:00
潘其威(William)	1ef0af824a	Merge pull request #80 from PanQiWei/user_customized_device_map Support users customize `device_map`	2023-05-20 15:00:05 +08:00
TheBloke	277a007ebc	Minor clarification and clean up of example script	2023-05-19 18:33:19 +01:00
TheBloke	e5c8479100	Remove debugging print line	2023-05-19 17:50:48 +01:00
TheBloke	c234bf11f9	Update README with examples for HF (Chinese text is from Google Translate - please check! :) )	2023-05-19 17:39:49 +01:00
TheBloke	735f7df4cc	Add push_to_hub for HF hub uploading	2023-05-19 17:10:57 +01:00
TheBloke	908b338436	Initial support for model loading from HF hub	2023-05-19 15:57:05 +01:00

1 2 3 4 5 ...

325 commits