Commit graph

325 commits

Author SHA1 Message Date
潘其威(William)
17db71491f
Update README.md
merge the example code of downloading from and uploading to HF Hub into simplest usage code above to keep README compact.
2023-05-30 05:49:29 +08:00
TheBloke
b7bb50b4d5 Fix bug added after merge 2023-05-25 07:05:51 +01:00
Tom Jobbins
492255b400
Merge branch 'main' into TheBloke_support-HF-download 2023-05-25 07:02:13 +01:00
潘其威(William)
18c7ce5875
Merge pull request #100 from PanQiWei/improve_cpu_offload
Improve CPU offload
2023-05-24 18:48:37 +08:00
PanQiWei
c341a6df2f update tutorial 2023-05-24 18:48:19 +08:00
PanQiWei
ac14180946 update tutorial 2023-05-24 18:31:59 +08:00
PanQiWei
065fd1de35 update README 2023-05-24 18:26:47 +08:00
PanQiWei
e6ba062c08 update basic usage example code 2023-05-24 17:58:01 +08:00
PanQiWei
94ef4d5ada update basic usage example code 2023-05-24 17:56:46 +08:00
PanQiWei
c89bb6450c correct typo of function name 2023-05-24 17:43:38 +08:00
PanQiWei
10347fdd7b remove full_cpu_offload argument and unify model dispatch strategy 2023-05-24 17:41:04 +08:00
PanQiWei
379f24c2a5 remove add_align_logits_hook_to_model 2023-05-24 17:01:57 +08:00
PanQiWei
749dba1a7e disable add_align_logits_hook_to_model for now 2023-05-24 13:42:06 +08:00
PanQiWei
58c1b509f0 support add_align_logits_hook_to_model 2023-05-24 12:50:30 +08:00
PanQiWei
21ab7c435a make comments more readable 2023-05-24 11:38:29 +08:00
PanQiWei
c31b370228 make_sure_not_tensor_in_meta_device before load checkpoint 2023-05-24 11:32:45 +08:00
PanQiWei
63f1b4e073 remove comment 2023-05-24 11:23:07 +08:00
PanQiWei
057c39e3f2 fix meta device bug when use low_cpu_mem_usage 2023-05-24 11:19:59 +08:00
PanQiWei
e2e7809a1f always to enable QuantLinear bias to make compatible with model quantized from other frameworks 2023-05-24 10:56:31 +08:00
PanQiWei
8e034b28bc remove duplicate code 2023-05-23 23:48:15 +08:00
PanQiWei
4373d6b29c Merge branch 'main' into improve_cpu_offload 2023-05-23 23:47:33 +08:00
PanQiWei
191da8141e fix device mismatch 2023-05-23 23:22:52 +08:00
PanQiWei
e4e90e8b0a add warmup_triton method 2023-05-23 23:18:46 +08:00
PanQiWei
ed14d3a786 fix save quantized model failed when load pretrained model using CPU offload 2023-05-23 23:17:11 +08:00
潘其威(William)
7820322089
Merge pull request #66 from LexSong/main
Fix CUDA out of memory error in qlinear_old.py
2023-05-23 23:04:45 +08:00
PanQiWei
6476ee4235 add options: 'low_cpu_mem_usage' and 'full_cpu_offload' 2023-05-23 22:51:00 +08:00
PanQiWei
c63959365a update setup.py 2023-05-23 19:30:47 +08:00
PanQiWei
1b2159bd4c add more help functions 2023-05-23 19:30:28 +08:00
PanQiWei
db63c0876a half out 2023-05-23 16:08:28 +08:00
潘其威(William)
1bb7be3dd3
Update issue templates 2023-05-23 15:55:48 +08:00
潘其威(William)
a85d65e915
Update issue templates 2023-05-23 15:53:07 +08:00
Lex Song
f2ab4fab46 Fix CUDA out of memory error in qlinear_old.py
Add a missing line from qlinear.py to qlinear_old.py to convert the output tensor.
This resolves a CUDA out of memory error that occurred without this line.
2023-05-20 21:10:11 +08:00
TheBloke
bf633c298e Clean up some unused params 2023-05-20 10:32:27 +01:00
潘其威(William)
d4011d29c6
Merge pull request #92 from PanQiWei/fix_triton_integration_bugs
fix ImportError when triton is not installed
2023-05-20 17:01:14 +08:00
潘其威(William)
809efa6fcb
Update README_zh.md 2023-05-20 16:53:27 +08:00
潘其威(William)
082e76713e
Update README.md 2023-05-20 16:52:43 +08:00
潘其威(William)
0ca1752a9b
Merge pull request #93 from TheBloke/TheBloke_rename-quant_cuda2
Rename 'quant_cuda' to 'autogptq_cuda' to avoid conflicts with existing GPTQ-for-LLaMa installations.
2023-05-20 16:44:02 +08:00
PanQiWei
b803369719 update quant_with_alpaca.py 2023-05-20 16:43:21 +08:00
PanQiWei
f78f074409 update quant_with_alpaca.py 2023-05-20 16:42:34 +08:00
TheBloke
898f1ef62d Rename 'quant_cuda' to 'autogptq_cuda' to avoid conflicts with existing GPTQ-for-LLaMa installations. 2023-05-20 09:33:51 +01:00
PanQiWei
73b5952f5e fix not return directly when triton is not installed 2023-05-20 16:21:52 +08:00
PanQiWei
86b3b52c63 fix ImportError when triton is not installed 2023-05-20 16:15:20 +08:00
潘其威(William)
13defe253a
Merge pull request #84 from TheBloke/TheBloke_forward-positional-args
Forward position args to allow `model(tokens)` syntax
2023-05-20 15:04:27 +08:00
潘其威(William)
d0b7908a2c
Merge pull request #82 from Ph0rk0z/patch-1
Update example script to include desc_act
2023-05-20 15:03:18 +08:00
潘其威(William)
1ef0af824a
Merge pull request #80 from PanQiWei/user_customized_device_map
Support users customize `device_map`
2023-05-20 15:00:05 +08:00
TheBloke
277a007ebc Minor clarification and clean up of example script 2023-05-19 18:33:19 +01:00
TheBloke
e5c8479100 Remove debugging print line 2023-05-19 17:50:48 +01:00
TheBloke
c234bf11f9 Update README with examples for HF (Chinese text is from Google Translate - please check! :) ) 2023-05-19 17:39:49 +01:00
TheBloke
735f7df4cc Add push_to_hub for HF hub uploading 2023-05-19 17:10:57 +01:00
TheBloke
908b338436 Initial support for model loading from HF hub 2023-05-19 15:57:05 +01:00