Update 02-Advanced-Model-Loading-and-Best-Practice.md

2023-05-12 19:47:05 +08:00 · 2023-05-12 19:47:05 +08:00 · 6f887f666a
commit 6f887f666a
parent 393a2fbac2
1 changed files with 2 additions and 2 deletions
--- a/docs/tutorial/02-Advanced-Model-Loading-and-Best-Practice.md
+++ b/docs/tutorial/02-Advanced-Model-Loading-and-Best-Practice.md
@ -59,7 +59,7 @@ In the simplest way, you can set `device_map='auto'` and let 🤗 Accelerate han
 ### At Quantization
 It's always recommended to first consider loading the whole model into GPU(s) for it can save the time spend on transferring module's weights between CPU and GPU.

-However, not everyone have large GPU memory. Roughly speaking, always specify the maximum memory CPU will be used to load model, then, for each GPU, you can preserve memory that can fit in 1~2(2~3 for the first GPU incase CPU offload used) model layers for examples' tensors and calculations in quantization, and load model weights using all others left. By this, all you need to do is a simple math based on the number of GPUs you have, the size of model weights file(s) and the number of model layers.
+However, not everyone have large GPU memory. Roughly speaking, always specify the maximum memory CPU will be used to load model, then, for each GPU, you can preserve memory that can fit in 1\~2(2\~3 for the first GPU incase CPU offload used) model layers for examples' tensors and calculations in quantization, and load model weights using all others left. By this, all you need to do is a simple math based on the number of GPUs you have, the size of model weights file(s) and the number of model layers.

 ### At Inference
 For inference, following this principle: always using single GPU if you can, otherwise multiple GPUs, CPU offload is the last one to consider.