本文介绍:
设置之后,代码运行时就仅能看到这个被设置的GPU序号。如宏观逻辑号为1的GPU,设置后,代码运行时cuda:0就会直接将逻辑号为0的GPU定位到真实的1卡上。
就可以直接这么写:device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
import osos.environ['CUDA_VISIBLE_DEVICES'] = '1'CUDA_VISIBLE_DEVICES=1 python run.py多卡的写法就是:CUDA_VISIBLE_DEVICES=0,3,7 python run.py
device="cuda:1"(数字就是GPU序号)
一般来说输入就直接把每个张量都to(device)
模型中,已经注册好的张量,可以直接通过将模型实例to(device)就自动实现转换;而模型中未注册的张量(如在forward()等函数中新建的、辅助模型实现更多操作的张量)
nvidia-smi:查看GPU运行情况输出示例:
Mon Jul 24 12:17:46 2023 +-----------------------------------------------------------------------------+| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A || 64% 76C P2 332W / 350W | 5349MiB / 24576MiB | 69% Default || | | N/A |+-------------------------------+----------------------+----------------------+| 1 NVIDIA GeForce ... On | 00000000:25:00.0 Off | N/A || 76% 66C P2 309W / 350W | 4775MiB / 24576MiB | 99% Default || | | N/A |+-------------------------------+----------------------+----------------------+omit+-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|| 0 N/A N/A 3323021 C python 5346MiB || 1 N/A N/A 3360508 C python 4772MiB |omit+-----------------------------------------------------------------------------+nvidia-smi能输出更多信息,如进程用户、运行命令等安装方式:pip install nvidia-htop
运行代码:nvidia-htop.py(如果加上-l将不限制运行命令长度(意思是一行能输出多少就输出多少,也不是能完整输出运行命令的意思), -c就能用红-黄-绿指示当前GPU占用量)
输出示例:
Mon Jul 24 12:22:34 2023+-----------------------------------------------------------------------------+| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A || 62% 77C P2 338W / 350W | 5349MiB / 24576MiB | 99% Default || | | N/A |+-------------------------------+----------------------+----------------------+| 1 NVIDIA GeForce ... On | 00000000:25:00.0 Off | N/A || 70% 62C P2 312W / 350W | 4775MiB / 24576MiB | 99% Default || | | N/A |+-------------------------------+----------------------+----------------------+omit+-------------------------------------------------------------------------------+| GPU PID USER GPU MEM %CPU %MEM TIME COMMAND || 0 3323021 omit 5346MiB 111 0.5 04:16:09 python -u omit || 1 3360508 omit 4772MiB 115 0.1 03:30:58 python -u omit |omit+-------------------------------------------------------------------------------+输出其实应该是彩色的,但是服务器上卡太多了截图截不全,所以我还是只复制文字了:
1. 查看所有设备的状态:nvitop -1
Thu Aug 03 15:07:47 2023╒═════════════════════════════════════════════════════════════════════════════╕│ NVITOP 1.2.0 Driver Version: 520.61.05 CUDA Driver Version: 11.8 │├───────────────────────────────┬──────────────────────┬──────────────────────┤│ GPU Name Persistence-M│ Bus-Id Disp.A │ Volatile Uncorr. ECC ││ Fan Temp Perf Pwr:Usage/Cap│ Memory-Usage │ GPU-Util Compute M. │╞═══════════════════════════════╪══════════════════════╪══════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════╕│ 0 GeForce RTX 3090 On │ 00000000:01:00.0 Off │ N/A │ MEM: ███████████████▏ 17.4% ││ 37% 50C P2 117W / 350W │ 4278MiB / 24.00GiB │ 0% Default │ UTL: ▏ 0% │├───────────────────────────────┼──────────────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────┤│ 1 GeForce RTX 3090 On │ 00000000:25:00.0 Off │ N/A │ MEM: █████████████████████████████████████████▋ 47.9% ││ 41% 45C P2 106W / 350W │ 11760MiB / 24.00GiB │ 0% Default │ UTL: ▏ 0% │╘═══════════════════════════════╧══════════════════════╧══════════════════════╧══════════════════════════════════════════════════════════════════════════════════════════════════╛[ CPU: █████████████████▌ 13.7% UPTIME: 9.0 days ] ( Load Average: 34.97 59.87 81.07 )[ MEM: ████████████████████████████████▎ 25.2% USED: 185.4GiB ] [ SWP: █▋ 7.3% ]╒════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕│ Processes: wanghuijuan@zju ││ GPU PID USER GPU-MEM %SM %CPU %MEM TIME COMMAND │╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡│ 0 4083851 C user1 4274MiB 0 102.2 0.0 20:47:34 Zombie Process │├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤│ 1 900475 C user2 11756MiB 1 103.7 2.4 46:16:48 python run.py │╘════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛import gcdel objgc.collect()with torch.no_grad()(在循环语句里面正常运算即可),可以有效降低梯度占据的内存(梯度可以占相当大的一部分)。如果仅不需要积累特定张量的梯度,可以将对应张量的requires_grad属性置False。(这是没有注册的张量的默认属性)(注意,仅使用model.eval()不能达到这个效果)torch.cuda.empty_cache()(官方文档:torch.cuda.empty_cache — PyTorch 1.11.0 documentation[4],此外可参考【pytorch】torch.cuda.empty_cache()==>释放缓存分配器当前持有的且未占用的缓存显存_马鹏森的博客-CSDN博客_empty_cache[5])1. 官方笔记 CUDA semantics - Memory management[6]
2. 科普帖:深度学习中GPU和显存分析 - 知乎[7]
3. Transformer性能优化:运算和显存 - 知乎[8]
[1] peci1/nvidia-htop: A tool for enriching the output of nvidia-smi.: https://github.com/peci1/nvidia-htop[2] XuehaiPan/nvitransform: translateY( An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process managem)ent.: https://github.com/XuehaiPan/nvitop[3] Linux查看Nvidia显卡型号\_linux查看显卡型号-CSDN博客: https://blog.csdn.net/qq_28790663/article/details/123741068[4] torch.cuda.empty\_cache — PyTorch 1.11.0 documentation: https://pytorch.org/docs/stable/generated/torch.cuda.empty_cache.html[5] 【pytorch】torch.cuda.empty\_cache()==>释放缓存分配器当前持有的且未占用的缓存显存\_马鹏森的博客-CSDN博客\_empty\_cache: https://blog.csdn.net/weixin_43135178/article/details/117906219[6] CUDA semantics - Memory management: https://pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management[7] 科普帖:深度学习中GPU和显存分析 - 知乎: https://zhuanlan.zhihu.com/p/31558973[8] Transformer性能优化:运算和显存 - 知乎: https://zhuanlan.zhihu.com/p/474554018[9] 使用CUDA\_VISIBLE\_DEVICES设置显卡\_华科附小第一名的博客-CSDN博客\_cuda\_visible\_devices: https://blog.csdn.net/qq_43307074/article/details/127659967
