サーベイ: GPUメモリ管理の実行時最適化による大規模深層学習の高速化 (2018)

@article{伊藤祐貴2018gpu,
  title={GPU メモリ管理の実行時最適化による大規模深層学習の高速化},
  author={伊藤祐貴 and 今井晴基 and 根岸康 and 河内谷清久仁 and 松宮遼 and 遠藤敏夫 and others},
  journal={研究報告ハイパフォーマンスコンピューティング (HPC)},
  volume={2018},
  number={30},
  pages={1--9},
  year={2018}
}

https://ipsj.ixsq.nii.ac.jp/ej/?action=repository_action_common_download&item_id=190671&item_no=1&attribute_id=1&file_no=1

概要

背景

学習モデルのbackward処理において、forward時の情報が必要(特徴マップのマスク情報など)
forward時の情報をCPU上に保存する(data-swapping)/復元する(recompute)/GPUメモリ上に保存する(keep)の3通りの方針がある

(図は今井, 根岸, and 河内谷清久二. "TensorFlow で大規模ニューラルネットワークを学習する手法の考察." 日本ソフトウェア科学会大会論文集 36 (2019): 83-90.より)

どんなもの?

学習を数イテレーション実行して得られたプロファイルから、どの層でdata-swappingするか、recomputeするか、keepするかを貪欲法により探索する
適切に層ごとにどの処理をするかを与えることで
- メモリ使用量を抑えながら
- CPU-GPU間通信や、recomputeのオーバーヘッドを抑えることに成功した

先行研究と比べてどこがすごい?

動的にどの処理をすべきかを"動的に"決めることができる
data-swappingとrecomputeを同時に考慮することができる

技術や手法のキモはどこ?

プロファイルや考察に基づく貪欲法により、巨大な探索空間でも良さげな解を見つける
貪欲法はkeep → data-swappingの変更箇所の特定、data-swapping → recomputeの置き換え場所の特定

どうやって有効だと検証した?

提案手法と、全ての層でdata-swappingを行う場合や、既存手法(superneurous)で
- 1学習イテレーションあたりのバッチサイズ/実行時間のスループットの向上を確認
- モデルはresnet50, googlenet(869), alexnet(2688)

議論はある?

貪欲法による探索なので最適解と比較してどれほどの解であるのか分からないし、それについて報告されていない
モデルの巨大化に伴って探索領域が増加するので、どのほどのモデルサイズであればこの手法で良い解が見つけられるか...
同じパターンを繰り返すようなモデルアーキテクチャの場合は効率的な探索方法がありあそう

参考

今井, 根岸, and 河内谷清久二. "TensorFlow で大規模ニューラルネットワークを学習する手法の考察." 日本ソフトウェア科学会大会論文集 36 (2019): 83-90.

次に読むべき論文は?

data-swapping
- (2016, 301 cited) Rhu, Minsoo, et al. "vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design." 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2016.
- (2017, 62 cited) Meng, Chen, et al. "Training deeper models by GPU memory optimization on TensorFlow." Proc. of ML Systems Workshop in NIPS. Vol. 7. 2017.
recompute
- (2016, 411 cited) Chen, Tianqi, et al. "Training deep nets with sublinear memory cost." arXiv preprint arXiv:1604.06174 (2016).
optimization of saving memory strategy for activation in forward
- (2018, 161 cited) Wang, Linnan, et al. "Superneurons: Dynamic GPU memory management for training deep neural networks." Proceedings of the 23rd ACM SIGPLAN symposium on principles and practice of parallel programming. 2018.