dannのブックマーク / 2020年11月12日

dann id:dann

2020年11月12日のブックマーク (9件)

https://slurm.schedmd.com/SLUG19/NVIDIA_Containers.pdf
dann 2020/11/12
slurm

container

docker

mpi

nvidia

jobscheduler
リンク
https://indico.cern.ch/event/757415/contributions/3421576/attachments/1856070/3048604/Solving_Problems_in_HPC_with_Singularity.pdf
dann 2020/11/12
slurm

singularity

k8s
リンク
Slurmでpytorch distributed trainingをする - Qiita
Introduction 環境 slurm 18.08 pytorch 1.3 What is Slurm? Slurmは、スパコンやコンピュータクラスタなどで主に科学技術計算用途で用いられるジョブスケジューラの一種。SGE, Torque, LSFなどを使ったことがあれば同様のものと思ってもらっていい。私は過去、SGEとLSFは使ったことがあるが、簡単にSlurmのいいところをあげると srunが便利（submit用のscriptを作らなくても、インタラクティブにコマンドを実行できる） GPUのリソース管理ができる（GPUを使用するプログラムで排他的にDeviceを確保できる）複数ノード・複数プロセスでの並列実行のサポートが充実している。今回の話は３つ目の特徴について。 What is PyTorch? Facebookが開発したDeep learningのフレームワーク。なぜS
dann 2020/11/12
horovod

slurm

openmpi

mpi

jobscheduler

pytorch
リンク
Install horovod for PyTorch to work with slurm on ABCI · NLPer
dann 2020/11/12
horovod

slurm

jobscheduler

pytorch
リンク
Research Center for Advanced Computing Infrastructure: ジョブスクリプト例
dann 2020/11/12
pbspro
リンク
How to Schedule Machine Learning Workloads Nicely In Kubernetes #CNDT2020 / Cloud Native Days Tokyo 2020
Kubernetesにおける機械学習(バッチ)ジョブのスケジューリングについては世界中で様々な取り組みがなされており、OSSも複数公開されています。Kubernetes 本体においてもsig-schedulingにおいて、kube-scheduler(デフォルトスケジューラ)の柔軟性・拡張性を高めるべく開発が推進されています。本公演ではそうした取組やOSSを紹介すると共に、Kubernetesクラスタにおいて機械学習ジョブをうまくスケジューリングするために考慮すべきポイント、それらがどのように実現されるかを解説します。Read less
dann 2020/11/12
k8s
リンク
Kubernetesによる機械学習基盤への挑戦
2018年12月4日　Japan Container Days　講演資料谷脇大輔 Preferred Networksでは1000個以上のGPUとInfiniBandからなるオンプレミスのクラスターを自社で構築しており、研究者が様々な目的、リソース量、実行時間の機械学習ジョブをKubernetes上で実行して研究成果を出しています。 KubernetesはKubeflowの登場など、機械学習基盤としても非常に注目されている一方で、現実的には未だ発展途上です。講演では機械学習基盤としてのKubernetesの導入理由、その実用性と将来性、Preferred Networksの挑戦についてお話ししました。Read less
dann 2020/11/12
k8s

pfn
リンク
Set up Message Passing Interface (MPI) for HPC - Azure Virtual Machines - Azure Virtual Machines
Applies to: ✔️ Linux VMs ✔️ Windows VMs ✔️ Flexible scale sets ✔️ Uniform scale sets The Message Passing Interface (MPI) is an open library and defacto standard for distributed memory parallelization. It's commonly used across many HPC workloads. HPC workloads on the RDMA capable HB-series and N-series VMs can use MPI to communicate over the low latency and high bandwidth InfiniBand network. The S
dann 2020/11/12
openmpi

intelmpi

hpc-x

mellanox
リンク
GitHub - stern/stern: ⎈ Multi pod and container log tailing for Kubernetes -- Friendly fork of https://github.com/wercker/stern
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
dann 2020/11/12
k8s
リンク
- 2020年11月13日
- 2020年11月12日
- 2020年11月11日