[B! cuda] serihiroのブックマーク

プログラムを高速化する話Ⅱ 〜GPGPU編〜

GPUを利用して汎用演算を行う技術であるGPGPUを用いて、プログラムを高速化する技法についてまとめました。高速化の具体例も適宜用いて解説しています。 CPU編はこちら https://www.slideshare.net/KMC_JP/ss-45855264

serihiro 2020/06/01

cuda

リンク

GPU コンピューティング研究会 English 連絡先講習会 Home > 講習会現在参加募集中の講習会 2010年9月13日（月）第8回GPUコンピューティング (CUDA) 講習会過去の講習会・講演会 2010年8月2日（月）第7回GPUコンピューティング (CUDA) 講習会 2010年6月28日（月）第6回GPUコンピューティング (CUDA) 講習会 2010年4月28日（水）第5回GPUコンピューティング (CUDA) 講習会 2010年3月19日（金）第4回GPUコンピューティング (CUDA) 講習会 2009年11月25日（水）第3回GPUコンピューティング (CUDA) 講習会 2009年10月28日（水）第2回GPUコンピューティング (CUDA) 講習会 2009年9月28日（月）第1回GPUコンピューティング (CUDA) 講習会 Homeへ戻るページト

serihiro 2018/12/12

cuda

リンク

CUDA C++ Best Practices Guide

CUDA C++ Best Practices Guide The programming guide to using the CUDA Toolkit to obtain the best performance from NVIDIA GPUs. 1. Preface 1.1. What Is This Document? This Best Practices Guide is a manual to help developers obtain the best performance from NVIDIA® CUDA® GPUs. It presents established parallelization and optimization techniques and explains coding metaphors and idioms that can grea

serihiro 2018/11/08

cuda

リンク

【CUDA】SM，Warp，Occupancyなどの概念 - 緑茶思考ブログ

CUDAを勉強しようとして，まずつまずくのが， ThreadやBlock, Gridなどのソフトウェア上の概念と， Streaming Multiprocessor(SM)，CUDA Core，Warp, Occupancyなどのハードウェアの概念がごっちゃになる点だ．今回はじめて理解できた（気がする）ので，忘れないうちに書き残しておく．ソフト的（と思われる）概念 Thread デバイス上のプログラムが動くときの最小単位非同期に動く Block Threadをまとめたもの 3次元表現 Grid Blockをまとめたものハード的（と思われる）概念 CUDA Core 実際にThreadが動く部分 CUDA Core以上にThreadを生成する利点は，GPUとGPU側のDRAM間のメモリアクセスの遅延の隠蔽にある SM 上のBlockはハードではSMにあたる 1つのSMで実行されるBl

serihiro 2018/11/08

cuda

リンク

http://nkl.cc.u-tokyo.ac.jp/seminars/multicore/oacc-05.pdf

serihiro 2018/11/08

cuda

リンク

GPGPU（GPUプログラミング概要）

ホスト側とGPUボードは PCI バスを通してデータの交換を行います。 GPU内に転送速度は非常に速いが、メモリ・アクセスの遅延（レイテンシ）の大きな「デバイスメモリ」を有します。現在の実装では、数 GB オーダーの容量を有します。デバイスメモリとGPUの演算装置（「Streaming Multiprocessor (SM)」と言う。あるいは単に Multiprocessor と言うこともある。）の間に、ソフトウェアで管理できるキャッシュ(Shared Memory)とハードウェアで制御できる L1/L2キャッシュ（Fermi以降）が存在する。 Streaming Multiprocessor (SM) は、演算器の集まり（クラスタ）であり、この演算器の最小単位は、Streaming Processor（SP）、あるいは CUDA coreと称される。一つの SM の中に 8個の SP

serihiro 2018/11/08

cuda

リンク

1. Introduction — CUDA C Programming Guide

CUDA C++ Programming Guide The programming guide to the CUDA model and interface. Changes from Version 12.3 Added section Asynchronous Data Copies using Tensor Memory Access (TMA). Added Unified Memory Programming guide supporting Grace Hopper with Address Translation Service (ATS) and Heterogeneous Memory Management (HMM ) on x86. 1. Introduction 1.1. The Benefits of Using GPUs The Graphics Pro

serihiro 2018/10/09

cuda

リンク

An Introduction to CUDA-Aware MPI | NVIDIA Technical Blog

MPI, the Message Passing Interface, is a standard API for communicating data via messages between distributed processes that is commonly used in HPC to build applications that can scale to multi-node computer clusters. As such, MPI is fully compatible with CUDA, which is designed for parallel computing on a single computer or node. There are many reasons for wanting to combine the two parallel pro

serihiro 2018/08/02

cuda

リンク

ニューラルネットワークの演算量を計測する

Idein エンジニアの打田です。今回はパフォーマンスカウンタを利用してニューラルネットワークの演算量を計測してみたので方法を共有したいと思います。昨今は大規模なネットワークを学習するために GPU クラスタが利用されはじめていたり、推論についても GPU やディープニューラルネットワーク用のプロセッサなどの開発であったりと、ニューラルネットワークの演算負荷を効率よく処理するニーズは高まっています。さてしかし、ニューラルネットワークの演算負荷というのはどのくらいのものなのでしょうか、あるプロセッサが利用できるときにどのニューラルネットワークがどれくらいで実行できるのか、あるいは、ある学習済みニューラルネットワークがあるときにどのプロセッサならどのくらいの時間で実行できるのか、プロセッサの性能とモデルの規模の関係に、おおよそのあたりがつけられるとニューラルネットワークの演算負荷というも

serihiro 2018/07/02

リンク

NVIDIA Collective Communications Library (NCCL)

NVIDIA NCCL The NVIDIA Collective Communication Library (NCCL) implements multi-GPU and multi-node communication primitives optimized for NVIDIA GPUs and Networking. NCCL provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter as well as point-to-point send and receive that are optimized to achieve high bandwidth and low latency over PCIe and NVLink high-speed interconn

serihiro 2018/04/19

リンク

GitHub - NVIDIA/nccl: Optimized primitives for collective multi-GPU communication

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

serihiro 2018/04/19

リンク

Bytedeco - Home

Bytedeco makes native libraries available to the Java platform by offering ready-to-use bindings generated with the codeveloped JavaCPP techno logy. This, we hope, is the missing bridge between Java and C/C++, bringing compute-intensive science, multimedia, computer vision, deep learning, etc to the Java platform. Core Techno logies JavaCPP [API] – A tool that can not only generate JNI code but also

serihiro 2018/03/25

java
cuda

リンク

CUDA 9 Unsupported Visual Studio Version Error

Today I installed CUDA 9 with the Visual Studio 2017 integration. When creating a new CUDA 9 project and building, I got the error: Error C1189 #error: – unsupported Microsoft Visual Studio version! Only the versions 2012, 2013, 2015 and 2017 are supported! With some debugging, I found that on line 131 of file host_config.h in directory “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\incl

serihiro 2018/03/25

cuda

リンク

p2 インスタンスへの TensorFlow 導入 - Qiita

はじめにこの記事は AWS の GPU マシンのなかで最も性能の高い p2 系のマシンでの、 TensorFlow の環境構築のメモです。最終的に AWS Step Function と AWS Lambda を組み合わせて、ここで作ったマシンをスポットインスタンスとして立ち上げ自動で学習を回すことを目指しています。 ※ p2 インスタンスは現在東京リージョンには導入されていません。オレゴン・バージニアなどのリージョンを使ってください。イメージの選定 TensorFlow 公式ドキュメントを読むと Linux の場合 Ubuntu をベースに書かれているようですので、 Ubuntu のインスタンスを選びました。 Ubuntu Server 16.04 LTS (HVM), SSD Volume Type 予め CUDA 環境などが入っている Amazon Linux の AMI 等

serihiro 2018/03/10

リンク

CUDAとOpenCLどっちがいいの？ - Qiita

TL;DR: そもそも単純に比べんな。ナイフとノコギリがどっちがいいかなんて一概には言えないだろう？ Twitterに書いたら思ったより反応されてるので、もうちょっと解説も兼ねて書いておきます。なお、この話はtweetにもある通り某所で発表したやつの公開版です。前にも観たって方は内緒にしておいてください。あと、若干、個人的な偏見を含んでいるかもしれませんが、そのあたりは頑張ってフィルターかけてください。 GeForceやTeslaといった、NVIDIA社のGPUでGPGPUしたい人がまず使うプログラミング環境。実質的にGPGPU界の頂点であり最強であることは否定できません。ただし、NVIDIAがベンダーロックしていて、標準化はされていません（一応、CUDAのモデルはロイヤリティーフリーで使っても良い）。 CUDA Cという独自拡張されたC言語で、デバイスとホストを同じ.cuファイルに

serihiro 2018/02/08

cuda
opencl

リンク

CuPy

NumPy/SciPy-compatible Array Library for GPU-accelerated Computing with Python High performance with GPU CuPy is an open-source array library for GPU-accelerated computing with Python. CuPy utilizes CUDA Toolkit libraries including cuBLAS, cuRAND, cuSOLVER, cuSPARSE, cuFFT, cuDNN and NCCL to make full use of the GPU architecture. The figure shows CuPy speedup over NumPy. Most operations perform we

serihiro 2017/06/02

リンク

はてなブックマーク

タグ

関連タグで絞り込む (10)

cudaに関するserihiroのブックマーク (16)

お知らせ

今週のはてなブックマーク数ランキング（2024年8月第3週）

今週のはてなブックマーク数ランキング（2024年8月第2週）

今週のはてなブックマーク数ランキング（2024年8月第1週）

公式Twitter

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス