[B! inference] dannのブックマーク

dann id:dann

inferenceに関するdannのブックマーク (16)

GitHub - huggingface/optimum-nvidia
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
dann 2024/03/11
llm

inference
リンク
Accelerating Generative AI with PyTorch II: GPT, Fast
by Team PyTorch This post is the second part of a multi-series blog focused on how to accelerate generative AI models with pure, native PyTorch. We are excited to share a breadth of newly released PyTorch performance features alongside practical examples to see how far we can push PyTorch native performance. In part one, we showed how to accelerate Segment Anything over 8x using only pure, native
dann 2024/01/24
llm

inference

performance
リンク
DeepSpeed, vLLM, CTranslate2 で rinna 3.6b の生成速度を比較する
はじめに言語モデルを用いたテキストの生成にはtransf ormersライブラリが広く使われていますが、transf ormersライブラリは幅広いモデルに対応する一方で、テキスト生成の速度やメモリ効率には十分に最適化されていません。そこでこの記事ではテキスト生成の効率を上げるためのツールを紹介します。今回はPyPIから簡単にインストールできるDeepSpeedとvLLM、CTranslate2を比較します。モデルはrinna/japanese-gpt-neox-3.6b-instruction-ppoを使います。プロンプトのフォーマットやトークナイザ等の使い方についてはモデルカードをご覧ください。この記事ではColabのT4 GPUタイプを利用してテキスト生成の速度を測定しています。それぞれのツールを試すノートブックと、Colabで開けるリンクを載せているので参考にしてみてください。
dann 2024/01/24
llm

inference
リンク
Accelerating Generative AI Part III: Diffusion, Fast
by Sayak Paul and Patrick von Platen (Hugging Face 🤗) This post is the third part of a multi-series blog focused on how to accelerate generative AI models with pure, native PyTorch. We are excited to share a breadth of newly released PyTorch performance features alongside practical examples to see how far we can push PyTorch native performance. In part one, we showed how to accelerate Segment Any
dann 2024/01/05
performance

inference

pytorch
リンク
Distributed Inference with 🤗 Accelerate
dann 2024/01/04
accelerate

inference
リンク
Deploy Your Local GPT Server With Triton
dann 2023/05/17
triton

inference
リンク
GitHub - NVIDIA/FasterTransformer: Transformer related optimization, including BERT, GPT
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
dann 2023/04/23
transformer

inference

deeplearning

triton

fastertransformer
リンク
NVIDIA Triton Inference Server on AWS: Customer success stories and AWS deployment methods to optimize inference throughput, reduce latency, and lower GPU or CPU inference costs. | GTC Digital November 2021 | NVIDIA On-Demand
dann 2023/03/30
aws

eks

inference
リンク
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
dann 2023/03/09
lm

inference

deeplearning
リンク
TensorFlow Model Optimization
import tensorflow as tf import tensorflow_model_optimization as tfmot model = tf.keras.Sequential([...]) pruning_schedule = tfmot.sparsity.keras.PolynomialDecay( initial_sparsity=0.0, final_sparsity=0.5, begin_step=2000, end_step=4000) model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude( model, pruning_schedule=pruning_schedule) ... model_for_pruning.fit(...) TensorFlow Model Optimization
dann 2023/03/08
tensorflow

inference
リンク
[English ver.] [Tensorflow Lite] Various Neural Network Model quantization methods for Tensorflow Lite (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization, EdgeTPU). As of May 05, 2020. - Qiita
[English ver.] [Tensorflow Lite] Various Neural Network Model quantization methods for Tensorflow Lite (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization, EdgeTPU). As of May 05, 2020.Python DeepLearningTensorFlowPyTorchOpenVINO Japanese　English - English - 1. Introduction In this article, I'd like to share with you the quantization workflow I've been workin
dann 2020/08/04
tensorflow

inference
リンク
ONNXの最適化まとめ - ぱたへね
ONNXの最適化を一通り試してみたのでまとめ。サポートしている最適化一覧の取得サポートしている最適化は、get_available_passesで取得できます。 from onnx import optimizer all_passes = optimizer.get_available_passes() 大きく分けると、このように分類できます。意味のないOpの削除（eliminate_deadend等） 2つのOpのfusion　（fuse_matmul_add_bias_into_gemm等） Convへのfusion　（fuse_add_bias_into_conv等）その他 convへのfuseは全く動かず、バージョンアップ待ちです。最適化の結果 Qiitaにそれぞれまとめました。 ONNXでeliminate_deadend 最適化 ONNXで eliminate_i
dann 2020/07/17
onnx

inference
リンク
[Tensorflow Lite] Various Neural Network Model quantization methods for Tensorflow Lite (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization, EdgeTPU). As of May 05, 2020. - Qiita
[Tensorflow Lite] Various Neural Network Model quantization methods for Tensorflow Lite (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization, EdgeTPU). As of May 05, 2020.Python DeepLearningTensorFlowPyTorchOpenVINO 日本語　English - Japanese - 1. Introduction 今回は私が半年間掛けてためてきた、学習済みモデルの量子化ワークフローをメモがてら共有したいと思います。 Tensorflow の checkpoint (.ckpt/.meta)、 FreezeGraph (.
dann 2020/05/06
tensorflow

pytorch

inference
リンク
GitHub - PINTO0309/PINTO_model_zoo: A repository for storing models that have been inter-converted between various frameworks. Supported frameworks are TensorFlow, PyTorch, ONNX, OpenVINO, TFJS, TFTRT, TensorFlowLite (Float32/16/INT8), EdgeTPU, CoreML.
Made with contrib.rocks. A repository for storing models that have been inter-converted between various frameworks. Supported frameworks are TensorFlow, PyTorch, ONNX, OpenVINO, TFJS, TFTRT, TensorFlowLite (Float32/16/INT8), EdgeTPU, CoreML. TensorFlow Lite, OpenVINO, CoreML, TensorFlow.js, TF-TRT, MediaPipe, ONNX [.tflite, .h5, .pb, saved_model, tfjs, tftrt, mlmodel, .xml/.bin, .onnx] I have been
dann 2020/04/19
deeplearning

inference

tensorflow
リンク
GitHub - pfnet-research/chainer-trt: Chainer x TensorRT
dann 2018/12/14
chainer

tensorrt

inference
リンク
benchmarking-hardware-for-cnn-inference-in-2018-1d58268de12a
dann 2018/09/07
inference

hardware

deeplearinng
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx