dannのブックマーク - はてなブックマーク

nvidia/GPT-2B-001 · Hugging Face
GPT-2B-001 ||| Model Description GPT-2B-001 is a transf ormer-based language model. GPT refers to a class of transf ormer decoder-only models similar to GPT-2 and 3 while 2B refers to the total trainable parameter count (2 Billion) [1, 2]. This model was trained on 1.1T tokens with NeMo. Model Architecture improvements The model uses the SwiGLU activation function [4] Rotary positional embeddings (R
dann 2023/05/01
nvidia
リンク
Distributed training with 🤗 Accelerate
dann 2023/04/07
accelerate
リンク
From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease
dann 2023/04/07
accelerate

ddp

huggingface
リンク
timm
dann 2023/02/23
huggingfacae

timm
リンク
A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes
The 3 models are BLOOM-176B, T5-11B and T5-3B. Hugging Face transf ormers integration nuances Next let's discuss the specifics of the Hugging Face transf ormers integration. Let's look at the usage and the common culprit you may encounter while trying to set things up. Usage The module responsible for the whole magic described in this blog post is called Linear8bitLt and you can easily import it fro
dann 2023/02/08
huggingface

transformer
リンク
Introduction to Graph Machine Learning
In this blog post, we cover the basics of graph machine learning. We first study what graphs are, why they are used, and how best to represent them. We then cover briefly how people learn on graphs, from pre-neural methods (exploring graph features at the same time) to what are commonly called Graph Neural Networks. Lastly, we peek into the world of Transf ormers for graphs. Graphs What is a graph?
dann 2023/01/10
graphgpt, tokengt

huggingface

gnn

transformer
リンク
DeepSpeed
dann 2022/08/30
deepspeed

huggingface

performance
リンク
Tf Xla Generate Benchmarks - a Hugging Face Space by joaogante
dann 2022/07/31
tensorflow
リンク
The Technology Behind BLOOM Training
Please note that both Megatron-LM and DeepSpeed have Pipeline Parallelism and BF16 Optimizer implementations, but we used the ones from DeepSpeed as they are integrated with ZeRO. Megatron-DeepSpeed implements 3D Parallelism to allow huge models to train in a very efficient way. Let’s briefly discuss the 3D components. DataParallel (DP) - the same setup is replicated multiple times, and each being
dann 2022/07/24
deeplearning

bloom

deepspeed

megatron
リンク
nlp-waseda/roberta-base-japanese · Hugging Face
nlp-waseda/roberta-base-japanese Model description This is a Japanese RoBERTa base model pretrained on Japanese Wikipedia and the Japanese portion of CC-100. How to use You can use this model for masked language modeling as follows: from transf ormers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("nlp-waseda/roberta-base-japanese") model = AutoModelForMaskedLM
dann 2021/12/27
pytorch
リンク
https://huggingface.co/docs/transformers/_modules/transformers/optimization
dann 2020/07/10
pytorch

scheduler
リンク
前のページ 1 2

はてなブックマーク

タグ

ブックマーク / huggingface.co (31)

お知らせ

月間はてなブックマーク数ランキング（2024年9月）

今週のはてなブックマーク数ランキング（2024年9月第5週）

今週のはてなブックマーク数ランキング（2024年9月第4週）

公式Twitter

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス