タイトル「TRANSFORMERS」を検索 - はてなブックマーク

1 - 20 件 / 20件

新着順人気順

絞り込み

検索対象
ブックマーク数
期間
セーフサーチ

TRANSFORMERSの検索結果1 - 20 件 / 20件

大規模言語モデルを自作しよう！(Transformers+DeepSpeed+torch.compile+flash_attn2）
- 24 users
- zenn.dev/selllous
- テクノロジー
- 2023/12/14
本記事は、LLM Advent Calendar 2023 13日目の記事です。はじめに 🤗 Transformersは、自然言語処理、マルチモーダル、音声処理、コンピュータビジョン分野の事前学習済モデルを簡単にダウンロードしトレーニングすることが可能なpythonライブラリです。このライブラリを使用し、大規模言語モデル（LLM）の事前学習済モデルをローカルPC上にダウンロードし、それを使用した言語生成や、要約・翻訳・質問応答などの個別のタスクへのファインチューニング、チャットAIへの組み込みなどが盛んに行われています。 LLMの事前学習方法に関する情報としては、GPT-NeoXやMegatron-LM、TinyLlama、lit-llamaなど、他のpythonライブラリを使用したものが増えてきています。一方で、Transformersライブラリを使用したLLMの事前学習に関する情報
- AI
- python
- Development
- あとで読む
Transformers高速化ライブラリvLLMのAsyncLLMEngineを利用した非同期高速文章生成 - 端の知識の備忘録
- 24 users
- hashicco.hatenablog.com
- テクノロジー
- 2024/07/06
概要先日までKaggleのAIMOコンペ(数学の問題をLLMに解かせて正答率を競う)に参戦していました。結果は初のチーム参加でメンバーに助けられつつ運もあり、なんとか銀メダルを取れました！これでMasterにリーチがかかりましたが、金メダルは未だ取れる気がしないので遠い道のりです……。 www.kaggle.com このコンペについて、近い内に同様のコンペが開催予定なこともあり上位解法があまり出ていない状態なので、どのような手法が良かったのかまだわかっていないのですが、とりあえず公開されている情報を元にすると、 LLMとしてはほぼほぼ全員が数学問題に特化したLLMであるDeepseek-Math-7Bを利用している LLMが出力したPythonコードを実行するインタープリターを実装することで、LLMのハルシネーションによる計算ミスを防ぐパイプラインが有力であった LLMの出力を比較的高い
- あとで読む
GitHub - kyegomez/BitNet: Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
- 21 users
- github.com/kyegomez
- テクノロジー
- 2024/02/28
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- LLM
- 言語
- github
- あとで読む
Googleが開発した画像分類タスクが可能な機械学習モデル「Vision Transformers」の仕組みとは？
- 19 users
- gigazine.net
- テクノロジー
- 2024/04/20
Googleの機械学習モデル「Transformer」は、データを時系列に処理しなくても、自然言語などのデータを翻訳やテキスト要約することが可能で、ChatGPTなどの自然な会話が可能なチャットAIのベースとなっています。また、Transformerの手法を画像分野に応用したモデルが「Vision Transformer」です。ソフトウェアエンジニアのデニス・タープ氏が、「Vision Transformer」のコンポーネントがどのように機能し、データはどのような流れをたどるのか、ビジュアル化して解説しています A Visual Guide to Vision Transformers | MDTURP https://blog.mdturp.ch/posts/2024-04-05-visual_guide_to_vision_transformer.html 0：はじめに前提として、T
GitHub - frodo821/BitNet-Transformers: 0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture
- 12 users
- github.com/frodo821
- テクノロジー
- 2024/03/01
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
Transformers as Support Vector Machines
- 11 users
- arxiv.org
- 学び
- 2023/09/04
Since its inception in "Attention Is All You Need", transformer architecture has led to revolutionary advancements in NLP. The attention layer within the transformer admits a sequence of input tokens $X$ and makes them interact through pairwise similarities computed as softmax$(XQK^\top X^\top)$, where $(K,Q)$ are the trainable key-query parameters. In this work, we establish a formal equivalence
Open AI「Sora」やStability AI「SD3」など、画像・動画生成AIの著しい進化の背景にある「Diffusion Transformers」とは？ | AMP[アンプ] - ビジネスインスピレーションメディア
- 6 users
- ampmedia.jp
- テクノロジー
- 2024/04/27
Open AIやGoogleなどといった競合に対抗するべく、Stable Diffusion 3（SD3）を発表したStability AI。最新かつ最強の画像生成AIモデルとされているSD3は、「Diffusion Transformers」に基づいた新しいアーキテクチャを採用し、さまざまなハードウェアで動作する。このDiffusion Transformersとはどのようなアプローチなのだろうか。 2022年には存在していたDiffusion Transformers 今話題となっているDiffusion Transformersそのものは、2022年夏にAIリサーチ研究のシーンに登場したAIモデルアーキテクチャ。ニューヨーク大学のコンピュータサイエンス教授のXie氏が、当時MetaのAIリサーチラボでインターンをしていたPeebles氏（Xie氏がメンター）と共に、機械学習上の2つの
- 人工知能
GitHub - Beomi/BitNet-Transformers: 0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture
- 6 users
- github.com/Beomi
- テクノロジー
- 2024/02/28
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning
- 6 users
- www.youtube.com
- テクノロジー
- 2024/04/08
Breaking down how Large Language Models work Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support --- Here are a few other relevant resources Build a GPT from scratch, by Andrej Karpathy https://youtu.be/kCc8FmEb1nY If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the
- Transformer
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers
- 4 users
- nihalsid.github.io
- 世の中
- 2023/11/29
MeshGPT creates triangle meshes by autoregressively sampling from a transformer model that has been trained to produce tokens from a learned geometric vocabulary. These tokens can then be decoded into the faces of a triangle mesh. Our method generates clean, coherent, and compact meshes, characterized by sharp edges and high fidelity. We introduce MeshGPT, a new approach for generating triangle me
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
- 4 users
- arxiv.org
- テクノロジー
- 2024/04/13
This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-te
AutoGPTQ と transformers によるLLMの軽量化｜npaka
- 4 users
- note.com/npaka
- テクノロジー
- 2023/08/24
以下の記事が面白かったので、かるくまとめました。・Making LLMs lighter with AutoGPTQ and transformers 1. はじめに「AutoGPTQ」を「transformers」に統合しました。これにより、「GPTQ」を使用して8、4、3、2bitの精度でモデルを量子化して実行できるようになります。4bit量子化による精度の低下は無視でき、小さいバッチサイズの推論速度はfp16ベースラインに匹敵します。この統合は、「Nvidia GPU」と「RoCm搭載AMD GPU 」の両方で利用できます。 2. AutoGPTQ「AutoGPTQ」を使用すると、transformersモデルを量子化できます。「GPTQ-for-LLaMa」「Exllama」「llama.cpp」などのコミュニティの取り組みはLlamaアーキテクチャ専用の量子化手法を実装している
Retentive Networks (RetNet) Explained: The much-awaited Transformers-killer is here
- 3 users
- medium.com
- テクノロジー
- 2023/08/21
Transformers have become the de-facto architecture for LLMs, as they efficiently overcome the sequential training issues of the recurrent neural networks (RNNs). However, transformers are not perfect either, as they solve for just two arms of the so-called “impossible triangle”. Well, the RetNet from Microsoft claims to sit right at the dead center of this impossible triangle trumping all the meth
GitHub - facebookresearch/searchformer: Official codebase for the paper "Beyond A* Better Planning with Transformers via Search Dynamics Bootstrapping".
- 3 users
- github.com/facebookresearch
- テクノロジー
- 2024/04/27
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- ai
Sparse Transformers：入力シーケンスの長さによる計算量増加問題への革新的なアプローチ
- 3 users
- ai-scholar.tech
- テクノロジー
- 2023/09/08
3つの要点 ✔️ Attentionのレイヤー毎の特徴を再現することで，計算量の削減を達成 ✔️ Sliding Window Attenion、Dilated Sliding Window Attention、Global Attentionという3つのAttentionを使ってTransformernの計算量を削減した ✔️ 計算量を削減しただけではなくて，当時のSOTAを達成している． Generating Long Sequences with Sparse Transformers written by Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever (Submitted on 23 Apr 2019) Comments: Published on arxiv. Subjects: Machine Learning (c
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
- 3 users
- arxiv.org
- テクノロジー
- 2024/02/24
While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time
- Algorithm
- ai
Attention in transformers, visually explained | Chapter 6, Deep Learning
- 3 users
- www.youtube.com
- テクノロジー
- 2024/04/10
Demystifying attention, the key mechanism inside transformers and LLMs. Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support Special thanks to these supporters: https://www.3blue1brown.com/lessons/attention#thanks An equally valuable form of support is to simply share the videos. Demystifying self-attention, multiple heads, and cross-attention. Inst
transformers.js/examples/webgpu-whisper at v3 · xenova/transformers.js
- 3 users
- github.com/xenova
- テクノロジー
- 2024/06/08
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
ライブラリsentence-transformersのサンプルコードを動かし、英語や日本語テキストからembeddingsやその類似度を計算する - nikkie-ftnextの日記
- 3 users
- nikkie-ftnext.hatenablog.com
- テクノロジー
- 2023/09/24
はじめにアヤさん、たんじょーび、おめでとう！！ nikkieです。みんなアイうた見ていて嬉しい限り♪ sentence-transformersというPythonのライブラリがあります。こいつでembeddings（テキストの埋め込み表現）が計算できるらしく、気になったので触ってみました。 ※レベル感としては使い出しレベル、やってみた系です。目次はじめに目次動作環境ドキュメントの例でembeddingsを計算（英語テキスト）日本語テキストからembeddingsを計算終わりに動作環境 macOS 12.6.6 CPU環境です Python 3.10.9 sentence-transformers 2.2.2 pip install sentence-transformersで入ったライブラリのうち主なもののバージョンはこちら torch 2.0.1 transform
CTranslate2 で手軽に Transformers の推論速度を1.6 ~ 約2倍にする - A Day in the Life
- 3 users
- secon.dev
- テクノロジー
- 2023/11/24
CTranslate2という Python と C++で書かれた高速推論用ライブラリがあり、いつか試そうと思っていたのだけど、モデルを変換する必要があったため億劫になって試していなかった。しかし hf_hub_ctranslate2 という、何もやらずにも透過的に HuggingFace のモデルを CTranslate2 で推論できる形式に変換して利用できるライブラリを知ったので試してみたところ、とても簡単に GPU で推論が 1.6 倍速に、CPU で1.9倍速になり、かつ精度もほぼ変わらなかったので、もっと早く使うべきだった、のでメモ。 CTranslate2 とは CTranslate2(以下CT2) とは、GitHub プロジェクトページの概要に書かれている "CTranslate2 is a C++ and Python library for efficient infere