本文「block sparse attention github」を検索

1 - 17 件 / 17件

新着順人気順

絞り込み

検索対象
ブックマーク数
期間
セーフサーチ

block sparse attention githubの検索結果1 - 17 件 / 17件

The Big LLM Architecture Comparison
- 97 users
- magazine.sebastianraschka.com
- テクノロジー
- 2025/07/19
Last updated: Apr 2, 2026 (added Gemma 4 in section 23) It has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek V3 and Llama 4 (2024-2025), one might be surprised at how structurally similar these models still are. Sure, positional embeddings have evolved from absolute to rotational (RoPE), Multi-Head Attentio
推薦システムにおけるニューラルネットワークの活用について読んだ論文をゆるくまとめる - Re:ゼロから始めるML生活
- 25 users
- www.nogawanogawa.com
- テクノロジー
- 2022/05/10
ここ数ヶ月くらい、推薦システムにおけるNNの活用というテーマで論文をちょこちょこ読んでいました。推薦システムにNNを適用・応用するという守備範囲も広いテーマではありますが、せっかく良い機会なので自分用にまとめてみたいと思います。理解が曖昧なところもあり、マサカリが飛んできそうな気配がプンプンしますが、がんばって書いてみたいと思います。マサカリコワイ... 前提知識協調フィルタリング Matrix Factorization Factorization Machine ニューラルネットワークの推薦システムへの応用の傾向 Feature EngineeringとしてのNN Wide & deep DeepFM DCN AutoInt DCN V2 系列データとして取り扱うNN prod2vec AttRec BERT4Rec Transformers4Rec 参考文献読んだ論文をまとめ
- 機械学習
- 論文
Patterns for Building LLM-based Systems & Products
- 16 users
- eugeneyan.com
- テクノロジー
- 2023/08/02
Patterns for Building LLM-based Systems & Products [ llm engineering production 🔥 ] · 66 min read Discussions on HackerNews, Twitter, and LinkedIn “There is a large class of problems that are easy to imagine and build demos for, but extremely hard to make products out of. For example, self-driving: It’s easy to demo a car self-driving around a block, but making it into a product takes a decade.”
- LLM
- LLMOps
- text
- あとで読む
GitHub - kyegomez/OpenMythos: A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.
- 8 users
- github.com/kyegomez
- テクノロジー
- 2026/04/20
Disclaimer: OpenMythos is an independent, community-driven theoretical reconstruction based solely on publicly available research and speculation. It is not affiliated with, endorsed by, or connected to Anthropic or any of their proprietary systems. OpenMythos is an open-source, theoretical implementation of the Claude Mythos model. It implements a Recurrent-Depth Transformer (RDT) with three stag
Blog
- 8 users
- eagledot.xyz
- テクノロジー
- 2025/11/30
Hachi: An (Image) Search engine Only the dead have seen the end of war .. George Santayana For quite some time now, i have been working on and off on a fully self-hosted search engine, in hope to make it easier to search across Personal data in an end to end manner. Even as individuals, we are hoarding and generating more and more data with no end in sight. Such "personal" data is being stored fro
- Software
Mixture of Experts Explained
- 8 users
- huggingface.co
- テクノロジー
- 2023/12/12
There is a second iteration (Feb 2026) of the blog post where we cover how the transformers library has built around MoEs to make them "first class citizens" of the library and the Hub. Here is the link to the post: Mixture of Experts (MoEs) in Transformers With the release of Mixtral 8x7B (announcement, model card), a class of transformer has become the hottest topic in the open AI community: Mix
Why We Use Julia, 10 Years Later
- 5 users
- julialang.org
- テクノロジー
- 2022/02/15
Exactly ten years ago today, we published "Why We Created Julia", introducing the Julia project to the world. At this point, we have moved well past the ambitious goals set out in the original blog post. Julia is now used by hundreds of thousands of people. It is taught at hundreds of universities and entire companies are being formed that build their software stacks on Julia. From personalized me
- Julia
Aman's AI Journal • Primers • Ilya Sutskever's Top 30
- 5 users
- aman.ai
- テクノロジー
- 2024/11/04
Ilya Sutskever’s Top 30 Reading List The First Law of Complexodynamics The Unreasonable Effectiveness of Recurrent Neural Networks Understanding LSTM Networks Recurrent Neural Network Regularization Keeping Neural Networks Simple by Minimizing the Description Length of the Weights Pointer Networks ImageNet Classification with Deep Convolutional Neural Networks Order Matters: Sequence to Sequence f
- ai
Daily Papers - Hugging Face
- 5 users
- huggingface.co
- テクノロジー
- 2023/05/05
Get trending papers in your email inbox once a day! Get trending papers in your email inbox! Subscribe user). This is inadequate for real-world agentic settings, where conflicts can arise across far more sources and contexts. In this work, we propose Many-Tier Instruction Hierarchy (ManyIH), a paradigm for resolving instruction conflicts among instructions with arbitrarily many privilege levels. W
- AI
Large Text Compression Benchmark
- 4 users
- www.mattmahoney.net
- テクノロジー
- 2024/09/22
Large Text Compression Benchmark Matt Mahoney Last update: Mar. 25, 2026. history This competition ranks lossless data compression programs by the compressed size (including the size of the decompression program) of the first 109 bytes of the XML text dump of the English version of Wikipedia on Mar. 3, 2006. About the test data. The goal of this benchmark is not to find the best overall compress
Accelerating Generative AI with PyTorch: Segment Anything, Fast – PyTorch
- 4 users
- pytorch.org
- テクノロジー
- 2023/12/02
Blog Accelerating Generative AI with PyTorch: Segment Anything, Fast This post is the first part of a multi-series blog focused on how to accelerate generative AI models with pure, native PyTorch. We are excited to share a breadth of newly released PyTorch performance features alongside practical examples of how these features can be combined to see how far we can push PyTorch native performance.
- PyTorch
- performance
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
- 3 users
- arxiv.org
- テクノロジー
- 2026/04/20
Accepted at ICLR 2026 (Oral). GEPA: REFLECTIVE PROMPT EVOLUTION CAN OUTPER- FORM REINFORCEMENT LEARNING Lakshya A Agrawal1 , Shangyin Tan1 , Dilara Soylu2 , Noah Ziems4 , Rishi Khare1 , Krista Opsahl-Ong5 , Arnav Singhvi2,5 , Herumb Shandilya2 , Michael J Ryan2 , Meng Jiang4 , Christopher Potts2 , Koushik Sen1 , Alexandros G. Dimakis1,3 , Ion Stoica1 , Dan Klein1 , Matei Zaharia1,5 , Omar Khattab6
Why We Think
- 3 users
- lilianweng.github.io
- テクノロジー
- 2025/05/19
Date: May 1, 2025 | Estimated Reading Time: 40 min | Author: Lilian Weng Special thanks to John Schulman for a lot of super valuable feedback and direct edits on this post. Test time compute (Graves et al. 2016, Ling, et al. 2017, Cobbe et al. 2021) and Chain-of-thought (CoT) (Wei et al. 2022, Nye et al. 2021), have led to significant improvements in model performance, while raising many research
- 機械学習
- programming
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
- 3 users
- arxiv.org
- テクノロジー
- 2023/03/25
111 A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT YIHAN CAO∗, Lehigh University & Carnegie Mellon University, USA SIYU LI, Lehigh University, USA YIXIN LIU, Lehigh University, USA ZHILING YAN, Lehigh University, USA YUTONG DAI, Lehigh University, USA PHILIP S. YU, University of Illinois at Chicago, USA LICHAO SUN, Lehigh University, USA Recen
FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention – PyTorch
- 3 users
- pytorch.org
- テクノロジー
- 2024/08/08
Blog FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention In theory, Attention is All You Need. In practice, however, we also need optimized attention implementations like FlashAttention. Although these fused attention implementations have substantially improved performance and enabled long contexts, this efficiency has come with a loss of flexibility. You can no longer
A Short Chronology Of Deep Learning For Tabular Data
- 3 users
- sebastianraschka.com
- テクノロジー
- 2022/07/25
[Last updated: Jan 23, 2023] In my lectures, I emphasize that deep learning is really good for unstructured data (essentially, that’s the opposite of tabular data). Deep learning is sometimes referred to as “representation learning” because its strength is the ability to learn the feature extraction pipeline. Most tabular datasets already represent (typically manually) extracted features, so there
- あとで読む
ICLR 2022 Spotlight: Demystifying local attention and dynamic depth-wise convolution - Microsoft Research
- 3 users
- www.microsoft.com
- テクノロジー
- 2022/07/14
In the past two years, there have been numerous papers written on Transformer, and researchers are designing Transformer models for all kinds of tasks. However, is attention, the core module of Transformer, really stronger than convolution? This paper may bring to you a new perspective. Researchers from Microsoft Research Asia have looked into local attention and dynamic depth-wise convolution and
- 機械学習
- あとで読む