並び順

ブックマーク数

期間指定

  • から
  • まで

1 - 17 件 / 17件

新着順 人気順

block sparse attention githubの検索結果1 - 17 件 / 17件

  • The Big LLM Architecture Comparison

    Last updated: Apr 2, 2026 (added Gemma 4 in section 23) It has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek V3 and Llama 4 (2024-2025), one might be surprised at how structurally similar these models still are. Sure, positional embeddings have evolved from absolute to rotational (RoPE), Multi-Head Attentio

      The Big LLM Architecture Comparison
    • 推薦システムにおけるニューラルネットワークの活用について読んだ論文をゆるくまとめる - Re:ゼロから始めるML生活

      ここ数ヶ月くらい、推薦システムにおけるNNの活用というテーマで論文をちょこちょこ読んでいました。 推薦システムにNNを適用・応用するという守備範囲も広いテーマではありますが、せっかく良い機会なので自分用にまとめてみたいと思います。 理解が曖昧なところもあり、マサカリが飛んできそうな気配がプンプンしますが、がんばって書いてみたいと思います。マサカリコワイ... 前提知識 協調フィルタリング Matrix Factorization Factorization Machine ニューラルネットワークの推薦システムへの応用の傾向 Feature EngineeringとしてのNN Wide & deep DeepFM DCN AutoInt DCN V2 系列データとして取り扱うNN prod2vec AttRec BERT4Rec Transformers4Rec 参考文献 読んだ論文をまとめ

        推薦システムにおけるニューラルネットワークの活用について読んだ論文をゆるくまとめる - Re:ゼロから始めるML生活
      • Patterns for Building LLM-based Systems & Products

        Patterns for Building LLM-based Systems & Products [ llm engineering production 🔥 ] · 66 min read Discussions on HackerNews, Twitter, and LinkedIn “There is a large class of problems that are easy to imagine and build demos for, but extremely hard to make products out of. For example, self-driving: It’s easy to demo a car self-driving around a block, but making it into a product takes a decade.”

          Patterns for Building LLM-based Systems & Products
        • GitHub - kyegomez/OpenMythos: A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.

          Disclaimer: OpenMythos is an independent, community-driven theoretical reconstruction based solely on publicly available research and speculation. It is not affiliated with, endorsed by, or connected to Anthropic or any of their proprietary systems. OpenMythos is an open-source, theoretical implementation of the Claude Mythos model. It implements a Recurrent-Depth Transformer (RDT) with three stag

            GitHub - kyegomez/OpenMythos: A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.
          • Blog

            Hachi: An (Image) Search engine Only the dead have seen the end of war .. George Santayana For quite some time now, i have been working on and off on a fully self-hosted search engine, in hope to make it easier to search across Personal data in an end to end manner. Even as individuals, we are hoarding and generating more and more data with no end in sight. Such "personal" data is being stored fro

            • Mixture of Experts Explained

              There is a second iteration (Feb 2026) of the blog post where we cover how the transformers library has built around MoEs to make them "first class citizens" of the library and the Hub. Here is the link to the post: Mixture of Experts (MoEs) in Transformers With the release of Mixtral 8x7B (announcement, model card), a class of transformer has become the hottest topic in the open AI community: Mix

                Mixture of Experts Explained
              • Why We Use Julia, 10 Years Later

                Exactly ten years ago today, we published "Why We Created Julia", introducing the Julia project to the world. At this point, we have moved well past the ambitious goals set out in the original blog post. Julia is now used by hundreds of thousands of people. It is taught at hundreds of universities and entire companies are being formed that build their software stacks on Julia. From personalized me

                  Why We Use Julia, 10 Years Later
                • Aman's AI Journal • Primers • Ilya Sutskever's Top 30

                  Ilya Sutskever’s Top 30 Reading List The First Law of Complexodynamics The Unreasonable Effectiveness of Recurrent Neural Networks Understanding LSTM Networks Recurrent Neural Network Regularization Keeping Neural Networks Simple by Minimizing the Description Length of the Weights Pointer Networks ImageNet Classification with Deep Convolutional Neural Networks Order Matters: Sequence to Sequence f

                  • Daily Papers - Hugging Face

                    Get trending papers in your email inbox once a day! Get trending papers in your email inbox! Subscribe user). This is inadequate for real-world agentic settings, where conflicts can arise across far more sources and contexts. In this work, we propose Many-Tier Instruction Hierarchy (ManyIH), a paradigm for resolving instruction conflicts among instructions with arbitrarily many privilege levels. W

                      Daily Papers - Hugging Face
                    • Large Text Compression Benchmark

                       Large Text Compression Benchmark Matt Mahoney Last update: Mar. 25, 2026. history This competition ranks lossless data compression programs by the compressed size (including the size of the decompression program) of the first 109 bytes of the XML text dump of the English version of Wikipedia on Mar. 3, 2006. About the test data. The goal of this benchmark is not to find the best overall compress

                      • Accelerating Generative AI with PyTorch: Segment Anything, Fast – PyTorch

                        Blog Accelerating Generative AI with PyTorch: Segment Anything, Fast This post is the first part of a multi-series blog focused on how to accelerate generative AI models with pure, native PyTorch. We are excited to share a breadth of newly released PyTorch performance features alongside practical examples of how these features can be combined to see how far we can push PyTorch native performance.

                          Accelerating Generative AI with PyTorch: Segment Anything, Fast – PyTorch
                        • GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

                          Accepted at ICLR 2026 (Oral). GEPA: REFLECTIVE PROMPT EVOLUTION CAN OUTPER- FORM REINFORCEMENT LEARNING Lakshya A Agrawal1 , Shangyin Tan1 , Dilara Soylu2 , Noah Ziems4 , Rishi Khare1 , Krista Opsahl-Ong5 , Arnav Singhvi2,5 , Herumb Shandilya2 , Michael J Ryan2 , Meng Jiang4 , Christopher Potts2 , Koushik Sen1 , Alexandros G. Dimakis1,3 , Ion Stoica1 , Dan Klein1 , Matei Zaharia1,5 , Omar Khattab6

                          • Why We Think

                            Date: May 1, 2025 | Estimated Reading Time: 40 min | Author: Lilian Weng Special thanks to John Schulman for a lot of super valuable feedback and direct edits on this post. Test time compute (Graves et al. 2016, Ling, et al. 2017, Cobbe et al. 2021) and Chain-of-thought (CoT) (Wei et al. 2022, Nye et al. 2021), have led to significant improvements in model performance, while raising many research

                            • A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

                              111 A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT YIHAN CAO∗, Lehigh University & Carnegie Mellon University, USA SIYU LI, Lehigh University, USA YIXIN LIU, Lehigh University, USA ZHILING YAN, Lehigh University, USA YUTONG DAI, Lehigh University, USA PHILIP S. YU, University of Illinois at Chicago, USA LICHAO SUN, Lehigh University, USA Recen

                              • FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention – PyTorch

                                Blog FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention In theory, Attention is All You Need. In practice, however, we also need optimized attention implementations like FlashAttention. Although these fused attention implementations have substantially improved performance and enabled long contexts, this efficiency has come with a loss of flexibility. You can no longer

                                  FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention – PyTorch
                                • A Short Chronology Of Deep Learning For Tabular Data

                                  [Last updated: Jan 23, 2023] In my lectures, I emphasize that deep learning is really good for unstructured data (essentially, that’s the opposite of tabular data). Deep learning is sometimes referred to as “representation learning” because its strength is the ability to learn the feature extraction pipeline. Most tabular datasets already represent (typically manually) extracted features, so there

                                    A Short Chronology Of Deep Learning For Tabular Data
                                  • ICLR 2022 Spotlight: Demystifying local attention and dynamic depth-wise convolution - Microsoft Research

                                    In the past two years, there have been numerous papers written on Transformer, and researchers are designing Transformer models for all kinds of tasks. However, is attention, the core module of Transformer, really stronger than convolution? This paper may bring to you a new perspective. Researchers from Microsoft Research Asia have looked into local attention and dynamic depth-wise convolution and

                                      ICLR 2022 Spotlight: Demystifying local attention and dynamic depth-wise convolution - Microsoft Research
                                    1