並び順

ブックマーク数

期間指定

  • から
  • まで

1 - 22 件 / 22件

新着順 人気順

probability distribution function continuousの検索結果1 - 22 件 / 22件

  • What We Learned from a Year of Building with LLMs (Part I)

    It’s an exciting time to build with large language models (LLMs). Over the past year, LLMs have become “good enough” for real-world applications. The pace of improvements in LLMs, coupled with a parade of demos on social media, will fuel an estimated $200B investment in AI by 2025. LLMs are also broadly accessible, allowing everyone, not just ML engineers and scientists, to build intelligence into

      What We Learned from a Year of Building with LLMs (Part I)
    • The Roadmap of Mathematics for Machine Learning

      Understanding math will make you a better engineer.So, I am writing the best and most comprehensive book about it. I'm interested Knowing the mathematics behind machine learning algorithms is a superpower. If you have ever built a model for a real-life problem, you probably experienced that familiarity with the details goes a long way if you want to move beyond baseline performance. This is especi

        The Roadmap of Mathematics for Machine Learning
      • How has DeepSeek improved the Transformer architecture?

        DeepSeek has recently released DeepSeek v3, which is currently state-of-the-art in benchmark performance among open-weight models, alongside a technical report describing in some detail the training of the model. Impressively, they’ve achieved this SOTA performance by only using 2.8 million H800 hours of training hardware time—equivalent to about 4e24 FLOP if we assume 40% MFU. This is about ten t

          How has DeepSeek improved the Transformer architecture?
        • Patterns for Building LLM-based Systems & Products

          Patterns for Building LLM-based Systems & Products [ llm engineering production 🔥 ] · 66 min read Discussions on HackerNews, Twitter, and LinkedIn “There is a large class of problems that are easy to imagine and build demos for, but extremely hard to make products out of. For example, self-driving: It’s easy to demo a car self-driving around a block, but making it into a product takes a decade.”

            Patterns for Building LLM-based Systems & Products
          • Deep Learning for AI – Communications of the ACM

            How can neural networks learn the rich internal representations required for difficult tasks such as recognizing objects or understanding language? Yoshua Bengio, Yann LeCun, and Geoffrey Hinton are recipients of the 2018 ACM A.M. Turing Award for breakthroughs that have made deep neural networks a critical component of computing. Research on artificial neural networks was motivated by the observa

            • GitHub - diff-usion/Awesome-Diffusion-Models: A collection of resources and papers on Diffusion Models

              DiffEnc: Variational Diffusion with a Learned Encoder Beatrix M. G. Nielsen, Anders Christensen, Andrea Dittadi, Ole Winther arXiv 2023. [Paper] 30 Oct 2023 Upgrading VAE Training With Unlimited Data Plans Provided by Diffusion Models Tim Z. Xiao, Johannes Zenn, Robert Bamler arXiv 2023. [Paper] 30 Oct 2023 Successfully Applying Lottery Ticket Hypothesis to Diffusion Model Chao Jiang, Bo Hui, Boha

                GitHub - diff-usion/Awesome-Diffusion-Models: A collection of resources and papers on Diffusion Models
              • Andrej Karpathy — AGI is still a decade away

                The Andrej Karpathy episode. Andrej explains why reinforcement learning is terrible (but everything else is much worse), why model collapse prevents LLMs from learning the way humans do, why AGI will just blend into the previous ~2.5 centuries of 2% GDP growth, why self driving took so long to crack, and what he sees as the future of education. Watch on YouTube; listen on Apple Podcasts or Spotify

                  Andrej Karpathy — AGI is still a decade away
                • Blog

                  Hachi: An (Image) Search engine Only the dead have seen the end of war .. George Santayana For quite some time now, i have been working on and off on a fully self-hosted search engine, in hope to make it easier to search across Personal data in an end to end manner. Even as individuals, we are hoarding and generating more and more data with no end in sight. Such "personal" data is being stored fro

                  • What We’ve Learned From A Year of Building with LLMs – Applied LLMs

                    A practical guide to building successful LLM products, covering the tactical, operational, and strategic. It’s an exciting time to build with large language models (LLMs). Over the past year, LLMs have become “good enough” for real-world applications. And they’re getting better and cheaper every year. Coupled with a parade of demos on social media, there will be an estimated $200B investment in AI

                      What We’ve Learned From A Year of Building with LLMs – Applied LLMs
                    • Migrating Critical Traffic At Scale with No Downtime — Part 2

                      Shyam Gala, Javier Fernandez-Ivern, Anup Rokkam Pratap, Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. Behind these perfect moments of entertainment is a complex mechanism, with numerous gears and cogs working in harmony. But what happens when this machinery needs a transformation?

                        Migrating Critical Traffic At Scale with No Downtime — Part 2
                      • Aman's AI Journal • Primers • Ilya Sutskever's Top 30

                        Ilya Sutskever’s Top 30 Reading List The First Law of Complexodynamics The Unreasonable Effectiveness of Recurrent Neural Networks Understanding LSTM Networks Recurrent Neural Network Regularization Keeping Neural Networks Simple by Minimizing the Description Length of the Weights Pointer Networks ImageNet Classification with Deep Convolutional Neural Networks Order Matters: Sequence to Sequence f

                        • https://deeplearningtheory.com/PDLT.pdf

                          The Principles of Deep Learning Theory An Effective Theory Approach to Understanding Neural Networks Daniel A. Roberts and Sho Yaida based on research in collaboration with Boris Hanin drob@mit.edu, shoyaida@fb.com ii Contents Preface vii 0 Initialization 1 0.1 An Effective Theory Approach . . . . . . . . . . . . . . . . . . . . . . . . 2 0.2 The Theoretical Minimum . . . . . . . . . . . . . . . .

                          • Attention Is Off By One | Hacker News

                            1. SummaryThe author is suggesting that we add 1 to the denominator of the softmax that is used within attention mechanisms (not the final output softmax). The softmax inside an attention unit allows it to see key/query matches as probabilities; those probabilities support a continuous-valued version of a key-value lookup (instead of 1/0 output of a lookup, we get weights where a high weight = the

                            • GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

                              Accepted at ICLR 2026 (Oral). GEPA: REFLECTIVE PROMPT EVOLUTION CAN OUTPER- FORM REINFORCEMENT LEARNING Lakshya A Agrawal1 , Shangyin Tan1 , Dilara Soylu2 , Noah Ziems4 , Rishi Khare1 , Krista Opsahl-Ong5 , Arnav Singhvi2,5 , Herumb Shandilya2 , Michael J Ryan2 , Meng Jiang4 , Christopher Potts2 , Koushik Sen1 , Alexandros G. Dimakis1,3 , Ion Stoica1 , Dan Klein1 , Matei Zaharia1,5 , Omar Khattab6

                              • BERT is just a Single Text Diffusion Step

                                This article appeared on Hacker News. Link to the discussion here. Additionally, Andrej Karpathy wrote his thoughts about the post, linked here. A while back, Google DeepMind unveiled Gemini Diffusion, an experimental language model that generates text using diffusion. Unlike traditional GPT-style models that generate one word at a time, Gemini Diffusion creates whole blocks of text by refining ra

                                  BERT is just a Single Text Diffusion Step
                                • Why We Think

                                  Date: May 1, 2025 | Estimated Reading Time: 40 min | Author: Lilian Weng Special thanks to John Schulman for a lot of super valuable feedback and direct edits on this post. Test time compute (Graves et al. 2016, Ling, et al. 2017, Cobbe et al. 2021) and Chain-of-thought (CoT) (Wei et al. 2022, Nye et al. 2021), have led to significant improvements in model performance, while raising many research

                                  • A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

                                    111 A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT YIHAN CAO∗, Lehigh University & Carnegie Mellon University, USA SIYU LI, Lehigh University, USA YIXIN LIU, Lehigh University, USA ZHILING YAN, Lehigh University, USA YUTONG DAI, Lehigh University, USA PHILIP S. YU, University of Illinois at Chicago, USA LICHAO SUN, Lehigh University, USA Recen

                                    • Choosing a Sequential Testing Framework — Comparisons and Discussions | Spotify Engineering

                                      Choosing a Sequential Testing Framework — Comparisons and Discussions TL;DR Sequential tests are the bread and butter for any company conducting online experiments. The literature on sequential testing has developed quickly over the last 10 years, and it’s not always easy to determine which test is most suitable for the setup of your company — many of these tests are “optimal” in some sense, and m

                                        Choosing a Sequential Testing Framework — Comparisons and Discussions | Spotify Engineering
                                      • The Little Book of Deep Learning

                                        The Little Book of Deep Learning François Fleuret François Fleuret is a professor of computer sci- ence at the University of Geneva, Switzerland. The cover illustration is a schematic of the Neocognitron by Fukushima [1980], a key an- cestor of deep neural networks. This ebook is formatted to fit on a phone screen. Contents Contents 5 List of figures 7 Foreword 8 I Foundations 10 1 Machine Learnin

                                        • Why model calibration matters and how to achieve it

                                          by LEE RICHARDSON & TAYLOR POSPISIL Calibrated models make probabilistic predictions that match real world probabilities. This post explains why calibration matters, and how to achieve it. It discusses practical issues that calibrated predictions solve and presents a flexible framework to calibrate any classifier. Calibration applies in many applications, and hence the practicing data scientist mu

                                            Why model calibration matters and how to achieve it
                                          • Building a Simple Artificial Neural Network in JavaScript

                                            This article will discuss building a simple neural network using JavaScript. However, let’s first check what deep neural networks and artificial neural networks are. Deep Neural Network and Artificial Neural NetworkArtificial Neural Networks (ANNs) and Deep Neural Networks (DNNs) are related concepts, but they are different. The inspiration behind these artificial neural networks for machine learn

                                              Building a Simple Artificial Neural Network in JavaScript
                                            • notes.dvi

                                              NOTES FOR MATH 635: TOPOLOGICAL QUANTUM FIELD THEORY KO HONDA The goal of this course is to define invariants of 3-manifolds and knots and representations of the mapping class group, using quantum field theory. We will follow Kohno, Conformal Field Theory and Topology, supplementing it with additional material to make it more accessible. The amount of mathematics that goes into defining these inva

                                              1