並び順

ブックマーク数

期間指定

  • から
  • まで

1 - 40 件 / 63件

新着順 人気順

probability distribution function exampleの検索結果1 - 40 件 / 63件

  • GPT in 60 Lines of NumPy | Jay Mody

    January 30, 2023 In this post, we'll implement a GPT from scratch in just 60 lines of numpy. We'll then load the trained GPT-2 model weights released by OpenAI into our implementation and generate some text. Note: This post assumes familiarity with Python, NumPy, and some basic experience with neural networks. This implementation is for educational purposes, so it's missing lots of features/improv

    • Why I no longer recommend Julia

      For many years I used the Julia programming language for transforming, cleaning, analyzing, and visualizing data, doing statistics, and performing simulations. I published a handful of open-source packages for things like signed distance fields, nearest-neighbor search, and Turing patterns (among others), made visual explanations of Julia concepts like broadcasting and arrays, and used Julia to ma

      • SARS-CoV-2 is associated with changes in brain structure in UK Biobank - Nature

        The global pandemic of SARS-CoV-2 has now claimed millions of lives across the world. There has been an increased focus by the scientific and medical community on the effects of mild-to-moderate COVID-19 in the longer term. There is strong evidence for brain-related pathologies, some of which could be a consequence of viral neurotropism1,2,14 or virus-induced neuroinflammation3,4,5,15, including t

          SARS-CoV-2 is associated with changes in brain structure in UK Biobank - Nature
        • What We Learned from a Year of Building with LLMs (Part I)

          It’s an exciting time to build with large language models (LLMs). Over the past year, LLMs have become “good enough” for real-world applications. The pace of improvements in LLMs, coupled with a parade of demos on social media, will fuel an estimated $200B investment in AI by 2025. LLMs are also broadly accessible, allowing everyone, not just ML engineers and scientists, to build intelligence into

            What We Learned from a Year of Building with LLMs (Part I)
          • Optimizing your LLM in production

            Note: This blog post is also available as a documentation page on Transformers. Large Language Models (LLMs) such as GPT3/4, Falcon, and LLama are rapidly advancing in their ability to tackle human-centric tasks, establishing themselves as essential tools in modern knowledge-based industries. Deploying these models in real-world tasks remains challenging, however: To exhibit near-human text unders

              Optimizing your LLM in production
            • The Roadmap of Mathematics for Machine Learning

              Understanding math will make you a better engineer.So, I am writing the best and most comprehensive book about it. I'm interested Knowing the mathematics behind machine learning algorithms is a superpower. If you have ever built a model for a real-life problem, you probably experienced that familiarity with the details goes a long way if you want to move beyond baseline performance. This is especi

                The Roadmap of Mathematics for Machine Learning
              • Prompt Engineering

                Date: March 15, 2023 | Estimated Reading Time: 21 min | Author: Lilian Weng Prompt Engineering, also known as In-Context Prompting, refers to methods for how to communicate with LLM to steer its behavior for desired outcomes without updating the model weights. It is an empirical science and the effect of prompt engineering methods can vary a lot among models, thus requiring heavy experimentation a

                • microgpt

                  This is a brief guide to my new art project microgpt, a single file of 200 lines of pure Python with no dependencies that trains and inferences a GPT. This file contains the full algorithmic content of what is needed: dataset of documents, tokenizer, autograd engine, a GPT-2-like neural network architecture, the Adam optimizer, training loop, and inference loop. Everything else is just efficiency.

                  • Patterns for Building LLM-based Systems & Products

                    Patterns for Building LLM-based Systems & Products [ llm engineering production 🔥 ] · 66 min read Discussions on HackerNews, Twitter, and LinkedIn “There is a large class of problems that are easy to imagine and build demos for, but extremely hard to make products out of. For example, self-driving: It’s easy to demo a car self-driving around a block, but making it into a product takes a decade.”

                      Patterns for Building LLM-based Systems & Products
                    • Deep Learning for AI – Communications of the ACM

                      How can neural networks learn the rich internal representations required for difficult tasks such as recognizing objects or understanding language? Yoshua Bengio, Yann LeCun, and Geoffrey Hinton are recipients of the 2018 ACM A.M. Turing Award for breakthroughs that have made deep neural networks a critical component of computing. Research on artificial neural networks was motivated by the observa

                      • Illustrating Reinforcement Learning from Human Feedback (RLHF)

                        This article has been translated to Chinese 简体中文 and Vietnamese đọc tiếng việt. Language models have shown impressive capabilities in the past few years by generating diverse and compelling text from human input prompts. However, what makes a "good" text is inherently hard to define as it is subjective and context dependent. There are many applications such as writing stories where you want creati

                          Illustrating Reinforcement Learning from Human Feedback (RLHF)
                        • How a simple Linux kernel memory corruption bug can lead to complete system compromise

                          In this case, reallocating the object as one of those three types didn't seem to me like a nice way forward (although it should be possible to exploit this somehow with some effort, e.g. by using count.counter to corrupt the buf field of seq_file). Also, some systems might be using the slab_nomerge kernel command line flag, which disables this merging behavior. Another approach that I didn't look

                          • LLM Powered Autonomous Agents

                            Date: June 23, 2023 | Estimated Reading Time: 31 min | Author: Lilian Weng Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerfu

                            • GitHub - diff-usion/Awesome-Diffusion-Models: A collection of resources and papers on Diffusion Models

                              DiffEnc: Variational Diffusion with a Learned Encoder Beatrix M. G. Nielsen, Anders Christensen, Andrea Dittadi, Ole Winther arXiv 2023. [Paper] 30 Oct 2023 Upgrading VAE Training With Unlimited Data Plans Provided by Diffusion Models Tim Z. Xiao, Johannes Zenn, Robert Bamler arXiv 2023. [Paper] 30 Oct 2023 Successfully Applying Lottery Ticket Hypothesis to Diffusion Model Chao Jiang, Bo Hui, Boha

                                GitHub - diff-usion/Awesome-Diffusion-Models: A collection of resources and papers on Diffusion Models
                              • Solving Quantitative Reasoning Problems With Language Models

                                Solving Quantitative Reasoning Problems with Language Models Aitor Lewkowycz∗, Anders Andreassen†, David Dohan†, Ethan Dyer†, Henryk Michalewski†, Vinay Ramasesh†, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur∗, Guy Gur-Ari∗, and Vedant Misra∗ Google Research Abstract Language models have achieved remarkable performance on a wide range of tasks that require

                                • Attention Is Off By One

                                  By Evan Miller July 24, 2023 About which one cannot speak, one must pass over in silence. –Wittgenstein Do you see the off-by-one error in this formula? \[ \textrm{Attention}(Q, K, V) = \textrm{softmax}\left(\frac{QK^T}{\sqrt{d}}\right)V \] The attention formula is the central equation of modern AI, but there’s a bug in it that has been driving me nuts the last week. I tried writing a serious-look

                                    Attention Is Off By One
                                  • Andrej Karpathy — AGI is still a decade away

                                    The Andrej Karpathy episode. Andrej explains why reinforcement learning is terrible (but everything else is much worse), why model collapse prevents LLMs from learning the way humans do, why AGI will just blend into the previous ~2.5 centuries of 2% GDP growth, why self driving took so long to crack, and what he sees as the future of education. Watch on YouTube; listen on Apple Podcasts or Spotify

                                      Andrej Karpathy — AGI is still a decade away
                                    • Blog

                                      Hachi: An (Image) Search engine Only the dead have seen the end of war .. George Santayana For quite some time now, i have been working on and off on a fully self-hosted search engine, in hope to make it easier to search across Personal data in an end to end manner. Even as individuals, we are hoarding and generating more and more data with no end in sight. Such "personal" data is being stored fro

                                      • Thinking Fast and Slow - Replicability-Index

                                        2011 was an important year in the history of psychology, especially social psychology. First, it became apparent that one social psychologist had faked results for dozens of publications (https://en.wikipedia.org/wiki/Diederik_Stapel). Second, a highly respected journal published an article with the incredible claim that humans can foresee random events in the future, if they are presented without

                                          Thinking Fast and Slow - Replicability-Index
                                        • Generative Modeling by Estimating Gradients of the Data Distribution | Yang Song

                                          Introduction Existing generative modeling techniques can largely be grouped into two categories based on how they represent probability distributions. likelihood-based models, which directly learn the distribution’s probability density (or mass) function via (approximate) maximum likelihood. Typical likelihood-based models include autoregressive models , normalizing flow models , energy-based mode

                                          • “Death of a Salesforce”: Why AI Will Transform the Next Generation of Sales Tech | Andreessen Horowitz

                                            “Death of a Salesforce”: Why AI Will Transform the Next Generation of Sales Tech The battle between every startup and incumbent comes down to whether the startup gets distribution before the incumbent gets innovation. In sales tech, it’s easy to assume incumbents like Salesforce and Hubspot have the edge. First, they are embedded as “systems of record,” so sales leaders are loath to rip them out a

                                              “Death of a Salesforce”: Why AI Will Transform the Next Generation of Sales Tech | Andreessen Horowitz
                                            • RAPIDS Forest Inference Library: Prediction at 100 million rows per second

                                              IntroductionRandom forests (RF) and gradient-boosted decision trees (GBDTs) have become workhorse models of applied machine learning. XGBoost and LightGBM, popular packages implementing GBDT models, consistently rank among the most commonly used tools by data scientists on the Kaggle platform. We see similar interest in forest-based models in industry, where they are applied to problems ranging fr

                                                RAPIDS Forest Inference Library: Prediction at 100 million rows per second
                                              • What We’ve Learned From A Year of Building with LLMs – Applied LLMs

                                                A practical guide to building successful LLM products, covering the tactical, operational, and strategic. It’s an exciting time to build with large language models (LLMs). Over the past year, LLMs have become “good enough” for real-world applications. And they’re getting better and cheaper every year. Coupled with a parade of demos on social media, there will be an estimated $200B investment in AI

                                                  What We’ve Learned From A Year of Building with LLMs – Applied LLMs
                                                • A decade of major cache incidents at Twitter

                                                  This was co-authored with Yao Yue This is a collection of information on severe (SEV-0 or SEV-1, the most severe incident classifications) incidents at Twitter that were at least partially attributed to cache from the time Twitter started using its current incident tracking JIRA (2012) to date (2022), with one bonus incident from before 2012. Not including the bonus incident, there were 6 SEV-0s a

                                                  • AI Timelines via Cumulative Optimization Power: Less Long, More Short — LessWrong

                                                    The general trend is clear: larger lifetime compute enables systems of greater generality and capability. Generality and performance are both independently expensive, as an efficient general system often ends up requiring combinations of many specialist subnetworks. BNNs and ANNs both implement effective approximations of bayesian learning[29]. Net training compute then measures the total intra-li

                                                      AI Timelines via Cumulative Optimization Power: Less Long, More Short — LessWrong
                                                    • 17 types of similarity and dissimilarity measures used in data science. | Towards Data Science

                                                      The following article explains various methods for computing distances and showing their instances in our daily lives. Additionally, it… Various ML metrics. Inspired by Maarten Grootendorst. "There is no Royal Road to Geometry." – Euclid Quick note: Everything written and visualized has been created by the author unless it was specified. Illustrations and equations were generated using tools like

                                                        17 types of similarity and dissimilarity measures used in data science. | Towards Data Science
                                                      • Migrating Critical Traffic At Scale with No Downtime — Part 2

                                                        Shyam Gala, Javier Fernandez-Ivern, Anup Rokkam Pratap, Devang Shah Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. Behind these perfect moments of entertainment is a complex mechanism, with numerous gears and cogs working in harmony. But what happens when this machinery needs a transformation?

                                                          Migrating Critical Traffic At Scale with No Downtime — Part 2
                                                        • research!rsc: Transparent Telemetry for Open-Source Projects (Transparent Telemetry, Part 1)

                                                          Russ Cox February 8, 2023 research.swtch.com/telemetry-intro How do software developers understand which parts of their software are being used and whether they are performing as expected? The modern answer is telemetry, which means software sending data to answer those questions back to a collection server. This post is about why I believe telemetry is important for open-source projects, and what

                                                          • Understanding Convolutions on Graphs

                                                            Many systems and interactions - social networks, molecules, organizations, citations, physical models, transactions - can be represented quite naturally as graphs. How can we reason about and make predictions within these systems? One idea is to look at tools that have worked well in other domains: neural networks have shown immense predictive power in a variety of learning tasks. However, neural

                                                            • FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision – PyTorch

                                                              Blog FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Attention, as a core layer of the ubiquitous Transformer architecture, is a bottleneck for large language models and long-context applications. FlashAttention (and FlashAttention-2) pioneered an approach to speed up attention on GPUs by minimizing memory reads/writes, and is now used by most libraries to accelerat

                                                                FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision – PyTorch
                                                              • The Annotated Diffusion Model

                                                                In this blog post, we'll take a deeper look into Denoising Diffusion Probabilistic Models (also known as DDPMs, diffusion models, score-based generative models or simply autoencoders) as researchers have been able to achieve remarkable results with them for (un)conditional image/audio/video generation. Popular examples (at the time of writing) include GLIDE and DALL-E 2 by OpenAI, Latent Diffusion

                                                                  The Annotated Diffusion Model
                                                                • Keenadu the tablet conqueror and the links between major Android botnets

                                                                  In April 2025, we reported on a then-new iteration of the Triada backdoor that had compromised the firmware of counterfeit Android devices sold across major marketplaces. The malware was deployed to the system partitions and hooked into Zygote – the parent process for all Android apps – to infect any app on the device. This allowed the Trojan to exfiltrate credentials from messaging apps and socia

                                                                    Keenadu the tablet conqueror and the links between major Android botnets
                                                                  • Aman's AI Journal • Primers • Ilya Sutskever's Top 30

                                                                    Ilya Sutskever’s Top 30 Reading List The First Law of Complexodynamics The Unreasonable Effectiveness of Recurrent Neural Networks Understanding LSTM Networks Recurrent Neural Network Regularization Keeping Neural Networks Simple by Minimizing the Description Length of the Weights Pointer Networks ImageNet Classification with Deep Convolutional Neural Networks Order Matters: Sequence to Sequence f

                                                                    • Llama from scratch (or how to implement a paper without crying)

                                                                      Llama from scratchI want to provide some tips from my experience implementing a paper. I'm going to cover my tips so far from implementing a dramatically scaled-down version of Llama for training TinyShakespeare. This post is heavily inspired by Karpathy's Makemore series, which I highly recommend. I'm only going to loosely follow the layout of their paper; while the formatting and order of sectio

                                                                        Llama from scratch (or how to implement a paper without crying)
                                                                      • How the RWKV language model works

                                                                        In this post, I will explain the details of how RWKV generates text. For a high level overview of what RWKV is and what is so special about it, check out the other post about RWKV. To explain exactly how RWKV works, I think it is easiest to look at a simple implementation of it. The following ~100 line code (based on RWKV in 150 lines) is a minimal implementation of a relatively small (430m parame

                                                                        • https://deeplearningtheory.com/PDLT.pdf

                                                                          The Principles of Deep Learning Theory An Effective Theory Approach to Understanding Neural Networks Daniel A. Roberts and Sho Yaida based on research in collaboration with Boris Hanin drob@mit.edu, shoyaida@fb.com ii Contents Preface vii 0 Initialization 1 0.1 An Effective Theory Approach . . . . . . . . . . . . . . . . . . . . . . . . 2 0.2 The Theoretical Minimum . . . . . . . . . . . . . . . .

                                                                          • Large Text Compression Benchmark

                                                                             Large Text Compression Benchmark Matt Mahoney Last update: Mar. 25, 2026. history This competition ranks lossless data compression programs by the compressed size (including the size of the decompression program) of the first 109 bytes of the XML text dump of the English version of Wikipedia on Mar. 3, 2006. About the test data. The goal of this benchmark is not to find the best overall compress

                                                                            • 転移学習(TL:Transfer Learning)とFine Tuningの違いって? - ts0818のブログ

                                                                              xtech.nikkei.com 人工知能(AI)の能力が人間を上回る領域が、より高度かつ複雑な方向へ拡大を続けている。2019年10月末には英ディープマインド(DeepMind)のAIが米ブリザードエンターテインメント(Blizzard Entertainment)のオンライン戦略ゲーム「StarCraft II」の対戦で大きな成果を上げたことが、欧米で話題となった。囲碁よりもオンライン戦略ゲームで人間に勝つことの方が、現実世界でのAI活用を目指す上で重要とされているためだ。 グーグルのAIが「対戦ゲーム」で人間を倒した、囲碁での勝利より画期的な理由 | 日経クロステック(xTECH) ⇧ 地球外生命体が地球を侵略しに来るような事態が起こりえたとしても、AIが防衛してくれるっていうことですかね、夢広がりますね、どうもボクです。 ということで、「多層ニューラルネットワーク」とかが絡んでくる

                                                                                転移学習(TL:Transfer Learning)とFine Tuningの違いって? - ts0818のブログ
                                                                              • What We Learned from a Year of Building with LLMs (Part II)

                                                                                A possibly apocryphal quote attributed to many leaders reads: “Amateurs talk strategy and tactics. Professionals talk operations.” Where the tactical perspective sees a thicket of sui generis problems, the operational perspective sees a pattern of organizational dysfunction to repair. Where the strategic perspective sees an opportunity, the operational perspective sees a challenge worth rising to.

                                                                                  What We Learned from a Year of Building with LLMs (Part II)
                                                                                • US10452978B2 - Attention-based sequence transduction neural networks - Google Patents

                                                                                  US10452978B2 - Attention-based sequence transduction neural networks - Google Patents Attention-based sequence transduction neural networks Download PDF Info Publication number US10452978B2 US10452978B2 US16/021,971 US201816021971A US10452978B2 US 10452978 B2 US10452978 B2 US 10452978B2 US 201816021971 A US201816021971 A US 201816021971A US 10452978 B2 US10452978 B2 US 10452978B2 Authority US Unit