並び順

ブックマーク数

期間指定

  • から
  • まで

1 - 27 件 / 27件

新着順 人気順

regularization techniquesの検索結果1 - 27 件 / 27件

  • GPT in 60 Lines of NumPy | Jay Mody

    January 30, 2023 In this post, we'll implement a GPT from scratch in just 60 lines of numpy. We'll then load the trained GPT-2 model weights released by OpenAI into our implementation and generate some text. Note: This post assumes familiarity with Python, NumPy, and some basic experience with neural networks. This implementation is for educational purposes, so it's missing lots of features/improv

    • Vibe physics: The AI grad student

      Can AI do theoretical physics? In this guest post, professor of physics Matthew Schwartz decided to find out by supervising Claude through a real research calculation, start to finish, without ever touching a file himself. His account of what happened is below. SummaryI guided Claude Opus 4.5 through a real theoretical physics calculation, encapsulating the complexity of code and computations behi

        Vibe physics: The AI grad student
      • Patterns for Building LLM-based Systems & Products

        Patterns for Building LLM-based Systems & Products [ llm engineering production 🔥 ] · 66 min read Discussions on HackerNews, Twitter, and LinkedIn “There is a large class of problems that are easy to imagine and build demos for, but extremely hard to make products out of. For example, self-driving: It’s easy to demo a car self-driving around a block, but making it into a product takes a decade.”

          Patterns for Building LLM-based Systems & Products
        • Andrej Karpathy — AGI is still a decade away

          The Andrej Karpathy episode. Andrej explains why reinforcement learning is terrible (but everything else is much worse), why model collapse prevents LLMs from learning the way humans do, why AGI will just blend into the previous ~2.5 centuries of 2% GDP growth, why self driving took so long to crack, and what he sees as the future of education. Watch on YouTube; listen on Apple Podcasts or Spotify

            Andrej Karpathy — AGI is still a decade away
          • Mixture of Experts Explained

            There is a second iteration (Feb 2026) of the blog post where we cover how the transformers library has built around MoEs to make them "first class citizens" of the library and the Hub. Here is the link to the post: Mixture of Experts (MoEs) in Transformers With the release of Mixtral 8x7B (announcement, model card), a class of transformer has become the hottest topic in the open AI community: Mix

              Mixture of Experts Explained
            • Annotated history of modern AI and deep neural networks

              For a while, DanNet enjoyed a monopoly. From 2011 to 2012 it won every contest it entered, winning four of them in a row (15 May 2011, 6 Aug 2011, 1 Mar 2012, 10 Sep 2012).[GPUCNN5] In particular, at IJCNN 2011 in Silicon Valley, DanNet blew away the competition and achieved the first superhuman visual pattern recognition[DAN1] in an international contest. DanNet was also the first deep CNN to win

                Annotated history of modern AI and deep neural networks
              • 経済モデルを解くための機械学習 - himaginary’s diary

                というNBER論文が上がっている(ungated版)。原題は「Deep Learning for Solving Economic Models」で、著者はJesús Fernández-Villaverde(ペンシルベニア大)。 以下はその要旨。 The ongoing revolution in artificial intelligence, especially deep learning, is transforming research across many fields, including economics. Its impact is particularly strong in solving equilibrium economic models. These models often lack closed-form solutions, so economis

                  経済モデルを解くための機械学習 - himaginary’s diary
                • AI Timelines via Cumulative Optimization Power: Less Long, More Short — LessWrong

                  The general trend is clear: larger lifetime compute enables systems of greater generality and capability. Generality and performance are both independently expensive, as an efficient general system often ends up requiring combinations of many specialist subnetworks. BNNs and ANNs both implement effective approximations of bayesian learning[29]. Net training compute then measures the total intra-li

                    AI Timelines via Cumulative Optimization Power: Less Long, More Short — LessWrong
                  • NeRF at CVPR 2022

                    There are more than 50 papers related to Neural Radiance Fields (NeRFs) at the CVPR 2022 conference. With my former student and now colleague at Google Research, Andrew Marmon, we rounded up all papers we could find and organized them here for our edification, and your reading pleasure. Below are all the papers at CVPR’22 that we could find by scanning titles and reading the associated papers, som

                    • Aman's AI Journal • Primers • Ilya Sutskever's Top 30

                      Ilya Sutskever’s Top 30 Reading List The First Law of Complexodynamics The Unreasonable Effectiveness of Recurrent Neural Networks Understanding LSTM Networks Recurrent Neural Network Regularization Keeping Neural Networks Simple by Minimizing the Description Length of the Weights Pointer Networks ImageNet Classification with Deep Convolutional Neural Networks Order Matters: Sequence to Sequence f

                      • A Brief History of Time Series Models

                        [updated on August 20, 2024] TL;DR: For folks who are interested in learning more about time series models, below is an incomplete roadmap that attempts to summarize the development of this complex, fast evolving field. M Competition is the equivalence of ImageNet to computer vision for time series model and deep learning beat traditional statistical models for the first time in M4 that took place

                          A Brief History of Time Series Models
                        • Too much efficiency makes everything worse: overfitting and the strong version of Goodhart’s law

                          Increased efficiency can sometimes, counterintuitively, lead to worse outcomes. This is true almost everywhere. We will name this phenomenon the strong version of [Goodhart's law](https://en.wikipedia.org/wiki/Goodhart%27s_law). As one example, more efficient centralized tracking of student progress by standardized testing seems like such a good idea that well-intentioned laws [mandate it](https:/

                          • https://deeplearningtheory.com/PDLT.pdf

                            The Principles of Deep Learning Theory An Effective Theory Approach to Understanding Neural Networks Daniel A. Roberts and Sho Yaida based on research in collaboration with Boris Hanin drob@mit.edu, shoyaida@fb.com ii Contents Preface vii 0 Initialization 1 0.1 An Effective Theory Approach . . . . . . . . . . . . . . . . . . . . . . . . 2 0.2 The Theoretical Minimum . . . . . . . . . . . . . . . .

                            • Well-tuned Simple Nets Excel on Tabular Datasets

                              Tabular datasets are the last "unconquered castle" for deep learning, with traditional ML methods like Gradient-Boosted Decision Trees still performing strongly even against recent specialized neural architectures. In this paper, we hypothesize that the key to boosting the performance of neural networks lies in rethinking the joint and simultaneous application of a large set of modern regularizati

                              • Large Transformer Model Inference Optimization

                                Date: January 10, 2023 | Estimated Reading Time: 9 min | Author: Lilian Weng [Updated on 2023-01-24: add a small section on Distillation.] Large transformer models are mainstream nowadays, creating SoTA results for a variety of tasks. They are powerful but very expensive to train and use. The extremely high inference cost, in both time and memory, is a big bottleneck for adopting a powerful transf

                                • Tongyi DeepResearch: A New Era of Open-Source AI Researchers

                                  September 16, 2025 · 12 min · 2515 words · DeepResearch Team, Tongyi Lab | Translations:中文 GITHUB HUGGINGFACE MODELSCOPE SHOWCASE From Chatbot to Autonomous Agent#We are proud to present Tongyi DeepResearch, the first fully open‑source Web Agent to achieve performance on par with OpenAI’s DeepResearch across a comprehensive suite of benchmarks. Tongyi DeepResearch demonstrates state‑of‑the‑art res

                                  • Optimal Transport for Machine Learners

                                    Optimal Transport is a foundational mathematical theory that connects optimization, partial differential equations, and probability. It offers a powerful framework for comparing probability distributions and has recently become an important tool in machine learning, especially for designing and evaluating generative models. These course notes cover the fundamental mathematical aspects of OT, inclu

                                    • GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

                                      Accepted at ICLR 2026 (Oral). GEPA: REFLECTIVE PROMPT EVOLUTION CAN OUTPER- FORM REINFORCEMENT LEARNING Lakshya A Agrawal1 , Shangyin Tan1 , Dilara Soylu2 , Noah Ziems4 , Rishi Khare1 , Krista Opsahl-Ong5 , Arnav Singhvi2,5 , Herumb Shandilya2 , Michael J Ryan2 , Meng Jiang4 , Christopher Potts2 , Koushik Sen1 , Alexandros G. Dimakis1,3 , Ion Stoica1 , Dan Klein1 , Matei Zaharia1,5 , Omar Khattab6

                                      • Linear and Logistic Regression in Machine Learning

                                        Logistic and Linear Regression are two fundamental statistical methods used for predictive modeling within the supervised machine learning framework. Regression analysis and classification are two of the most common approaches in machine learning. Linear regression is one of the primary and most fundamental tools for regression analysis. In contrast, Logistic regression is a fundamental tool for c

                                          Linear and Logistic Regression in Machine Learning
                                        • Version 1.0

                                          Version 1.0# For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 1.0. Legend for changelogs Major Feature something big that you couldn’t do before. Feature something that you couldn’t do before. Efficiency an existing feature now may not require as much computation or memory. Enhancement a miscellaneous minor improvement. Fix somethin

                                            Version 1.0
                                          • Transformerよりもシンプル?「MLP-Mixer」爆誕(1日目) ~Abstract / Introduction編~

                                            Transformerよりもシンプル?「MLP-Mixer」爆誕(1日目) ~Abstract / Introduction編~ ニツオです。TwitterでAIやMLについて関連する話題を紹介してます。海外の研究者をフォローしていますので、情報源を増やしたい方はお気軽にフォローください。 さて、2021年5月にMLP-Mixerというモデルが爆誕しました。本日はその解説シリーズ1日目です。 1日目: Abstract / Introduction 2日目: Mixer Architecture 3日目: Experiments 4日目: Related Work 5日目: Conclusion 6日目: Appendix 7日目: Source Code 「MLP-Mixer: An all-MLP Architecture for Vision」の原文はこちらです。2021年5月4日に

                                              Transformerよりもシンプル?「MLP-Mixer」爆誕(1日目) ~Abstract / Introduction編~
                                            • How to Avoid Overfitting in Machine Learning Model?

                                              Overfitting is a typical mistake that many machine learning engineers make, typically beginners. Unfortunately, this mistake can completely ruin your machine learning model, producing incorrect outputs and leading to making the wrong decision. What is Overfitting in Machine Learning?Overfitting in Data Science occurs when a statistical model fits precisely against its training data. It is a modeli

                                                How to Avoid Overfitting in Machine Learning Model?
                                              • The Little Book of Deep Learning

                                                The Little Book of Deep Learning François Fleuret François Fleuret is a professor of computer sci- ence at the University of Geneva, Switzerland. The cover illustration is a schematic of the Neocognitron by Fukushima [1980], a key an- cestor of deep neural networks. This ebook is formatted to fit on a phone screen. Contents Contents 5 List of figures 7 Foreword 8 I Foundations 10 1 Machine Learnin

                                                • Introduction to Machine Learning

                                                  Machine Learning is making a buzz in the industry. And it’s the right time to get familiar with it. Let’s get the basics right. Let’s get started. What is Machine LearningWhat the heck is machine learning? If I had to quote it in a single sentence, I would say, ‘Machine Learning is a way to find a pattern in data to predict the future. The above is not the only definition of machine learning. Ther

                                                    Introduction to Machine Learning
                                                  • A Short Chronology Of Deep Learning For Tabular Data

                                                    [Last updated: Jan 23, 2023] In my lectures, I emphasize that deep learning is really good for unstructured data (essentially, that’s the opposite of tabular data). Deep learning is sometimes referred to as “representation learning” because its strength is the ability to learn the feature extraction pipeline. Most tabular datasets already represent (typically manually) extracted features, so there

                                                      A Short Chronology Of Deep Learning For Tabular Data
                                                    • 最適輸送距離に基づく分布的ロバスト最適化とその周辺 - 冷めたコーヒー

                                                      このエントリは「数理最適化 Advent Calendar 2022」の 24 日目の記事です.わたしの前後の記事は: 23 日目は @YamagenSakam さんによる 『IIRフィルタの設計問題を焼きなまし法で解いてみる』 25 日目は @snowberryfield さんによる 『整数計画問題のメタヒューリスティクス向け前処理手法について書きます』 です.Advent Calendar は 2020 年の「数理最適化 Advent Calendar 2020」以来 2 年ぶり 2 回目です. 前回の Advent Calendar では共役勾配法と呼ばれる最適化手法について扱いました.今回は,近年盛んに研究されている分布的ロバスト最適化というモデリング手法を紹介したいと思います. 本エントリは,わたしが今年読んだ論文の中で特に印象的だった [Shafieezadeh-Abadeh,

                                                        最適輸送距離に基づく分布的ロバスト最適化とその周辺 - 冷めたコーヒー
                                                      • How I Used Stable Diffusion and Dreambooth to Create A Painted Portrait of My Dog

                                                        Introduction When I first started playing with Stable Diffusion text-to-image generation, in August 2022, my immediate reaction was, "ZOMG! I need to make art prints for my art wall!". Only to then immediately face-plant because vanilla Stable Diffusion is quite challenging to tame. If you are trying to reproduce a specific subject, you need to utilize additional strategies and techniques, none of

                                                          How I Used Stable Diffusion and Dreambooth to Create A Painted Portrait of My Dog
                                                        1