本文「regularization techniques」を検索

1 - 27 件 / 27件

新着順人気順

絞り込み

検索対象
ブックマーク数
期間
セーフサーチ

regularization techniquesの検索結果1 - 27 件 / 27件

GPT in 60 Lines of NumPy | Jay Mody
- 53 users
- jaykmody.com
- テクノロジー
- 2023/02/10
January 30, 2023 In this post, we'll implement a GPT from scratch in just 60 lines of numpy. We'll then load the trained GPT-2 model weights released by OpenAI into our implementation and generate some text. Note: This post assumes familiarity with Python, NumPy, and some basic experience with neural networks. This implementation is for educational purposes, so it's missing lots of features/improv
- Python
- ML
- あとで読む
- NLP
- 自然言語処理
- 機械学習
- *あとで
Vibe physics: The AI grad student
- 21 users
- www.anthropic.com
- テクノロジー
- 2026/03/25
Can AI do theoretical physics? In this guest post, professor of physics Matthew Schwartz decided to find out by supervising Claude through a real research calculation, start to finish, without ever touching a file himself. His account of what happened is below. SummaryI guided Claude Opus 4.5 through a real theoretical physics calculation, encapsulating the complexity of code and computations behi
- 人工知能
- あとで読む
Patterns for Building LLM-based Systems & Products
- 16 users
- eugeneyan.com
- テクノロジー
- 2023/08/02
Patterns for Building LLM-based Systems & Products [ llm engineering production 🔥 ] · 66 min read Discussions on HackerNews, Twitter, and LinkedIn “There is a large class of problems that are easy to imagine and build demos for, but extremely hard to make products out of. For example, self-driving: It’s easy to demo a car self-driving around a block, but making it into a product takes a decade.”
- LLM
- LLMOps
- text
- あとで読む
Andrej Karpathy — AGI is still a decade away
- 8 users
- www.dwarkesh.com
- テクノロジー
- 2025/10/20
The Andrej Karpathy episode. Andrej explains why reinforcement learning is terrible (but everything else is much worse), why model collapse prevents LLMs from learning the way humans do, why AGI will just blend into the previous ~2.5 centuries of 2% GDP growth, why self driving took so long to crack, and what he sees as the future of education. Watch on YouTube; listen on Apple Podcasts or Spotify
- 人工知能
- あとで読む
Mixture of Experts Explained
- 8 users
- huggingface.co
- テクノロジー
- 2023/12/12
There is a second iteration (Feb 2026) of the blog post where we cover how the transformers library has built around MoEs to make them "first class citizens" of the library and the Hub. Here is the link to the post: Mixture of Experts (MoEs) in Transformers With the release of Mixtral 8x7B (announcement, model card), a class of transformer has become the hottest topic in the open AI community: Mix
Annotated history of modern AI and deep neural networks
- 7 users
- people.idsia.ch/~juergen
- テクノロジー
- 2022/12/30
For a while, DanNet enjoyed a monopoly. From 2011 to 2012 it won every contest it entered, winning four of them in a row (15 May 2011, 6 Aug 2011, 1 Mar 2012, 10 Sep 2012).[GPUCNN5] In particular, at IJCNN 2011 in Silicon Valley, DanNet blew away the competition and achieved the first superhuman visual pattern recognition[DAN1] in an international contest. DanNet was also the first deep CNN to win
- あとで読む
経済モデルを解くための機械学習 - himaginary’s diary
- 6 users
- himaginary.hatenablog.com
- テクノロジー
- 2025/09/16
というNBER論文が上がっている（ungated版）。原題は「Deep Learning for Solving Economic Models」で、著者はJesús Fernández-Villaverde（ペンシルベニア大）。以下はその要旨。 The ongoing revolution in artificial intelligence, especially deep learning, is transforming research across many fields, including economics. Its impact is particularly strong in solving equilibrium economic models. These models often lack closed-form solutions, so economis
- blog
AI Timelines via Cumulative Optimization Power: Less Long, More Short — LessWrong
- 6 users
- www.lesswrong.com
- テクノロジー
- 2022/10/17
The general trend is clear: larger lifetime compute enables systems of greater generality and capability. Generality and performance are both independently expensive, as an efficient general system often ends up requiring combinations of many specialist subnetworks. BNNs and ANNs both implement effective approximations of bayesian learning[29]. Net training compute then measures the total intra-li
- AI
NeRF at CVPR 2022
- 5 users
- dellaert.github.io
- 学び
- 2022/06/21
There are more than 50 papers related to Neural Radiance Fields (NeRFs) at the CVPR 2022 conference. With my former student and now colleague at Google Research, Andrew Marmon, we rounded up all papers we could find and organized them here for our edification, and your reading pleasure. Below are all the papers at CVPR’22 that we could find by scanning titles and reading the associated papers, som
- コンピュータビジョン
- *あとで読む
Aman's AI Journal • Primers • Ilya Sutskever's Top 30
- 5 users
- aman.ai
- テクノロジー
- 2024/11/04
Ilya Sutskever’s Top 30 Reading List The First Law of Complexodynamics The Unreasonable Effectiveness of Recurrent Neural Networks Understanding LSTM Networks Recurrent Neural Network Regularization Keeping Neural Networks Simple by Minimizing the Description Length of the Weights Pointer Networks ImageNet Classification with Deep Convolutional Neural Networks Order Matters: Sequence to Sequence f
- ai
A Brief History of Time Series Models
- 5 users
- medium.com/@ycwong.joe
- テクノロジー
- 2022/10/20
[updated on August 20, 2024] TL;DR: For folks who are interested in learning more about time series models, below is an incomplete roadmap that attempts to summarize the development of this complex, fast evolving field. M Competition is the equivalence of ImageNet to computer vision for time series model and deep learning beat traditional statistical models for the first time in M4 that took place
Too much efficiency makes everything worse: overfitting and the strong version of Goodhart’s law
- 4 users
- sohl-dickstein.github.io
- 世の中
- 2022/11/11
Increased efficiency can sometimes, counterintuitively, lead to worse outcomes. This is true almost everywhere. We will name this phenomenon the strong version of [Goodhart's law](https://en.wikipedia.org/wiki/Goodhart%27s_law). As one example, more efficient centralized tracking of student progress by standardized testing seems like such a good idea that well-intentioned laws [mandate it](https:/
https://deeplearningtheory.com/PDLT.pdf
- 4 users
- deeplearningtheory.com
- テクノロジー
- 2021/06/19
The Principles of Deep Learning Theory An Effective Theory Approach to Understanding Neural Networks Daniel A. Roberts and Sho Yaida based on research in collaboration with Boris Hanin drob@mit.edu, shoyaida@fb.com ii Contents Preface vii 0 Initialization 1 0.1 An Effective Theory Approach . . . . . . . . . . . . . . . . . . . . . . . . 2 0.2 The Theoretical Minimum . . . . . . . . . . . . . . . .
- AI
- Book
Well-tuned Simple Nets Excel on Tabular Datasets
- 4 users
- arxiv.org
- テクノロジー
- 2021/06/23
Tabular datasets are the last "unconquered castle" for deep learning, with traditional ML methods like Gradient-Boosted Decision Trees still performing strongly even against recent specialized neural architectures. In this paper, we hypothesize that the key to boosting the performance of neural networks lies in rethinking the joint and simultaneous application of a large set of modern regularizati
Large Transformer Model Inference Optimization
- 4 users
- lilianweng.github.io
- テクノロジー
- 2023/01/12
Date: January 10, 2023 | Estimated Reading Time: 9 min | Author: Lilian Weng [Updated on 2023-01-24: add a small section on Distillation.] Large transformer models are mainstream nowadays, creating SoTA results for a variety of tasks. They are powerful but very expensive to train and use. The extremely high inference cost, in both time and memory, is a big bottleneck for adopting a powerful transf
Tongyi DeepResearch: A New Era of Open-Source AI Researchers
- 3 users
- tongyi-agent.github.io
- テクノロジー
- 2025/11/02
September 16, 2025 · 12 min · 2515 words · DeepResearch Team, Tongyi Lab | Translations:中文 GITHUB HUGGINGFACE MODELSCOPE SHOWCASE From Chatbot to Autonomous Agent#We are proud to present Tongyi DeepResearch, the first fully open‑source Web Agent to achieve performance on par with OpenAI’s DeepResearch across a comprehensive suite of benchmarks. Tongyi DeepResearch demonstrates state‑of‑the‑art res
- 人工知能
Optimal Transport for Machine Learners
- 3 users
- arxiv.org
- 学び
- 2025/05/14
Optimal Transport is a foundational mathematical theory that connects optimization, partial differential equations, and probability. It offers a powerful framework for comparing probability distributions and has recently become an important tool in machine learning, especially for designing and evaluating generative models. These course notes cover the fundamental mathematical aspects of OT, inclu
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
- 3 users
- arxiv.org
- テクノロジー
- 2026/04/20
Accepted at ICLR 2026 (Oral). GEPA: REFLECTIVE PROMPT EVOLUTION CAN OUTPER- FORM REINFORCEMENT LEARNING Lakshya A Agrawal1 , Shangyin Tan1 , Dilara Soylu2 , Noah Ziems4 , Rishi Khare1 , Krista Opsahl-Ong5 , Arnav Singhvi2,5 , Herumb Shandilya2 , Michael J Ryan2 , Meng Jiang4 , Christopher Potts2 , Koushik Sen1 , Alexandros G. Dimakis1,3 , Ion Stoica1 , Dan Klein1 , Matei Zaharia1,5 , Omar Khattab6
Linear and Logistic Regression in Machine Learning
- 3 users
- www.ejable.com
- テクノロジー
- 2023/10/10
Logistic and Linear Regression are two fundamental statistical methods used for predictive modeling within the supervised machine learning framework. Regression analysis and classification are two of the most common approaches in machine learning. Linear regression is one of the primary and most fundamental tools for regression analysis. In contrast, Logistic regression is a fundamental tool for c
Version 1.0
- 3 users
- scikit-learn.org
- テクノロジー
- 2021/09/10
Version 1.0# For a short description of the main highlights of the release, please refer to Release Highlights for scikit-learn 1.0. Legend for changelogs Major Feature something big that you couldn’t do before. Feature something that you couldn’t do before. Efficiency an existing feature now may not require as much computation or memory. Enhancement a miscellaneous minor improvement. Fix somethin
Transformerよりもシンプル？「MLP-Mixer」爆誕（1日目）～Abstract / Introduction編～
- 3 users
- zenn.dev/attentionplease
- テクノロジー
- 2022/01/12
Transformerよりもシンプル？「MLP-Mixer」爆誕（1日目）～Abstract / Introduction編～ニツオです。TwitterでAIやMLについて関連する話題を紹介してます。海外の研究者をフォローしていますので、情報源を増やしたい方はお気軽にフォローください。さて、2021年5月にMLP-Mixerというモデルが爆誕しました。本日はその解説シリーズ1日目です。 1日目：　Abstract / Introduction 2日目：　Mixer Architecture 3日目：　Experiments 4日目：　Related Work 5日目：　Conclusion 6日目：　Appendix 7日目：　Source Code 「MLP-Mixer: An all-MLP Architecture for Vision」の原文はこちらです。2021年5月4日に
How to Avoid Overfitting in Machine Learning Model?
- 3 users
- www.ejable.com
- テクノロジー
- 2023/11/01
Overfitting is a typical mistake that many machine learning engineers make, typically beginners. Unfortunately, this mistake can completely ruin your machine learning model, producing incorrect outputs and leading to making the wrong decision. What is Overfitting in Machine Learning?Overfitting in Data Science occurs when a statistical model fits precisely against its training data. It is a modeli
The Little Book of Deep Learning
- 3 users
- fleuret.org
- テクノロジー
- 2023/05/08
The Little Book of Deep Learning François Fleuret François Fleuret is a professor of computer sci- ence at the University of Geneva, Switzerland. The cover illustration is a schematic of the Neocognitron by Fukushima [1980], a key an- cestor of deep neural networks. This ebook is formatted to fit on a phone screen. Contents Contents 5 List of figures 7 Foreword 8 I Foundations 10 1 Machine Learnin
- 書籍
- 本
Introduction to Machine Learning
- 3 users
- www.ejable.com
- テクノロジー
- 2023/11/14
Machine Learning is making a buzz in the industry. And it’s the right time to get familiar with it. Let’s get the basics right. Let’s get started. What is Machine LearningWhat the heck is machine learning? If I had to quote it in a single sentence, I would say, ‘Machine Learning is a way to find a pattern in data to predict the future. The above is not the only definition of machine learning. Ther
A Short Chronology Of Deep Learning For Tabular Data
- 3 users
- sebastianraschka.com
- テクノロジー
- 2022/07/25
[Last updated: Jan 23, 2023] In my lectures, I emphasize that deep learning is really good for unstructured data (essentially, that’s the opposite of tabular data). Deep learning is sometimes referred to as “representation learning” because its strength is the ability to learn the feature extraction pipeline. Most tabular datasets already represent (typically manually) extracted features, so there
- あとで読む
最適輸送距離に基づく分布的ロバスト最適化とその周辺 - 冷めたコーヒー
- 3 users
- mirucacule.hatenablog.com
- テクノロジー
- 2022/12/29
このエントリは「数理最適化 Advent Calendar 2022」の 24 日目の記事です．わたしの前後の記事は： 23 日目は @YamagenSakam さんによる『IIRフィルタの設計問題を焼きなまし法で解いてみる』 25 日目は @snowberryfield さんによる『整数計画問題のメタヒューリスティクス向け前処理手法について書きます』です．Advent Calendar は 2020 年の「数理最適化 Advent Calendar 2020」以来 2 年ぶり 2 回目です．前回の Advent Calendar では共役勾配法と呼ばれる最適化手法について扱いました．今回は，近年盛んに研究されている分布的ロバスト最適化というモデリング手法を紹介したいと思います．本エントリは，わたしが今年読んだ論文の中で特に印象的だった [Shafieezadeh-Abadeh,
How I Used Stable Diffusion and Dreambooth to Create A Painted Portrait of My Dog
- 3 users
- www.shruggingface.com
- テクノロジー
- 2023/04/17
Introduction When I first started playing with Stable Diffusion text-to-image generation, in August 2022, my immediate reaction was, "ZOMG! I need to make art prints for my art wall!". Only to then immediately face-plant because vanilla Stable Diffusion is quite challenging to tame. If you are trying to reproduce a specific subject, you need to utilize additional strategies and techniques, none of