arXiv.org e[B!]新着記事・評価 - はてなブックマーク

Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
3 users
arxiv.org

Chain-of-thought (CoT) prompting has become a widely used strategy for working with large language and multimodal models. While CoT has been shown to improve performance across many tasks, determining the settings in which it is effective remains an ongoing effort. In particular, it is still an open question in what settings CoT systematically reduces model performance. In this paper, we seek to i
- テクノロジー
- 2024/10/31 10:24

Differential Transformer
4 users
arxiv.org

Transformer tends to overallocate attention to irrelevant context. In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise. Specifically, the differential attention mechanism calculates attention scores as the difference between two separate softmax attention maps. The subtraction cancels noise, promoting the emergence of sparse attentio
- 学び
- 2024/10/21 05:31
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
3 users
arxiv.org

Recent advancements in Large Language Models (LLMs) have sparked interest in their formal reasoning capabilities, particularly in mathematics. The GSM8K benchmark is widely used to assess the mathematical reasoning of models on grade-school-level questions. While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning cap
- テクノロジー
- 2024/10/13 10:51
- 機械学習
- 数学
- AI
What is Entropy?
19 users
arxiv.org

This short book is an elementary course on entropy, leading up to a calculation of the entropy of hydrogen gas at standard temperature and pressure. Topics covered include information, Shannon entropy and Gibbs entropy, the principle of maximum entropy, the Boltzmann distribution, temperature and coolness, the relation between entropy, expected energy and temperature, the equipartition theorem, th
- 学び
- 2024/09/22 07:01
- Physics
- Math
- science
- あとで読む
LLMs Will Always Hallucinate, and We Need to Live With This
24 users
arxiv.org

As Large Language Models become more ubiquitous across domains, it becomes important to examine their inherent limitations critically. This work argues that hallucinations in language models are not just occasional errors but an inevitable feature of these systems. We demonstrate that hallucinations stem from the fundamental mathematical and logical structure of LLMs. It is, therefore, impossible
- 学び
- 2024/09/11 12:53
QUIC is not Quick Enough over Fast Internet
4 users
arxiv.org

QUIC is expected to be a game-changer in improving web application performance. In this paper, we conduct a systematic examination of QUIC's performance over high-speed networks. We find that over fast Internet, the UDP+QUIC+HTTP/3 stack suffers a data rate reduction of up to 45.2% compared to the TCP+TLS+HTTP/2 counterpart. Moreover, the performance gap between QUIC and HTTP/2 grows as the underl
- テクノロジー
- 2024/09/10 11:14
Introduction to Machine Learning
4 users
arxiv.org

This book introduces the mathematical foundations and techniques that lead to the development and analysis of many of the algorithms that are used in machine learning. It starts with an introductory chapter that describes notation used throughout the book and serve at a reminder of basic concepts in calculus, linear algebra and probability and also introduces some measure theoretic terminology, wh
- テクノロジー
- 2024/09/05 15:53
- 機械学習
Meta Knowledge for Retrieval Augmented Large Language Models
6 users
arxiv.org

Retrieval Augmented Generation (RAG) is a technique used to augment Large Language Models (LLMs) with contextually relevant, time-critical, or domain-specific information without altering the underlying model parameters. However, constructing RAG systems that can effectively synthesize information from large and diverse set of documents remains a significant challenge. We introduce a novel data-ce
- テクノロジー
- 2024/08/26 17:45
- RAG
- AI
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
3 users
arxiv.org

One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used as aides to human scientists, e.g. for brainstorming ideas, writing code, or prediction tasks, they still conduct only a small part of the scientific process. This paper presents the first comprehen
- テクノロジー
- 2024/08/13 13:02
- 論文
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach
5 users
arxiv.org

Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Models (LLMs) to efficiently process overly lengthy contexts. However, recent LLMs like Gemini-1.5 and GPT-4 show exceptional capabilities to understand long contexts directly. We conduct a comprehensive comparison between RAG and long-context (LC) LLMs, aiming to leverage the strengths of both. We benchmark RAG and L
- テクノロジー
- 2024/07/29 09:50
Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction
4 users
arxiv.org

The fields of Origin of Life and Artificial Life both question what life is and how it emerges from a distinct set of "pre-life" dynamics. One common feature of most substrates where life emerges is a marked shift in dynamics when self-replication appears. While there are some hypotheses regarding how self-replicators arose in nature, we know very little about the general dynamics, computational p
- 学び
- 2024/07/17 18:19
- あとで読む
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
3 users
arxiv.org

The enduring legacy of Euclidean geometry underpins classical machine learning, which, for decades, has been primarily developed for data lying in Euclidean space. Yet, modern machine learning increasingly encounters richly structured data that is inherently nonEuclidean. This data can exhibit intricate geometric, topological and algebraic structure: from the geometry of the curvature of space-tim
- テクノロジー
- 2024/07/16 04:48
- math
Searching for Best Practices in Retrieval-Augmented Generation
3 users
arxiv.org

Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolong
- 学び
- 2024/07/03 08:52
Mixture-of-Agents Enhances Large Language Model Capabilities
4 users
arxiv.org

Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. With the growing number of LLMs, how to harness the collective expertise of multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages the collective strengths of multiple LLMs through a Mixture-of-Agents (MoA) met
- テクノロジー
- 2024/06/18 08:44
- research
- あとで読む
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
4 users
arxiv.org

This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs, particularly in strategic and mathematical reasoning, MCTSr leverages systematic exploration and heuristic s
- テクノロジー
- 2024/06/13 12:36
Scalable MatMul-free Language Modeling
4 users
arxiv.org

Matrix multiplication (MatMul) typically dominates the overall computational cost of large language models (LLMs). This cost only grows as LLMs scale to larger embedding dimensions and context lengths. In this work, we show that MatMul operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales. Our experiments show that our proposed MatMul-fr
- テクノロジー
- 2024/06/10 18:25
- AI
- 研究
- 技術
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
4 users
arxiv.org

Scale has become a main ingredient in obtaining strong machine learning models. As a result, understanding a model's scaling properties is key to effectively designing both the right training setup as well as future generations of architectures. In this work, we argue that scale and training research has been needlessly complex due to reliance on the cosine schedule, which prevents training across
- テクノロジー
- 2024/06/07 18:49
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
4 users
arxiv.org

Large Language Models (LLMs) are often described as being instances of foundation models - that is, models that transfer strongly across various tasks and conditions in few-show or zero-shot manner, while exhibiting scaling laws that predict function improvement when increasing the pre-training scale. These claims of excelling in different functions and tasks rely on measurements taken across vari
- テクノロジー
- 2024/06/06 19:13
- 機械学習
Your Transformer is Secretly Linear
4 users
arxiv.org

This paper reveals a novel linear characteristic exclusive to transformer decoders, including models such as GPT, LLaMA, OPT, BLOOM and others. We analyze embedding transformations between sequential layers, uncovering a near-perfect linear relationship (Procrustes similarity score of 0.99). However, linearity decreases when the residual component is removed due to a consistently low output norm o
- テクノロジー
- 2024/05/26 01:32
Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents
7 users
arxiv.org

Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking
- テクノロジー
- 2024/05/23 08:58
Sakuga-42M Dataset: Scaling Up Cartoon Research
4 users
arxiv.org

Hand-drawn cartoon animation employs sketches and flat-color segments to create the illusion of motion. While recent advancements like CLIP, SVD, and Sora show impressive results in understanding and generating natural video by scaling large models with extensive datasets, they are not as effective for cartoons. Through our empirical experiments, we argue that this ineffectiveness stems from a not
- 学び
- 2024/05/17 23:26
Seven Failure Points When Engineering a Retrieval Augmented Generation System
6 users
arxiv.org

Software engineers are increasingly adding semantic search capabilities to applications using a strategy known as Retrieval Augmented Generation (RAG). A RAG system involves finding documents that semantically match a query and then passing the documents to a large language model (LLM) such as ChatGPT to extract the right answer using an LLM. RAG systems aim to: a) reduce the problem of hallucinat
- テクノロジー
- 2024/05/17 11:50
- あとで読む
A Primer on the Inner Workings of Transformer-based Language Models
3 users
arxiv.org

The rapid progress of research aimed at interpreting the inner workings of advanced language models has highlighted a need for contextualizing the insights gained from years of work in this area. This primer provides a concise technical introduction to the current techniques used to interpret the inner workings of Transformer-based language models, focusing on the generative decoder-only architect
- テクノロジー
- 2024/05/06 22:09
KAN: Kolmogorov-Arnold Networks
13 users
arxiv.org

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametriz
- テクノロジー
- 2024/05/01 16:37
- 機械学習
- 論文
Building a Large Japanese Web Corpus for Large Language Models
3 users
arxiv.org

Open Japanese large language models (LLMs) have been trained on the Japanese portions of corpora such as CC-100, mC4, and OSCAR. However, these corpora were not created for the quality of Japanese texts. This study builds a large Japanese web corpus by extracting and refining text from the Common Crawl archive (21 snapshots of approximately 63.4 billion pages crawled between 2020 and 2023). This c
- 学び
- 2024/04/30 17:29
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
9 users
arxiv.org

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset
- テクノロジー
- 2024/04/23 11:46
- あとで読む
Many-Shot In-Context Learning
4 users
arxiv.org

Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative
- テクノロジー
- 2024/04/22 02:20
A Survey on Retrieval-Augmented Text Generation for Large Language Models
4 users
arxiv.org

Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of large language models (LLMs) by enabling the dynamic integration of up-to-date external information. This methodology, focusing primarily on the text domain, provides a cost-effective solution to the generation of plausible but incorrect responses by LLMs, thereby enha
- 学び
- 2024/04/18 20:02
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
6 users
arxiv.org

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-te
- テクノロジー
- 2024/04/13 01:46

はてなブックマーク

はてなブックマーク

『arXiv.org e-Print archive』

Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse

Differential Transformer

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

What is Entropy?

LLMs Will Always Hallucinate, and We Need to Live With This

QUIC is not Quick Enough over Fast Internet

Introduction to Machine Learning

Meta Knowledge for Retrieval Augmented Large Language Models

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction

Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures

Searching for Best Practices in Retrieval-Augmented Generation

Mixture-of-Agents Enhances Large Language Model Capabilities

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Scalable MatMul-free Language Modeling

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

Your Transformer is Secretly Linear

Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents

Sakuga-42M Dataset: Scaling Up Cartoon Research

Seven Failure Points When Engineering a Retrieval Augmented Generation System

A Primer on the Inner Workings of Transformer-based Language Models

KAN: Kolmogorov-Arnold Networks

Building a Large Japanese Web Corpus for Large Language Models

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Many-Shot In-Context Learning

A Survey on Retrieval-Augmented Text Generation for Large Language Models

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス

『arXiv.org e-Print archive』

このページはまだブックマークされていません

キーボードショートカット一覧

公式Twitter

はてなのサービス

このページはまだ
ブックマークされていません