arXiv.org e[B!]新着記事・評価 - はてなブックマーク

Seven Failure Points When Engineering a Retrieval Augmented Generation System
3 users
arxiv.org

Software engineers are increasingly adding semantic search capabilities to applications using a strategy known as Retrieval Augmented Generation (RAG). A RAG system involves finding documents that semantically match a query and then passing the documents to a large language model (LLM) such as ChatGPT to extract the right answer using an LLM. RAG systems aim to: a) reduce the problem of hallucinat
- テクノロジー
- 2024/05/17 11:50
- あとで読む

KAN: Kolmogorov-Arnold Networks
11 users
arxiv.org

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametriz
- テクノロジー
- 2024/05/01 16:37
- 機械学習
Building a Large Japanese Web Corpus for Large Language Models
3 users
arxiv.org

Open Japanese large language models (LLMs) have been trained on the Japanese portions of corpora such as CC-100, mC4, and OSCAR. However, these corpora were not created for the quality of Japanese texts. This study builds a large Japanese web corpus by extracting and refining text from the Common Crawl archive (21 snapshots of approximately 63.4 billion pages crawled between 2020 and 2023). This c
- 学び
- 2024/04/30 17:29
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
8 users
arxiv.org

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset
- テクノロジー
- 2024/04/23 11:46
- あとで読む
A Survey on Retrieval-Augmented Text Generation for Large Language Models
4 users
arxiv.org

Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of large language models (LLMs) by enabling the dynamic integration of up-to-date external information. This methodology, focusing primarily on the text domain, provides a cost-effective solution to the generation of plausible but incorrect responses by LLMs, thereby enha
- 学び
- 2024/04/18 20:02
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
7 users
arxiv.org

Scaling laws describe the relationship between the size of language models and their capabilities. Unlike prior studies that evaluate a model's capability via loss or benchmarks, we estimate the number of knowledge bits a model stores. We focus on factual knowledge represented as tuples, such as (USA, capital, Washington D.C.) from a Wikipedia page. Through multiple controlled datasets, we establi
- 学び
- 2024/04/10 22:16
ReALM: Reference Resolution As Language Modeling
10 users
arxiv.org

Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the background. While LLMs have been shown to be extremely powerful for a variety of tasks, their use in ref
- テクノロジー
- 2024/04/03 15:14
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
3 users
arxiv.org

In this paper, we unveil that Language Models (LMs) can acquire new capabilities by assimilating parameters from homologous models without retraining or GPUs. We first introduce DARE to set most delta parameters (i.e., the disparity between fine-tuned and pre-trained parameters) to zeros without affecting the abilities of Supervised Fine-Tuning (SFT) LMs, which randomly Drops delta parameters with
- テクノロジー
- 2024/04/02 15:06
Jamba: A Hybrid Transformer-Mamba Language Model
3 users
arxiv.org

We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. This flexible architecture allows reso
- テクノロジー
- 2024/04/01 22:38
The Elements of Differentiable Programming
18 users
arxiv.org

Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization o
- テクノロジー
- 2024/03/23 16:06
- あとで読む
Evolutionary Optimization of Model Merging Recipes
9 users
arxiv.org

We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically disc
- 学び
- 2024/03/21 09:47
RAFT: Adapting Language Model to Domain Specific RAG
6 users
arxiv.org

Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain su
- テクノロジー
- 2024/03/19 00:15
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
5 users
arxiv.org

In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for la
- テクノロジー
- 2024/03/17 18:13
Stealing Part of a Production Language Model
5 users
arxiv.org

We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \$20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Ba
- テクノロジー
- 2024/03/12 13:42
- セキュリティ
Applied Causal Inference Powered by ML and AI
5 users
arxiv.org

An introduction to the emerging fusion of machine learning and causal inference. The book presents ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and covers Double/Debiased Machine Learning methods to do inference in such models using modern predictive tools.
- テクノロジー
- 2024/03/06 20:07
https://arxiv.org/pdf/2402.17764.pdf
3 users
arxiv.org
- 学び
- 2024/02/29 05:29
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
26 users
arxiv.org

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-t
- 学び
- 2024/02/28 20:02
- LLM
- あとで読む
Hallucination is Inevitable: An Innate Limitation of Large Language Models
7 users
arxiv.org

Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that it is impossible to eliminat
- テクノロジー
- 2024/02/26 22:32
- LLM
- 人工知能
- techfeed
- AI
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
3 users
arxiv.org

While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time
- テクノロジー
- 2024/02/24 11:47
- Algorithm
- ai
More Agents Is All You Need
4 users
arxiv.org

We find that, simply via a sampling-and-voting method, the performance of large language models (LLMs) scales with the number of agents instantiated. Also, this method is orthogonal to existing complicated methods to further enhance LLMs, while the degree of enhancement is correlated to the task difficulty. We conduct comprehensive experiments on a wide range of LLM benchmarks to verify the presen
- 学び
- 2024/02/22 07:23
- あとで読む
Automated Unit Test Improvement using Large Language Models at Meta
15 users
arxiv.org

This paper describes Meta's TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear a set of filters that assure measurable improvement over the original test suite, thereby eliminating problems due to LLM hallucination. We describe the deployment of TestGen-LLM at Meta test-a-thons for the Ins
- テクノロジー
- 2024/02/17 15:42
Mixtral of Experts
3 users
arxiv.org

We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected e
- テクノロジー
- 2024/02/16 16:30
Large Language Models: A Survey
3 users
arxiv.org

Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffman
- テクノロジー
- 2024/02/13 00:54
Grandmaster-Level Chess Without Search
4 users
arxiv.org

The recent breakthrough successes in machine learning are mainly attributed to scale: namely large-scale attention-based architectures and datasets of unprecedented scale. This paper investigates the impact of training at scale for chess. Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with sup
- テクノロジー
- 2024/02/08 18:36
- Software
Bluesky and the AT Protocol: Usable Decentralized Social Media
4 users
arxiv.org

Bluesky is a new social network built upon the AT Protocol, a decentralized foundation for public social media. It was launched in private beta in February 2023, and has grown to over 3 million registered users in the following year. In this paper we introduce the architecture of Bluesky and the AT Protocol, which is inspired by the web itself, but modernized to include streams of real-time update
- テクノロジー
- 2024/02/06 21:53
- Bluesky
- SNS
MM-LLMs: Recent Advances in MultiModal Large Language Models
3 users
arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive surve
- テクノロジー
- 2024/01/27 22:28
- あとで読む
Self-Rewarding Language Models
17 users
arxiv.org

We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal. Current approaches commonly train reward models from human preferences, which may then be bottlenecked by human performance level, and secondly these separate frozen reward models cannot then learn to improve during LLM training. In this work, we study Self-Rewardi
- テクノロジー
- 2024/01/21 09:59
- llm
- あとで読む
Improving Text Embeddings with Large Language Models
3 users
arxiv.org

In this paper, we introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps. Unlike existing methods that often depend on multi-stage intermediate pre-training with billions of weakly-supervised text pairs, followed by fine-tuning with a few labeled datasets, our method does not require building complex training pipelin
- 学び
- 2024/01/03 17:18
- あとで読む
Large Language Models for Generative Information Extraction: A Survey
3 users
arxiv.org

Information extraction (IE) aims to extract structural knowledge (such as entities, relations, and events) from plain natural language texts. Recently, generative Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation, allowing for generalization across various domains and tasks. As a result, numerous works have been proposed to harness abilitie
- テクノロジー
- 2024/01/02 18:33

はてなブックマーク

はてなブックマーク

『arXiv.org e-Print archive』

Seven Failure Points When Engineering a Retrieval Augmented Generation System

KAN: Kolmogorov-Arnold Networks

Building a Large Japanese Web Corpus for Large Language Models

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

A Survey on Retrieval-Augmented Text Generation for Large Language Models

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

ReALM: Reference Resolution As Language Modeling

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

Jamba: A Hybrid Transformer-Mamba Language Model

The Elements of Differentiable Programming

Evolutionary Optimization of Model Merging Recipes

RAFT: Adapting Language Model to Domain Specific RAG

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Stealing Part of a Production Language Model

Applied Causal Inference Powered by ML and AI

https://arxiv.org/pdf/2402.17764.pdf

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Hallucination is Inevitable: An Innate Limitation of Large Language Models

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

More Agents Is All You Need

Automated Unit Test Improvement using Large Language Models at Meta

Mixtral of Experts

Large Language Models: A Survey

Grandmaster-Level Chess Without Search

Bluesky and the AT Protocol: Usable Decentralized Social Media

MM-LLMs: Recent Advances in MultiModal Large Language Models

Self-Rewarding Language Models

Improving Text Embeddings with Large Language Models

Large Language Models for Generative Information Extraction: A Survey

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス

『arXiv.org e-Print archive』

このページはまだブックマークされていません

キーボードショートカット一覧

公式Twitter

はてなのサービス

このページはまだ
ブックマークされていません