並び順

ブックマーク数

期間指定

  • から
  • まで

1 - 40 件 / 47件

新着順 人気順

language-modelの検索結果1 - 40 件 / 47件

  • LMQL(Language Model Query Language)概観|mah_lab / 西見 公宏

    LMQL Playgroundでクエリを試すLMQLには動作を簡単に検証できるPlaygroundが用意されています。ローカルでPlaygroundを起動することもできます。 まずはGetting Startedで紹介されている以下のクエリを実行します。 argmax "Hello[WHO]" from "openai/text-ada-001" where len(WHO) < 10「Run」ボタンをクリックするとOpenAIのAPI KEYを求められるので、入力します。 実行するとModel Responseの枠に結果が表示されます。 LMQLの基本構造LMQLは記法的にはSQLと似ていて、以下のような構造を持っています。 デコーダ節(Decoder Clause): テキスト生成に使用するデコード・アルゴリズムを指定します。LMQLでは様々なデコード・アルゴリズムを選択することができ

      LMQL(Language Model Query Language)概観|mah_lab / 西見 公宏
    • Introducing Code Llama, a state-of-the-art large language model for coding

      Today, we are releasing Code Llama, a large language model (LLM) that can use text prompts to generate code. Code Llama is state-of-the-art for publicly available LLMs on code tasks, and has the potential to make workflows faster and more efficient for current developers and lower the barrier to entry for people who are learning to code. Code Llama has the potential to be used as a productivity an

        Introducing Code Llama, a state-of-the-art large language model for coding
      • 大規模言語モデル(LLM:Large Language Model)とは?

        大規模言語モデル(LLM:Large Language Model)とは?:AI・機械学習の用語辞典 連載目次 用語解説 大規模言語モデル(LLM:Large Language Models)とは、大量のテキストデータを使ってトレーニングされた自然言語処理のモデルのことである。一般的には大規模言語モデルをファインチューニングなどすることによって、テキスト分類や感情分析、情報抽出、文章要約、テキスト生成、質問応答といった、さまざまな自然言語処理(NLP:Natural Language Processing)タスクに適応できる(図1)。 大規模言語モデルの代表例としては、2018年にGoogleが発表した「BERT」や、2020年にOpenAIが発表した「GPT-3」などが挙げられる。2022年12月に発表された「ChatGPT」は、2022年初頭にトレーニングした「GPT-3.5シリーズ」

          大規模言語モデル(LLM:Large Language Model)とは?
        • GitHub - yandex/YaLM-100B: Pretrained language model with 100B parameters

          YaLM 100B is a GPT-like neural network for generating and processing text. It can be used freely by developers and researchers from all over the world. The model leverages 100 billion parameters. It took 65 days to train the model on a cluster of 800 A100 graphics cards and 1.7 TB of online texts, books, and countless other sources in both English and Russian. Training details and best practices o

            GitHub - yandex/YaLM-100B: Pretrained language model with 100B parameters
          • GitHub - jart/emacs-copilot: Large language model code completion for Emacs

            You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

              GitHub - jart/emacs-copilot: Large language model code completion for Emacs
            • GitHub - BlinkDL/ChatRWKV: ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.

              You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                GitHub - BlinkDL/ChatRWKV: ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
              • OWASP Top 10 for Large Language Model Applications | OWASP Foundation

                This website uses cookies to analyze our traffic and only share that information with our analytics partners. Accept The OWASP Top 10 for Large Language Model Applications project aims to educate developers, designers, architects, managers, and organizations about the potential security risks when deploying and managing Large Language Models (LLMs). The project provides a list of the top 10 most c

                • Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrou

                  Philosophy We strive to create an environment conducive to many different types of research across many different time scales and levels of risk. Learn more about our Philosophy Learn more

                    Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrou
                  • How to get meaning from text with language model BERT | AI Explained

                    In this video, we give a step-by-step walkthrough of self-attention, the mechanism powering the deep learning model BERT, and other state-of-the-art transformer models for natural language processing (NLP). More on attention and BERT: https://bit.ly/38vpOyW How to solve a text classification problem with BERT with this tutorial: https://bit.ly/2Ij6tGa 0:00 Introduction of NLP 0:39 Text tokenizati

                      How to get meaning from text with language model BERT | AI Explained
                    • How to train a new language model from scratch using Transformers and Tokenizers

                      How to train a new language model from scratch using Transformers and Tokenizers Over the past few months, we made several improvements to our transformers and tokenizers libraries, with the goal of making it easier than ever to train a new language model from scratch. In this post we’ll demo how to train a “small” model (84 M parameters = 6 layers, 768 hidden size, 12 attention heads) – that’s th

                        How to train a new language model from scratch using Transformers and Tokenizers
                      • Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

                        We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset

                        • LayoutLM (Layout Language Model)を試したら精度がめっちゃ上がった件について | 株式会社シナモン(シナモンAI)

                          技術 LayoutLM (Layout Language Model)を試したら精度がめっちゃ上がった件について 2021.01.18 こんにちは。シナモンAI広報担当です。 シナモンAIでは自然言語処理技術を用いたプロダクトの Aurora Clipper(オーロラ・クリッパー)をご提供しており、特定の文脈を持つ日付(イベント開催日や契約日等)や人物名(契約者の関係)の取得、長い文章からの要点抽出、テキストの分類など様々な用途で用いられる製品です。 今回は、Aurora Clipperの基礎となるモデルとして、LayoutLMと呼ばれるアルゴリズムを実験した結果をAurora Clipperの開発をリードする 藤井 からご紹介いたします。 テキストの位置を特徴として利用するLayoutLMとは? LayoutLM(Layout Language Model)とは、Microsoft Re

                            LayoutLM (Layout Language Model)を試したら精度がめっちゃ上がった件について | 株式会社シナモン(シナモンAI)
                          • GitHub - Hannibal046/Awesome-LLM: Awesome-LLM: a curated list of Large Language Model

                            If you're interested in the field of LLM, you may find the above list of milestone papers helpful to explore its history and state-of-the-art. However, each direction of LLM offers a unique set of insights and contributions, which are essential to understanding the field as a whole. For a detailed list of papers in various subfields, please refer to the following link: Awesome-LLM-hallucination -

                              GitHub - Hannibal046/Awesome-LLM: Awesome-LLM: a curated list of Large Language Model
                            • BloombergGPT: A Large Language Model for Finance

                              The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion pa

                                BloombergGPT: A Large Language Model for Finance
                              • GitHub - tanreinama/GPTSAN: General-purpose Swich transformer based Japanese language model

                                You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                  GitHub - tanreinama/GPTSAN: General-purpose Swich transformer based Japanese language model
                                • GitHub - SJTU-IPADS/PowerInfer: High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

                                  You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                    GitHub - SJTU-IPADS/PowerInfer: High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
                                  • TextPruner による大規模言語モデルの軽量化 / Large language model pruning using TextPruner

                                    2022/05/13 の NLP Hacks で LT をした際の発表資料

                                      TextPruner による大規模言語モデルの軽量化 / Large language model pruning using TextPruner
                                    • 日本経済新聞が「NIKKEI Language Model」の開発を発表 40年分の記事情報で磨かれた経済情報特化型大規模言語モデル | Ledge.ai

                                      Top > ビジネス > 日本経済新聞が「NIKKEI Language Model」の開発を発表 40年分の記事情報で磨かれた経済情報特化型大規模言語モデル

                                        日本経済新聞が「NIKKEI Language Model」の開発を発表 40年分の記事情報で磨かれた経済情報特化型大規模言語モデル | Ledge.ai
                                      • The Rise and Potential of Large Language Model Based Agents: A Survey

                                        For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training stra

                                        • Cramming: Training a Language Model on a Single GPU in One Day

                                          Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day? We investigate t

                                          • RAFT: Adapting Language Model to Domain Specific RAG

                                            Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain su

                                            • PaLM-E: An Embodied Multimodal Language Model

                                              Danny Driess1,2 Fei Xia1 Mehdi S. M. Sajjadi3 Corey Lynch1 Aakanksha Chowdhery3 Brian Ichter1 Ayzaan Wahid1 Jonathan Tompson1 Quan Vuong1 Tianhe Yu1 Wenlong Huang1 Yevgen Chebotar1 Pierre Sermanet1 Daniel Duckworth3 Sergey Levine1 Vincent Vanhoucke1 Karol Hausman1 Marc Toussaint2 Klaus Greff3 Andy Zeng1 Igor Mordatch3 Pete Florence1 1 2 3 Abstract Large language models have been demonstrated to pe

                                                PaLM-E: An Embodied Multimodal Language Model
                                              • GitHub - pfnet-research/japanese-lm-fin-harness: Japanese Language Model Financial Evaluation Harness

                                                You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                                  GitHub - pfnet-research/japanese-lm-fin-harness: Japanese Language Model Financial Evaluation Harness
                                                • OpenAI's GPT-3 Language Model: A Technical Overview

                                                  Notice GPT-2 1.5B is trained with 40GB of Internet text, which is roughly 10 Billion tokens (conversely assuming the average token size is 4 characters). So GPT-3 175B has a lower data compression ratio 300 / 175 = 1.71 in comparison to GPT-2 1.5G 10 / 1.5 = 6.66. This raises the question that, with this amount of parameters, whether the model functions by memorizing the data in the training and p

                                                  • GitHub - WooooDyy/LLM-Agent-Paper-List: The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

                                                    For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing human level, with AI agents considered as a promising vehicle of this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Due to the versatile and remarkable capabilities they demonstrate, large language models (LLMs) are regarded as potential sparks

                                                      GitHub - WooooDyy/LLM-Agent-Paper-List: The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
                                                    • GitHub - XiongjieDai/GPU-Benchmarks-on-LLM-Inference: Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?

                                                      You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                                        GitHub - XiongjieDai/GPU-Benchmarks-on-LLM-Inference: Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?
                                                      • Large Language Model(LLM)をもっと活用したい!"LangChain"を使ってみました。 - CCCMKホールディングス TECH Labの Tech Blog

                                                        こんにちは、CCCMKホールディングス TECH LABの三浦です。 "シャドーイング"という英語の学習方法があり、最近試してみています。これは英語の音声を聞きながら、それを追いかけるように発音する、という方法で、ヒアリングやスピーキング力の改善に効果があるそうです。英語を発音しようとするとなかなか思ったように口が回らないのですが、英語を話すための口周りの筋肉が整っていない、とったことも要因としてあるようです。動画を見ながら発声練習を始めてみたので、今後改善されるといいな、と期待しています。 最近はLarge Language Model(LLM)について、毎日のように新しい情報がインターネットなどで見つかります。本当にホットな話題なんだな、と感じています。このブログでも最近LLMによりよい指示を与えるためのPrompt Engineeringのテクニックについて、最近発表された論文などを

                                                          Large Language Model(LLM)をもっと活用したい!"LangChain"を使ってみました。 - CCCMKホールディングス TECH Labの Tech Blog
                                                        • GitHub - databrickslabs/dolly: Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

                                                          Databricks’ Dolly is an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. Based on pythia-12b, Dolly is trained on ~15k instruction/response fine tuning records databricks-dolly-15k generated by Databricks employees in capability domains from the InstructGPT paper, including brainstorming, classification, closed QA,

                                                            GitHub - databrickslabs/dolly: Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
                                                          • Stealing Part of a Production Language Model

                                                            We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \$20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Ba

                                                            • How the RWKV language model works

                                                              In this post, I will explain the details of how RWKV generates text. For a high level overview of what RWKV is and what is so special about it, check out the other post about RWKV. To explain exactly how RWKV works, I think it is easiest to look at a simple implementation of it. The following ~100 line code (based on RWKV in 150 lines) is a minimal implementation of a relatively small (430m parame

                                                              • The RWKV language model: An RNN with the advantages of a transformer

                                                                For a while, I’ve been following and contributing to the RWKV language model, an open source large language model with great potential. As ChatGPT and large language models in general have gotten a lot of attention recently, I think it’s a good time to write about RWKV. In this post, I will try to explain what is so special about RWKV compared to most language models (transformers). The other RWKV

                                                                • Mapping the Mind of a Large Language Model

                                                                  Today we report a significant advance in understanding the inner workings of AI models. We have identified how millions of concepts are represented inside Claude Sonnet, one of our deployed large language models. This is the first ever detailed look inside a modern, production-grade large language model. This interpretability discovery could, in future, help us make AI models safer. We mostly trea

                                                                    Mapping the Mind of a Large Language Model
                                                                  • GitHub - salesforce/ctrl: Conditional Transformer Language Model for Controllable Generation

                                                                    You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

                                                                      GitHub - salesforce/ctrl: Conditional Transformer Language Model for Controllable Generation
                                                                    • Introducing LLaMA: A foundational, 65-billion-parameter language model

                                                                      Introducing LLaMA: A foundational, 65-billion-parameter large language model UPDATE: We just launched Llama 2 - for more information on the latest see our blog post on Llama 2. As part of Meta’s commitment to open science, today we are publicly releasing LLaMA (Large Language Model Meta AI), a state-of-the-art foundational large language model designed to help researchers advance their work in thi

                                                                        Introducing LLaMA: A foundational, 65-billion-parameter language model
                                                                      • Turing-NLG: A 17-billion-parameter language model by Microsoft - Microsoft Research

                                                                        This figure was adapted from a similar image published in DistilBERT (opens in new tab). Turing Natural Language Generation (T-NLG) is a 17 billion parameter language model by Microsoft that outperforms the state of the art on many downstream NLP tasks. We present a demo of the model, including its freeform generation, question answering, and summarization capabilities, to academics for feedback a

                                                                          Turing-NLG: A 17-billion-parameter language model by Microsoft - Microsoft Research
                                                                        • GitHub - Lightning-AI/lit-llama: Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

                                                                          You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                                                            GitHub - Lightning-AI/lit-llama: Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
                                                                          • ScreenAI: A visual language model for UI and visually-situated language understanding

                                                                            Philosophy We strive to create an environment conducive to many different types of research across many different time scales and levels of risk. Learn more about our Philosophy Learn more

                                                                            • The New Language Model Stack

                                                                              ChatGPT unleashed a tidal wave of innovation with large language models (LLMs). More companies than ever before are bringing the power of natural language interaction to their products. The adoption of language model APIs is creating a new stack in its wake. To better understand the applications people are building and the stacks they are using to do so, we spoke with 33 companies across the Sequo

                                                                                The New Language Model Stack
                                                                              • GitHub - neuml/txtai: 💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

                                                                                All-in-one embeddings database txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows. Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases. This enables vector search with SQL, topic modeling, retrieval augmented generation and more. Embeddings databases can stand on their own and/or

                                                                                  GitHub - neuml/txtai: 💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
                                                                                • DarkBERT: A Language Model for the Dark Side of the Internet

                                                                                  Recent research has suggested that there are clear differences in the language used in the Dark Web compared to that of the Surface Web. As studies on the Dark Web commonly require textual analysis of the domain, language models specific to the Dark Web may provide valuable insights to researchers. In this work, we introduce DarkBERT, a language model pretrained on Dark Web data. We describe the s