並び順

ブックマーク数

期間指定

  • から
  • まで

1 - 40 件 / 121件

新着順 人気順

language-modelの検索結果1 - 40 件 / 121件

  • LMQL(Language Model Query Language)概観|mah_lab / 西見 公宏

    LMQL Playgroundでクエリを試すLMQLには動作を簡単に検証できるPlaygroundが用意されています。ローカルでPlaygroundを起動することもできます。 まずはGetting Startedで紹介されている以下のクエリを実行します。 argmax "Hello[WHO]" from "openai/text-ada-001" where len(WHO) < 10「Run」ボタンをクリックするとOpenAIのAPI KEYを求められるので、入力します。 実行するとModel Responseの枠に結果が表示されます。 LMQLの基本構造LMQLは記法的にはSQLと似ていて、以下のような構造を持っています。 デコーダ節(Decoder Clause): テキスト生成に使用するデコード・アルゴリズムを指定します。LMQLでは様々なデコード・アルゴリズムを選択することができ

      LMQL(Language Model Query Language)概観|mah_lab / 西見 公宏
    • Introducing Code Llama, a state-of-the-art large language model for coding

      Today, we are releasing Code Llama, a large language model (LLM) that can use text prompts to generate code. Code Llama is state-of-the-art for publicly available LLMs on code tasks, and has the potential to make workflows faster and more efficient for current developers and lower the barrier to entry for people who are learning to code. Code Llama has the potential to be used as a productivity an

        Introducing Code Llama, a state-of-the-art large language model for coding
      • 「Visual Studio Code」バージョン1.91公開 拡張機能の開発を効率化する「Chat API」「Language Model API」が利用可能に

        Microsoftは2024年7月4日(米国時間)、WindowsやLinux、macOSに対応するエディタ「Visual Studio Code」(以下、VS Code)のバージョン1.91(June 2024)を公開した。 バージョン1.91ではソース管理、ワークベンチ、言語、拡張機能関連などの機能が強化されている。主なアップデート内容は以下の通り。 ソース管理:変更をグラフで視覚化(プレビュー段階) 変更をグラフで視覚化する実験的な機能が導入された。グラフには、現在のブランチ、現在のブランチの上流ブランチ、オプションのベースブランチが含まれる。グラフのルートは、これらブランチの共通の祖先だ。 関連記事 「Visual Studio Code」バージョン1.90リリース 「GPT-4」Copilot Chatモデルへのアクセスなど機能追加 MicrosoftはVisual Studio

          「Visual Studio Code」バージョン1.91公開 拡張機能の開発を効率化する「Chat API」「Language Model API」が利用可能に
        • 大規模言語モデル(LLM:Large Language Model)とは?

          大規模言語モデル(LLM:Large Language Model)とは?:AI・機械学習の用語辞典 連載目次 用語解説 大規模言語モデル(LLM:Large Language Models)とは、大量のテキストデータを使ってトレーニングされた自然言語処理のモデルのことである。一般的には大規模言語モデルをファインチューニングなどすることによって、テキスト分類や感情分析、情報抽出、文章要約、テキスト生成、質問応答といった、さまざまな自然言語処理(NLP:Natural Language Processing)タスクに適応できる(図1)。 大規模言語モデルの代表例としては、2018年にGoogleが発表した「BERT」や、2020年にOpenAIが発表した「GPT-3」などが挙げられる。2022年12月に発表された「ChatGPT」は、2022年初頭にトレーニングした「GPT-3.5シリーズ」

            大規模言語モデル(LLM:Large Language Model)とは?
          • GitHub - yandex/YaLM-100B: Pretrained language model with 100B parameters

            YaLM 100B is a GPT-like neural network for generating and processing text. It can be used freely by developers and researchers from all over the world. The model leverages 100 billion parameters. It took 65 days to train the model on a cluster of 800 A100 graphics cards and 1.7 TB of online texts, books, and countless other sources in both English and Russian. Training details and best practices o

              GitHub - yandex/YaLM-100B: Pretrained language model with 100B parameters
            • GitHub - jart/emacs-copilot: Large language model code completion for Emacs

              You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                GitHub - jart/emacs-copilot: Large language model code completion for Emacs
              • GitHub - BlinkDL/ChatRWKV: ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.

                You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                  GitHub - BlinkDL/ChatRWKV: ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
                • Metaがコードのコンパイルや最適化を行える商用利用可能な大規模言語モデル「Meta Large Language Model Compiler」をリリース

                  Metaがコードをコンパイルしつつ最適化するという大規模言語モデル「Meta Large Language Model Compiler」をリリースしました。モデルは商用利用可能で、Hugging Faceにてホストされています。 Meta Large Language Model Compiler: Foundation Models of Compiler Optimization | Research - AI at Meta https://ai.meta.com/research/publications/meta-large-language-model-compiler-foundation-models-of-compiler-optimization/ Today we’re announcing Meta LLM Compiler, a family of models

                    Metaがコードのコンパイルや最適化を行える商用利用可能な大規模言語モデル「Meta Large Language Model Compiler」をリリース
                  • OWASP Top 10 for Large Language Model Applications | OWASP Foundation

                    This website uses cookies to analyze our traffic and only share that information with our analytics partners. Accept The OWASP Top 10 for Large Language Model Applications project aims to educate developers, designers, architects, managers, and organizations about the potential security risks when deploying and managing Large Language Models (LLMs). The project provides a list of the top 10 most c

                    • Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrou

                      Philosophy We strive to create an environment conducive to many different types of research across many different time scales and levels of risk. Learn more about our Philosophy Learn more

                        Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrou
                      • How to get meaning from text with language model BERT | AI Explained

                        In this video, we give a step-by-step walkthrough of self-attention, the mechanism powering the deep learning model BERT, and other state-of-the-art transformer models for natural language processing (NLP). More on attention and BERT: https://bit.ly/38vpOyW How to solve a text classification problem with BERT with this tutorial: https://bit.ly/2Ij6tGa 0:00 Introduction of NLP 0:39 Text tokenizati

                          How to get meaning from text with language model BERT | AI Explained
                        • GitHub - Hannibal046/Awesome-LLM: Awesome-LLM: a curated list of Large Language Model

                          If you're interested in the field of LLM, you may find the above list of milestone papers helpful to explore its history and state-of-the-art. However, each direction of LLM offers a unique set of insights and contributions, which are essential to understanding the field as a whole. For a detailed list of papers in various subfields, please refer to the following link: Awesome-LLM-hallucination -

                            GitHub - Hannibal046/Awesome-LLM: Awesome-LLM: a curated list of Large Language Model
                          • How to train a new language model from scratch using Transformers and Tokenizers

                            How to train a new language model from scratch using Transformers and Tokenizers Over the past few months, we made several improvements to our transformers and tokenizers libraries, with the goal of making it easier than ever to train a new language model from scratch. In this post we’ll demo how to train a “small” model (84 M parameters = 6 layers, 768 hidden size, 12 attention heads) – that’s th

                              How to train a new language model from scratch using Transformers and Tokenizers
                            • Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

                              We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset

                              • LayoutLM (Layout Language Model)を試したら精度がめっちゃ上がった件について | 株式会社シナモン(シナモンAI)

                                技術 LayoutLM (Layout Language Model)を試したら精度がめっちゃ上がった件について 2021.01.18 こんにちは。シナモンAI広報担当です。 シナモンAIでは自然言語処理技術を用いたプロダクトの Aurora Clipper(オーロラ・クリッパー)をご提供しており、特定の文脈を持つ日付(イベント開催日や契約日等)や人物名(契約者の関係)の取得、長い文章からの要点抽出、テキストの分類など様々な用途で用いられる製品です。 今回は、Aurora Clipperの基礎となるモデルとして、LayoutLMと呼ばれるアルゴリズムを実験した結果をAurora Clipperの開発をリードする 藤井 からご紹介いたします。 テキストの位置を特徴として利用するLayoutLMとは? LayoutLM(Layout Language Model)とは、Microsoft Re

                                  LayoutLM (Layout Language Model)を試したら精度がめっちゃ上がった件について | 株式会社シナモン(シナモンAI)
                                • BloombergGPT: A Large Language Model for Finance

                                  The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion pa

                                    BloombergGPT: A Large Language Model for Finance
                                  • GitHub - hiroshi-matsuda-rit/NLP2024-tutorial-3: NLP2024 チュートリアル3 作って学ぶ日本語大規模言語モデル - 環境構築手順とソースコード / NLP2024 Tutorial 3: Practicing how to build a Japanese large-scale language model - Environment construction and experimental source codes

                                    You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                      GitHub - hiroshi-matsuda-rit/NLP2024-tutorial-3: NLP2024 チュートリアル3 作って学ぶ日本語大規模言語モデル - 環境構築手順とソースコード / NLP2024 Tutorial 3: Practicing how to build a Japanese large-scale language model - Environment construction and experimental source codes
                                    • GitHub - XiongjieDai/GPU-Benchmarks-on-LLM-Inference: Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?

                                      You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                        GitHub - XiongjieDai/GPU-Benchmarks-on-LLM-Inference: Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?
                                      • GitHub - tanreinama/GPTSAN: General-purpose Swich transformer based Japanese language model

                                        You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                          GitHub - tanreinama/GPTSAN: General-purpose Swich transformer based Japanese language model
                                        • GitHub - SJTU-IPADS/PowerInfer: High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

                                          You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                            GitHub - SJTU-IPADS/PowerInfer: High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
                                          • TextPruner による大規模言語モデルの軽量化 / Large language model pruning using TextPruner

                                            2022/05/13 の NLP Hacks で LT をした際の発表資料

                                              TextPruner による大規模言語モデルの軽量化 / Large language model pruning using TextPruner
                                            • 日本経済新聞が「NIKKEI Language Model」の開発を発表 40年分の記事情報で磨かれた経済情報特化型大規模言語モデル | Ledge.ai

                                              Top > ビジネス > 日本経済新聞が「NIKKEI Language Model」の開発を発表 40年分の記事情報で磨かれた経済情報特化型大規模言語モデル

                                                日本経済新聞が「NIKKEI Language Model」の開発を発表 40年分の記事情報で磨かれた経済情報特化型大規模言語モデル | Ledge.ai
                                              • The Rise and Potential of Large Language Model Based Agents: A Survey

                                                For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training stra

                                                • Cramming: Training a Language Model on a Single GPU in One Day

                                                  Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day? We investigate t

                                                  • RAFT: Adapting Language Model to Domain Specific RAG

                                                    Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain su

                                                    • PaLM-E: An Embodied Multimodal Language Model

                                                      Danny Driess1,2 Fei Xia1 Mehdi S. M. Sajjadi3 Corey Lynch1 Aakanksha Chowdhery3 Brian Ichter1 Ayzaan Wahid1 Jonathan Tompson1 Quan Vuong1 Tianhe Yu1 Wenlong Huang1 Yevgen Chebotar1 Pierre Sermanet1 Daniel Duckworth3 Sergey Levine1 Vincent Vanhoucke1 Karol Hausman1 Marc Toussaint2 Klaus Greff3 Andy Zeng1 Igor Mordatch3 Pete Florence1 1 2 3 Abstract Large language models have been demonstrated to pe

                                                        PaLM-E: An Embodied Multimodal Language Model
                                                      • ScreenAI: A visual language model for UI and visually-situated language understanding

                                                        Philosophy We strive to create an environment conducive to many different types of research across many different time scales and levels of risk. Learn more about our Philosophy Learn more

                                                        • GitHub - pfnet-research/japanese-lm-fin-harness: Japanese Language Model Financial Evaluation Harness

                                                          You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                                            GitHub - pfnet-research/japanese-lm-fin-harness: Japanese Language Model Financial Evaluation Harness
                                                          • OpenAI's GPT-3 Language Model: A Technical Overview

                                                            Notice GPT-2 1.5B is trained with 40GB of Internet text, which is roughly 10 Billion tokens (conversely assuming the average token size is 4 characters). So GPT-3 175B has a lower data compression ratio 300 / 175 = 1.71 in comparison to GPT-2 1.5G 10 / 1.5 = 6.66. This raises the question that, with this amount of parameters, whether the model functions by memorizing the data in the training and p

                                                            • GitHub - WooooDyy/LLM-Agent-Paper-List: The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

                                                              You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                                                GitHub - WooooDyy/LLM-Agent-Paper-List: The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
                                                              • Large Language Model(LLM)をもっと活用したい!"LangChain"を使ってみました。 - CCCMKホールディングス TECH Labの Tech Blog

                                                                こんにちは、CCCMKホールディングス TECH LABの三浦です。 "シャドーイング"という英語の学習方法があり、最近試してみています。これは英語の音声を聞きながら、それを追いかけるように発音する、という方法で、ヒアリングやスピーキング力の改善に効果があるそうです。英語を発音しようとするとなかなか思ったように口が回らないのですが、英語を話すための口周りの筋肉が整っていない、とったことも要因としてあるようです。動画を見ながら発声練習を始めてみたので、今後改善されるといいな、と期待しています。 最近はLarge Language Model(LLM)について、毎日のように新しい情報がインターネットなどで見つかります。本当にホットな話題なんだな、と感じています。このブログでも最近LLMによりよい指示を与えるためのPrompt Engineeringのテクニックについて、最近発表された論文などを

                                                                  Large Language Model(LLM)をもっと活用したい!"LangChain"を使ってみました。 - CCCMKホールディングス TECH Labの Tech Blog
                                                                • GitHub - databrickslabs/dolly: Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

                                                                  You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

                                                                    GitHub - databrickslabs/dolly: Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
                                                                  • Stealing Part of a Production Language Model

                                                                    We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \$20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Ba

                                                                    • How the RWKV language model works

                                                                      In this post, I will explain the details of how RWKV generates text. For a high level overview of what RWKV is and what is so special about it, check out the other post about RWKV. To explain exactly how RWKV works, I think it is easiest to look at a simple implementation of it. The following ~100 line code (based on RWKV in 150 lines) is a minimal implementation of a relatively small (430m parame

                                                                      • The RWKV language model: An RNN with the advantages of a transformer

                                                                        For a while, I’ve been following and contributing to the RWKV language model, an open source large language model with great potential. As ChatGPT and large language models in general have gotten a lot of attention recently, I think it’s a good time to write about RWKV. In this post, I will try to explain what is so special about RWKV compared to most language models (transformers). The other RWKV

                                                                        • Mixture-of-Agents Enhances Large Language Model Capabilities

                                                                          Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. With the growing number of LLMs, how to harness the collective expertise of multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages the collective strengths of multiple LLMs through a Mixture-of-Agents (MoA) met

                                                                          • Mapping the Mind of a Large Language Model

                                                                            Today we report a significant advance in understanding the inner workings of AI models. We have identified how millions of concepts are represented inside Claude Sonnet, one of our deployed large language models. This is the first ever detailed look inside a modern, production-grade large language model. This interpretability discovery could, in future, help us make AI models safer. We mostly trea

                                                                              Mapping the Mind of a Large Language Model
                                                                            • GitHub - salesforce/ctrl: Conditional Transformer Language Model for Controllable Generation

                                                                              You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

                                                                                GitHub - salesforce/ctrl: Conditional Transformer Language Model for Controllable Generation
                                                                              • Jamba: A Hybrid Transformer-Mamba Language Model

                                                                                We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. This flexible architecture allows reso

                                                                                • Turing-NLG: A 17-billion-parameter language model by Microsoft - Microsoft Research

                                                                                  This figure was adapted from a similar image published in DistilBERT (opens in new tab). Turing Natural Language Generation (T-NLG) is a 17 billion parameter language model by Microsoft that outperforms the state of the art on many downstream NLP tasks. We present a demo of the model, including its freeform generation, question answering, and summarization capabilities, to academics for feedback a

                                                                                    Turing-NLG: A 17-billion-parameter language model by Microsoft - Microsoft Research