[B! multiModal] manboubirdのブックマーク

manboubird id:manboubird

multiModalに関するmanboubirdのブックマーク (44)

llama-cookbook/end-to-end-use-cases/Multi-Modal-RAG/README.md at main · meta-llama/llama-cookbook
manboubird 2025/07/30
llama

rag

multimodal

fashion

clothing

meta

llama3
リンク
llama-cookbook/getting-started/build_with_llama_4.ipynb at main · meta-llama/llama-cookbook
manboubird 2025/07/30
vllm

llm

llama

llama4

tutorial

multimodal

sceneUnderstanding
リンク
Conversational image segmentation with Gemini 2.5- Google Developers Blog
The way AI visually understands images has evolved tremendously. Initially, AI could tell us "where" an object was using bounding boxes. Then, segmentation models arrived, precisely outlining an object's shape. More recently, open-vocabulary models emerged, allowing us to segment objects using less common labels like "blue ski boot" or "xylophone" without needing a predefined list of categories. P
manboubird 2025/07/22
gemini

llm

multimodal

imageSegmentation
リンク
Multimodal APIs
manboubird 2025/07/21
multimodal

llm

model

links

nvidia
リンク
Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM
While these are multimodal models, one can use it as a text only model (as an LLM) without loading the vision encoder in memory. We will talk about this in more detail later in the inference section. Technical Enhancements in Gemma 3 The three core enhancements in Gemma 3 over Gemma 2 are: Longer context length Multimodality Multilinguality In this section, we will cover the technical details that
manboubird 2025/07/21
multimodal

llm

localLlm

gemma
リンク
GitHub - InternLM/xtuner: An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
manboubird 2025/07/21
xtuner

llm

localLlm

multimodal

llama
リンク
Multimodal AI: A Guide to Open-Source Vision Language Models
manboubird 2025/07/21
multimodal

llm

oss

comparison

llama
リンク
Llama 4 | Model Cards and Prompt formats
manboubird 2025/07/01
llama4

llama

multimodal

meta
リンク
Llama 4: Metaがもたらす新時代のマルチモーダルAI革命
Llama 4: Metaがもたらす新時代のマルチモーダルAI革命はじめに 2025年4月5日、Meta AIは待望の新しいAIモデルファミリー「Llama 4」を正式に発表しました。このLlama 4は、Metaにとって初めてのネイティブマルチモーダルモデルであり、また初めてMixture of Experts（MoE）アーキテクチャを採用したモデルでもあります。現代のAI開発において、オープンソースモデルの重要性はますます高まっています。特に、日常生活でAIを活用する人々が増える中、先進的なモデルとシステムが広く公開されることで、誰もがパーソナライズされたAI体験の未来を構築できるようになります。この記事では、Llama 4の革新的な特徴、その技術的背景、競合他社のモデルとの比較、そして将来の展望について詳しく解説します。AI 技術者として、この新たなモデルがもたらす可能性と影響を
manboubird 2025/06/30
イメージグラウンディング機能.ユーザープロンプトと関連する視覚的概念を整合させ、画像内の特定領域にモデルの応答をアンカーする能力に優れています。

llama

llama4

multimodal

llm

MoE
リンク
Building with Llama 4
manboubird 2025/06/29
meta

video

course

llama

multimodal

generativeAi

deeplearningAi

llamaScout

generator
リンク
ハッカー魂 Llama4がMacBookで動く!ローカルLLMの時代が来た
manboubird 2025/06/29
llama

multimodal

llm

video

MacBook

hardware
リンク
Meta の Llama 4 モデルが Amazon Bedrock サーバーレスで使用可能に | Amazon Web Services
Amazon Web Services ブログ Meta の Llama 4 モデルが Amazon Bedrock サーバーレスで使用可能に Meta の最新 AI モデルである Llama 4 Scout 17B と Llama 4 Maverick 17B が、Amazon Bedrock でフルマネージドサーバーレスオプションとしてご利用いただけるようになりました。これらの新しい基盤モデル (FM) は、Early Fusion テクノロジーを利用するネイティブなマルチモーダル機能を提供します。これは、アプリケーションでの正確な画像グラウンディングと拡張コンテキスト処理のために使用できます。 Llama 4 は、革新的な Mixture-of-Experts (MoE) アーキテクチャを採用しています。これは、コストと速度の両方を最適化ながら、推論タスクと画像理解タスク全体で強化さ
manboubird 2025/06/29
llama

multimodal

llm
リンク
Llama 4 の概要｜npaka
以下の記事が面白かったので、簡単にまとめました。・The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation 1. Llama 4本日、「Llama 4 Scout」と「Llama 4 Maverick」がリリースしました。これらは、前例のないコンテキスト長のサポートを備えた初のオープンウェイトネイティブマルチモーダルモデルであり、MoEアーキテクチャを使用して構築されています。また、新しいモデルの教師として機能する最も強力な「Llama 4 Behemoth」のプレビューも行います。・Llama 4 Maverick ・17Bのアクティブパラメータ・128のエキスパート・合計400Bのパラメータ・100万トークンのコンテキスト長・Llama 4 Scout ・17Bのアク
manboubird 2025/06/29
llama

multimodal

llm
リンク
【Llama 4 Scout】巨大モデル並みの知識容量を実現する小型モデル！？MetaのLLMを徹底解説 | WEEL
Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model… pic.twitter.com/Z8P3h0MA1P — AI at Meta (@AI atMeta) April 5, 2025 Llama 4 Scoutとは？概要・特徴を解説 Llama 4 Scoutは、Meta社が公開したLlama
manboubird 2025/06/29
llama

multimodal

llm

googleColab
リンク
Train LLMs Faster, Better, and Smaller with DatologyAI’s Data Curation
manboubird 2025/06/22
llm

datologyAi

multimodal

training
リンク
CLIP Gets a Data Upgrade: Outperforming SoTA with Improved Data Curation Only
manboubird 2025/06/22
datologyAi

multimodal

training
リンク
Home
manboubird 2025/06/22
workshop on Multimodal Representation Learning (MRL)

iclr

workshop

multimodal
リンク
About Me - Ari Morcos
manboubird 2025/06/22
datologyAi

researcher

llm

multimodal
リンク
https://dl.acm.org/doi/pdf/10.1145/3394486.3403280
manboubird 2025/04/29
paper

pinterest

visualAnalysis

multimodal

topicDetection

sns
リンク
Notes on Google’s Gemma 3
manboubird 2025/03/13
gemma

google

multimodal

llm
リンク
1 2 3 次のページ

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx