主流の画像キャプションモデルは通常、2 段階のキャプションモデルです。つまり、事前トレーニング済みの検出器によってオブジェクトの特徴を計算し、それを言語モデルに入力してテキストの説明を生成します。

arxiv_reader のブックマーク 2022/11/07 12:25

<blockquote class="hatena-bookmark-comment"><a class="comment-info" href="https://b.hatena.ne.jp/entry/4727749528315802403/comment/arxiv_reader" data-user-id="arxiv_reader" data-entry-url="https://b.hatena.ne.jp/entry/s/arxiv-check-250201.firebaseapp.com/each/2211.02321v1" data-original-href="https://arxiv-check-250201.firebaseapp.com/each/2211.02321v1" data-entry-favicon="https://cdn-ak2.favicon.st-hatena.com/64?url=https%3A%2F%2Farxiv-check-250201.firebaseapp.com%2Feach%2F2211.02321v1" data-user-icon="/users/arxiv_reader/profile.png">OSIC: 新しいワンステージ画像キャプショナーの造語</a><ul class="comment-tag" style="list-style: none; margin: 0px;"><li style="float: left">[<a href="https://b.hatena.ne.jp/q/arXiv">arXiv</a>]</li><li style="float: left">[<a href="https://b.hatena.ne.jp/q/learning">learning</a>]</li><li style="float: left">[<a href="https://b.hatena.ne.jp/q/benchmark">benchmark</a>]</li><li style="float: left">[<a href="https://b.hatena.ne.jp/q/transformer">transformer</a>]</li><li style="float: left">[<a href="https://b.hatena.ne.jp/q/dataset">dataset</a>]</li><li style="float: left">[<a href="https://b.hatena.ne.jp/q/detection">detection</a>]</li><li style="float: left">[<a href="https://b.hatena.ne.jp/q/representation">representation</a>]</li><li style="float: left">[<a href="https://b.hatena.ne.jp/q/%22arXiv%20reaDer%22">arXiv reaDer</a>]</li><li style="float: left">[<a href="https://b.hatena.ne.jp/q/embedding">embedding</a>]</li><li style="float: left">[<a href="https://b.hatena.ne.jp/q/pre-training">pre-training</a>]</li></ul><br><p style="clear: left">主流の画像キャプション モデルは通常、2 段階のキャプション モデルです。つまり、事前トレーニング済みの検出器によってオブジェクトの特徴を計算し、それを言語モデルに入力してテキストの説明を生成します。</p><a class="datetime" href="https://b.hatena.ne.jp/arxiv_reader/20221107#bookmark-4727749528315802403"><span class="datetime-body">2022/11/07 12:25</span></a></blockquote><script src="https://b.st-hatena.com/js/comment-widget.js" charset="utf-8" async></script>

このブックマークにはスターがありません。
最初のスターをつけてみよう！