タイトル:BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models 著者:Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi(Saleforce Research) 論文URL:https://arxiv.org/abs/2301.12597 コード:https://github.com/salesforce/LAVIS/tree/main/projects/blip2 HuggingFace:https://huggingface.co/spaces/taesiri/BLIP-2 ざっくりいうと 視覚言語(V&L)モデルにおいて、事前学習コストを減らしつつ精度を出すための研究 事前訓練済みの画像
![論文まとめ:BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large](https://cdn-ak-scissors.b.st-hatena.com/image/square/3273037b792dd4fd728f12bcb436bf09c8b97ed4/height=288;version=1;width=512/https%3A%2F%2Fblog.shikoan.com%2Fwp-content%2Fuploads%2F2023%2F02%2Fblip2_header.jpg)