PaliGemma 2 Mix – New Instruction Vision Language Models by Google
Last December, Google released PaliGemma 2: a new family of pre-trained (pt) PaliGemma vision language models (VLMs) based on SigLIP and Gemma 2. The models come in three different sizes (3B, 10B, 28B) and three different resolutions (224×224, 448×448, 896×896). Today, Google is releasing PaliGemma 2 mix: fine-tuned on a mix of vision language tasks, including OCR, long and short captioning and more. PaliGemma 2 pretrained (pt) variants are great vision language models to transfer on a given task at […]
Read more