PaliGemma – Google’s Cutting-Edge Open Vision Language Model
Updated on 23-05-2024: We have introduced a few changes to the transformers PaliGemma implementation around fine-tuning, which you can find in this notebook. PaliGemma is a new family of vision language models from Google. PaliGemma can take in an image and a text and output text. The team at Google has released three types of models: the pretrained (pt) models, the mix models, and the fine-tuned (ft) models, each with different resolutions and available in multiple precisions for convenience. All […]
Read more