Microsoft's Florence-2: Ultimate AI Vision Model

Florence-2 is an innovative vision foundation model capable of understanding text prompts and performing a variety of tasks including image captioning, object detection, and segmentation. It was trained on a large dataset called FLD-5B, which contains over 126 million images and 5.4 billion annotations, enabling the model’s multi-task learning.

Florence-2 boasts exceptional OCR capabilities, particularly in recognizing handwritten text.

Florence-2 Usage Scenario

The Florence-2 vision model supports multiple tasks such as image captioning, object detection, image segmentation, OCR, and more. The list of supported tasks is shown in the image below:

OCR

OCR with Region

Object Detection

Detailed Caption

Online demo: https://huggingface.co/spaces/gokaygokay/Florence-2

Florence-2 Model Information:

Florence-2-base
Florence-2-large
Florence-2-base-ft
Florence-2-large-ft

Getting Started with Florence-2

The model can perform different tasks by modifying the prompt. First, let’s define a function to run prompts.

import requests

from PIL import Image  
from transformers import AutoProcessor, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("microsoft/Florence-2-large", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("microsoft/Florence-2-large", trust_remote_code=True)

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)

def run_example(task_prompt, text_input=None):
    if text_input is None:
        prompt = task_prompt
    else:
        prompt = task_prompt + text_input

    inputs = processor(text=prompt, images=image, return_tensors="pt")
    generated_ids = model.generate(
      input_ids=inputs["input_ids"], 
      pixel_values=inputs["pixel_values"],
      max_new_tokens=1024,
      num_beams=3
    )
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]

    parsed_answer = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))

    print(parsed_answer)

Then set the prompt to perform the corresponding task:

python prompt = "<CAPTION>" 
run_example(prompt)

Paper: https://arxiv.org/abs/2311.06242

kevin

I'm Kevin, founder of NobleFilt.com, where I curate cutting-edge AI tools and prompts. With a background in AI and web development, I leverage my expertise in machine learning, NLP, and data analysis to make artificial intelligence more accessible. Through NobleFilt, I showcase the most promising AI advancements, from lifelike digital humans to intelligent web scraping, enabling wider applications of this transformative technology.

GitHub

Create AI Assistants Fast with OpenAI’s 2024 Open Source Kit

Bykevin August 9, 2024August 9, 2024

Unleash AI magic in a snap! OpenAI’s game-changing open source project makes building AI assistants a breeze. 🚀 Discover the power of effortless integration now!

GitHub

TTT: The Cutting-Edge AI Architecture Dethroning Transformers

Bykevin July 13, 2024July 13, 2024

Discover TTT, the revolutionary AI architecture poised to eclipse transformers. Uncover its groundbreaking potential and the future of artificial intelligence.

GitHub

Cognita: Ultimate Open-Source RAG Framework for 2024 | TrueFoundry

Bykevin June 28, 2024June 28, 2024

Discover Cognita, the cutting-edge open-source framework for building lightning-fast RAG apps. Streamline development with modular components & user-friendly UI. Try it now!

AI Tools

LLMs Revolutionize AI Agents: 5 Key Planning Innovations

Bykevin August 7, 2024August 7, 2024

Discover how Large Language Models are transforming AI agent planning. Explore 5 groundbreaking strategies reshaping autonomous intelligence across industries. Insights for AI enthusiasts and professionals.

GitHub

Fooocus: Ultimate AI Image Tool – Free, Fast & User-Friendly

Bykevin August 2, 2024August 2, 2024

Discover Fooocus, the revolutionary open-source AI image generator. Create stunning visuals with ease, no complex settings needed. Free, offline, and powerful. Try it now!

GitHub

AI-Scientist: The 1st Automated Research Tool with 2.4K Stars

Bykevin August 22, 2024August 22, 2024

Discover AI-Scientist, the groundbreaking automated research tool. Perfect for all levels, it enhances efficiency in scientific discovery. Explore now!

Microsoft’s Florence-2: Ultimate AI Vision Model

Florence-2 Usage Scenario