bokomslag Multimodal Foundation Models
Data & IT

Multimodal Foundation Models

Chunyuan Li Zhe Gan Zhengyuan Yang Jianwei Yang Linjie Li

Pocket

1659:-

Funktionen begränsas av dina webbläsarinställningar (t.ex. privat läge).

Uppskattad leveranstid 7-12 arbetsdagar

Fri frakt för medlemmar vid köp för minst 249:-

  • 230 sidor
  • 2024
This monograph presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities, focusing on the transition from specialist models to general-purpose assistants. The focus encompasses five core topics, categorized into two classes; (i) a survey of well-established research areas: multimodal foundation models pre-trained for specific purposes, including two topics methods of learning vision backbones for visual understanding and text-to-image generation; (ii) recent advances in exploratory, open research areas: multimodal foundation models that aim to play the role of general-purpose assistants, including three topics unified vision models inspired by large language models (LLMs), end-to-end training of multimodal LLMs, and chaining multimodal tools with LLMs. The target audience of the monograph is researchers, graduate students, and professionals in computer vision and vision-language multimodal communities who are eager to learn the basics and recent advances in multimodal foundation models.
  • Författare: Chunyuan Li, Zhe Gan, Zhengyuan Yang, Jianwei Yang, Linjie Li
  • Format: Pocket/Paperback
  • ISBN: 9781638283362
  • Språk: Engelska
  • Antal sidor: 230
  • Utgivningsdatum: 2024-05-06
  • Förlag: now publishers Inc