Salesforce Introduces New Family of Multimodal Action Models Named TACO

TACO leverages chains-of-thought-and-action to enhance AI's ability to handle multimodal reasoning and real-world challenges.

Most readers read for free. A small group from the TelecomTalk community keeps this going. Support only if our work adds value for you.

Highlights

  • The model utilises OCR, depth estimation, and calculators to handle diverse data types.
  • Salesforce trained TACO with over 1 million synthetic CoTA traces to optimise its capabilities.
  • Potential applications include web navigation and medical question answering.

Follow Us

Salesforce Introduces New Family of Multimodal Action Models Named TACO
Salesforce AI Research has introduced TACO, a family of multimodal large action models designed to improve performance on complex, multi-step problems that require multiple reasoning across various data types, such as images, text, and calculations. "We present TACO, a family of multi-modal large action models designed to improve performance on complex questions that require multiple capabilities and demand multi-step solutions," Salesforce said in a blog post on January 16, 2025.

Also Read: Meta Expands Access to Llama AI Models for US Government Use




Overcoming Limitations of Current AI Systems

According to the company, TACO tackles a significant limitation of current AI systems (open-source multi-modal models), which struggle to solve realistic complex problems in a step-by-step manner. For instance, when posed with a question like "How much gas can I buy with $50?" from a photo of a gas station sign, TACO can identify price information, extract the text using OCR, and perform the necessary calculations. This capability is powered by chains-of-thought-and-action (CoTA), where the model generates both reasoning and actionable steps to arrive at the correct answer.

"To answer such questions, TACO produces chains-of-thought-and-action (CoTA), executes intermediate steps by invoking external tools such as OCR, depth estimation and calculator, then integrates both the thoughts and action outputs to produce coherent responses," the company explained.

Also Read: Meta Unveils New AI Models and Tools to Drive Innovation

Training TACO

To train TACO, Salesforce said it created over 1 million synthetic CoTA traces through model-based and programmatic generation methods. These steps help the model learn to perform complex reasoning and execute external actions such as text recognition and mathematical operations.

Salesforce claims that TACO achieved 30-50 percent higher performance compared to models using traditional direct answers. It also outperformed baseline models by up to 20 percent on the MMVet benchmark.

Also Read: Microsoft, Dell, Google and Others Launch Initiatives to Propel AI Infrastructure and Innovation

Future Applications

With this framework, Salesforce AI hopes to pave the way for new multimodal models that can be applied across various domains, such as medical question answering and web navigation.

"With our framework, future works can train new models with different actions for other applications such as web navigation or for other domains such as medical question answering," Salesforce said.

Most readers read for free. A small group from the TelecomTalk community keeps this going. Support only if our work adds value for you.

Reported By

Kirpa B is passionate about the latest advancements in Artificial Intelligence technologies and has a keen interest in telecom. In her free time, she enjoys gardening or diving into insightful articles on AI.

Recent Comments

Mahmood Junaid :

Jio stopped 5g expansion because of lack of 5g devices connected to networks… 2026 is going to be really the…

Vodafone Idea Seeks Investor Money to Give Govt Exit: Report

TheAndroidFreak :

I repeat this 1.3L sites is with the help of Tejas Networks. Out of which 97000 are live. 20-30K order…

Vodafone Idea Seeks Investor Money to Give Govt Exit: Report

TheAndroidFreak :

It requires deep digging.

Vodafone Idea Seeks Investor Money to Give Govt Exit: Report

TheAndroidFreak :

Then 4G indoor coverage will be giving problems.

Vodafone Idea Seeks Investor Money to Give Govt Exit: Report

TheAndroidFreak :

DoT have decided to auction 67-71Mhz in 1400Mhz band. When and how, I don't know. So that means 20-20-20Mhz each…

Vodafone Idea Seeks Investor Money to Give Govt Exit: Report

Load More
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments