Salesforce Introduces New Family of Multimodal Action Models Named TACO

TACO leverages chains-of-thought-and-action to enhance AI's ability to handle multimodal reasoning and real-world challenges.

Highlights

  • The model utilises OCR, depth estimation, and calculators to handle diverse data types.
  • Salesforce trained TACO with over 1 million synthetic CoTA traces to optimise its capabilities.
  • Potential applications include web navigation and medical question answering.

Follow Us

Salesforce Introduces New Family of Multimodal Action Models Named TACO
Salesforce AI Research has introduced TACO, a family of multimodal large action models designed to improve performance on complex, multi-step problems that require multiple reasoning across various data types, such as images, text, and calculations. "We present TACO, a family of multi-modal large action models designed to improve performance on complex questions that require multiple capabilities and demand multi-step solutions," Salesforce said in a blog post on January 16, 2025.

Also Read: Meta Expands Access to Llama AI Models for US Government Use




Overcoming Limitations of Current AI Systems

According to the company, TACO tackles a significant limitation of current AI systems (open-source multi-modal models), which struggle to solve realistic complex problems in a step-by-step manner. For instance, when posed with a question like "How much gas can I buy with $50?" from a photo of a gas station sign, TACO can identify price information, extract the text using OCR, and perform the necessary calculations. This capability is powered by chains-of-thought-and-action (CoTA), where the model generates both reasoning and actionable steps to arrive at the correct answer.

"To answer such questions, TACO produces chains-of-thought-and-action (CoTA), executes intermediate steps by invoking external tools such as OCR, depth estimation and calculator, then integrates both the thoughts and action outputs to produce coherent responses," the company explained.

Also Read: Meta Unveils New AI Models and Tools to Drive Innovation

Training TACO

To train TACO, Salesforce said it created over 1 million synthetic CoTA traces through model-based and programmatic generation methods. These steps help the model learn to perform complex reasoning and execute external actions such as text recognition and mathematical operations.

Salesforce claims that TACO achieved 30-50 percent higher performance compared to models using traditional direct answers. It also outperformed baseline models by up to 20 percent on the MMVet benchmark.

Also Read: Microsoft, Dell, Google and Others Launch Initiatives to Propel AI Infrastructure and Innovation

Future Applications

With this framework, Salesforce AI hopes to pave the way for new multimodal models that can be applied across various domains, such as medical question answering and web navigation.

"With our framework, future works can train new models with different actions for other applications such as web navigation or for other domains such as medical question answering," Salesforce said.

Reported By

Kirpa B is passionate about the latest advancements in Artificial Intelligence technologies and has a keen interest in telecom. In her free time, she enjoys gardening or diving into insightful articles on AI.

Recent Comments

Faraz :

It is much better in Kerala, T.N, Rajasthan and H.P circle. Improving slowly in A.P, Punjab and Haryana circle. Might…

BSNL Reports Rs 262 Crore Profit in Q3, Marking First…

Faraz :

Don't work, Now even can't send SMS on that number as if it's blocked.

BSNL Reports Rs 262 Crore Profit in Q3, Marking First…

Faraz :

With this article, there's another news too on same day, Where Vi is asked to pay 6090 crore before 10…

BSNL Reports Rs 262 Crore Profit in Q3, Marking First…

Aniruddha Dhar :

Trust me. BSNL customer care said same thing like VoLte will be automatically activated when 4G officially launched. But then…

BSNL Reports Rs 262 Crore Profit in Q3, Marking First…

shivraj roy :

Subsea Cable landing point in Mumbai India Europe express cable oh and that's my dog for scale lol

Meta Unveils 50,000 Km Waterworth Subsea Cable Project to Boost…

Load More
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments