AI: OpenAI Audio Models, Image Generation, Meta AI Creator Tools, Google Gemini 2.5, Microsoft AI Agents

From advanced speech-to-text models to AI-powered creator tools, tech companies push the boundaries of artificial intelligence in voice, image generation, research, and reasoning.

Highlights

  • OpenAI's new speech models offer improved accuracy in noisy environments and better voice customisation.
  • Meta's AI-driven tools help brands discover influencers, optimise content, and drive higher engagement.
  • Google Gemini 2.5 Pro excels in advanced reasoning, coding, and problem-solving across multiple domains.

Follow Us

AI: OpenAI Audio Models Image Generation, Meta AI Creator Tools, Google Gemini 2.5, Microsoft AI Agents
Tech companies OpenAI, Meta, Google, and Microsoft have unveiled AI advancements, pushing the boundaries of speech, image, reasoning, and research capabilities. OpenAI introduced state-of-the-art speech-to-text and text-to-speech models, along with an advanced image generator in ChatGPT. Meta launched AI-powered tools to enhance brand-creator partnerships, while Google released the Gemini 2.5 model with enhanced reasoning. Microsoft integrated deep research agents into M365 Copilot, revolutionising workplace AI applications.

Also Read: AI: Oracle AI Agent Studio, Deloitte Zora AI, Accenture AI Refinery Platform, NTT DATA Agentic AI Services




Here's a closer look at these five major AI advancements, detailing their features:

1. OpenAI Unveils New Advanced Audio Models

OpenAI has announced the launch of its latest speech-to-text and text-to-speech models, enhancing the capabilities of AI-powered voice agents through the API. OpenAI stated that these new models, "set a new state-of-the-art benchmark, outperforming existing solutions in accuracy and reliability—especially in challenging scenarios involving accents, noisy environments, and varying speech speeds."

These enhancements improve transcription accuracy, making the models particularly well-suited for applications such as customer service call centres, meeting note-taking, and other similar use cases. These new models promise greater accuracy, improved customisation, and a more natural conversational experience.

OpenAI says that, for the first time, developers will now be able to instruct the text-to-speech mode to speak in a specific way. To illustrate this, OpenAI provided an example of a developer instructing the voice agent to "talk like a sympathetic customer service agent." In its blog post on March 20, the company claimed that giving such instructions would unlock a new level of customisation for voice agents.

New Speech-to-Text Audio Models

The newly introduced gpt-4o-transcribe and gpt-4o-mini-transcribe models outperform the original Whisper models, achieving a lower Word Error Rate (WER) and better language recognition. OpenAI attributes these advancements to reinforcement learning and extensive training on high-quality audio datasets. The company claims that the improved models excel in challenging environments, such as noisy settings or conversations with diverse accents and speech speeds, making them ideal for applications like call center transcriptions and meeting notes. These models are now available through the speech-to-text API.

New Text-to-Speech Model with Customisation

The gpt-4o-mini-tts model brings a new level of steerability, allowing developers to control not only what the AI says but how it says it. For instance, users can instruct the model to adopt a sympathetic customer service tone or an engaging storytelling style. The model is available in the text-to-speech API?.

The blog post read, "Note that these text-to-speech models are limited to artificial, preset voices, which we monitor to ensure they consistently match synthetic presets."

API Availability and Future Developments

The new models are now accessible to all developers via OpenAI's API, with an integration in the Agents SDK to simplify development. For developers looking to build low-latency speech-to-speech functionality, OpenAI recommends building with its speech-to-speech models in the Realtime API.

Looking ahead, OpenAI plans to expand its audio models further, explore custom voice capabilities, and invest in multimodal AI experiences, including video.

2. OpenAI Introduces New Image Generator for ChatGPT

OpenAI has unveiled its latest image generation model, which the company calls its "most advanced image generator yet", integrated directly into GPT-4o. This new capability enhances the practical use of AI-generated visuals, making it easier for users to create detailed, accurate, and context-aware images.

"GPT-4o image generation excels at accurately rendering text, precisely following prompts, and leveraging 4o's inherent knowledge base and chat context—including transforming uploaded images or using them as visual inspiration," OpenAI said in a blog post on Tuesday, March 25.

"These capabilities make it easier to create exactly the image you envision, helping you communicate more effectively through visuals and advancing image generation into a practical tool with precision and power," the company added.

OpenAI explained that it trained the models on the joint distribution of online images and text, allowing them to learn not just how images relate to language but also how they relate to each other.

OpenAI said GPT-4o's image generation follows detailed prompts with attention to detail. While other systems struggle with around 5-8 objects, GPT-4o can handle up to 10-20 different objects. GPT-4o can also analyze and learn from user-uploaded images, integrating their details into its context to inform image generation.

OpenAI emphasised that it has reinforced its safety protocols to ensure responsible use of the technology. All generated images include C2PA metadata to indicate their AI origin. Furthermore, safeguards prevent misuse, such as generating inappropriate or harmful content, particularly when real people are involved.

A specialised reasoning LLM has also been trained to align image generation with OpenAI's safety policies, providing an extra layer of content moderation, the company explained.

Also Read: Wipro Sovereign AI, Capgemini Agentic AI, TCS Air New Zealand Partnership, Tech Mahindra PV Solution

Access and Availability

Starting March 25, GPT-4o's image generation is rolling out for Plus, Pro, Team, and Free users of ChatGPT as the default image generator, with availability for Enterprise and Edu users coming soon. 4o image generation is also available for use in Sora. Developers will gain access via the API in the coming weeks.

Users can generate images by describing what they need, including specifics like aspect ratio, colours (using hex codes), or a transparent background. However, due to the increased detail in images, rendering may take up to a minute, according to the company.

For those who prefer DALL·E, OpenAI confirmed that it remains accessible through a dedicated DALL·E GPT.

"Creating and customising images is as simple as chatting using GPT-4o - just describe what you need, including any specifics like aspect ratio, exact colors using hex codes, or a transparent background. Because this model creates more detailed pictures, images take longer to render, often up to one minute," OpenAI said.

March 27 update:

The excitement surrounding ChatGPT's enhanced and more accessible image generation has led OpenAI to "temporarily" impose a rate limit on requests, according to CEO Sam Altman.

"It's super fun seeing people love images in ChatGPT, but our GPUs are melting," Altman posted on X on March 27.

"We are going to temporarily introduce some rate limits while we work on making it more efficient. hopefully won't be long! ChatGPT free tier will get 3 generations per day soon. Also, we are refusing some generations that should be allowed; we are fixing these as fast we can," he added.

Altman's statement came as millions used ChatGPT to create Ghibli-inspired images with the newly launched image creator inside the application.

3. Meta Launches New AI-Powered Tools to Enhance Creator Partnerships

On March 25, Meta introduced new artificial intelligence (AI)-enabled marketing tools designed to help brands discover and partner with creators who can boost their sales. The company announced new AI-powered creator discovery and content recommendation tools, along with enhanced creator insights for businesses in Instagram's creator marketplace. These updates aim to help businesses find the right influencers, optimise ad performance, and drive sales more effectively.

AI-Powered Creator Discovery and Content Recommendations

Meta is introducing new AI-enabled personalised creator content recommendations, integrated within the Partnership Ads Hub in Ads Manager. These will help brands identify high-performing organic branded content for paid promotions. By analysing platform presence, audience similarity, and past ad performance, AI will predict the most effective creators for upcoming campaigns.

Additionally, Meta is introducing keyword search capabilities in Instagram's creator marketplace, allowing businesses to find creators with greater precision using terms like "soccer moms with dogs" or "gluten-free desserts." The platform also now offers filtering options across 20 verticals, including Fashion, Beauty, and Home and Garden.

Enhanced Creator Insights for Smarter Collaborations

Meta is introducing new features to provide deeper insights into creator profiles, including:

  • Creator Cards with Playable Reels – Brands can preview a creator's recent reels directly on their profile.
  • Easier Creator Engagement – Businesses can now contact opted-in creators via direct email.
  • Experienced Creator Badges – Identifies creators with prior branded content and partnership ad experience.
  • Active Partnership Ads Display – Shows a creator's ongoing brand collaborations for better campaign alignment.

Marketing API Expansion for Partnership Ads

To further simplify influencer marketing, Meta has expanded its Marketing API to support partnership ads. Businesses can now integrate existing Instagram posts into their ad campaigns, customise placements, and utilise click-to-message destinations for more engagement.

Meta cited an eMarketer report stating that US marketers are expected to spend over USD 10 billion on influencer-sponsored content in 2025, with 86 percent of marketers using influencer marketing.

"The world's largest community of Instagram creators is right here in India, and it's no surprise that we're seeing strong momentum around brands partnering with them to drive sales and ROAS (Return on Ad Spend)," said Arun Srinivas, Director and Head of Ads Business for Meta in India, according to multiple media reports.

"The new tools we're launching today harness the power of AI to make creator discovery even more seamless for brands, in turn boosting growth potential for both brands and creators" Srinivas reportedly said.

Aniket Singh, Chief Business Officer at Snitch mentioned that they have consistently been using Reels and witnessing impact across the funnel. "The combination of Reels and creators is where the magic spot lies for driving the desired business results. Using creator content amplified by Partnership Ads on Meta platforms has helped us drive a 53 percent increase in ROAS" he said, as per the reports.

Also Read: AI: Google Health AI Updates, xAI Acquires GenAI Video Startup, Mistral Releases Small AI Model

4. Google Releases Gemini 2.5 Model with a Focus on Reasoning

Google has announced the release of its new Gemini 2.5 AI models, starting with the experimental Gemini 2.5 Pro version. In an update to the Google DeepMind blog, the company described the new Gemini 2.5 models as "thinking models" with advanced reasoning capabilities. According to Google, these models can analyse information, draw logical conclusions, incorporate context and nuance, and make informed decisions.

Compared to the Gemini 2.0 Flash Thinking model, released in December last year, the Gemini 2.5 models feature an enhanced base model with improved post-training. Google also stated that it is building these "thinking" capabilities into all its models going forward, enabling them to handle more complex problems and support context-aware AI agents.

Gemini 2.5 Pro model

The Gemini 2.5 Pro model is the first in the 2.5 series to be released. "Gemini 2.5 Pro is available now in Google AI Studio and in the Gemini app for Gemini Advanced users, and will be coming to Vertex AI soon. We'll also introduce pricing in the coming weeks, enabling people to use 2.5 Pro with higher rate limits for scaled production use," Koray Kavukcuoglu, CTO of Google DeepMind, said in a blog post on March 25.

Google describes the Gemini 2.5 Pro Experimental as its most advanced model for handling complex tasks. It is said to exhibit strong reasoning and coding capabilities. According to Google, the model leads in maths and science benchmarks such as GPQA and AIME 2025. It also scores 18.8 per cent across models without tool use on Humanity's Last Exam, a dataset designed by hundreds of subject matter experts to capture the human frontier of knowledge and reasoning.

Advanced Coding

For coding tasks, the Gemini 2.5 Pro model excels at creating visually compelling web apps and developing agentic code applications, along with code transformation and editing. In a preview video published by Google, the model was able to generate a fully functional video game by producing executable code from a single-line prompt.

Gemini 2.5 can comprehend vast datasets and handle complex problems from different information sources, including text, audio, images, video and even entire code repositories, the company said in a blog post.

Also Read: Oracle UK Investment, ServiceNow AI Agents, Google AI Chip, Tech Mahindra–Google Cloud Partnership

5. Microsoft Introduces AI-Powered Deep Research Tools in M365 Copilot

Microsoft, on March 25, introduced two reasoning agents powered by OpenAI's "deep research" in Microsoft 365 Copilot, its AI chatbot app. AI Companies have recently launched deep research AI agents across chatbots. Microsoft, on its part, named its two reasoning agents Researcher and Analyst, which can analyse vast amounts of information with access to work data—including emails, meetings, files, chats, and more—as well as the web, to deliver what the company calls "highly skilled expertise on demand."

Researcher

Researcher combines OpenAI's deep research model with Microsoft 365 Copilot's "advanced orchestration" and "deep search capabilities." Microsoft claims that Researcher can perform analyses, including developing a go-to-market strategy based on the context of all your work data and broader competitive data from the web, as well as creating a quarterly report for a client.

"Researcher can leverage third-party data via connectors to enhance its capabilities and provide more comprehensive insights—allowing it to integrate data from external sources, such as Salesforce, ServiceNow, Confluence, and more, directly into Microsoft 365 Copilot. It can even pull in data through other agents such as Sales Chat," said Jared Spataro, Chief Marketing Officer, AI at Work, in a blog post on March 25.

Analyst

Analyst is built on OpenAI's o3-mini reasoning model and is "optimised to do advanced data analysis at work," Microsoft said. Analyst progresses through problems iteratively, taking as many steps as necessary to refine its "thinking" and provide a detailed answer to queries. Analyst can also run the programming language Python to tackle complex data queries, Microsoft added, and expose its "work" for inspection.

"For example, you can use Analyst to turn raw data scattered across multiple spreadsheets into a demand forecast for a new product, a visualisation of customer purchasing patterns, or a revenue projection," Microsoft said.

Microsoft also announced deep reasoning and agent flows in Microsoft Copilot Studio, a platform to create, manage, and deploy agents for business needs.

New Frontier Program

Microsoft said Researcher and Analyst will start rolling out to customers with a Microsoft 365 Copilot license in April as part of a new "Frontier" program that gives customers early access to new Copilot innovations while they're still in development.

Reported By

Kirpa B is passionate about the latest advancements in Artificial Intelligence technologies and has a keen interest in telecom. In her free time, she enjoys gardening or diving into insightful articles on AI.

Recent Comments

TheAndroidFreak :

This is more than sufficient speed. Even under load 50-60 Mbps is achieved. How come Patna is 20Mhz? Few days…

BSNL Makes an Effort, Loses Customers Anyway

rahul_yadav :

It kind of same if you include every equipment which is connected to tower. Check image for simple understanding.

BSNL Makes an Effort, Loses Customers Anyway

TheAndroidFreak :

Off Topic: 3.8 million+ AnTuTu Snapdragon SM8850 (8E2) TSMC N3P Oryon CPU 2xPrime+6xPerformance SME1/SVE2, Adreno 840 GPU

BSNL Makes an Effort, Loses Customers Anyway

rahul_yadav :

Get your SIM from a local seller. The old BSNL employee don’t know much and are just getting paid. I…

BSNL Makes an Effort, Loses Customers Anyway

TheAndroidFreak :

Those are 4G+5G sites. I was talking about 5G only sites with 900Mhz+3500Mhz. Whatever number you shared is the number…

BSNL CMD Confirms 5G Rollout in Select Cities in Next…

Load More
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments