New Gemini 2.5, self-improving AI, 3D faces, ChatGPT updates, Sora is free, OpenAudio S1, Elevenlabs V3
Welcome to the AI Search newsletter. Here are the top highlights in AI this week.
Google unveiled an upgraded preview of Gemini 2.5 Pro, their most advanced AI model so far. It boasts improved abilities in coding, reasoning, and creative writing, making it smarter and more versatile than previous versions. This upgrade aims to push the boundaries in both technical and creative AI tasks. Read more
Fish Audio launched OpenAudio S1, a TTS model that excels at capturing human emotion and vocal nuances. It ranked #1 in human evaluations on HuggingFace TTS-Arena-V2 and allows precise control over emotions and tone, such as angry, happy, sad, and even nuanced effects like whispering or emphasizing. This makes it ideal for generating expressive and realistic speech. Read more
OpenAI is adding new features to ChatGPT, including integrations with cloud services, meeting recordings, and support for connecting to research tools. These updates make ChatGPT more useful for deep research and productivity tasks. Read more
Do you prefer to watch instead of read? Check out this video covering all the highlights in AI this week:
SakanaAI introduced the Darwin Gödel Machine (DGM), a coding agent that rewrites its own code to improve at programming tasks. This self-improving AI can optimize itself over time, making it more efficient and capable with each iteration. Read more
Microsoft now offers free video generation in the Bing mobile app, powered by OpenAI’s Sora model. Users can create AI-generated videos directly from their phones, making video creation more accessible. Read more
Kling 2.1 is a new AI video generator that creates hyper-realistic videos with sharper 1080p visuals and smoother motion. It offers both cost-effective and professional modes, lets you easily swap objects or extend videos, and brings faster rendering for creators who want cinematic-quality results. Read more
Exa launched /research, an agentic search tool that automates web research and returns structured insights. It handles multiple searches and organizes the results for easier analysis. Read more
ElevenLabs introduced Eleven v3 (alpha), their most expressive text-to-speech model yet. It supports over 70 languages, uses advanced audio tags for emotional control, and can handle multi-speaker conversations with natural pacing and interruptions. This model brings lifelike, emotionally rich speech generation for creators, but requires more detailed prompts than previous versions. Read more
SkyReels-Audio can generate realistic talking portrait videos. The system uses a combination of audio, images, and text to create videos that are temporally coherent and have fine-grained multimodal control. It can also edit videos to align lip movements with audio clips. Read more
Pixel3DMM is a new AI that can reconstruct 3D faces from a single RGB image. The system uses a vision transformer to predict per-pixel geometric cues, which are then used to constrain the optimization of a 3D morphable face model. This approach achieves state-of-the-art results in 3D face reconstruction, outperforming existing methods by over 15%. Read more
With Monica, you can use the top AI models, image generators, and video generators, all in one integrated platform. Use code AISEARCH10 to get 25% OFF 'Unlimited Annual Plan' within 24h of registration, or enjoy 10% OFF. Try it for free today!
A new AI system called Native-Resolution Image Synthesis (NiT) can generate high-quality images at arbitrary resolutions and aspect ratios. The system uses a novel architecture for diffusion transformers that directly models native-resolution image data, allowing it to generalize across diverse resolutions and aspect ratios. Read more
Ctrl-Crash is a new AI that generates realistic car crash videos. The system uses a controllable diffusion model that conditions on signals such as bounding boxes, crash types, and an initial image frame, allowing for fine-grained control over the generated crashes. Ctrl-Crash achieves state-of-the-art performance in generating realistic crash videos and can be used for counterfactual scenario generation and crash reconstruction. Read more
Higgsfield launched Higgsfield Speak, a tool for creating motion-driven talking videos. You can generate a talking avatar video just by providing a script and an avatar image, making video content creation much easier and more dynamic. Read more