AI environmental inspection, welding robots, Hailuo 02, Minimax M1, POLARIS, AI VR videos, new AI deblur
Welcome to the AI Search newsletter. Here are the top highlights in AI this week.
MiniMax-M1 is an open-weights, large-scale hybrid-attention reasoning model, designed to efficiently process long inputs and perform complex tasks. It uses a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism, making it suitable for tasks like mathematical reasoning, coding, and software engineering. MiniMax-M1 outperforms other strong open-weight models in various benchmarks. Read more
Text-Aware Image Restoration (TAIR) is a new task that aims to restore both visual quality and text fidelity in degraded images. To achieve this, researchers introduced a large-scale dataset called SA-Text and a multi-task diffusion framework called TeReDiff, which leverages text-spotting features to enhance restoration. This approach outperforms state-of-the-art methods in text recognition accuracy. Read more
Hunyuan3D-2.1 is a scalable 3D asset creation system that generates high-fidelity 3D assets from images, using a fully open-source framework and physically-based rendering (PBR) texture synthesis. It outperforms other 3D generation methods in terms of quality and condition following ability. Read more
Do you prefer to watch instead of read? Check out this video covering all the highlights in AI this week:
NVIDIA's PartPacker is a neural network-based approach to 3D scene understanding and object manipulation. The goal of PartPacker is to enable robots and other machines to better understand and interact with complex 3D environments. By using deep learning techniques, PartPacker can learn to identify and manipulate individual parts of objects in a scene. Read more
InterActHuman is a new AI framework that can generate realistic videos of humans interacting with each other and objects, using audio cues to control the actions and lip movements. It uses a novel approach called diffusion transformer (DiT) to align audio conditions with specific individuals in the video, allowing for precise control over the animation. Read more
Hailuo 02 is a new SOTA video model that excels at prompt understanding, physics, camera control, and coherence. It's based on a new architecture called Noise-aware Compute Redistribution (NCR), which boosts training and inference efficiency by 2.5x, allowing 1080p video generation at a competitive cost. With unmatched efficiency and precision, Hailuo 02 is rated among the top video models in the world. Try it for free today!
POLARIS is a post-training recipe that uses reinforcement learning to improve the performance of advanced reasoning models, such as Qwen3-4B and DeepSeek-R1-Distill-Qwen-7B. By fine-tuning these models with POLARIS, researchers have achieved state-of-the-art results on various reasoning tasks, including math and science problems. This approach has the potential to improve the performance of AI models in complex reasoning tasks. Read more
ImmerseGen is a new AI system that can generate immersive 3D worlds from text prompts, using a combination of agent-guided asset design and arrangement. It creates compact alpha-textured proxies that can be used to build realistic and diverse environments, tailored for virtual reality (VR) experiences. This technology has the potential to revolutionize the way we create and interact with virtual worlds. Read more
Researchers have developed a new vision-language model that can create plans for automated inspection of environments. This model uses a combination of natural language and images to generate 3D inspection plans for robots, achieving over 90% accuracy in spatial reasoning and trajectory planning. The model can be used to inspect environments that are hazardous or difficult for humans to access, such as tunnels, dams, and power plants. Read more
Researchers have developed a new prosthetic hand control system that uses machine learning and sensory data to control movements without relying on biological signals. The system uses a camera and touch sensors to autonomously plan and execute grasp-and-release tasks with over 95% success, reducing user effort and enabling more intuitive operation. This technology has the potential to improve the lives of people with prosthetic limbs. Read more
A good photo on your Linkedin or business profile makes a huge difference. You could do a physical photoshoot, which costs you over $200 and hours posing awkwardly at a camera. Or, with AI Portrait, just upload one photo, and get a portfolio of 50 professional photos in minutes. Save time and money - try it today!
Midjourney has released its V1 Video Model, a new AI system that can generate short videos from images, allowing users to create animated scenes with ease. This model currently can only do "Image-to-Video" where users can press an "Animate" button to create 480P clips of 5 seconds long, extendable to 20 seconds. Read more
A new robot navigation system called LENS uses brain-inspired computing to navigate efficiently. LENS uses a special camera that only reacts to movement and a low-power chip to recognize locations, reducing energy consumption by up to 99%. This allows robots to operate longer and cover greater distances on limited power supplies. Read more
Researchers have developed a robotic welding system that learns from skilled human welders to address the UK's welder shortage. The system captures and digitizes expert techniques, enabling robots to perform complex tasks with quality comparable to experienced welders, potentially increasing productivity and supporting multiple industries. Read more