ASI-Arch, Hierarchical Reasoning Models, Unitree R1, Qwen3 beats closed-source, Higgs Audio, 3D upscaler
Welcome to the AI Search newsletter. Here are the top highlights in AI this week.
The Hierarchical Reasoning Model (HRM) is a new AI model that can solve complex problems by breaking them down into smaller, more manageable parts, similar to how the human brain works. This model is special because it can learn from a small amount of data and still perform well on difficult tasks which top AI models fail at, like solving Sudoku puzzles and navigating mazes. Read more
SeC is a new computer vision model that can accurately segment objects in videos, even when they change appearance or move around. It uses a concept-driven approach, which means it understands objects in a more human-like way, rather than just looking at their appearance. This makes it better at handling complex scenes and objects that change over time. Read more
Yume is an interactive world generation model that can create realistic and dynamic worlds from images, text, or videos, allowing users to explore and control them using keyboard actions or other devices. It uses a combination of camera motion quantization, video generation architecture, and advanced sampling mechanisms to achieve high-fidelity and interactive video world generation. Read more
Do you prefer to watch instead of read? Check out this video covering all the highlights in AI this week
Qwen3-235B-A22B-Instruct-2507 is a powerful language model that has been updated to improve its ability to follow instructions, reason logically, and understand text. It has also been enhanced to better align with user preferences and generate higher-quality text. The model has been trained on a large dataset and has achieved state-of-the-art results in various tasks. Read more
Qwen3-Coder is a powerful AI model designed for agentic coding, allowing it to perform complex coding tasks and interact with the environment in a more human-like way. It has been trained on a large dataset and has achieved state-of-the-art results in various coding tasks, including Agentic Coding, Agentic Browser-Use, and Agentic Tool-Use. Read more
Boson AI has released Higgs Audio V2, a powerful audio generation model that can create realistic and emotionally competent voices, including multi-speaker conversations and long-form audio. This model is trained on over 10 million hours of audio data and can generate high-fidelity audio with a low token rate, making it suitable for applications like conversational agents, audiobooks, and podcasts. Read more
Hailuo 02 is a new SOTA video model that excels at prompt understanding, physics, camera control, and coherence. It's based on a new architecture called Noise-aware Compute Redistribution (NCR), which boosts training and inference efficiency by 2.5x, allowing 1080p video generation at a competitive cost. With unmatched efficiency and precision, Hailuo 02 is rated among the top video models in the world. Try it for free today!
Diffuman4D is a new AI model that can generate high-quality 3D videos of humans from sparse-view videos, allowing for free-viewpoint rendering of human performances. This model uses a spatio-temporal diffusion model to generate 4D-consistent multi-view videos, which are then used to reconstruct a high-fidelity 3D model of the human performance. Read more
Elevate3D is a new AI framework that can transform low-quality 3D models into high-quality assets by refining both texture and geometry. This is achieved through a novel texture enhancement method called HFS-SDEdit, which preserves the input's identity while improving texture quality, and a view-by-view refinement approach that alternates between texture and geometry refinement. Read more
ASI-Arch is a highly autonomous framework that uses artificial intelligence to discover new model architectures, specifically linear attention mechanisms. It consists of three main components: the Autonomous Architecture Discovery Pipeline, the Architecture Database, and the Cognition Base, which work together to hypothesize, implement, and validate new architectures. The framework has successfully discovered 106 novel linear attention architectures that achieve state-of-the-art performance. Read more
Researchers have found that invisible watermarks, intended to distinguish real from AI-generated images, can be easily removed. The UnMarker tool, developed by the University of Waterloo, can disrupt watermark signals in the image's spectral domain, making it undetectable while preserving visual quality. This undermines watermarking as a reliable defense against deepfakes. Read more
With Monica, you can use the top AI models, image generators, and video generators, all in one integrated platform. Use code AISEARCH10 to get 25% OFF 'Unlimited Annual Plan' within 24h of registration, or enjoy 10% OFF. Try it for free today!
The Unitree R1 is a new humanoid robot starting at $5,900, designed to walk, talk, recognize voices and images, and even do acrobatics like cartwheels and punches. It weighs about 25 kg (55 lbs), stands about 121 cm tall, and comes with 26 joints and advanced AI, making it much cheaper than other robots in its class. Read more
DAViD is a computer vision model that uses synthetic data to achieve high accuracy and efficiency in tasks like depth estimation, surface normal estimation, and soft foreground segmentation. It uses a single model architecture and a dataset of 300,000 synthetic images to deliver high-quality results while running orders of magnitude faster than competing methods. This makes it a powerful tool for human-centric computer vision tasks. Read more