Claude Opus 4.8, LocateAnything, DeepSWE, Step 3.5, PiD, Auto Scientists: AI NEWS
Welcome to the AI Search newsletter. Here are the top highlights in AI this week.
LocateAnything is an advanced AI framework by NVIDIA designed to pinpoint and track any object in a video using natural language descriptions. Unlike older vision tools that only recognize pre-defined items, it can understand complex, highly descriptive phrases to find specific things in real-time. This makes it an incredibly powerful engine for smart search systems and autonomous robots navigating real-world spaces. Read more
PhysX-Omni generates 3D objects that are not just pretty, but physically useful for simulation. It tries to understand things like size, material, movement, function, and whether an object is rigid, soft, or articulated, which is important for robotics and embodied AI. Read more
DeepSWE is a benchmark for testing whether AI coding agents can solve long, realistic software engineering tasks. The tasks are written from scratch, span many real repos and languages, and use behavior-based tests so models cannot just memorize old GitHub fixes. Read moreClaude Opus 4.8 is Anthropic’s upgraded flagship model for coding, reasoning, and agent-style work. It improves on Opus 4.7, adds better collaboration features in Claude Code, and includes a faster mode that is cheaper than before. Read more
Do you prefer to watch instead of read? Check out this video covering the top AI news this week:
Claude Opus 4.8 is Anthropic’s upgraded flagship model for coding, reasoning, and agent-style work. It improves on Opus 4.7, adds better collaboration features in Claude Code, and includes a faster mode that is cheaper than before. Read more
Step 3.7 Flash is a fast AI model built for real-world agents that need to see, search, code, and use tools. It focuses on efficiency, multimodal understanding, reliable tool calls, and compatibility with agent frameworks like Claude Code-style workflows. Read more
CubePart is a 3D generator that creates objects as separate controllable parts instead of one solid lump. That means a generated robot, drone, car, or character can have parts that move independently, making it much easier to animate or simulate. Read more
GLM is Z.AI’s flagship open-source model built for long, difficult engineering tasks like coding, debugging, and autonomous agent work. It delivers top-tier performance while being dramatically cheaper and faster than many leading closed models. Try it today
Bidirectional Evolutionary Search is a method that helps AI models improve their answers by searching both forward and backward. It evolves possible solutions while also breaking the goal into smaller checkable steps, which gives the model better feedback instead of relying on one final yes-or-no answer. Read more
AutoScientists is a system where multiple AI agents organize themselves into research teams. Instead of one central planner controlling everything, the agents explore different ideas, critique each other, share results, and run long scientific experiments more like a real research group. Read more
Gamma-World is an AI world model by NVIDIA for generating interactive scenes with multiple controllable agents. It can simulate future frames for games or robotics where several characters act independently while still sharing the same coherent world. Read more
Typeless is an intelligent AI voice dictation tool designed to turn your speech into polished, well-structured text in real time. It automatically cleans up your speech by removing filler words like "um," fixing mid-sentence corrections, and formatting lists across various apps and devices. Try it for free today!
Bonsai Image 4B is a tiny image-generation model family designed to run locally on laptops and even phones. It uses extremely compressed 1-bit and ternary weights, making image generation possible on devices that normally could not handle big diffusion models. Read more
MiniCPM5-1B is a small 1-billion-parameter language model built for local assistants, coding agents, and tool-use tasks. It supports long context and both fast “no thinking” mode and deeper reasoning mode, making it useful when you need a compact model on-device. Read more
SEGA is a training-free trick that helps diffusion transformer image models generate much higher-resolution images. It adjusts attention based on the image’s frequency details, helping preserve both the overall structure and tiny details without retraining the model. Read more



