AI operating system, self healing robots, ChatGPT Agent, Pusa, OpenMed, PhysX, NVIDIA's image editor
Welcome to the AI Search newsletter. Here are the top highlights in AI this week.
ChatGPT Agent is a new feature that lets ChatGPT actually do things for you online, like planning trips, buying stuff, analyzing data, or creating documents. You give it a task, and it chooses the right tools, browses websites, fills out forms, and finishes multi-step jobs, all autonomously. This is only available for ChatGPT Pro, Plus, and Team subscribers, and it handles tasks much more complex than just chatting or writing text. Read more
Pusa V1.0 is a new AI model that can generate high-quality videos from images, surpassing the performance of previous models with much less training data and cost. It achieves this by using a technique called vectorized timestep adaptation, which allows it to learn temporal dynamics more efficiently. This model can also perform multiple tasks, including text-to-video generation and video extension. Read more
CoPart is a new 3D generation framework that represents 3D objects with multiple contextual part latents, allowing for more detailed and controllable generation. This approach decomposes complex objects into simpler parts, facilitating part learning and relationship modeling, and naturally supports part-level control. CoPart also introduces a novel mutual guidance strategy to fine-tune pre-trained diffusion models for joint part latent denoising. Read more
Do you prefer to watch instead of read? Check out this video covering all the highlights in AI this week
NeuralOS is a simulated operating system that uses neural generative models to mimic the behavior of a real operating system. You can interact with it by moving your mouse, clicking, and typing, and even adjust the simulation settings to control the quality and speed of the simulation. This demo showcases the potential of neural networks in simulating complex systems. Read more
DreamPoster is a text-to-image generation framework that can create high-quality posters from user-provided images and text prompts. It uses a multi-modal architecture that combines text and image information, and is trained on a large dataset of high-quality posters. DreamPoster outperforms existing models in terms of usability and design sense, achieving a high usability rate of 88.55%. Read more
PhysX-3D is a new AI model that can generate 3D assets with realistic physical properties, such as weight, material, and movement. This model uses a combination of human-annotated data and machine learning algorithms to create 3D objects that can be used in simulations and other applications. By incorporating physical properties, PhysX-3D can create more realistic and useful 3D assets. Read more
Hailuo 02 is a new SOTA video model that excels at prompt understanding, physics, camera control, and coherence. It's based on a new architecture called Noise-aware Compute Redistribution (NCR), which boosts training and inference efficiency by 2.5x, allowing 1080p video generation at a competitive cost. With unmatched efficiency and precision, Hailuo 02 is rated among the top video models in the world. Try it for free today!
Scientists have created robots that can grow, heal, and adapt by consuming parts from other machines or their environment. These robots, which use modular magnetic components, can self-assemble, repair, and enhance their capabilities, making them more autonomous and adaptable. This process, called "Robot Metabolism," allows robots to develop and improve themselves in a way similar to biological systems. Read more
Epona is a new AI model that can generate realistic and long-term predictions of driving scenes, including future trajectories and traffic patterns. It uses a unique approach called autoregressive diffusion to model the world and make predictions, and has been shown to outperform other models in terms of accuracy and duration. Epona can also be used as a real-time motion planner for autonomous vehicles. Read more
Researchers have developed a new AI model called Be.FM that can predict and understand human behavior. Be.FM is trained on behavioral science data and can forecast human behavior, infer psychological traits, and analyze contextual influences. It outperforms general-purpose AI models like GPT-4o and Llama in these tasks. Read more
SpatialTrackerV2 is a new AI model that can track 3D points in videos with high accuracy and speed. It's the first model to estimate camera motion, scene geometry, and 3D trajectories all at once, making it a significant improvement over previous methods. This model can be used for various applications, including 2D tracking and dynamic 3D reconstruction. Read more
With Monica, you can use the top AI models, image generators, and video generators, all in one integrated platform. Use code AISEARCH10 to get 25% OFF 'Unlimited Annual Plan' within 24h of registration, or enjoy 10% OFF. Try it for free today!
NVIDIA has developed a new AI tool called DiffusionRenderer that allows for precise editing of 3D scenes and photorealistic images. This tool uses a combination of diffusion models and traditional graphics pipelines to convert 2D videos into editable scene representations, giving users control over lighting and materials. It has the potential to improve asset creation, relighting, and material editing in various fields such as content creation and robotics. Read more
OpenMed on Hugging Face is a hub for open-source AI models and datasets made to help advance healthcare and medical research. It provides tools, resources, and community support for anyone interested in using AI for tasks like medical imaging, diagnostics, and data analysis. Read more