New open-source AI video, AI fragrances, AI runs on CPUs, Hunyuan 3D-2.5, Dia text-to-speech, Trillion 7B
Welcome to the AI Search newsletter. Here are the top highlights in AI this week.
Kimi-Audio is an open-source model that can understand, generate, and converse using audio. It handles tasks like speech recognition, audio captioning, emotion detection, and even creates new audio, all in one framework. The model is trained on over 13 million hours of audio and is designed for efficient, low-latency generation. Read more
AnimPortrait3D lets you create 3D animated portraits from a single photo. It uses AI to turn a normal 2D picture into a lifelike 3D model that can move and show different facial expressions. This makes it easy to animate selfies or portraits for fun or creative projects. Read more
MAGI-1 is an AI model that generates videos by predicting sequences of video chunks, making the process more efficient and scalable. It uses an autoregressive approach, meaning it creates each part of the video step by step, which helps it handle longer and more complex videos. Read more
Do you prefer to watch instead of read? Check out this video covering all the highlights in AI this week:
Dia 1.6B is an open-source text-to-speech model by Nari Labs that creates super realistic, emotional conversations from text. It can run on a regular GPU, handles multiple speakers in one go, and even mimics nonverbal sounds like laughter or coughing, making it a strong rival to big names like ElevenLabs and OpenAI. Read more
ReflectionFlow is a new method that helps image-generating AI models improve their results by reflecting and refining at each step. It uses a huge dataset of flawed and improved images, plus feedback, so the model can learn to fix its own mistakes and create better pictures. Read more
We’re partnering with Dell to give away a Dell Precision 5690 Workstation with a RTX 5000 Ada. This is a powerful yet portable laptop, with a built-in NPU that’s optimized for AI. Only available to USA or Canada residents. Enter for FREE here
Trillion-7B-preview is a large open-source language model designed to handle advanced text tasks. It aims to compete with other top models by providing strong performance in understanding and generating human-like text, and is available for anyone to try on Hugging Face. Read more
TopoLM is a language model that not only mimics how neurons in the brain process language, but also how they're physically arranged into clusters for different language functions like verbs and nouns. Unlike previous models, TopoLM uses a spatial rule so its internal components self-organize in a way that matches real brain activity, offering new insights into how both AI and the human brain handle language. Read more
Researchers have developed a new AI system that can add realistic tactile textures to 3D-printed objects. The system, called TactStyle, uses a combination of computer vision and machine learning to generate accurate heightfields from images of textures. This allows users to customize 3D models with realistic surface properties. Read more
Hunyuan 3D-2.5 is Tencent’s upgraded tool for making high-quality 3D models faster and with better detail, supporting both text and image prompts. It improves geometry precision, texture quality, and speed, and is designed for uses like game development, VR, and e-commerce, with open-source access and easier workflows for creators. Read more
Generative AI has been developed to create new fragrances based on user-defined scent descriptors. The AI model, called OGDiffusion, uses mass spectrometry profiles of essential oils and corresponding odor descriptors to generate essential oil blends for new scents. This technology has the potential to revolutionize the fragrance industry by enabling rapid and scalable fragrance production. Read more
With Monica, you can use the top AI models, image generators, and video generators, all in one integrated platform. Use code AISEARCH10 to get 25% OFF 'Unlimited Annual Plan' within 24h of registration, or enjoy 10% OFF. Try it for free today!
Microsoft has developed an AI model that can run on regular CPUs instead of powerful GPUs. This is achieved through a new approach that uses a 1-bit architecture, which eliminates the need for floating-point operations and reduces memory and energy requirements. The model can match or surpass the performance of GPU-based models while using less energy. Read more
SkyReels-V2 is an open-source AI model that can generate super long, cinematic-quality videos from text or images, using new tech called Diffusion Forcing. It supports both text-to-video and image-to-video, produces visuals close to commercial models, and is designed for flexible creative uses like story generation and camera direction, but requires powerful hardware to run. Read more
A new AI technique called Lp-Convolution mimics how the human brain processes visual information. This technique allows AI systems to focus on the most relevant parts of an image, just like the human brain does, and improves image recognition accuracy and efficiency. It has the potential to revolutionize fields such as autonomous driving, medical imaging, and robotics. Read more