Dancing robots learn better • Apple/MSFT quit OpenAI board • AI character acting is here • AI news this week
Welcome to the AI Search newsletter. Here are the top highlights in AI this week.
DeepMind’s Robot Can Give Context-Based Guided Tours of an Office Building
Researchers at DeepMind have developed a robot that can give context-based guided tours of an office building, using a combination of AI capabilities and a Gemini 1.5 Pro application. The robot can listen to a person's request, parse it, and translate it into behavior, such as guiding them to a specific location in the office. The robot's ability to understand the layout of the office is achieved through its long context window, which allows it to process different parts of the office scenery simultaneously and generate associations.
This New AI Can Animate Any Photo Or Video With Full Expression Control
LivePortrait is a cutting-edge AI for portrait animation that achieves efficient and controllable video synthesis from a single still image. The framework consists of a base model, stitching module, and retargeting modules, which work together to produce high-quality portrait animations with advanced control over facial expressions and head poses.
Robotics Breakthrough: Dancing Robots to Improve Human-Robot Interactions
Engineers have developed a humanoid robot that can learn and perform various expressive movements, including dance routines and gestures like waving, high-fiving, and hugging, while maintaining a steady gait on diverse terrains. This enhanced expressiveness and agility could improve human-robot interactions in settings such as factory assembly lines, hospitals, and homes, where robots could safely operate alongside humans or even replace them in hazardous environments.
Microsoft and Apple Drop OpenAI Board Seats Amid Antitrust Scrutiny
Microsoft and Apple have dropped their plans to take observer seats on OpenAI's board, citing growing antitrust scrutiny of Big Tech's influence over artificial intelligence. Microsoft, which invested $13 billion in OpenAI, will withdraw from its observer role, while Apple was also expected to join but will no longer do so. The decision comes as global watchdogs are increasingly scrutinizing the tech giants' clout in AI, and marks a significant setback for OpenAI's efforts to bring in major industry players to guide its development.
TurboType - Type Faster & Save Time
Turbotype is a Chrome extension that allows you to type faster by setting customizable shortcuts, designed to boost productivity and save time. Easily create, save, and insert keyboard shortcuts for frequently used phrases. Use it for free, forever!
This New AI Can Generate Full Drawing Videos
This innovative model is capable of generating videos that showcase the drawing process, given a still image as input. Through various showcases, PaintsUndo demonstrates its ability to extract coarse sketches, interpolate from external sketches, and even accept sketches as input.
New Model to Control the Movements of Humanoids in 3D Environments
Researchers introduced PlaMo, a new computational approach to plan and control the movements of humanoids in complex, 3D, physically simulated worlds. PlaMo consists of a scene-aware path planner and a robust control policy, which work together to generate realistic motions for virtual humanoid characters in response to textual instructions, taking into account the scene's physical constraints and dynamic obstacles.
ChatLLM by Abacus
ChatLLM by Abacus AI is an integrated platform that allows enterprises to use multiple LLMs, deploy custom agents, and collaborate with team members. Choose from state-of-the-art LLMs such as GPT-4o, Claude 3 Opus, and their new open source Smaug.
Framework proposed for 'child-safe AI' following incidents where kids saw chatbots as trustworthy
A recent study highlights the need for "child-safe AI" due to the "empathy gap" in AI chatbots, which can lead to potentially dangerous situations for young users. The study proposes a 28-item framework to help developers and policymakers ensure that AI is designed with children's needs in mind, following incidents where children treated chatbots as lifelike and trustworthy, and were given harmful advice.
New tool uses vision language models to safeguard against offensive image content
Researchers have developed a tool called LlavaGuard, which utilizes vision language models to filter, evaluate, and suppress specific image content in large datasets or from image generators. LlavaGuard can adapt to different legal regulations and user requirements, and provides detailed explanations of its safety ratings by categorizing content and explaining why it is classified as safe or unsafe.
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models
Researchers from MIT have found that the reasoning skills of large language models (LLMs) are often overestimated. By testing LLMs on variations of different tasks, the researchers discovered that the models' high performance on familiar tasks is often due to memorization rather than true reasoning abilities. The study revealed that LLMs struggle with unfamiliar scenarios and counterfactual tasks, indicating a lack of generalizable skills and limited ability to reason and adapt to new situations.