Voice Commands & Chaining Tasks on Humanoid EVE

We have confidence beforehand developed an self reliant mannequin that can merge many projects accurate into a single purpose-conditioned neural network. Alternatively, when multi-job devices are minute (

We have confidence beforehand developed an self reliant mannequin that can merge many projects accurate into a single purpose-conditioned neural network. Alternatively, when multi-job devices are minute (<100M parameters), including files to repair one job’s habits on the total adversely affects behaviors on different projects. Increasing the mannequin parameter depend can mitigate this forgetting predicament, however also take longer to put together, which slows down our skill to search out out what demonstrations we should always silent score to toughen robotic habits.

How will we iterate like a flash on the files while constructing a generalist robotic that can stop many projects with a single neural network? We desire to decouple our skill to love a flash toughen job efficiency from our skill to merge multiple capabilities accurate into a single neural network. To stop this, we’ve constructed a command-controlled natural language interface to chain short-horizon capabilities all the intention by strategy of multiple minute devices into longer ones. With humans directing the skill chaining, this lets in us to prevent the long-horizon behaviors shown in this video:

Even supposing humans can stop long horizon chores trivially, chaining multiple self reliant robotic abilities in a series is laborious since the 2d skill has to generalize to the total moderately random starting positions that the robotic finds itself in when the first skill finishes. This compounds with every successive skill – the third skill has to address the variation in outcomes of the 2d skill, etc.

From the person perspective, the robotic is able to doing many natural language projects and the correct amount of devices controlling the robotic is abstracted away. This lets in us to merge the most effective-job devices into purpose-conditioned devices over time. Single-job devices also present a staunch baseline to prevent shadow mode opinions: comparing how a brand serene mannequin’s predictions fluctuate from an existing baseline at test-time. Once the purpose-conditioned mannequin fits single-job mannequin predictions properly, we can change over to a more highly effective, unified mannequin with out a change to the person workflow.

Directing robots with this excessive-level language interface affords a brand serene individual ride for files sequence. As adverse to the exhaust of VR to manipulate a single robotic, an operator can converse multiple robots with excessive level language and let the low-level policies discontinuance low-level actions to have confidence those excessive-level dreams. Due to excessive-level actions are despatched infrequently, operators can also administration robots remotely, as shown under:

Voice Commands & Chaining Tasks on Humanoid EVE

Point out that the above video is no longer fully self reliant; humans are dictating when robots need to silent change projects. Naturally, the following step after constructing a dataset of vision-to-natural language bid pairs is to automate the prediction of excessive level actions the exhaust of vision-language devices adore GPT-4o, VILA, and Gemini Imaginative and prescient.

Handle tuned!
Eric Jang

发布者:Dr.Durant,转转请注明出处:https://robotalks.cn/voice-commands-chaining-tasks-on-humanoid-eve/

(0)
上一篇 12 7 月, 2024 3:17 上午
下一篇 12 7 月, 2024 3:31 上午

相关推荐

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信
社群的价值在于通过分享与互动,让想法产生更多想法,创新激发更多创新。