Meta V-JEPA 2 world model uses raw video to train robots

Meta today presented V-JEPA 2, a 1.2-billion-parameter globe design educated mainly on video clip to sustain understanding, forecast, and preparation in robot systems. Improved the Joint Embedding Predictive Design (JEPA), the design is developed to aid robotics and various other “AI representatives” browse unknown atmospheres and jobs with minimal domain-specific training.

V-JEPA 2 complies with a two-stage training procedure all without extra human note. In the initial, self-supervised phase, the design gains from over 1 million hours of video clip and 1 million pictures, catching patterns of physical communication. The 2nd phase presents action-conditioned understanding making use of a tiny collection of robotic control information (regarding 62 hours), enabling the design to consider representative activities when forecasting results. This makes the design functional for preparing and closed-loop control jobs.

Meta stated it has actually currently checked this brand-new design on robotics in its laboratories. Meta records that V-JEPA 2 does well on usual robot jobs like and pick-and-place, making use of vision-based objective depictions. For easier jobs such as choice and location, the system creates prospect activities and examines them based upon anticipated results. For harder jobs, such as getting an item and positioning it in the best area, V-JEPA2 utilizes a series of aesthetic subgoals to lead actions.

In interior examinations, Meta stated the design revealed appealing capability to generalise to brand-new items and setups, with success prices varying from 65% to 80% on pick-and-place jobs in formerly undetected atmospheres.

Meta V-JEPA 2 world model uses raw video to train robots

” Our company believe globe designs will certainly bring in a brand-new age for robotics, making it possible for real-world AI representatives to aid with tasks and physical jobs without requiring expensive quantities of robot training information,” stated Meta’s principal AI researcher Yann LeCun.

Although V-JEPA 2 comes along over previous designs, Meta AI stated there stays an obvious space in between design and human efficiency on these criteria. Meta recommends this indicate the requirement for designs that can run throughout several timescales and techniques, such as integrating sound or responsive info.

To examine progression in physical understanding from video clip, Meta is additionally launching the complying with 3 criteria:

  • IntPhys 2: examines the design’s capability to compare literally possible and doubtful situations.
  • MVPBench: examinations whether designs count on real understanding instead of dataset faster ways in video clip question-answering.
  • CausalVQA: checks out thinking regarding cause-and-effect, expectancy, and counterfactuals.

The V-JEPA 2 code and design checkpoints are offered for industrial and study usage, with Meta intending to urge more comprehensive expedition of globe designs in robotics and symbolized AI.

Meta signs up with various other technology leaders in creating their very own globe designs. Google DeepMind has actually been creating its very own variation, Genie, which can replicate whole 3D atmospheres. And Globe Labs, a start-up started by Fei-Fei Li, elevated $230 million to develop big globe designs.

发布者:Robot Talk,转转请注明出处:https://robotalks.cn/meta-v-jepa-2-world-model-uses-raw-video-to-train-robots/

(0)
上一篇 11 6 月, 2025 8:15 下午
下一篇 11 6 月, 2025

相关推荐

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信
社群的价值在于通过分享与互动,让想法产生更多想法,创新激发更多创新。