跟TED演讲学英文:The next grand challenge for AI by Jim Fan

The next grand challenge for AI

在这里插入图片描述

Link: https://www.ted.com/talks/jim_fan_the_next_grand_challenge_for_ai?

Speaker: Jim Fan

Date: October 2023

文章目录

  • The next grand challenge for AI
    • Introduction
    • Vocabulary
    • Transcript
    • Summary
    • 后记

Introduction

Researcher Jim Fan presents the next grand challenge in the quest for AI: the “foundation agent,” which would seamlessly operate across both the virtual and physical worlds. He explains how this technology could fundamentally change our lives — permeating everything from video games and metaverses to drones and humanoid robots — and explores how a single model could master skills across these different realities.

研究员Jim Fan介绍了人工智能探索中的下一个重大挑战:“基础agent”,它将在虚拟和现实世界中无缝运行。他解释了这项技术如何从根本上改变我们的生活——渗透到从视频游戏和元宇宙到无人机和人形机器人的所有东西——并探索了一个模型如何掌握这些不同现实的技能。

Vocabulary

grand challenge:重大挑战

permeate:美 [ˈpɜːrmieɪt] 渗透,弥漫

in the quest for:在寻求

It is still early days in the quest for global power of the Chinese carmaker.

中国汽车制造商对全球影响力的争夺目前仍处于初期阶段.

www.24en.com

In the quest for self-expression, composers were imbued with a concern for detail.

为追求自我表现,本时期的作曲家极为注重细节。

www.tdict.com

Perhaps his greatest joy is in the chance encounters during the quest for the perfect image.

也许,在现实中寻找令他耿耿于心的心像并终于与之邂逅的过程,才是他最大的喜悦。

dict.youdao.com

She does gymnastic exercises four times a week in the quest for achieving the perfect figure.

为练成完美的体型,她每周做四次健身操.

wk.baidu.com

humanoid:英 [ˈhjuːmənɔɪd] 类人的,人形的

humanoid robot:人形机器人

metaverse:元宇宙

adrenaline:美 [əˈdrenəlɪn] 肾上腺素

I still remember the adrenaline of seeing history unfold that day. 我仍然记得那天看到历史展开时的肾上腺素。

embodiment:美 [ɪmˈbɑːdimənt] 具体形象;化身;具体表现;体现

It was in Germany alone that his hope seemed capable of embodiment.

似乎只有在德国他的希望才能得到体现。

牛津词典

Kofi is the embodiment of possibility.

Kofi身上体现了一种可能性。

www.kekenet.com

But it is perfect embodiment of modern styles.

是现代时尚风格的完美体现。

bbs.chinadaily.com.cn

A circle was the embodiment of his concept of life.

圈子是他生活理念的具体体现.

dict.engbus.cn

terrain:美 [təˈreɪn] 地形,地带

tree of skills: 技能树

It can explore the terrains, mine all kinds of materials, fight monsters, craft hundreds of recipes, and unlock an ever-expanding tree of skills. 它可以探索地形,开采各种材料,与怪物战斗,制作数百种食谱,并解锁不断扩展的技能树。

indefinitely: 美 [ɪnˈdefɪnətli] 无限期地

how does Voyager keep exploring indefinitely? 旅行者号是如何无限期地继续探索的?

kinematic:美 [ˌkɪnə’mætɪk] 运动学的;运动学上的

Metamorph is able to handle extremely varied kinematic characteristics from different robot bodies. Metamorph能够处理来自不同机器人主体的极其不同的运动学特征。

envision:美 [ɪnˈvɪʒn] 想象,设想

take a big stride:迈出一大步

The speaker envisions that MetaMorph 2.0 will be able to generalize to robot hands, humanoids, dogs, drones, and even beyond. Compared to Voyager, MetaMorph takes a big stride towards multi-body control. 演讲者设想MetaMorph 2.0将能够推广到机器人手,人形机器人,狗,无人机,甚至更远。与旅行者号相比,MetaMorph向多体控制迈出了一大步。

uncanny:奇怪的;神秘的;怪异的

And this car racing scene is where simulation has crossed the uncanny valley. 这个赛车场景是模拟穿越鬼谷的地方。

hardware accelerated ray tracing:硬件加速光线追踪

render extremely complex scenes: 渲染极其复杂的场景

photorealism: 美 [ˌfoʊdoʊˈri(ə)lɪzəm] 摄影写实主义;照相现实主义;超级现实主义

Thanks to hardware accelerated ray tracing, we’re able to render extremely complex scenes with breathtaking levels of details. And this photorealism you see here will help us train computer vision models that will become the eyes of every AI agent. 由于硬件加速光线跟踪,我们能够以惊人的细节水平渲染极其复杂的场景。你在这里看到的照片真实感将帮助我们训练计算机视觉模型,这些模型将成为每个AI智能体的眼睛。

be it xxx, or xxx:无论xxx还是xxx

All language tasks can be expressed as text in and text out. Be it writing poetry, translating English to Spanish, or coding Python, it’s all the same. 所有的语言任务都可以表示为文本输入和文本输出。无论是写诗、将英语翻译成西班牙语,还是编写Python代码,都是一样的。

Transcript

In spring of 2016,

I was sitting in a classroom
at Columbia University

but wasn’t paying attention
to the lecture.

Instead, I was watching a board game
tournament on my laptop.

And it wasn’t just any tournament,
but a very, very special one.

The match was between AlphaGo
and Lee Sedol.

The AI had just won three
out of five games

and became the first ever to beat
a human champion at a game of Go.

I still remember the adrenaline
of seeing history unfold that day.

The [glorious] moment when AI agents
finally entered the mainstream.

But when the excitement fades,

I realized that as mighty as AlphaGo was,

it could only do one thing
and one thing alone.

It isn’t able to play any other games,
like Super Mario or Minecraft,

and it certainly cannot do dirty laundry

or cook a nice dinner for you tonight.

But what we truly want
are AI agents as versatile as Wall-E,

as diverse as all the robot body forms

or embodiments in Star Wars

and works across infinite realities,

virtual or physical,
as in Ready Player One.

在这里插入图片描述

So how can we achieve
these science fictions

in possibly the near future?

This is a practitioner’s guide
towards generally capable AI agents.

Most of the ongoing research efforts
can be laid out nicely across three axes:

the number of skills an agent can do;

the body forms or embodiments
it can control;

and the realities it can master.

AlphaGo is somewhere here,

but the upper right corner
is where we need to go.

So let’s take it one axis at a time.

Earlier this year,
I led the Voyager project,

which is an agent that scales up massively
on a number of skills.

And there’s no game better than Minecraft

for the infinite creative
things it supports.

And here’s a fun fact for all of you.

Minecraft has 140 million active players.

And just to put that number
in perspective,

it’s more than twice
the population of the UK.

And Minecraft is so insanely popular
because it’s open-ended:

it does not have a fixed storyline
for you to follow,

and you can do whatever
your heart desires in the game.

And when we set Voyager free in Minecraft,

we see that it’s able to play
the game for hours on end

without any human intervention.

The video here shows snippets

from a single episode of Voyager
where it just keeps going.

It can explore the terrains,

mine all kinds of materials,
fight monsters,

craft hundreds of recipes

and unlock an ever-expanding
tree of skills.

So what’s the magic?

The core insight is coding as action.

First, we convert the 3D world
into a textual representation

using a Minecraft JavaScript API
made by the enthusiastic community.

Voyager invokes GPT4 to write
code snippets in JavaScript

that become executable skills in the game.

Yet, just like human engineers,
Voyager makes mistakes.

It isn’t always able to get a program
correct on the first try.

So we add a self-reflection
mechanism for it to improve.

There are three sources of feedback
for the self-reflection:

the JavaScript code execution error;

the agent state, like health and hunger;

and a world state, like terrains
and enemies nearby.

So Voyager takes an action,

observes the consequences of its action
on the world and on itself,

reflects on how it can possibly do better,

[tries] out some new action plans
and rinse and repeat.

And once the skill becomes mature,

Voyager saves it to a skill library
as a persistent memory.

You can think of the skill library
as a code repository

written entirely by a language model.

And in this way,

Voyager is able to bootstrap
its own capabilities recursively

as it explores
and experiments in Minecraft.

So let’s work through an example together.

Voyager finds itself hungry

and needs to get food as soon as possible.

It senses four entities nearby:

a cat, a villager, a pig
and some wheat seeds.

Voyager starts an inner monologue.

"Do I kill the cat or villager for food?

Horrible idea.

How about a wheat seed?

I can grow a farm out of the seeds,

but that’s going to take a long time.

So sorry, piggy, you are the chosen one."

(Laughter)

And Voyager finds a piece
of iron in its inventory.

So it recalls an old skill
from the library to craft an iron sword

and starts to learn
a new skill called “hunt pig.”

And now we also know that, unfortunately,
Voyager isn’t vegetarian.

(Laughter)

One question still remains:

how does Voyager keep
exploring indefinitely?

We only give it a high-level directive,

that is, to obtain as many
unique items as possible.

And Voyager implements a curriculum
to find progressively harder

and more novel challenges
to solve all by itself.

And putting all of these together,

Voyager is able to not only master

but also discover new skills
along the way.

在这里插入图片描述

And we did not pre-program any of this.

It’s all Voyager’s idea.

And this, what you see here,
is what we call lifelong learning.

When an agent is forever curious
and forever pursuing new adventures.

Compared to AlphaGo,

Voyager scales up massively
on a number of things he can do,

but still controls only one
body in Minecraft.

So the question is:
can we have an algorithm

that works across many different bodies?

Enter MetaMorph.

It is an initiative
I co-developed at Stanford.

We created a foundation model
that can control not just one

but thousands of robots

with very different
arm and leg configurations.

Metamorph is able to handle extremely
varied kinematic characteristics

from different robot bodies.

And this is the intuition
on how we create a MetaMorph.

First, we design a special vocabulary

to describe the body parts

so that every robot body
is basically a sentence

written in the language
of this vocabulary.

And then we just apply
a transformer to it,

much like ChatGPT,

but instead of writing out text,
MetaMorph writes out motor controls.

We show that MetaMorph is able to control
thousands of robots to go upstairs,

cross difficult terrains
and avoid obstacles.

Extrapolating into the future,

if we can greatly expand
this robot vocabulary,

I envision MetaMorph 2.0 will be able
to generalize to robot hands, humanoids,

dogs, drones and even beyond.

Compared to Voyager,

MetaMorph takes a big stride
towards multi-body control.

在这里插入图片描述

And now, let’s take everything
one level further

and transfer the skills
and embodiments across realities.

Enter IsaacSim,
Nvidia’s simulation effort.

The biggest strength of IsaacSim
is to accelerate physics simulation

to 1,000x faster than real time.

For example,

this character here learns
some impressive martial arts

by going through ten years
of intense training

in only three days of simulation time.

So it’s very much like the virtual
sparring dojo in the movie “Matrix.”

And this car racing scene

is where simulation has crossed
the uncanny valley.

Thanks to hardware
accelerated ray tracing,

we’re able to render
extremely complex scenes

with breathtaking levels of details.

And this photorealism you see here
will help us train computer vision models

that will become the eyes
of every AI agent.

And what’s more, IsaacSim
can procedurally generate worlds

with infinite variations
so that no two look the same.

So here’s an interesting idea.

If an agent is able to master
10,000 simulations,

then it may very well just generalize
to our real physical world,

which is simply the 10,001st reality.

And let that sink in.

As we progress through this map,

we will eventually get
to the upper right corner,

which is a single agent that generalizes
across all three axes,

and that is the “Foundation Agent.”

在这里插入图片描述

I believe training Foundation Agent
will be very similar to ChatGPT.

All language tasks can be expressed
as text in and text out.

Be it writing poetry,

translating English to Spanish
or coding Python,

it’s all the same.

And ChatGPT simply scales this up
massively across lots and lots of data.

It’s the same principle.

The Foundation Agent takes as input
an embodiment prompt and a task prompt

and output actions,

and we train it by simply
scaling it up massively

across lots and lots of realities.

I believe in a future where everything
that moves will eventually be autonomous.

And one day we will realize
that all the AI agents,

across Wall-E, Star Wars,
Ready Player One,

no matter if they are
in the physical or virtual spaces,

will all just be different prompts
to the same Foundation Agent.

And that, my friends,

will be the next grand challenge
in our quest for AI.

(Applause)

Summary

In spring of 2016, the speaker was sitting in a classroom at Columbia University but wasn’t paying attention to the lecture. Instead, he was watching a board game tournament on his laptop, a very special one between AlphaGo and Lee Sedol. AlphaGo had just made history by beating a human champion at the game of Go, winning three out of five games. The adrenaline of witnessing this historic moment marked the entry of AI agents into the mainstream.

After the excitement faded, the speaker realized that as mighty as AlphaGo was, it could only play Go and nothing else. The vision for AI agents that we truly want is to be as versatile as Wall-E, capable of diverse actions across infinite realities. To achieve this, research efforts are focused on three axes: the number of skills an agent can perform, the body forms or embodiments it can control, and the realities it can master.

Taking one axis at a time, progress is being made. The Voyager project, which he led, demonstrated an agent’s ability to scale up massively in the number of skills it can perform, using Minecraft as a platform for its diverse actions. Voyager’s core insight is “coding as action,” where it converts the 3D world into a textual representation, uses GPT-4 to write executable skills in JavaScript, and employs self-reflection mechanisms for improvement.

MetaMorph, another initiative, aims to control thousands of robots with varied configurations. By designing a special vocabulary to describe robot body parts and applying a transformer model to generate motor controls, MetaMorph demonstrates the potential for multi-body control. IsaacSim, Nvidia’s simulation effort, accelerates physics simulations to enable rapid skill acquisition in virtual environments, bridging the gap between virtual and physical realities.

The ultimate goal is to develop a Foundation Agent that can generalize across all three axes, mastering diverse skills, controlling various bodies, and understanding multiple realities. This agent, trained on massive amounts of data and across numerous realities, represents the next grand challenge in the quest for AI.

后记

2024年4月11日20点19分于上海。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/535724.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

深入理解MD5算法:原理、应用与安全

title: 深入理解MD5算法:原理、应用与安全 date: 2024/4/11 20:55:57 updated: 2024/4/11 20:55:57 tags: MD5算法数据安全哈希函数摘要算法安全漏洞SHA算法密码学 第一章:引言 导言 在当今数字化时代,数据安全和完整性变得至关重要。消息…

PHP婚恋小程序开发源码支持微信+公众号+APP

随着社会的发展和人们生活节奏的加快,传统的相亲方式已经不能满足现代人的需求。在此背景下,有人想到通过线上小程序的方式来满足更多的人进行相亲,所以在此情况下,婚恋相亲小程序由此出现。婚恋相亲小程序的功能有会员功能&#…

Postman接口测试工具

Postman接口测试工具 目录 Postman接口测试工具安装页面概述保存任务发送请求 安装 PostMan官方下载网址:https://www.getpostman.com/downloads/ 页面概述 保存任务 新建请求集合 命名为test 将刚刚的任务保存 选择新建的test集合 发送请求 新建窗口 request请…

解决源 “MySQL 8.0 Community Server“ 的 GPG 密钥已安装,但是不适用于此软件包。请检查源的公钥 URL 是否配置正确。

源 “MySQL 8.0 Community Server” 的 GPG 密钥已安装,但是不适用于此软件包。请检查源的公钥 URL 是否配置正确。 失败的软件包是:mysql-community-server-8.0.31-1.el7.x86_64 GPG 密钥配置为:file:///etc/pki/rpm-gpg/RPM-GPG-KEY-mysql…

vue源码解析——v-if和v-for哪个优先级高,如何避免两者同时使用

首先,官方不推荐v-if和v-for在同一个元素上使用。其次,如果两者同时使用,v-if和v-for的优先级怎么确定?在vue2和vue3中这两者的优先级顺序不一样。vue2是v-for优先,条件不存在时也会渲染多个注释节点。在vue3中进行了改…

JVM 垃圾收集器

JVM 垃圾收集器 垃圾收集器 垃圾收集器 Serial (串行):单线程垃圾回收器;采用复制算法 Serial Old:Serial 收集器的老年代版本,采用标记-整理算法。 ParNew:多线程的垃圾回收器(Serial 的多线程版本&#x…

推荐一个大学生可以参加的榜单赛事|人工智能赛道

【榜单赛事】第十四届全国大学生计算机应用能力与数字素养大赛 - 人工智能产业应用赛道人工智能编程赛项 正在火热报名中 本赛道定位于人工智能产业应用和实践,把人工智能产业真实的技能要求、能力要求体现在竞赛内容设计当中,并在竞赛环节融入实战项目…

SQLite Android 绑定(十八)

返回:SQLite—系列文章目录 上一篇:SQLite 在Android安装与定制方案(十七) 下一篇:SQLite—系列文章目录 ​ 应用程序编程 加载共享库 在使用任何与 SQLite 相关的方法或对象之前,本机 SQLite 必…

H5:canvas刮刮乐

今日无事&#xff0c;写一个刮刮乐用于收割亲弟弟零花钱 <!DOCTYPE html> <html lang"en"><head><meta charset"UTF-8" /><title>Title</title><style>body {height: 100vh;background-color: #fff;}.textDiv {…

数学之光照亮AI之路:探究数学背景在人工智能学习中的优势

在科技日新月异的今天&#xff0c;人工智能&#xff08;AI&#xff09;已成为引领未来发展的重要力量。然而&#xff0c;对于许多初涉此领域的学习者来说&#xff0c;AI的复杂性和深度常常让他们望而却步。有趣的是&#xff0c;那些数学基础扎实的人在学习AI时&#xff0c;往往…

2024 Android Studio安装及配置gradle快速省心搭建,不放C盘,前置搭建

题外话&#xff1a;要做安卓项目然后安装过Android Studio的朋友都知道&#xff0c;下载安装完成之后并不能直接开始你的第一个安卓项目的“ Hello World”&#xff0c;其中有要配置好gradle&#xff0c;在你测试好环境之前你会遇到很多问题&#xff0c;同时默认下使用中所需依…

Redis从入门到精通(十二)Redis实战(九)GEO查询附近商户、BitMap用户签到和统计、HLL的UV统计

↑↑↑请在文章开头处下载测试项目源代码↑↑↑ 文章目录 前言4.10 附近商户4.10.1 GEO介绍4.10.2 附近商户需求分析4.10.3 实现新增商户功能4.10.4 实现查询附近商户功能 4.11 用户签到4.11.1 用户签到需求分析4.11.2 BitMap介绍4.11.3 实现用户签到4.11.4 实现用户签到统计4.…

备战蓝桥杯---数学刷题3

话不多说&#xff0c;直接看题&#xff1a; 1. 我们可以得到大致一个思路&#xff0c;就是先枚举1-1e6的质数&#xff0c;然后看看有几个即可。 我们怎么知道个数呢&#xff1f; 首先我们知道1---n中有n/p的下取整个为p的倍数。 因此&#xff0c;p的个数至少是n/p的下取整个…

损失函数-交叉熵 梯度下降

文章目录 1、交叉熵的简单例子1.2、Classification Error&#xff08;分类错误率&#xff09;1.3、Mean Squared Error (均方误差)1.4、交叉熵损失函数1.5、二分类 2、什么是梯度下降法&#xff1f;2.2、梯度下降法的运行过程2.3、二元函数的梯度下降 1、交叉熵的简单例子 参考…

多模态小记:CLIP、BLIP与BLIP2

CLIP 使用网络上爬取得到的大量图文对进行对比学习&#xff0c;图文匹配的是正样本&#xff0c;图文不匹配的是负样本&#xff0c;使匹配样本的embedding之间的距离尽可能小&#xff0c;不匹配样本间的距离尽可能大。 缺点&#xff1a;网上爬的数据质量差&#xff0c;不能进行…

SOCKS代理是如何提高网络性能和兼容性的?

SOCKS代理作为一种网络协议中间件&#xff0c;不仅在提升网络隐私和安全性方面发挥着重要作用&#xff0c;也在提高网络性能和兼容性方面有着不容忽视的影响&#x1f680;。本文将深入探讨SOCKS代理如何通过减少网络延迟&#x1f680;、优化数据传输&#x1f504;、提高跨平台兼…

十进制,二进制,八进制,十六进制之间转换

一. 十进制转二进制 二. 二进制转十进制 三. 十进制转八进制 四. 八进制转十进制 五. 十进制转十六进制

数字档案馆升级改造的意义

数字档案馆升级改造的意义在于提升档案管理的效率和质量&#xff0c;更好地满足各方面的需求&#xff0c;并为数字时代的档案管理提供更好的支持和保障。具体意义包括&#xff1a; 1. 提高档案存储、检索和利用效率&#xff1a;玖拓智能数字化档案馆可以实现电子存储和快速检索…

el-tree如何修改节点点击颜色

el-tree修改点击节点颜色三大步 使用elementui库时&#xff0c;有时候我们会对里面提供的组件做一些样式修改。如果我们想要修改el-tree组件点击节点时的颜色&#xff0c;可以使用下面这种方式实现&#xff1a;

最新国产中文版官网chatGPT镜像网站

分享5个国产中文版chatGPT镜像网站&#xff0c;希望可以帮助到您&#xff01; 1️⃣ HiClaude3基于国外原版GPT模型、Claude模型开发&#xff0c;是资源丰富的全能镜像&#xff0c;适合各行各业的工作者。不仅有gpt&#xff0c;而且还支持图片对话、文件对话&#xff0c;轻松解…