跟TED演讲学英文:A new way to build AI, openly by Percy Liang

A new way to build AI, openly

在这里插入图片描述

Link: https://www.ted.com/talks/percy_liang_a_new_way_to_build_ai_openly?

Speaker: Percy Liang

Date: October 2023

文章目录

  • A new way to build AI, openly
    • Introduction
    • Vocabulary
    • Transcript
    • Summary
    • 后记

Introduction

Today’s AI is trained on the work of artists and writers without attribution, its core values decided by a privileged few. What if the future of AI was more open and democratic? Researcher Percy Liang offers a vision of a transparent, participatory future for emerging technology, one that credits contributors and gives everyone a voice.

今天的人工智能是在没有归属的艺术家和作家的作品上接受训练的,其核心价值由少数特权阶层决定。如果人工智能的未来更加开放和民主会怎样?研究员Percy Liang为新兴技术提供了一个透明、参与性的未来愿景,一个表彰贡献者并给予每个人发言权的愿景。

Vocabulary

participatory: 美 [pɑːrˈtɪsəpətɔːri] 参与性的

core value:核心价值

intrigue:美 [ɪnˈtriːɡ] 引起xxx的好奇心;耍阴谋

I was intrigued, I wanted to understand it, I wanted to see how far we could go with this.我很感兴趣,我想了解它,我想看看我们能走多远。

enter the mainstream:跻身主流,成为主流

Language models and more generally, foundation models, have taken off and entered the mainstream. 语言模型和更一般的基础模型已经起飞并进入主流。

ensemble:美 [ɑːnˈsɑːmbl] 乐团,剧团: jazz ensemble 爵士乐合奏团, 注意发音

It was like a jazz ensemble where everyone was riffing off of each other, developing the technology that we have today. 这就像一个爵士乐合奏团,每个人都在即兴表演,发展我们今天拥有的技术。

not released openly: 没有开源

recipe:注意发音 美 [ˈresəpi] 烹饪法;食谱

And then today, the most advanced foundation models in the world are not released openly. They are instead guarded closely behind black box APIs with little to no information about how they’re built. So it’s like we have these castles which house the world’s most advanced AIs and the secret recipes for creating them. 然后今天,世界上最先进的基础模型没有公开发布。相反,它们被严密保护在黑盒API之后,几乎没有关于它们是如何构建的信息。这就像我们有这些城堡,里面有世界上最先进的人工智能和创造它们的秘方。

asymmetry: 不对称

stark: 明显的

but the resource and information asymmetry is stark. 但是资源和信息的不对称是明显的。

opacity:美 [oʊˈpæsədi] 不透明,晦涩,难懂

This opacity and centralization of power is concerning. 这种不透明和权力集中令人担忧。

tenet:美 [ˈtenɪt] 原则,信条

The most basic tenet of machine learning is that the training data and the test data have to be independent for evaluation to be meaningful. So if we don’t know what’s in the training data, then that 95 percent number is meaningless. 机器学习的最基本原则是训练数据和测试数据必须独立,评估才有意义。因此,如果我们不知道训练数据中有什么,那么95%的数字就没有意义。

we are flying blind.

accountability: 有责任,责任制

And with all the enthusiasm to deploying these models in the real world without meaningful evaluation, we are flying blind. And transparency isn’t just about the training data or evaluation. It’s also about environmental impact, labor practices, release processes, risk mitigation strategies. Without transparency, we lose accountability. 尽管我们满怀热情地在现实世界中部署这些模型,但却没有进行有意义的评估,这无疑是盲目的。透明度不仅仅是关于训练数据或评估。它还涉及环境影响、劳工实践、发布流程、风险缓解策略。没有透明度,我们就失去了问责制。

affirmative action

Affirmative action (also sometimes called reservations, alternative access, positive discrimination or positive action in various countries’ laws and policies)[1][2][3][4][5][6][7] refers to a set of policies and practices within a government or organization seeking to benefit marginalized groups. Historically and internationally, support for affirmative action has been justified by the idea that it may help with bridging inequalities in employment and pay, increasing access to education, and promoting diversity, social equity and redressing alleged wrongs, harms, or hindrances, also called substantive equality.[8]

subjective,controversial,contested questions

These are highly subjective, controversial, contested questions, and any decision on how to answer them is necessarily value-laden.这些都是高度主观的、有争议的、有争议的问题,任何关于如何回答这些问题的决定都必然是基于价值(观)的。

without attribution or consent:没有归属或者未经同意

The data here is a result of human labor, and currently this data is being scraped, often without attribution or consent. 这里的数据是人类劳动的结果,目前这些数据正在被爬取,通常没有归属或同意。

status quo:现状,美 [ˌsteɪtəs ˈkwoʊ]

So how can we change the status quo? 我们如何才能改变现状?

bleak:美 [bliːk] 凄凉的,暗淡的

situation seems pretty bleak:情况看起来相当惨淡。

With these castles,the situation might seem pretty bleak. But let me try to give you some hope.

encyclopedia:美 [ɪnˌsaɪkləˈpiːdiə] 百科全书, 注意发音

against all odds:尽管很困难,排除万难

But against all odds, Wikipedia prevailed. 但尽管困难重重,维基百科还是流行开来。

hobbyist:美 [ˈhɑbiɪst] 业余爱好者

peer production:对等生产

Peer production (also known as mass collaboration) is a way of producing goods and services that relies on self-organizing communities of individuals. In such communities, the labor of many people is coordinated towards a shared outcome.

embark on:开始从事,着手

I feel the same excitement about this vision as I did 19 years ago as that master’s student, embarking on his first NLP research project. 我对这个愿景感到兴奋,就像我19年前作为那个硕士生开始他的第一个NLP研究项目时一样。

Transcript

I was a young masters student

about to start my first
NLP research project,

and my task was to train a language model.

Now that language model was a little bit
smaller than the ones we have today.

It was trained on millions
rather than trillions of words.

I used a hidden Markov model
as opposed to a transformer,

but that little language model I trained

did something I thought was amazing.

It took all this raw text

and somehow it organized it into concepts.

A concept for months,

male first names,

words related to the law,

countries and continents and so on.

But no one taught
these concepts to this model.

It discovered them all by itself,
just by analyzing the raw text.

But how?

I was intrigued,
I wanted to understand it,

I wanted to see how far
we could go with this.

So I became an AI researcher.

In the last 19 years,

we have come a long way
as a research community.

Language models and more generally,
foundation models, have taken off

and entered the mainstream.

But, it is important to realize
that all of these achievements

are based on decades of research.

Research on model architectures,

research on optimization algorithms,
training objectives, data sets.

For a while,

we had an incredible free culture,

a culture of open innovation,

a culture where researchers published,

researchers released data sets, code,

so that others can go further.

It was like a jazz ensemble where everyone
was riffing off of each other,

developing the technology
that we have today.

But then in 2020,

things started changing.

Innovation became less open.

And then today, the most advanced
foundation models in the world

are not released openly.

They are instead guarded closely
behind black box APIs

with little to no information
about how they’re built.

So it’s like we have these castles

which house the world’s most advanced AIs

and the secret recipes for creating them.

Meanwhile, the open community
still continues to innovate,

but the resource and information
asymmetry is stark.

This opacity and centralization
of power is concerning.

Let me give you three reasons why.

First, transparency.

With closed foundation models,
we lose the ability to see,

to evaluate, to audit these models

which are going to impact
billions of people.

Say we evaluate a model through an API
on medical question answering

and it gets 95 percent accuracy.

What does that 95 percent mean?

The most basic tenet of machine learning

is that the training data
and the test data

have to be independent
for evaluation to be meaningful.

So if we don’t know
what’s in the training data,

then that 95 percent
number is meaningless.

And with all the enthusiasm
to deploying these models

in the real world
without meaningful evaluation,

we are flying blind.

And transparency isn’t just
about the training data or evaluation.

It’s also about environmental impact,

labor practices, release processes,

risk mitigation strategies.

Without transparency,
we lose accountability.

It’s like not having nutrition labels
on the food you eat,

or not having safety ratings
on the cars you drive.

Fortunately, the food and auto industries
have matured over time,

but AI still has a long way to go.

Second, values.

So model developers like to talk
about aligning foundation models

to human values,
which sounds wonderful.

But whose values
are we talking about here?

If we were just building a model
to answer math questions,

maybe we wouldn’t care,

because as long as the model
produces the right answer,

we would be happy,
just as we’re happy with calculators.

But these models are not calculators.

These models will attempt to answer
any question you throw it.

Who is the best basketball
player of all time?

Should we build nuclear reactors?

What do you think of affirmative action?

These are highly subjective,
controversial, contested question,

and any decision on how to answer them
is necessarily value laden.

And currently, these values
are unilaterally decided

by the rulers of the castles.

So can we imagine
a more democratic process

for determining these values
based on the input from everybody?

So foundation models will be the primary
way that we interact with information.

And so determining these values
and how we set them

will have a sweeping impact

on how we see the world and how we think.

Third, attribution.

So why are these foundation
models so powerful?

It’s because they’re trained
on massive amounts of data.

See what machine-learning
researchers call data

is what artists call art

or writers call books

or programers call software.

The data here is a result of human labor,

and currently this data is being scraped,

often without attribution or consent.

So understandably, some people are upset,

filing lawsuits, going on strike.

But this is just an indication
that the incentive system is broken.

And in order to fix it,
we need to center the creators.

We need to figure out
how to compensate them

for the value of the content
they produced,

and how to incentivize them
to continue innovating.

Figuring this out
will be critical to sustaining

the long term development of AI.

So here we are.

We don’t have transparency
about how the models are being built.

We have to live with a fixed values
set by the rulers of the castles,

and we have no means of attributing

the creators who make
foundation models possible.

So how can we change the status quo?

With these castles,

the situation might seem pretty bleak.

But let me try to give you some hope.

In 2001,

Encyclopedia Britannica was a castle.

Wikipedia was an open experiment.

It was a website
where anyone could edit it,

and all the resulting knowledge
would be made freely available

to everyone on the planet.

It was a radical idea.

In fact, it was a ridiculous idea.

But against all odds, Wikipedia prevailed.

In the '90s, Microsoft
Windows was a castle.

Linux was an open experiment.

Anyone could read its source code,
anyone could contribute.

And over the last two decades,

Linux went from being a hobbyist toy

to the dominant operating system
on mobile and in the data center.

So let us not underestimate
the power of open source

and peer production.

These examples show us a different way
that the world could work.

A world in which everyone can participate

and development is transparent.

So how can we do the same for AI?

Let me end with a picture.

The world is filled
with incredible people:

artists, musicians, writers, scientists.

Each person has unique skills,
knowledge and values.

Collectively, this defines
the culture of our civilization.

And the purpose of AI, as I see it,

should be to organize
and augment this culture.

So we need to enable people to create,
to invent, to discover.

And we want everyone to have a voice.

The research community has focused
so much on the technical progress

that is necessary to build these models,

because for so long,
that was the bottleneck.

But now we need to consider
the social context

in which these models are built.

Instead of castles,

let us imagine a more transparent
and participatory process for building AI.

I feel the same excitement
about this vision

as I did 19 years ago
as that masters student,

embarking on his first
NLP research project.

But realizing this vision will be hard.

It will require innovation.

It will require participation
of researchers, companies, policymakers,

and all of you

to not accept the status quo as inevitable

and demand a more participatory
and transparent future for AI.

Thank you.

(Applause)

Summary

The speaker’s manuscript outlines his journey from a young master’s student working on his first NLP research project in 2004 to becoming an AI researcher. He highlights the significant advancements made by the research community over the last 19 years, particularly in language and foundation models. However, he expresses concerns about the recent trend towards less open innovation, with advanced models now hidden behind closed APIs. This shift raises issues of transparency, values, and attribution in AI development.

The speaker emphasizes the importance of transparency in evaluating and auditing models, as well as the need to consider whose values are embedded in these models. He also discusses the lack of attribution and consent in the data used to train these models, calling attention to the broken incentive system in AI development.

To address these challenges, the speaker advocates for a more open and participatory approach to AI development, citing the success of projects like Wikipedia and Linux. He believes that by embracing open source and peer production principles, the AI community can create a more transparent and inclusive future for AI development.

演讲者的手稿概述了他从2004年作为年轻的硕士生开始进行他的第一个自然语言处理研究项目,到成为人工智能研究员的旅程。他强调了过去19年来研究界取得的重大进展,特别是在语言和基础模型方面。然而,他对最近向较少开放创新的趋势表示担忧,因为现在先进的模型都隐藏在封闭的API背后。这种转变引发了AI开发中透明度、价值观和归因的问题。

演讲者强调了在评估和审计模型时透明度的重要性,以及需要考虑到这些模型中嵌入的价值观。他还讨论了在训练这些模型所使用的数据中缺乏归因和同意,引起了人工智能开发中破碎的激励机制的关注。

为了解决这些挑战,演讲者主张采取更开放和参与式的人工智能开发方式,引用了维基百科和Linux等项目的成功。他认为,通过 embracing开源和peer production原则,AI社区可以为AI开发创造一个更透明和包容的未来。

后记

2024年4月10日19点17分写于上海市。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/533347.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

STM32F4 IAP跳转APP问题及STM32基于Ymodem协议IAP升级笔记

STM32F4 IAP 跳转 APP问题 ST官网IAP例程Chapter1 STM32F4 IAP 跳转 APP问题1. 概念2. 程序2.1 Bootloader 程序 问题现象2.2. APP程序 3. 代码4. 其他问题 Chapter2 STM32-IAP基本原理及应用 | ICP、IAP程序下载流程 | 程序执行流程 | 配置IAP到STM32F4xxxChapter3 STM32基于Y…

信息系统项目管理师——第5章信息系统工程(三)

近几期的考情来看,本章选择题稳定考4分,考案例的可能性有,需要重点学习。本章节专业知识点特别多。但是,只考课本原话,大家一定要把本章至少通读一遍,还要多刷题,巩固重点知识。 3 系统集成 3…

Qt5 编译 Qt Creator 源码中的 linguist 模块

文章目录 下载 Qt Creator 源码手动翻译多语言自动翻译多语言 下载 Qt Creator 源码 Github: https://github.com/qt/qttools 笔记打算用 Qt 5.12.12 来编译 qt creator-linguist 所以笔者下载的是 tag - 5.12.12 ,解压后如下,先删除多余的文件&#xf…

30岁《爱·回家》小花多次得罪高层,正式宣布离巢TVB。

30岁的苏韵姿(Andrea)16年选港姐入行,虽然无三甲名次,但靠着皇后大学戏剧学士学位背景,她很快已有机会入剧组,凭《爱回家之开心速递》熊心如(红衫鱼)一角成功入屋,不过去…

先进电机技术 —— 步进电机控制综述

一、背景 随着自动化技术的发展和精密控制需求的增长,步进电机作为一种重要的执行元件在众多领域展现出了卓越的性能优势。步进电机,又称为步进驱动器或步进马达,是一种能够将电脉冲信号精确转换为角位移或直线位移的特殊电动机。其工作原理…

防止狗上沙发,写一个浏览器实时识别目标检测功能

家里有一条狗🐶,很喜欢乘人不备睡沙发🛋️,恰好最近刚搬家 狗迎来了掉毛期 不想让沙发上很多毛。所以希望能识别到狗,然后播放“gun 下去”的音频📣。 需求分析 需要一个摄像头📷 利用 chrome…

14款DevOps/SRE工具,助力提升运维效率

简介 随着平台工程的兴起,DevOps 和 SRE 不断发展,带来了新一代工具,旨在提高软件开发和运维的效率、可扩展性和可靠性。 在本篇文章中,我们将深入探讨一些最具发展前景的工具,它们正在塑造持续集成与部署、监控与可观…

Redis -- 缓存击穿问题

缓存击穿问题也叫热点Key问题,就是一个被高并发访问并且缓存重建业务较复杂的key突然失效了,无数的请求访问会在瞬间给数据库带来巨大的冲击。 常见的解决方案有两种: 互斥锁 逻辑过期 逻辑分析:假设线程1在查询缓存之后&…

2024认证杯数学建模A题思路模型代码

目录 2024认证杯数学建模A题思路模型代码:4.11开赛后第一时间更更新,获取见文末名片 2023年认证杯数学建模 2024年认证杯思路代码获取见此 2024认证杯数学建模A题思路模型代码:4.11开赛后第一时间更更新,获取见文末名片 2023年认…

花样鼠标悬停特效

代码&#xff1a; <!DOCTYPE html> <html lang"en"><head><meta charset"UTF-8"><meta name"viewport" content"widthdevice-width, initial-scale1.0"><title>Document</title><style&…

python如何输入多行

Python中的Input()函数在输入时&#xff0c;遇到回车符&#xff0c;那么一次输入就结束了。这不能满足输入多行文本并且行数也不确定的情形&#xff0c;当然输入空行也是允许的。 方法1&#xff1a;利用异常处理机制实现 lines[] while True:try:lines.append(input())except:…

Node.js 的 5 个常见服务器漏洞

Node.js 是一个强大且广泛使用的 JavaScript 运行时环境&#xff0c;用于构建服务器端应用程序。然而&#xff0c;与任何其他软件一样&#xff0c;Node.js 也有自己的一些漏洞&#xff0c;如果处理不当&#xff0c;可能会导致安全问题。请注意&#xff0c;这些漏洞并不是 Node.…

数据驱动目标:如何通过OKR实现企业数字化转型

在数字化转型的浪潮中&#xff0c;企业管理者面临着前所未有的挑战和机遇。如何确保企业在变革中不仅能够生存&#xff0c;还能蓬勃发展&#xff1f;答案可能就在于有效的目标管理——特别是采用OKR&#xff08;Objectives and Key Results&#xff0c;目标与关键成果&#xff…

Hive概述与基本操作

一、Hive基本概念 1.什么是hive? &#xff08;1&#xff09;hive是数据仓库建模的工具之一 &#xff08;2&#xff09;可以向hive传入一条交互式的sql,在海量数据中查询分析得到结果的平台 2.Hive简介 Hive本质是将SQL转换为MapReduce的任务进行运算&#xff0c;底层由HDFS…

使用Vivado Design Suite进行物理优化(二)

物理优化是对设计的negative-slack路径进行时序驱动的优化。而phys_opt_design 命令是用于对设计进行物理优化。这个命令可以在布局后的后置模式&#xff08;post-place mode&#xff09;中运行&#xff0c;也就是在放置所有组件之后&#xff1b;还可以在完全布线后的后置模式&…

【oracle数据库安装篇一】Linux5.6基于LVM安装oracle10gR2单机

说明 本篇文章主要介绍了Linux5.6基于LVM安装oracle10gR2单机的配置过程&#xff0c;比较详细&#xff0c;基本上每一个配置部分的步骤都提供了完整的脚本&#xff0c;安装部分都提供了简单的说明和截图&#xff0c;帮助你100%安装成功oracle数据库。 安装过程有不明白的地方…

爬虫学习第一天

爬虫-1 爬虫学习第一天1、什么是爬虫2、爬虫的工作原理3、爬虫核心4、爬虫的合法性5、爬虫框架6、爬虫的挑战7、难点8、反爬手段8.1、Robots协议8.2、检查 User-Agent8.3、ip限制8.4、SESSION访问限制8.5、验证码8.6、数据动态加载8.7、数据加密-使用加密算法 9、用python学习爬…

Flody算法求解多源最短路问题

Flody算法求解多源最短路问题 蓝桥公园 #include <bits/stdc.h> using namespace std; #define int long long const int N409; int n,m,q,d[N][N]; signed main(){ios::sync_with_stdio(0),cin.tie(0),cout.tie(0);cin>>n>>m>>q;memset(d,0x3f,sizeof…

mac 配置前端开发环境brew,git,nvm,nrm

我的电脑是mac 3 pro 一、配置Homebrew 打开终端&#xff0c;执行指令 /bin/zsh -c "$(curl -fsSL https://gitee.com/cunkai/HomebrewCN/raw/master/Homebrew.sh)"查看版本 brew -v 安装nvm brew install nvm 再执行 brew reinstall nvm 我这边安装好了…

云服务器上Docker启动的MySQL会自动删除数据库的问题

一、问题说明 除了常见的情况&#xff0c;例如没有实现数据挂载&#xff0c;导致数据丢失外&#xff0c;还需要考虑数据库是否被攻击&#xff0c;下图 REVOVER_YOUR_DATA 就代表被勒索了&#xff0c;这种情况通常是数据库端口使用了默认端口&#xff08;3306&#xff09;且密码…