Large Language Models Paper 分享

论文1: ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?

简介

2022年11月,OpenAI发布了ChatGPT,这一事件在AI社区甚至全世界引起了轰动。首次,一个基于应用的AI聊天机器人能够提供有帮助、安全和有用的答案,遵循人类指令,甚至承认并纠正之前的错误。作为第一个这样的应用,ChatGPT在其推出仅两个月内,用户数量就达到了1亿,远远快于其他流行应用如TikTok或YouTube。因此,它也吸引了巨额的商业投资,因为它有望降低劳动成本,自动化工作流程,甚至为客户带来新的体验。

但ChatGPT的闭源特性可能引发诸多问题。首先,由于不了解内部细节,比如预训练和微调过程,很难正确评估其潜在风险,尤其是考虑到大模型可能生成有害、不道德和虚假的内容。其次,有报道称ChatGPT的性能随时间变化,妨碍了可重复的结果。第三,ChatGPT经历了多次故障,仅在2023年11月就发生了两次重大故障,期间无法访问ChatGPT网站及其API。最后,采用ChatGPT的企业可能会关注API调用的高成本、服务中断、数据所有权和隐私问题,以及其他不可预测的事件,比如最近有关CEO Sam Altman被解雇并最终回归的董事会闹剧。

此时,开源大模型应运而生,社区一直在积极推动将高性能的大模型保持开源。然而,截至2023年末,大家还普遍认为类似Llama-2或Falcon这样的开源大模型在性能上落后于它们的闭源模型,如OpenAI的GPT3.5(ChatGPT)和GPT-4,Anthropic的Claude2或Google的Bard3,其中GPT-4通常被认为是最出色的。然而,令人鼓舞的是差距正在变得越来越小,开源大模型正在迅速赶上。

地址:[2311.16989] ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up? (arxiv.org)

更有趣的 AI Agent

  • Generative Agents: Interactive Simulacra of Human Behavior https://arxiv.org/abs/2304.03442

  • RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models https://arxiv.org/abs/2310.00746

  • Role play with large language models https://www.nature.com/articles/s41586-023-06647-8

  • Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf https://arxiv.org/abs/2309.04658

  • MemGPT: Towards LLMs as Operating Systems https://arxiv.org/abs/2310.08560

  • Augmenting Language Models with Long-Term Memory https://arxiv.org/abs/2306.07174

  • Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models https://arxiv.org/pdf/2307.16180.pdf

更有用的 AI Agent

  • The Rise and Potential of Large Language Model Based Agents: A Survey https://arxiv.org/abs/2309.07864

  • MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework https://arxiv.org/abs/2308.00352

  • Communicative Agents for Software Development https://arxiv.org/pdf/2307.07924.pdf

  • Large Language Models Can Self-Improve https://arxiv.org/abs/2210.11610

  • Evaluating Human-Language Model Interaction https://arxiv.org/abs/2212.09746

  • Large Language Models can Learn Rules https://arxiv.org/abs/2310.07064

  • AgentBench: Evaluating LLMs as Agents https://arxiv.org/abs/2308.03688

  • WebArena: A Realistic Web Environment for Building Autonomous Agents https://arxiv.org/abs/2307.13854

  • TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT https://arxiv.org/abs/2307.08674

任务规划与分解

  • Chain-of-Thought Prompting Elicits Reasoning in Large Language Models https://arxiv.org/abs/2201.11903

  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models https://arxiv.org/abs/2305.10601

  • Implicit Chain of Thought Reasoning via Knowledge Distillation https://arxiv.org/abs/2311.01460

  • ReAct: Synergizing Reasoning and Acting in Language Models https://arxiv.org/abs/2210.03629

  • ART: Automatic multi-step reasoning and tool-use for large language models https://arxiv.org/abs/2303.09014

  • Branch-Solve-Merge Improves Large Language Model Evaluation and Generation https://arxiv.org/abs/2310.15123

  • WizardLM: Empowering Large Language Models to Follow Complex Instructionshttps://arxiv.org/pdf/2304.12244.pdf

幻觉

  • Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Modelshttps://arxiv.org/pdf/2309.01219.pdf

  • Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback https://arxiv.org/abs/2302.12813

  • SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models https://arxiv.org/abs/2303.08896

  • WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus https://arxiv.org/abs/2304.04358

多模态

  • Learning Transferable Visual Models From Natural Language Supervision (CLIP) https://arxiv.org/abs/2103.00020

  • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT): https://arxiv.org/abs/2010.11929

  • MiniGPT-v2: large language model as a unified interface for vision-language multi-task learninghttps://arxiv.org/abs/2310.09478

  • MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models https://arxiv.org/abs/2304.10592

  • NExT-GPT: Any-to-Any Multimodal LLM https://arxiv.org/pdf/2309.05519.pdf

  • Visual Instruction Tuning (LLaVA) https://arxiv.org/pdf/2304.08485.pdf

  • Improved Baselines with Visual Instruction Tuning (LLaVA-1.5) https://arxiv.org/abs/2310.03744

  • Sequential Modeling Enables Scalable Learning for Large Vision Models (LVM) https://arxiv.org/pdf/2312.00785.pdf

  • CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation https://arxiv.org/pdf/2311.18775.pdf

  • Neural Discrete Representation Learning (VQ-VAE) https://browse.arxiv.org/pdf/1711.00937.pdf

  • Taming Transformers for High-Resolution Image Synthesis (VQ-GAN) https://arxiv.org/abs/2012.09841

  • Swin Transformer: Hierarchical Vision Transformer using Shifted Windows https://arxiv.org/abs/2103.14030

  • BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models https://browse.arxiv.org/pdf/2301.12597.pdf

  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning https://browse.arxiv.org/pdf/2305.06500.pdf

  • ImageBind: One Embedding Space To Bind Them All https://arxiv.org/abs/2305.05665

  • Meta-Transformer: A Unified Framework for Multimodal Learning https://arxiv.org/abs/2307.10802

图片/视频生成

  • High-Resolution Image Synthesis with Latent Diffusion Models https://arxiv.org/pdf/2112.10752.pdf

  • Structure and Content-Guided Video Synthesis with Diffusion Models (RunwayML Gen1) https://browse.arxiv.org/pdf/2302.03011.pdf

  • Hierarchical Text-Conditional Image Generation with CLIP Latents (DaLLE-2) https://arxiv.org/pdf/2204.06125.pdf

  • AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning https://arxiv.org/abs/2307.04725

  • Adding Conditional Control to Text-to-Image Diffusion Models (ControlNet) https://arxiv.org/abs/2302.05543

  • SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesishttps://arxiv.org/abs/2307.01952

  • Zero-1-to-3: Zero-shot One Image to 3D Object https://arxiv.org/abs/2303.11328

  • Scaling Vision Transformers to 22 Billion Parameters https://arxiv.org/abs/2302.05442

  • Glow: Generative Flow with Invertible 1×1 Convolutions https://browse.arxiv.org/pdf/1807.03039.pdf

  • Language Model Beats Diffusion – Tokenizer is Key to Visual Generation https://arxiv.org/pdf/2310.05737.pdf

  • InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generationhttps://arxiv.org/pdf/2309.06380.pdf

  • Perceptual Losses for Real-Time Style Transfer and Super-Resolution https://arxiv.org/pdf/1603.08155.pdf

  • CogView: Mastering Text-to-Image Generation via Transformers https://arxiv.org/abs/2105.13290

  • Diffusion Models for Video Prediction and Infilling https://arxiv.org/abs/2206.07696

语音合成

  • Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech (VITS)https://browse.arxiv.org/pdf/2106.06103.pdf

  • Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (VALL-E)https://arxiv.org/abs/2301.02111

  • Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling (VALL-E X) https://arxiv.org/pdf/2303.03926.pdf

  • MusicLM: Generating Music From Text https://arxiv.org/abs/2301.11325

大模型基础

  • Attention Is All You Need https://arxiv.org/abs/1706.03762

  • Sequence to Sequence Learning with Neural Networks https://arxiv.org/abs/1409.3215

  • Neural Machine Translation by Jointly Learning to Align and Translate https://arxiv.org/abs/1409.0473

  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805

  • Scaling Laws for Neural Language Models https://arxiv.org/pdf/2001.08361.pdf

  • Emergent Abilities of Large Language Models https://openreview.net/pdf?id=yzkSU5zdwD

  • Training Compute-Optimal Large Language Models (ChinChilla scaling law) https://arxiv.org/abs/2203.15556

  • Scaling Instruction-Finetuned Language Models https://arxiv.org/pdf/2210.11416.pdf

  • Direct Preference Optimization:

  • Your Language Model is Secretly a Reward Model https://arxiv.org/pdf/2305.18290.pdf

  • Progress measures for grokking via mechanistic interpretability https://arxiv.org/abs/2301.05217

  • Language Models Represent Space and Time https://arxiv.org/abs/2310.02207

  • GLaM: Efficient Scaling of Language Models with Mixture-of-Experts https://arxiv.org/abs/2112.06905

  • Adam: A Method for Stochastic Optimization https://arxiv.org/abs/1412.6980

  • Efficient Estimation of Word Representations in Vector Space (Word2Vec) https://arxiv.org/abs/1301.3781

  • Distributed Representations of Words and Phrases and their Compositionality https://arxiv.org/abs/1310.4546

GPT

  • Language Models are Few-Shot Learners (GPT-3) https://arxiv.org/abs/2005.14165

  • Language Models are Unsupervised Multitask Learners (GPT-2) https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

  • Improving Language Understanding by Generative Pre-Training (GPT-1) https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf

  • Training language models to follow instructions with human feedback (InstructGPT)https://arxiv.org/pdf/2203.02155.pdf

  • Evaluating Large Language Models Trained on Code https://arxiv.org/pdf/2107.03374.pdf

  • Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond https://arxiv.org/abs/2304.13712

  • Instruction Tuning with GPT-4 https://arxiv.org/pdf/2304.03277.pdf

  • The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) https://arxiv.org/abs/2309.17421

  • Sparks of Artificial General Intelligence: Early experiments with GPT-4 https://arxiv.org/abs/2303.12712

  • Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision https://arxiv.org/abs/2312.09390

开源大模型

  • LLaMA: Open and Efficient Foundation Language Models https://arxiv.org/abs/2302.13971

  • Llama 2: Open Foundation and Fine-Tuned Chat Models https://arxiv.org/pdf/2307.09288.pdf

  • Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality https://lmsys.org/blog/2023-03-30-vicuna/

  • LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset https://arxiv.org/abs/2309.11998

  • Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena https://arxiv.org/abs/2306.05685

  • How Long Can Open-Source LLMs Truly Promise on Context Length? https://lmsys.org/blog/2023-06-29-longchat/

  • Mixtral of experts https://mistral.ai/news/mixtral-of-experts/

  • OpenChat: Advancing Open-source Language Models with Mixed-Quality Data https://arxiv.org/abs/2309.11235

  • RWKV: Reinventing RNNs for the Transformer Era https://arxiv.org/abs/2305.13048

  • Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/ftp/arxiv/papers/2312/2312.00752.pdf

  • Retentive Network: A Successor to Transformer for Large Language Models https://arxiv.org/abs/2307.08621

  • Baichuan 2: Open Large-scale Language Models https://arxiv.org/abs/2309.10305

  • GLM-130B: An Open Bilingual Pre-trained Model https://arxiv.org/abs/2210.02414

  • Qwen Technical Report https://arxiv.org/abs/2309.16609

  • Skywork: A More Open Bilingual Foundation Model https://arxiv.org/abs/2310.19341

微调

  • Learning to summarize from human feedback https://arxiv.org/abs/2009.01325

  • Self-Instruct: Aligning Language Model with Self Generated Instruction https://arxiv.org/abs/2212.10560

  • Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning https://arxiv.org/abs/2303.15647

  • LoRA: Low-Rank Adaptation of Large Language Models https://arxiv.org/abs/2106.09685

  • Vera: Vector-Based Random Matrix Adapation https://arxiv.org/pdf/2310.11454.pdf

  • QLoRA: Efficient Finetuning of Quantized LLMs https://arxiv.org/abs/2305.14314

  • Chain of Hindsight Aligns Language Models with Feedback https://arxiv.org/abs/2302.02676

  • Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models https://arxiv.org/pdf/2312.06585.pdf

性能优化

  • Efficient Memory Management for Large Language Model Serving with PagedAttention (vLLM) https://arxiv.org/abs/2309.06180

  • FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness https://arxiv.org/abs/2205.14135

  • S-LoRA: Serving Thousands of Concurrent LoRA Adapters https://arxiv.org/abs/2311.03285

  • GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism https://proceedings.neurips.cc/paper/2019/file/093f65e080a295f8076b1c5722a46aa2-Paper.pdf

  • Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism https://arxiv.org/pdf/1909.08053.pdf

  • ZeRO: Memory Optimizations Toward Training Trillion Parameter Models https://arxiv.org/pdf/1910.02054.pdf

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/302786.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

一文初步了解slam技术

本文初步介绍slam技术,主要是slam技术的概述,涉及技术原理、应用场景、分类、以及各自优缺点,和slam技术的未来展望。 🎬个人简介:一个全栈工程师的升级之路! 📋个人专栏:slam精进之…

Hyperledger Fabric Java App Demo

编写一个应用程序来连接到 fabrc 网络中,通过调用智能合约来访问账本. fabric gateway fabric gateway 有两个项目,一个是 fabric-gateway-java , 一个是 fabric-gateway。 fabric-gateway-java 是比较早的项目,使用起来较为麻烦需要提供一…

【JaveWeb教程】(12) 一篇文章教你轻松搞定IDEA集成Maven(最详细)

目录 03. IDEA集成Maven3.1 配置Maven环境3.1.1 当前工程设置3.1.2 全局设置 3.2 Maven项目3.2.1 创建Maven项目3.2.2 POM配置详解3.2.3 Maven坐标详解 3.3 导入Maven项目 03. IDEA集成Maven 我们要想在IDEA中使用Maven进行项目构建,就需要在IDEA中集成Maven 3.1 …

Bedrock Base推出Gal 01,一款专为个人用户设计的AI人工智能电脑

Bedrock Base 是一个为高效团队设计的协作工作环境。它的主要用途是帮助团队更快、更好地进行创作和合作。 Bedrock Base 可以对各行各业的团队都有帮助和影响。无论是科技行业、创意行业、媒体行业还是其他行业,团队都可以利用 Bedrock Base 的功能来更高效地组织…

云仓酒庄带大家识破葡萄酒的谣言

在葡萄酒世界里,有的刚入门,有的没入门,于是对于葡萄酒知识一知半解,很容易道听涂说,甚至对一些属于误解或谣言都深信不疑。所以,云仓酒庄有必要给大家辟辟谣。 谣言1:只有红葡萄酒具有陈年潜力…

气缸功能块(SMART PLC梯形图代码)

有关气缸功能块的更多介绍,可以参考下面链接文章: https://rxxw-control.blog.csdn.net/article/details/125459568https://rxxw-control.blog.csdn.net/article/details/125459568CODESYS平台双通气缸功能块 https://rxxw-control.blog.csdn.net/article/details/12544822…

网络字节序与主机字节序

字节序区分 多字节的数值在内存中高低位的排列方式会影响所表示的数值处理方式和显示。字节序以字节为基本单位,表示不同字节的存储顺序。 从存储顺序上区分,可分为大端字节序和小端字节序。从处理上区分,可区分为网络字节序和主机字节序。…

Java面试高招:程序员如何在面试中脱颖而出

Java面试高招:程序员如何在面试中脱颖而出 《Java面试高招:程序员如何在面试中脱颖而出》摘要引言面试经历面试失败的反思 面试技巧侦探式的问题解决无敌铁金刚的坚定决心 参考资料 博主 默语带您 Go to New World. ✍ 个人主页—— 默语 的博客&#x1…

test mutation-03-变异测试 mujava Mutation 入门

拓展阅读 开源 Auto generate mock data for java test.(便于 Java 测试自动生成对象信息) 开源 Junit performance rely on junit5 and jdk8.(java 性能测试框架。性能测试。压测。测试报告生成。) test 系统学习-04-test converate 测试覆盖率 jacoco 原理介绍 Java (muJ…

基于SSM酒店后台管理系统【源码】【最详细运行文档】

基于SSM酒店后台管理系统【源码】【最详细运行文档】 功能简介技术描述运行准备♝项目运行访问项目 演示图✅源码获取 💡 「分享」 大家好,最近几年在酒店后台管理系统非常流行,无论是上课的项目或者是一些毕设都会以酒店后台管理系统举例说…

猫头虎分享已解决Bug || 解决Vue.js not detected的问题 ️

博主猫头虎的技术世界 🌟 欢迎来到猫头虎的博客 — 探索技术的无限可能! 专栏链接: 🔗 精选专栏: 《面试题大全》 — 面试准备的宝典!《IDEA开发秘籍》 — 提升你的IDEA技能!《100天精通Golang》…

任务调度中心

可以服务器配置和权限,分配任务执行。当服务器下线后,任务会被在线服务器接管,当重新上线后会在次执行任务。接管任务的服务器会释放任务。调度过程的实现,可以二次开发。基于 netty tcp 通信开发。 下载地址: http:/…

Android Canvas图层saveLayer剪切clipPath原图addCircle绘制对应圆形区域,Kotlin(2)

Android Canvas图层saveLayer剪切clipPath原图addCircle绘制对应圆形区域,Kotlin(2) 在 Android Canvas图层saveLayer剪切clipRect原图对应Rect区域,Kotlin(1)-CSDN博客 的基础上,把矩形切图&a…

系统架构设计师教程(十)软件可靠性基础知识

软件可靠性基础知识 10.1 软件架构演化和定义的关系10.1.1 演化的重要性10.1.2 演化和定义的关系 10.2 面向对象软件架构演化过程10.2.1 对象演化10.2.2 消息演化10.2.3 复合片段演化10.2.4 约束演化 10.3 软件架构演化方式的分类10.3.1 软件架构演化时期10.3.2 软件架构静态演…

7个PyCharm实用插件实现轻松编程

大家好,IDE(集成开发环境)是开发者的武器,使用一个好的IDE和一些很棒的插件,工作效率会更高。Python是一种广泛使用的编程语言,PyCharm是最受欢迎的Python IDE之一。以下介绍7个PyCharm插件,它们…

最优化理论分析复习--最优性条件(一)

文章目录 上一篇无约束问题的极值条件约束极值问题的最优性条件基本概念只有不等式约束时 下一篇 上一篇 最优化理论复习–对偶单纯形方法及灵敏度分析 无约束问题的极值条件 由于是拓展到向量空间 R n R^n Rn, 所以可由高数中的极值条件进行类比 一阶必要条件 设函数 f (…

小程序如何设置客服

​小程序客服功能可以帮助企业与用户建立更紧密的联系,提供更好的服务体验。本文将介绍如何在小程序中设置客服功能,以及一些提高客服效率和用户满意度的最佳实践。 1. 登录mp.weixin.qq.com,在侧边栏找到功能->客服,支持设置…

基于Java SSM框架实现校园网络维修系统项目【项目源码】

基于java的SSM框架实现校园网络维修系统演示 java简介 Java主要采用CORBA技术和安全模型,可以在互联网应用的数据保护。它还提供了对EJB(Enterprise JavaBeans)的全面支持,java servlet API,JSP(java serve…

docker部署awvs

docker部署awvs cantos部署docker点这里 下载镜像 docker pull xiaomimi8/awvs14-log4j-2022 docker images 查看本地所有镜像启动镜像 docker run -it -d(后台运行) -p(端口映射) 13443(主机端口):3443&…

第一个Java网络爬虫程序

目录 前言第一个Java网络爬虫程序总结 前言 网络爬虫是一种获取互联网信息的技术,它可以模拟浏览器行为,访问网站并提取所需的数据。在这个小Demo中,我们使用Java语言结合HttpClient库实现了一个简单的爬虫程序,用于抓取汽车之家…