【读论文】【泛读】三篇生成式自动驾驶场景生成: Bevstreet, DisCoScene, BerfScene

文章目录

  • 1. Street-View Image Generation from a Bird’s-Eye View Layout
    • 1.1 Problem introduction
    • 1.2 Why
    • 1.3 How
    • 1.4 My takeaway
  • 2. DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis
    • 2.1 What
    • 2.2 Why
    • 2.3 How
    • 2.4 My takeaway
  • 3. BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation(Follow DisCoScene)
    • 3.1 What
    • 3.2 Why
    • 3.3 How
    • 3.4 My takeaway

1. Street-View Image Generation from a Bird’s-Eye View Layout

1.1 Problem introduction

From the title of this paper, we know it bound a relation from Bev(Bird’s-Eye View) to Street view image.

在这里插入图片描述

Concretely, the input (Bev) is a two-dimensional representation of a three-dimensional environment from a top perspective. In the BEV diagram, squares of different colors represent different objects or road features, such as vehicles, pedestrians, lane lines, etc. And green square means an ego vehicle that has three cameras in front.

The task is to generate three street-view images aligned to the Bev according to the relative position among these square objects.

As for the concept of “layout”, it should consider the effects of these factors:

  • Cameras with an overlapping field-of-view (FoV) must ensure overlapping content is correctly shown
  • The visual styling of the scene also needs to be consistent such that all virtual views appear to be created in the same geographical area (e.g., urban vs. rural), at the same time of day, with the same weather conditions, and so on.
  • In addition to this consistency, the images must correspond to the HD
    map, faithfully reproducing the specified road layout, lane lines, and vehicle locations.

1.2 Why

It is the first attempt to explore the generative side of BEV perception for driving scenes.

1.3 How

  1. Methods
    在这里插入图片描述
    As shown in this pipeline, the Bev layout and source images were encoded as an input of the autoregressive transformer collaborating with direction and camera information to help the understanding of space. New mv-images were output.

  2. Experiments
    Three metrics are used.
    在这里插入图片描述
    FID represents the diversity and quality of generated images. Road mIoU and Vehicle mIoU can be used to represent the overlapping to verify the relative position in the Bev inputs.
    Scene edit was achieved by the change of Bev layout:
    在这里插入图片描述

1.4 My takeaway

  1. How to utilize the ability of an autoregressive transformer!!! Why do we use it other than others?
  2. I have known about what is Bev.

2. DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

2.1 What

An editable 3D generative model using object bounding boxes without semantic annotation as layout prior, allowing for high-quality scene synthesis and flexible user control of both the camera and scene objects.

2.2 Why

  • Existing generative models focus on individual objects, lacking the ability to handle non-trivial scenes.

  • Some works like GSN can only generate scenes, without object-level editing. That is because of the lack of explicit object definition in NeRF.

  • GIRAFFE explicitly composites object-centric radiance fields to support object-level control. Yet, it works poorly on mixed scenes due to the absence of proper spatial priors.

  • Interesting refer:

    17: Layout-transformer: Layout generation and completion with self-attention.

    26: Layout-gan: Generating graphic layouts with wireframe discriminators.

    58: Blockplanner: City block generation with vectorized graph representation.

2.3 How

在这里插入图片描述

Bounding boxes as layout priors to generate the objects, combined with the generated background were used in neural rendering. Meanwhile, an extra object discriminator for local discrimination is added, leading to better object-level supervision.

2.4 My takeaway

  1. Is it possible to cancel the manually marked bbox and automatically identify and regenerate the corresponding area in Gaussian?

3. BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation(Follow DisCoScene)

3.1 What

Incorporating an equivariant radiance field with the guidance of a BEV map, this method allows us to produce large-scale, even infinite-scale, 3D scenes via synthesizing local scenes and then stitching them with smooth consistency.

Understood as the superposition of patches in a bev:
在ddd

3.2 Why

  1. Generating large-scale 3D scenes cannot simply apply existing 3D object synthesis techniques since 3D scenes usually hold complex spatial configurations and consist of many objects at varying scales.
  2. Previous approaches often relied on scene graphs, facing limitations in processing due to unstructured topology.
  3. DiscoScene introduces complexity in interpreting the entire scene and
    faces scalability challenges when using Bbox.
  4. BEV maps could specify the composition and scales of objects clearly but lack insights into the detailed visual appearance of the objects. Recent attempts like InfiniCity and SceneDreamer try to avoid the ambiguity of BEV maps, but they are inefficiency.

3.3 How

在这里插入图片描述

To integrate the prior information provided by the BEV map into the radiation field, the researchers introduced a generator U U U, which can generate a 2D feature map based on BEV map conditions. Builder U U U adopts a network structure that combines U-Net architecture and StyleGAN blocks.

3.4 My takeaway

  1. Confused about how to use this U-Net, need some other time to supplement background knowledge. 🤡

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/550421.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

解决QtCreator不能同时运行多个程序的方法

当我们运行QtCreator代码的时候,往往一个代码,可能需要打开好几个运行,但是会出现的情况就是,如果打开了一个界面,当我么再运行的时候,第一个界面就没有了,而且可能会出现终端报错的情况&#x…

虚拟环境下的Pip引用外部环境的解决方法

当你使用新创建的虚拟环境时,测试pip list却显示了一堆自己没有的功能包,这是因为你的环境错乱了,废话不多说直接上解决办法。 设置-》高级系统设置 环境变量 在系统变量部分,Anaconda要求前边没有其余的python环境路径。

开源全方位运维监控工具:HertzBeat

HertzBeat:实时监控系统性能,精准预警保障业务稳定- 精选真开源,释放新价值。 概览 HertzBeat是一款深受广大开发者喜爱的开源实时监控解决方案。它以其简洁直观的设计理念和免安装Agent的特性,实现了对各类服务器、数据库及应用…

vagrant 安装虚拟机,docker, k8s

第一步:安装虚拟机 1、安装 vagrant 本机是 mac, 但是这一步不影响,找对应操作系统的安装方式就行了。 vagrant 下载地址 brew install vagrant 2、下载 VirtualBox 虚拟机 VirtualBox 下载地址 找到对应系统下载,安装就可以。 尽量把…

项目中,如何写 readme.md 文件 | 写项目总结

tips:注意写 1. readme文件:①项目文档(项目需求和设计文档、项目系统架构和技术文档、接口文档)、②项目结构、③启动项目。具体结构见下文。 2. 项目总结:技术栈、描述、主要工作!!需求及功…

Rust面试宝典第4题:打家劫舍

题目 你是一个专业的小偷,计划偷窃沿街的房屋。每间房内都藏有一定的现金,影响你偷窃的唯一制约因素就是相邻的房屋装有相互连通的防盗系统。如果两间相邻的房屋在同一晚上被小偷闯入,系统会自动报警。 给定一个代表每个房屋存放金额的非负整…

多线程传参以及线程的优缺点

进程是资源分配的基本单位 线程是调度的基本单位 笼统来说,线程有以下优点: 创建一个新线程的代价要比创建一个新进程小得多 与进程之间的切换相比,线程之间的切换需要操作系统做的工作要少很多 线程占用的资源要比进程少很多 能充分利用多…

Pytorch手撸Attention

Pytorch手撸Attention 注释写的很详细了,对照着公式比较下更好理解,可以参考一下知乎的文章 注意力机制 import torch import torch.nn as nn import torch.nn.functional as Fclass SelfAttention(nn.Module):def __init__(self, embed_size):super(S…

Sy-linux下常用的网络命令linux network commands

linux下的网络命令非常强大,这里根据教材需要,列出来常用的网络命令和场景实例,供参考。 一、命令列表: Command Description ip Manipulating routing to assigning and configuring network parameters traceroute Identi…

【Java】通过poi给word首页添加水印图片

背景: poi并没有提供直接插入水印图片的方法,目前需要再word的首页插入一张水印图片,于是就需要通过另一种方式,插入透明图片(png格式)并将图片设置为“浮于文字上方”的方式实现该需求。 所需jar&#xf…

Linux解压4GB以上zip文件

Linux使用unzip解压大于4GB文件,会出现以下错误: 解决方法 安装p7zip yum -y install p7zip执行命令: 7za x MSRVTT.zip

Spark-机器学习(2)特征工程之特征提取

在之前的文章中,我们了解我们的机器学习,了解我们spark机器学习中的MLIib算法库,知道它大概的模型,熟悉并认识它。想了解的朋友可以查看这篇文章。同时,希望我的文章能帮助到你,如果觉得我的文章写的不错&a…

HackMyVM-Connection

目录 信息收集 arp nmap WEB web信息收集 dirsearch smbclient put shell 提权 系统信息收集 suid gdb提权 信息收集 arp ┌─[rootparrot]─[~/HackMyVM] └──╼ #arp-scan -l Interface: enp0s3, type: EN10MB, MAC: 08:00:27:16:3d:f8, IPv4: 192.168.9.115 S…

js打印页面源码 ,打印选取的容器里的内容,打印指定内容

js打印页面源码 &#xff0c;打印选取的容器里的内容&#xff0c;打印指定内容 效果 代码 <!DOCTYPE html> <html lang"en"> <head><meta charset"UTF-8"><meta http-equiv"X-UA-Compatible" content"IEedge&…

FreeRTOS时间管理

FreeRTOS时间管理 主要要了解延时函数&#xff1a; 相对延时&#xff1a;指每次延时都是从执行函数vTaskDelay()开始&#xff0c;直到延时指定的时间结束。 绝对延时&#xff1a;指将整个任务的运行周期看成一个整体&#xff0c;适用于需要按照一定频率运行的任务。 函数 vTa…

PTA图论的搜索题

目录 7-1 列出连通集 题目 输入格式: 输出格式: 输入样例: 输出样例: AC代码 7-2 六度空间 题目 输入格式: 输出格式: 输入样例: 输出样例: 思路 AC代码 7-3 地下迷宫探索 题目 输入格式: 输出格式: 输入样例1: 输出样例1: 输入样例2: 输出样例2: 思路 …

MySQL 试图

视图功能在 5.0 以后的版本启用 视图是一张虚表。数据表确实包含了具体数据并且保存到硬盘中的实表。视图使用数据检索语句动态生 成的一张虚表。每一次数据服务重启或者系统重启之后&#xff0c;在数据库服务启动期间&#xff0c;会使用创建视图的语 句重新生成视图中的数据&…

这家物流装备公司突破天际:销售额飙升至10亿美元,引领仓储机器人革命!...

导语 大家好&#xff0c;我是智能仓储物流技术研习社的社长&#xff0c;老K。专注分享智能仓储物流技术、智能制造等内容。 新书《智能物流系统构成与技术实践》 法国的Exotec公司在仓储自动化领域取得了显著成就&#xff0c;其销售额已超过10亿美元&#xff0c;成为全球物料搬…

考研数学|《1800》《1000》《660》《880》如何搭配❓

这几本书都是不同阶段对应的习题册 我觉得最舒服的使用就是方式就是基础阶段用《1800题基础部分》然后强化阶段主要刷《880题》并且强化阶段带着刷《660题》 上面是我的使用方式。之所以没有刷《1000题》是因为这本习题册的难度对我来说还是太大了&#xff0c;并且计算量很大…

上海计算机学会 2023年10月月赛 乙组T3 树的连通子图(树、树形dp)

第三题&#xff1a;T3树的连通子图 标签&#xff1a;树、树形 d p dp dp题意&#xff1a;给定一棵 n n n个结点的树&#xff0c; 1 1 1号点为这棵树的根。计算这棵树连通子图的个数&#xff0c;答案对 1 , 000 , 000 , 007 1,000,000,007 1,000,000,007取余数。题解&#xff1…