Latest papers: Difference between revisions

Revision as of 22:10, 11 August 2021

This page requires expansion!
This page needs papers! Papers for creating robowaifus!

This page serves to collect notable research papers within the past two years related to robotics and artificial intelligence. Feel free to add new papers to the list and discuss any papers on the talk page.

Recent papers

PROTIP: You can use sshleifer/distilbart-cnn-12-6 and SciTLDR to help with summarizing papers. Check the paper template for usage instructions.

August 2021

Computer vision

NeuralMVS: Bridging Multi-View Stereo and Novel View Synthesis (arXiv:2108.03880)

tl;dr Multi-view stereo is a core task in 3D computer vision. NeRF methods do not generalize to novel scenes and are slow to train and test. We propose to bridge the gap between these two methodologies with a novel network that can recover 3D scene geometry as a distance function.^[1]

Simulation

iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks (arXiv:2108.03272)

tl;dr iGibson 2.0 is a novel simulation environment using Bullet that supports the simulation of a more diverse set of household tasks through three key innovations. Firstly, it supports object states, including temperature, wetness level, cleanliness level, and toggled and sliced states, necessary to cover a wider range of tasks. Second, it implements a set of predicate logic functions that map the simulator states to logic states like Cooked or Soaked. Third, the simulator can sample valid physical states that satisfy a logic state. This functionality can generate potentially infinite instances of tasks with minimal effort from the users.^[2]

July 2021

Audio processing

SoundStream: An End-to-End Neural Audio Codec (arXiv:2107.03312)

tl;dr A novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs. SoundStream at 3kbps outperforms Opus at 12kbps and approaches EVS at 9.6kbps.^[3]

June 2021

Multimodal learning

Multimodal Few-Shot Learning with Frozen Language Models (arXiv:2106.13884)

tl;dr When trained at sufficient scale, auto-regressive language models exhibit the notable ability to learn a new language task after being prompted with just a few examples. Here, the authors present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language).^[4]

Optimizers

A Generalizable Approach to Learning Optimizers (arXiv:2106.00958)

tl;dr Learning to update optimizer hyperparameters instead of model parameters directly using novel features, actions, and a reward function.^[5]

October 2020

Computer vision

GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering (arXiv:2010.04595)

tl;dr General Radiance Fields construct an internal representation for each 3D point of a scene from 2D inputs and renders the corresponding appearance and geometry of any 3D scene viewing from an arbitrary angle.^[6]

September 2020

Summarization

Learning to Summarize with Human Feedback (arXiv:2009.01325)

tl;dr Human feedback models outperform much larger supervised models and reference summaries on TL;DR.^[7]

Older papers

List of 2018 papers

References

↑ Radu Alexandru Rosu, Sven Behnke. NeuralMVS: Bridging Multi-View Stereo and Novel View Synthesis. arXiv:2108.03880, 2021.
↑ Chengshu Li, Fei Xia, Roberto Martín-Martín, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, Andrey Kurenkov, Karen Liu, Hyowon Gweon, Jiajun Wu, Li Fei-Fei, Silvio Savarese. iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks. arXiv:2108.03272, 2021.
↑ Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi. SoundStream: An End-to-End Neural Audio Codec. arXiv:2107.03312, 2021.
↑ Maria Tsimpoukelli, Jacob Menick, Serkan Cabi, S. M. Ali Eslami, Oriol Vinyals, Felix Hill. Multimodal Few-Shot Learning with Frozen Language Models. arXiv:2106.13884, 2021.
↑ Diogo Almeida, Clemens Winter, Jie Tang, Wojciech Zaremba. A Generalizable Approach to Learning Optimizers. arXiv:2106.00958, 2021.
↑ Alex Trevithick, Bo Yang. GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering. arXiv:2010.04595, 2020.
↑ Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano. Learning to Summarize with Human Feedback. arXiv:2009.01325, 2020.

[1] Radu Alexandru Rosu, Sven Behnke. NeuralMVS: Bridging Multi-View Stereo and Novel View Synthesis. arXiv:2108.03880, 2021.

[2] Chengshu Li, Fei Xia, Roberto Martín-Martín, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, Andrey Kurenkov, Karen Liu, Hyowon Gweon, Jiajun Wu, Li Fei-Fei, Silvio Savarese. iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks. arXiv:2108.03272, 2021.

[3] Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi. SoundStream: An End-to-End Neural Audio Codec. arXiv:2107.03312, 2021.

[4] Maria Tsimpoukelli, Jacob Menick, Serkan Cabi, S. M. Ali Eslami, Oriol Vinyals, Felix Hill. Multimodal Few-Shot Learning with Frozen Language Models. arXiv:2106.13884, 2021.

[5] Diogo Almeida, Clemens Winter, Jie Tang, Wojciech Zaremba. A Generalizable Approach to Learning Optimizers. arXiv:2106.00958, 2021.

[6] Alex Trevithick, Bo Yang. GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering. arXiv:2010.04595, 2020.

[7] Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano. Learning to Summarize with Human Feedback. arXiv:2009.01325, 2020.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

@@ Line 12: / Line 12: @@
 {{paper|title=iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks|url=http://svl.stanford.edu/igibson/|tldr=iGibson 2.0 is a novel simulation environment using [[Bullet]] that supports the simulation of a more diverse set of household tasks through three key innovations. Firstly, it supports object states, including temperature, wetness level, cleanliness level, and toggled and sliced states, necessary to cover a wider range of tasks. Second, it implements a set of predicate logic functions that map the simulator states to logic states like Cooked or Soaked. Third, the simulator can sample valid physical states that satisfy a logic state. This functionality can generate potentially infinite instances of tasks with minimal effort from the users.|publication=arXiv:2108.03272|authors=Chengshu Li, Fei Xia, Roberto Martín-Martín, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, Andrey Kurenkov, Karen Liu, Hyowon Gweon, Jiajun Wu, Li Fei-Fei, Silvio Savarese|year=2021}}
+=== July 2021 ===
+==== [[Audio processing]] ====
+{{paper|title=SoundStream: An End-to-End Neural Audio Codec|tldr=A novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs. SoundStream at 3kbps outperforms Opus at 12kbps and approaches EVS at 9.6kbps.|authors=Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi|year=2021|publication=arXiv:2107.03312|url=https://arxiv.org/abs/2107.03312}}
 === June 2021 ===

Latest papers: Difference between revisions

Revision as of 22:10, 11 August 2021

Contents

Recent papers

August 2021

Computer vision

Simulation

July 2021

Audio processing

June 2021

Multimodal learning

Optimizers

October 2020

Computer vision

September 2020

Summarization

Older papers

References

Navigation menu

Latest papers: Difference between revisions

Revision as of 22:10, 11 August 2021

Recent papers

August 2021

Computer vision

Simulation

July 2021

Audio processing

June 2021

Multimodal learning

Optimizers

October 2020

Computer vision

September 2020

Summarization

Older papers

References

Navigation menu

Search