Latest papers: Difference between revisions

From Robowaifu Institute of Technology
Jump to navigation Jump to search
(Created page with "{{Protip|You can use [https://huggingface.co/sshleifer/distilbart-cnn-12-6 sshleifer/distilbart-cnn-12-6] and [https://scitldr.apps.allenai.org/ SciTLDR] to help with summariz...")
 
Line 6: Line 6:
=== August 2021 ===
=== August 2021 ===


==== Computer vision ====
==== [[Computer vision]] ====
'''[https://arxiv.org/abs/2108.03880 NeuralMVS: Bridging Multi-View Stereo and Novel View Synthesis]'''
'''NeuralMVS: Bridging Multi-View Stereo and Novel View Synthesis''' ([https://arxiv.org/abs/2108.03880 arXiv:2108.03880])


{{tldr|[[Multi-view stereo]] is a core task in 3D computer vision. [[Neural Radiance Fields|NeRF]] methods do not generalize to novel scenes and are slow to train and test. We propose to bridge the gap between these two methodologies with a novel network that can recover 3D scene geometry as a distance function.}}<ref>{{cite|authors=Radu Alexandru Rosu, Sven Behnke|title=NeuralMVS: Bridging Multi-View Stereo and Novel View Synthesis|publication=arXiv:2108.03880|year=2021|month=August}}</ref>
{{tldr|[[Multi-view stereo]] is a core task in 3D computer vision. [[Neural Radiance Fields|NeRF]] methods do not generalize to novel scenes and are slow to train and test. We propose to bridge the gap between these two methodologies with a novel network that can recover 3D scene geometry as a distance function.}}<ref>{{cite|authors=Radu Alexandru Rosu, Sven Behnke|title=NeuralMVS: Bridging Multi-View Stereo and Novel View Synthesis|publication=arXiv:2108.03880|year=2021|month=August}}</ref>
=== June 2021 ===
==== [[Multimodal learning]] ====
'''Multimodal Few-Shot Learning with Frozen Language Models''' ([https://arxiv.org/abs/2108.03880 arXiv:2106.13884])
{{tldr|When trained at sufficient scale, auto-regressive language models exhibit the notable ability to learn a new language task after being prompted with just a few examples. Here, the authors present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language).}}<ref>{{cite|authors=Maria Tsimpoukelli, Jacob Menick, Serkan Cabi, S. M. Ali Eslami, Oriol Vinyals, Felix Hill|title=Multimodal Few-Shot Learning with Frozen Language Models|publication=arXiv:2106.13884|year=2021}}</ref>


=== October 2020 ===
=== October 2020 ===


==== Computer vision ====
==== Computer vision ====
'''[https://arxiv.org/abs/2010.04595 GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering]'''
'''GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering''' ([https://arxiv.org/abs/2010.04595 arXiv:2010.04595])


{{tldr|General Radiance Fields construct an internal representation for each 3D point of a scene from 2D inputs and renders the corresponding appearance and geometry of any 3D scene viewing from an arbitrary angle.}}<ref>{{cite|authors=Alex Trevithick, Bo Yang|title=GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering|publication=arXiv:2010.04595|year=2020|month=October}}</ref>
{{tldr|General Radiance Fields construct an internal representation for each 3D point of a scene from 2D inputs and renders the corresponding appearance and geometry of any 3D scene viewing from an arbitrary angle.}}<ref>{{cite|authors=Alex Trevithick, Bo Yang|title=GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering|publication=arXiv:2010.04595|year=2020|month=October}}</ref>

Revision as of 01:44, 10 August 2021

PROTIP: You can use sshleifer/distilbart-cnn-12-6 and SciTLDR to help with summarizing papers.

This page serves to collect notable research papers within the past two years related to robotics and artificial intelligence. Feel free to add new papers to the list and discuss any papers on the talk page.

Recent papers

August 2021

Computer vision

NeuralMVS: Bridging Multi-View Stereo and Novel View Synthesis (arXiv:2108.03880)

tl;dr Multi-view stereo is a core task in 3D computer vision. NeRF methods do not generalize to novel scenes and are slow to train and test. We propose to bridge the gap between these two methodologies with a novel network that can recover 3D scene geometry as a distance function.[1]

June 2021

Multimodal learning

Multimodal Few-Shot Learning with Frozen Language Models (arXiv:2106.13884)

tl;dr When trained at sufficient scale, auto-regressive language models exhibit the notable ability to learn a new language task after being prompted with just a few examples. Here, the authors present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language).[2]

October 2020

Computer vision

GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering (arXiv:2010.04595)

tl;dr General Radiance Fields construct an internal representation for each 3D point of a scene from 2D inputs and renders the corresponding appearance and geometry of any 3D scene viewing from an arbitrary angle.[3]

Older papers

References

  1. Radu Alexandru Rosu, Sven Behnke. NeuralMVS: Bridging Multi-View Stereo and Novel View Synthesis. arXiv:2108.03880, 2021.
  2. Maria Tsimpoukelli, Jacob Menick, Serkan Cabi, S. M. Ali Eslami, Oriol Vinyals, Felix Hill. Multimodal Few-Shot Learning with Frozen Language Models. arXiv:2106.13884, 2021.
  3. Alex Trevithick, Bo Yang. GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering. arXiv:2010.04595, 2020.