ANSEL Photobot: A Robot Event Photographer with Semantic Intelligence
- Resource Type
- Conference
- Authors
- Rivkin, Dmitriy; Dudek, Gregory; Kakodkar, Nikhil; Meger, David; Limoyo, Oliver; Jenkin, Michael; Liu, Xue; Hogan, Francois
- Source
- 2023 IEEE International Conference on Robotics and Automation (ICRA) Robotics and Automation (ICRA), 2023 IEEE International Conference on. :8262-8268 May, 2023
- Subject
- Robotics and Control Systems
Photography
Measurement
Adaptation models
Visualization
Semantics
Streaming media
Planning
- Language
Our work examines the way in which large language models can be used for robotic planning and sampling in the context of automated photographic documentation. Specifically, we illustrate how to produce a photo-taking robot with an exceptional level of semantic awareness by leveraging recent advances in general purpose language (LM) and vision-language (VLM) models. Given a high-level description of an event we use an LM to generate a natural-language list of photo descriptions that one would expect a photographer to capture at the event. We then use a VLM to identify the best matches to these descriptions in the robot's video stream. The photo portfolios generated by our method are consistently rated as more appropriate to the event by human evaluators than those generated by existing methods.