Research Papers | Sayan Paul

2024

MPVO: Motion-Prior Based Visual Odometry for PointGoal Navigation

Sayan Paul, Ruddra dev Roychoudhury, and Brojeshwar Bhowmick

In 18th European Conference on Computer Vision (ECCV), 50SFM Workshop, 2024

Abs DOI arXiv

Visual odometry (VO) is essential for enabling accurate point-goal navigation of embodied agents in indoor environments where GPS and compass sensors are unreliable and inaccurate. However, traditional VO methods face challenges in wide-baseline scenarios, where fast robot motions and low frames per second (FPS) during inference hinder their performance, leading to drift and catastrophic failures in point-goal navigation. Recent deep-learned VO methods show robust performance but suffer from sample inefficiency during training; hence, they require huge datasets and compute resources. So, we propose a robust and sample-efficient VO pipeline based on motion priors available while an agent is navigating an environment. It consists of a training-free action-prior based geometric VO module that estimates a coarse relative pose which is further consumed as a motion prior by a deep-learned VO model, which finally produces a fine relative pose to be used by the navigation policy. This strategy helps our pipeline achieve up to 2x sample efficiency during training and demonstrates superior accuracy and robustness in point-goal navigation tasks compared to state-of-the-art VO method(s). Realistic indoor environments of the Gibson dataset is used in the AI-Habitat simulator to evaluate the proposed approach using navigation metrics (like success/SPL) and pose metrics (like RPE/ATE). We hope this method further opens a direction of work where motion priors from various sources can be utilized to improve VO estimates and achieve better results in embodied navigation tasks.
Teledrive: An Embodied AI Based Telepresence System

Snehasis Banerjee, Sayan Paul, Ruddradev Roychoudhury, Abhijan Bhattacharya, Chayan Sarkar, Ashis Sau, Pradip Pramanick, and Brojeshwar Bhowmick

Journal of Intelligent & Robotic Systems, 2024

Abs DOI arXiv

This article presents ‘Teledrive’, a telepresence robotic system with embodied AI features that empowers an operator to navigate the telerobot in any unknown remote place with minimal human intervention. We conceive Teledrive in the context of democratizing remote ‘care-giving’ for elderly citizens as well as for isolated patients, affected by contagious diseases. In particular, this paper focuses on the problem of navigating to a rough target area (like ‘bedroom’ or ‘kitchen’) rather than pre-specified point destinations. This ushers in a unique ‘AreaGoal’ based navigation feature, which has not been explored in depth in the contemporary solutions. Further, we describe an edge computing-based software system built on a WebRTC-based communication framework to realize the aforementioned scheme through an easy-to-use speech-based human-robot interaction. Moreover, to enhance the ease of operation for the remote caregiver, we incorporate a ‘person following’ feature, whereby a robot follows a person on the move in its premises as directed by the operator. Moreover, the system presented is loosely coupled with specific robot hardware, unlike the existing solutions. We have evaluated the efficacy of the proposed system through baseline experiments, user study, and real-life deployment.

2022

DoRO: Disambiguation of Referred Object for Embodied Agents

Pradip Pramanick, Chayan Sarkar, Sayan Paul, Ruddra dev Roychoudhury, and Brojeshwar Bhowmick

IEEE Robotics and Automation Letters, 2022

Abs DOI arXiv

Robotic task instructions often involve a referred object that the robot must locate (ground) within the environment. While task intent understanding is an essential part of natural language understanding, less effort is made to resolve ambiguity that may arise while grounding the task. Existing works use vision-based task grounding and ambiguity detection, suitable for a fixed view and a static robot. However, the problem magnifies for a mobile robot, where the ideal view is not known beforehand. Moreover, a single view may not be sufficient to locate all the object instances in the given area, which leads to inaccurate ambiguity detection. Human intervention is helpful only if the robot can convey the kind of ambiguity it is facing. In this article, we present DoRO (Disambiguation of Referred Object), a system that can help an embodied agent to disambiguate the referred object by raising a suitable query whenever required. Given an area where the intended object is, DoRO finds all the instances of the object by aggregating observations from multiple views while exploring & scanning the area. It then raises a suitable query using the information from the grounded object instances. Experiments conducted with the AI2Thor simulator show that DoRO not only detects the ambiguity more accurately but also raises verbose queries with more accurate information from the visual-language grounding.

2020

Jampacker: An Efficient and Reliable Robotic Bin Packing System for Cuboid Objects

Marichi Agarwal, Swagata Biswas, Chayan Sarkar, Sayan Paul, and Himadri Sekhar Paul

IEEE Robotics and Automation Letters, 2020

Abs DOI PDF

Bin packing using a robotic arm is an important problem in the context of Industry 4.0. In this work, we present a reliable and efficient bin packing system, called Jampacker. We propose a new offline 3D bin packing algorithm, called Jampack that achieves higher packing efficiency in comparison to the state-of-the-art algorithms. Jampack computes placement points, called internal corner points that tries to maximize the utilization of free spaces in-between objects, which are generally ignored by existing algorithms. Additionally, we introduce a fault recovery module (FRM) for the robotic manipulator that helps to achieve a more reliable and efficient packing in a physical container. The FRM module monitors the object placement by the robotic arm, calculates a fault score for the placement, adjusts the placement if required, and also learns & adjusts the offset for the placement procedure on-the-go. We show that this system achieves faster completion of the overall packing process.
Demo: Edge-centric Telepresence Avatar Robot for Geographically Distributed Environment

Ashis Sau, Ruddra Dev Roychoudhury, Hrishav Bakul Barua, Chayan Sarkar, Sayan Paul, Brojeshwar Bhowmick, Arpan Pal, and Balamuralidhar P

In 21st International Conference on Distributed Computing and Networking (ICDCN), 2020

Abs arXiv

Using a robotic platform for telepresence applications has gained paramount importance in this decade. Scenarios such as remote meetings, group discussions, and presentations/talks in seminars and conferences get much attention in this regard. Though there exist some robotic platforms for such telepresence applications, they lack efficacy in communication and interaction between the remote person and the avatar robot deployed in another geographic location. Also, such existing systems are often cloud-centric which adds to its network overhead woes. In this demo, we develop and test a framework that brings the best of both cloud and edge-centric systems together along with a newly designed communication protocol. Our solution adds to the improvement of the existing systems in terms of robustness and efficacy in communication for a geographically distributed environment.