Member-only story
Introduction
Object Pose Estimation (OPE) is a typical task in computer vision that consists of identifying specific object instances in an image and determining each object’s position and orientation relative to some coordinate system. The pose of an object in a 3D scene can be described by means of rotation and translation transformation.

However, using this type of fixed 3D orientation representation can cause pose ambiguities due to symmetries. An overview of the different causes of pose ambiguities is shown in the figure below.

State-of-the-Art Overview
In literature, 6D object pose estimation algorithms from a single RGB image can be divided into three main categories using traditional image processing techniques or deep learning:

Direct Approaches
- These techniques predict the 6D object pose directly from the color images. However, this can be difficult for a Convolutional Neural Network (CNN) to achieve as it has to learn all the geometrical knowledge from the training dataset. In contrast to the other deep learning-based approaches which require information about the 3D model of the object during inference.

- Example:
→ Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes.
PnP Based Approaches
- In these methods, a feature extraction block is used to detect 2D key points of the object in the image first. Then from the CAD model, the corresponding 3D coordinates can be…