Abstract:
Consider a set of n images of a dynamic scene captured using multiple hand-held devices. The order in which these images are captured is unknown. For n images, there can be n! possible arrangements, which makes this problem extremely challenging. In this work, we address the problem of sequencing such a set of unordered images in its temporal order. We propose an LSTM-based deep neural network which addresses this problem in an end-to-end manner. The network takes the set of images as input and outputs their order of capture. We formulate the problem as a sequence-to-sequence mapping task, in which each image is mapped to its position in the ordered sequence. We do not provide any other information to the network apart from the input images. We show that the proposed approach obtains the state-of-the-art results on the standard dataset. Further, we show through experimental results that the network learns better when the target sequence is reversed.