Image Captioning using Multimodal space embedding – Implementation

I would like an explanation of the implementation of the paper provided. And how it can be implemented in Tensorflow.

Tags: No tags