Image Description Model using Multimodal space embedding

I want a code implementation explanation of the paper provided. No report or code needed. No submission. 

Tags: No tags