Pipeline for chemical image-to-text competition

BMS-Molecular-Translation

Pipeline for chemical image-to-text competition.

This is a pipeline for Bristol-Myers Squibb – Molecular Translation by Vadim Timakin and Maksim Zhdanov. We got bronze medals in this competition. Significant part of code was originated from Y.Nakama’s notebook

This competition was about image-to-text translation of images with molecular skeletal strucutures to InChI chemical formula identifiers.

InChI=1S/C16H13Cl2NO3/c1-10-2-4-11(5-3-10)16(21)22-9-15(20)19-14-8-12(17)6-7-13(14)18/h2-8H,9H2,1H3,(H,19,20)

Solution

General Encoder-Decoder concept

Most participants used CNN encoder to acquire features with decoder (LSTM/GRU/Transformer) to get text sequences. That’s a casual approach to image captioning problem.

decoder_att

Pseudo-labelling with InChI validation using RDKit

RDKit is an open source toolkit for cheminformatics and it was quite useful while solving the problem. When we trained our first model, it scored around 7-8

 

 

 

To finish reading, please visit source site