GSoC week 2: developing a model for semantic segmentation

 This week I was developing models for medical images segmentation in VR.


Data

I was working with two datasets: binary semantic segmentation dataset and multi class semantic segmentation dataset.

The binary segmentation was done using Polyp Segmentation in Colonoscopy data. This dataset contains 300 images with binary masks. 

The multi class segmentation was done using Cholec8K dataset. This dataset contains 8080 laparoscopic cholecystectomy image frames  from 17 videos from Cholec80 dataset. 


Model and training

For both segmentation tasks I was using the DeepLabV3Plus model with ResNet50 encoder, pretrained on imaginet data set. 

For polyp segmentation I was using 80% of data for training and 20% for validation-testing. In case of Cholec8k I decided to make the train-test-split based on videos. 12 videos were taken for training and 5 for testing.

For training I was using the open source Catalyst framework and SMP. It reduced the time required for building the training pipeline.

Polyp segmentation

Results obtained by the original model


The original model performed very well. After 25 epoch of training with Dice, IoU and BCE losses, it has achieved the IoU of 0.97. It is a fantastic result!

Then I have added the distortion to the mask and image pre-processing.



Images after distortion



Model performance slightly degraded after applying distortion. It has dropped from 0.97 IoU down to 0.93.  Such a small change could be related to the simplicity of task. I assumed that for multi class segmentation task the relative change in performance will be greater.

P.S my experiment can be found here 

Cholec8k segmentation


Multi class segmentation is a more complicated problem. First of all, I had to transfer a 3-channel mask with class ids to separate masks. An easy solution is to simply use the to_categorical function, however, the class ids in the docs and ids on images did not match. That is why I had to reindex classes first.

Experiment has shown that after few epoch the original model can achieve F-score of 0.8 and IoU of 0.7. I believe that configuring the hyper parameters it is possible to achieve even higher results, but since the aim of this experiment is to check whether it is possible to adapt domains for VR, I think the shown result is ok.

The I had to add distortion to images. However, it was important to apply it AFTER splitting the image on classes, because adding distortion could change pixel values a bit. 

Resulting model has shown 7-10% degradation in IoU and F scores, however it simply was learning slower. I believe that given more time, the model could make the difference less.


Here is my experiment


Further work


My further work is to validate models again and try tuning them to improve performance

Comments

Popular posts from this blog

Summing up my GSoC experience

How can we "simulate" VR?

Surgical tools detection