Posts

Showing posts from July, 2022

Surgical tools detection

Image
 This week I was training the Surgical Tools Detection model. The dataset was taken from here . To train the model and prepare it for inference I was using Roboflow. Roboflow  is a robust web-platform for collecting datasets, training models and running them into production. Training I have split the data on 69/21/10 for train, val and test sets. Graphs with training metrics is shown below Train / val metrics To be more precise, on test set the model has 93.8% precision with 90% recall and 96.4% mAP.  It is considered as a high performance and means that the model can be used in production. Inference Roboflow itself provides multiple ways for inference. I could use web-API method (requests are sent to the server and the model sends back predictions), but to make it more flexible, I prefer python-package method. It can be easily imported to the main project of LibreHealth and run.

More experiments with bounding boxes and refactoring

Image
 This week I was mostly doing refactoring of my code and experiments. Refactoring First of all I made the segmentation inference pipeline more felxible by using inheritance for dataset classes and making sure that my onnx runtime works identically for both binary and multi class segmentation models. Then I have committed these changes. Experiments Secondly, this week I was trying to redistort bboxes using a method from OpenCV forum . The idea was simple: I have applied the distortion to a chessboard image, calculated the camera matrix and distortion matrix and then put them to the formula. Chessboard before distortion Chessboard after distortion Then I have applied a technique for camera calibration . And got the following representation: Distorted chessboard with corners After that I had distortion and camera matrix for my images. I have simply hardcoded them and applied to surgical tools dataset. The resulting distortion was not that strong, but coordinates of bounding boxes have cha

Bounding boxes and distortion

 This week I was mainly focused on the ways to distort bounding boxes. Unfortunately, albuminations do not provide the functionality of distorting bounding-boxes. That is why I have had to experiment with different approach. Changing nothing The very first approach I have used was simple: I have just distorted the image, without changing the bounding boxes. And actually, it did work (but for objects that are in the center only).  Distorting coordinates I was using an approach from this post on OpenCV blog. It suggest a way to redistort points. Actually, it is better than changing nothing. Ellipse Another approach was to simply change coordinates to the form of an ellipse (x and y of the center + radii). This approach was proposed in this paper .  In the paper, this approach has beaten the original box method, but I believe that to use it in the project, manual annotations are required.

Inference

Image
This week my main goal was to develop an inference pipeline for segmentation model. To make the inference faster, I have transferred model to ONNX (my plan is to transfer it to TensorRT next). Here  is the notebook with an algorithm to transfer a model from pyTorch to ONNX. The resulting model works as good as the original model does (there is no significant difference in their predictions): Results comparison Function based inference In case if the model is to be run inside another script / application, I have put the model to a python module. Now this module can be used inside any other module and/or function. Also, the ONNX pipeline from the module can be used separately in any part of service or/and program. It makes the recognition more flexible. Script based inference Also I have developed a script, which accepts a directory with input images and constructs a folder with segmentation masks for this folder. I believe that such a script could be useful, if it is needed to mark out

Multi class semantic segmentation in VR

Image
This week I was mostly improving my model for multi class semantic segmentation. Dataset For this task I was using the Cholec8k Segmentation dataset.  This dataset is interesting, because it contains not only the segmentation of organs, but also the data needed for segmentation of surgical tools.  The data in this dataset is split on videos. Totally there are 17 videos, so I decided to make the train-test split based on video ids. For this purpose, I was using 13 videos for training and 4 videos for validation and testing. Model Last week I was experimenting with different models and at the beginning of this week I was sure that DeepLavV3+ should be used. DeepLavV3+ uses a decoder to better segment boundaries of objects. It is an important part of medical images segmentation, since we want to know where the border between organs and surgical tools located. Actually, it is one more reason for me to use DeepLabV3+. For training I was using 2 loss functions: CE and Dice Loss. CE loss is w