3 main challenges for training algorithms in the medical field

The challenges and solutions of the training algorithm

Alamin Musa Magaga
Analytics Vidhya

--

The advancement of artificial intelligence was not limited to e-commerce, chatbots, security, and weather forecasts but also transcended countless healthcare applications and medical concepts.

Artificial intelligence has been widely used in numerous medical applications, such as disease prognosis, diagnosis, surgical robots, drug recommendations, genetic codes, expert systems, and many more.

The use of AI, or artificial intelligence, has redefined and modified the present standard of medical practice by simplifying medical concepts and techniques, thereby reducing time-consuming processes and stress for both patients and healthcare practitioners.

Artificial intelligence in healthcare is the used of complex algorithm and software to emulate human cognition in comprehension, analysis and interpretation of complicated medical and healthcare data---wikipedia

This turning-point application of AI doesn’t come in handy without passing through challenging phases, especially during the training and inception of the algorithm.

Challenges for training algorithm in AI

  1. Class imbalance
  2. Multi-task
  3. Data size

1. Class imbalance

Note: mass is the damaged tissue or lesion seen in an x-ray greater than 3cm in diameter

Class imbalance is the unequal number of diseases and non-diseases in the medical dataset, or the unequal examples of mass and normal in the dataset.

In class imbalance there may be more examples of people without the disease(normal) than people with a disease in the datasets because there are more frequencies of examples of non-disease than a disease in the real world, So due to few examples of disease in the dataset, the model will predict poorly, misclassify disease or predict the low probability of diseases which is fatal in the medical field.

This challenge of class imbalance can be solved by:

  • Resampling method
  • Weighted loss

Resampling the dataset

The idea behind resampling is to reorganize and resample the dataset in order to have an equal number of examples of disease and non-disease in the dataset. We can achieve this by resampling parts of the examples of disease and non-disease, or mass and normal. In resampling, part of the mass and normal may be oversampled or undersampled

In the table above, there are more examples of normal, so we oversampled the mass to equalize the dataset.

Weighted loss

The problem of class imbalance can be traced with the loss function which is called binary crossentropy loss, it measures the error between the output probability and the desired label and also measures the performance of the classification model whose output is between 0 and 1.

Finding the total loss function

Normal:

Loss=-log(1–0.5)=0.3

Loss=0.3

Total loss from Normal examples=loss x no of normal examples

=0.3 x 6

=1.8

Mass(disease):

Loss=-log(1–0.5)=0.3

Loss=0.3

Total lost from Mass examples=loss x no of mass examples

=0.3 x 2

=0.6

From this calculation, the total loss from normal contributes more to the overall dataset than the total loss from mass, so we solve this problem by modifying the loss function.

Modified loss function:

Using modified lost function:

Calculating the weighted loss:

Wp=number of negative/total number

Wn=number of positive/total number

Wp=6/8

Wn=2/8

Normal:

L=0.3 x wn=0.3 x 2/8

L=0.075

Total loss from normal=0.075 x 6

=0.45

Mass(disease):

L=0.3 x wp=0.3 x 6/8

L=0.255

Total loss from Mass=0.255 x 2

=0.45

From these calculations the total loss from both the Mass and Normal are equalized, thus solving the class imbalance problem

2. Multi-task challenge

We can classify the presence or absence of multiple diseases with a single model or algorithm, this process is called multi-task, in multi-tasking, we used a single model to predict multiple diseases.

We used the same method in solving the issues of class imbalance in multi-task with the resampling method and we can also modify the loss function of the classes as we did earlier but with a different formula

Modified loss function

3. Datasize challenge

In the case of the small dataset, let's say we don’t have enough images like 100 hundred thousand or million of images, we can apply some techniques to add more images to the model.

We can apply some transformation to the images like to:

  • rotate the images
  • Translate the images sideway
  • Change the brightness of the images
  • Resizing the images
  • and zooming in on the images

Applying these changes is called data augmentation.

The modified images will be passed into the model in order to have more images and increase the complexity of the model.

With the application of augmentation, we will have more images in the model, which will help us eliminate underfitting.

https://link.medium.com/SW9Ubz06P8

--

--

Alamin Musa Magaga
Analytics Vidhya

Data Scientist | Developer | Embedded System Engineer | Zindi Ambassador | Omdena Kano Lead | Youth Opportunities Ambassador | CTO YandyTech