Accurate and Reliable Classification of Disc Degeneration Ba
Now open: Certificate Course in Management of Covid-19 by Govt. Of Gujarat and PlexusMDKnow more...Now open: Certificate Course in Management of Covid-19 by Govt. Of Gujarat and PlexusMDKnow more...
Although magnetic resonance imaging-based formalized grading schemes for intervertebral disc degeneration offer improved reproducibility compared with purely subjective ratings, their intrarater, and interrater reliability are not nearly good enough to be able to detect small to medium effects in clinical longitudinal studies. Thus the study was aimed to develop a method that enables automatic and therefore reproducible and reliable evaluation of disc degeneration based on conventional clinical image data and Pfirrmann's grading scheme.

Researchers proposed a classifier based on a deep convolutional neural network that they trained on a large, manually evaluated data set of 1599 patients (7948 intervertebral discs).To improve upon the status quo, researchers focused on the quality of the training data and performed extensive hyperparameter optimization. They assessed the potential benefits of optimizing loss functions beyond common cross-entropy loss, such as soft kappa loss, ordinal cross-entropy loss, or regression losses. During model development and hyperparameter optimization, a fixed 90%/10% training/validation set split was used. To estimate real-world prediction performance, 10-fold cross-validation was performed.

The evaluated image data results in a Gaussian degeneration grade distribution, and thus grades 1 and 5 are slightly underrepresented in the training set. The default cross-entropy–based classifier achieves reliability of = 0.92 (Cohen), an average sensitivity of 90.2%, and an average precision of 92.5%. In 99.2% of validation cases, the network's prediction deviates at most 1 Pfirrmann grades from the ground truth. Framed as an ordinal regression problem, the mean absolute error between the ground truth and the prediction is 0.08 Pfirrmann grade with a correlation of r = 0.96. The results of the 10-fold cross-validation confirm those performance estimates, indicating no substantial overfitting. More sophisticated loss functions, class-based loss weighting, or class pooling did not lead to improved classification performance overall.

Conclusively, With reliability of > 0.9, the system clearly outperforms average human interrater as well as intrarater reliability. With an average sensitivity of more than 90%, the classifier also surpasses state-of-the-art machine learning solutions for automatically grading disc degeneration.

Journal: Investigative Radiology
1 share