Computed tomography (CT) of the head is used worldwide to diagnose neurologic emergencies. However, expertise is required to interpret these scans, and even highly trained experts may miss subtle life-threatening findings. For head CT, a unique challenge is to identify, with perfect or near-perfect sensitivity and very high specificity, often small subtle abnormalities on a multislice cross-sectional (3D) imaging modality that is characterized by poor soft tissue contrast, low signal-to-noise using current low-radiation-dose protocols, and a high incidence of artifacts.

We view the task as a semantic segmentation problem and tackle it with a patch-based fully convolutional network (PatchFCN). To develop the model, we collected a dataset of 4396 head CT scans performed at University of California at San Francisco and affiliated hospitals, and compared the algorithm’s performance to that of 4 American Board of Radiology (ABR) certified radiologists on an independent test set of 200 randomly selected head CT scans. Our algorithm demonstrates the highest accuracy to date for this clinical application, with a receiver operating characteristic (ROC) area under the curve (AUC) of 0.991 ± 0.006 for identification of exams positive for acute intracranial hemorrhage, and also exceeded the performance of 2 of 4 radiologists. We demonstrate an end-to-end network that performs joint classification and segmentation with exam-level classification comparable to experts, in addition to robust localization of abnormalities including some that are missed by radiologists, both of which are critically important elements for this application. Furthermore, we demonstrate promising multiclass segmentation and detection results competitive with the state-of-the-art in an exploratory study.

Finally, we study how to scale up the data without naive labeling by building a cost-sensitive active learning system. Our method compares favorably with the state-of-the-art, while running faster and using less memory. The approach is inspired by observing that the labeling time could vary greatly across examples, we model the labeling time and optimize the return on investment. We validate this idea by core-set selection and by collecting new data from the wild. Our method shows good estimation of human annotation time and clear performance gain under fixed annotation budget.




Download Full History