Jacob Braun, Michael Hsieh, Samir Fouissi, Anthony Bilic, Liqian Wang, Chen Chen


Necropsy, Whole Slide Image, Digital Pathology


Pathologists diagnose biopsy samples with a stained specimen on the glass slide through a microscope. The entire specimen can be stored as a Whole Slide Image (WSI) for further analysis. However, managing and manually diagnosing hundreds of images is time-consuming and requires specific expertise. As a result, there is extensive ongoing research for computer-aided diagnosis of these digitally acquired pathology images. Deep learning has gained significant attention for its effectiveness with disease classification and segmentation of cancer cells from pathology images. For deep learning, a large number of annotated images are needed to build a robust and accurate model. However, there is a scarcity of a large number of annotated public images to validate and build a new model based on pathology images. To combat this limitation, we are introducing a public dataset where a large number of histopathology WSIs available from cadavers containing tissues of multiple organs such as lung, kidney, liver, pancreas, etc. We extract patches from each of the WSIs while discarding the white spaces in the slide. Later, we use the ImageNet model to train the model based on our processed dataset and classify patches from the WSI. Included in this paper is access to the full ~1700 WSIs with accurate labels by trained pathologists. Our dataset can be used as a benchmark dataset for training and validating deep learning models which contain a large number of WSIs with millions of extracted patches representative of 15-20 organ classes.

Date Created


Release Date


Document Type