Ⅰ. Introduction
Vessel analysis on retinal pictures can help to identify health-related problems at early stages.1) More specifically, visual impairments such as diabetic retinopathy or age-related macular degeneration (AMD) can be deduced from retinopathy analysis of the macular vessels as it may show occlusion or hemorrhage of the vessels.2, 3) Even though this method is generally reliable, finding occlusions of blood vessels in patients with diabetes or hypertension from fundus photographs is difficult – similarly, there are limitations in confirming the morphology of micro-vessels such as neovascularization as compared with general blood vessels. Accurate analysis of retinal vessel structure with the aim of early diagnosis of retinal disease is therefore an important area of research with a wide range of applications in practice.4)
One of the first automatic processing algorithms was developed in 2002 by Walter et al. who implemented an algorithm that collected common features from patients with diabetic retinopathy to extract exudates in retinal images. The algorithm extracted morphological characteristics of the exudate image that are common in diabetic retinopathy.5) Nearest-neighbor-classifiers were then developed to distinguish the blood vessels from the retinal image, which further improved the accuracy by subdividing and analyzing the features of the blood vessel image.6-8)
For the purpose of vessel segmentation, Wang et al. implemented computer vision techniques in 2000.9) These showed reasonably high accuracy on test retinal images; however reliable extraction of branch vessels such as microvascular or neovascular vessels required more advanced algorithm architectures that were also able to distinguish between normal vessels and abnormal vessels.10,11) Recently, Artificial Neural Networks (ANNs) have experienced a renaissance in image processing and analysis – ANNs are a class of algorithms that simulates the way a human brain processes patterns, being able to robustly learn categories from large amounts of data. Convolutional Neural Networks (CNNs), in particular, are a special class of ANNs that have received a lot of attention for their performance in image interpretation tasks, Fu et al. in 2016 developed such a model for the task of vessel segmentation, showing impressive segmentation accuracy.12)
Current CNNs architectures usually operate in a patch-based fashion: small patches are extracted from the image and each pixel in each patch is classified whether it belongs to a blood vessel or not. Since the patch size needs to be small in order for robust learning, this means that such an architecture takes a long time both during training, and more importantly, also during testing. In the present paper, we try to overcome this problem and develop a novel, deep-learning-based architecture for vessel segmentation called DirectNet.
Ⅱ. Methods
DirectNet is a fully convolutional neural network that approaches retinal image segmentation as an image-to-image translation task. In a traditional feedforward CNN, data flows through the network continuously in one direction from the top to the bottom layer. In contrast, we propose the use of recurrent structures to build a compact, yet sufficiently complex model. An architecture that allows for very fast analysis, while maintaining accuracy is developed in this study.
Our network consists of a pyramid-shaped stack of recurring blocks of convolutional layers as depicted in Figure 1b. Data flows through the network in 4 stages, being processed repeatedly by 4 distinct blocks. Each block consists of a set of convolutional layers. At stage one, the input image is processed by block 1 (depicted as red block in Figure 1b). Outputs are then passed on to block 2 (green) but also fed back into block 1. At the next stage, outputs of block 1 are again given directly to block 1 and outputs of block 2 are given to block 3 (blue). Outputs of block 1 and 2 are then merged and then passed to block 2. This process continues in the same fashion through stages 3 and 4. Finally, the results of all individual blocks on stage 4 are combined into a joint prediction. The final output is a vessel probability map of the same size as the input image.
Most current CNN models use the combination of small 3×3 convolutional kernels and pooling layers to reduce model parameters14). Since the DirectNet architecture presented here relies on a rather shallow network design with fewer layers, it does not contain any pooling layers. Instead, the required receptive field size is achieved by employing larger kernels sizes (5×5, 7×7, 15×15) combined with the aforementioned recurrent structure of the network. This structure allows the storage and propagation of information across the image. The increase in computational complexity, introduced by larger kernels can be mitigated by the use of depthwise separable convolutions as proposed by Chollet15). This method can reduce computation time by more than 30% compared to general convolutions by separating spatial convolution from the convolution across image channels.
Ⅲ. Results
The proposed DirectNet model for retinal vessel segmentation was implemented using the Keras library with a Tensorflow backend and evaluated on the DRIVE dataset, which is the most commonlyused dataset for vessel segmentation6). The testing procedure followed the common methodology of selecting the annotations of the first human observer as ground truth (annotations of the second observer are usually only used to study human performance). The DRIVE dataset contains 40 fundus images, split into 20 images for training and 20 images for testing. All images were cropped to an input size of 584 × 565 pixels.
To compare the DirectNet architecture to a standard, state-of-the-art method, another patchbased CNN was trained on DRIVE to serve as a baseline. For this a publicly available implementation* based on the U-Net architecture13) was used. In the DirectNet architecture, the total number of parameters was 273,668 (see Table 1), whereas the U-Net implementation had 517,666 parameters. All experiments were run on the Intel Core i7 processor with 16Gb RAM and a GeForce GTX 1080Ti graphics card.
Performance of the vessel segmentation can be described with several metrics including the F1-score, accuracy, sensitivity, specificity, precision, and the area under the ROC curve. A definition of these metrics is given in (1).
, where, TP, TN, FP, and FN stand for true positive, true negative, false positive, and false negative classification, respectively. The area under the ROC curve was calculated using the standard implementation provided in the Python library. Segmentation results in form of probability maps were converted to binary images by a fixed threshold of 0.5. Since threshold values may change the results, the automatic Otsu threshold selection method was also tried16), but did not obtain better results.Figure 2
As shown in Table 2, all parameters obtained from DirectNet were relatively higher than the patch-based CNN method. In particular, the F1 score for predictive evaluation of 0.8124 compared to 0.7653 showed a strong increase in performance (note, that the format of the training and testing paradigm, which is used in the field so far prohibits the use of multiple dataset splits, such that statistical tests or Bland-Altman plots cannot be run). This was mostly driven by an increase in sensitivity, meaning the algorithm’s ability to accurately detect a pixel of a vessel. Additionally, DirectNet showed a significant speed-up compared to the U-Net architecture: training of the patch-based CNN took 8 hours, whereas DirectNet took only 1 hour. Similarly, during testing, U-Net took 1 hour to process all patches of one retinal image, whereas DirectNet finished the same task in 6 seconds – a strong speed-up.
Figure 3 shows the qualitative evaluation of segmentation results on the first 4 images in DRIVE. The original retinal photos are shown in the first column. The second and third columns show human annotations and the developed vessel segmentation maps, respectively.
Finally, DirectNet was also compared to another recent deep-learning method suggested by Liskowski et al.17). This method required 8 hours of training time on 400,000 sample patches extracted from the 20 training images in DRIVE. Both methods achieved comparable accuracy (0.9535 vs 0.9538) and virtually identical ROC performance (AUC was 0.9790 for DirectNet, compared to 0.9733 for Liskowski et al.). Importantly, at test time DirectNet was still more than 15 times faster than the other approach, which took 92 seconds per image (see Table 2 for all results).
Ⅳ. Conclusion
This study presents a novel method for retinal blood vessel segmentation that is time- and memory-efficient while providing high segmentation accuracy. The proposed recurrent DirectNet architecture provides a compact network architecture (low parameter count) that does not require patch-based scanning techniques or any post-processing steps. It is able to predict a segmentation image by operating directly on the image without prior upor downsampling steps as necessary in other approaches. DirectNet was benchmarked against two other state-of-the-art methods on the DRIVE dataset, yielding or surpassing state-of-the-art performance in terms of accuracy, sensitivity, and specificity. Importantly for practical implementations, however, the proposed DirectNet architecture is at least one order of magnitude faster than traditional patch-based CNNs.
Vessel segmentation is only the first step in an automatic analysis pipeline that can be implemented in clinical practice. In the future, our goal is to derive features based on the segmented vessels that can be helpful in diagnosing certain types of retinopathies such as edema, or early signs of agerelated macular degeneration (AMD). However, for specialized diagnostic tasks based on retinal images a large amount of training data will be required, going far beyond currently available datasets such as DRIVE. Especially in these cases, efficient architectures, like the proposed DirectNet, will be necessary for training on large datasets and in clinical application use cases.