Journal Search Engine

ISSN : 1229-6457(Print)
ISSN : 2466-040X(Online)

The Korean Journal of Vision Science Vol.20 No.2 pp.151-159
DOI : https://doi.org/10.17337/JMBI.2018.20.2.151

Retinal Vessel Detection Using Deep Learning: A novel DirectNet Architecture

Hyeongsuk Ryu, Hyeongjun Moon, Björn Browatzki, Christian Wallraven

Dept of. Brain and Cognitive Engineering, Korea University

Address reprint request to Christian Wallraven
Korea University, Anam-Dong, Sungbuk-Ku, Seoul 136-701, Korea
TEL: +82-2-3290-5925, E-mail: christian.wallraven@gmail.com

Received May 10, 2018 Review June 19, 2018 Accepted June 19, 2018

Abstract

Purpose :

The aim of this study is to develop a novel deep learning system for vessel segmentation of retinal images. We present a recurrent Convolutional Neural Network (CNN) architecture and compare performance with existing CNN approaches, showing greatly reduced processing time with excellent performance.

Methods :

The proposed DirectNet architecture is composed of blocks, with each block containing a collection of convolutional layers. Blocks are stacked up in a pyramid, such that the number of blocks is increased by one at each level. Data are repeatedly processed by each block and combined with outputs of other blocks. This recurrent structure combined with the use of large kernel avoids the need for up- or downsampling layers, thus creating a direct pixel-to-pixel mapping from pixel inputs to the outputs of segmentation.

Results :

DirectNet provides higher accuracy, sensitivity, specificity, and precision values compared to a state-of-the-art, patch-based CNN approach (0.9538 vs 0.9327, 0.7851 vs 0.7346, 0.9782 vs 0.9730, 0.8458 vs 0.7987). Training time on a standard dataset for DirectNet is reduced from 8 hours to 1 hour, and testing time per image is greatly reduced from 1 hour for the patch-based method to 6 seconds for our method.

Conclusion :

The proposed deep-learning architecture is eight times faster for training and 600 times faster for testing at slightly higher accuracy values than a state-of-the-art method. Segmentation successfully highlights retinal blood vessels of large down to small sizes.

Key Words : Machine learning , Deep learning , Retinal vessel detection

딥러닝을 이용한 망막혈관 추출: 새로운 DirectNet 구조

유형석, 문형준, Björn Browatzki, Christian Wallraven

고려대학교 일반대학원 뇌공학과

This article has been cited by 0 article in crossref

Cited-By

Funding:

Institute for Information & Communications Technology Promotion

Ⅰ. Introduction

Vessel analysis on retinal pictures can help to identify health-related problems at early stages.¹⁾ More specifically, visual impairments such as diabetic retinopathy or age-related macular degeneration (AMD) can be deduced from retinopathy analysis of the macular vessels as it may show occlusion or hemorrhage of the vessels.^{2, 3)} Even though this method is generally reliable, finding occlusions of blood vessels in patients with diabetes or hypertension from fundus photographs is difficult – similarly, there are limitations in confirming the morphology of micro-vessels such as neovascularization as compared with general blood vessels. Accurate analysis of retinal vessel structure with the aim of early diagnosis of retinal disease is therefore an important area of research with a wide range of applications in practice.⁴⁾

One of the first automatic processing algorithms was developed in 2002 by Walter et al. who implemented an algorithm that collected common features from patients with diabetic retinopathy to extract exudates in retinal images. The algorithm extracted morphological characteristics of the exudate image that are common in diabetic retinopathy.⁵⁾ Nearest-neighbor-classifiers were then developed to distinguish the blood vessels from the retinal image, which further improved the accuracy by subdividing and analyzing the features of the blood vessel image.^6-8)

For the purpose of vessel segmentation, Wang et al. implemented computer vision techniques in 2000.⁹⁾ These showed reasonably high accuracy on test retinal images; however reliable extraction of branch vessels such as microvascular or neovascular vessels required more advanced algorithm architectures that were also able to distinguish between normal vessels and abnormal vessels.^10,11) Recently, Artificial Neural Networks (ANNs) have experienced a renaissance in image processing and analysis – ANNs are a class of algorithms that simulates the way a human brain processes patterns, being able to robustly learn categories from large amounts of data. Convolutional Neural Networks (CNNs), in particular, are a special class of ANNs that have received a lot of attention for their performance in image interpretation tasks, Fu et al. in 2016 developed such a model for the task of vessel segmentation, showing impressive segmentation accuracy.¹²⁾

Current CNNs architectures usually operate in a patch-based fashion: small patches are extracted from the image and each pixel in each patch is classified whether it belongs to a blood vessel or not. Since the patch size needs to be small in order for robust learning, this means that such an architecture takes a long time both during training, and more importantly, also during testing. In the present paper, we try to overcome this problem and develop a novel, deep-learning-based architecture for vessel segmentation called DirectNet.

Ⅱ. Methods

DirectNet is a fully convolutional neural network that approaches retinal image segmentation as an image-to-image translation task. In a traditional feedforward CNN, data flows through the network continuously in one direction from the top to the bottom layer. In contrast, we propose the use of recurrent structures to build a compact, yet sufficiently complex model. An architecture that allows for very fast analysis, while maintaining accuracy is developed in this study.

Our network consists of a pyramid-shaped stack of recurring blocks of convolutional layers as depicted in Figure 1b. Data flows through the network in 4 stages, being processed repeatedly by 4 distinct blocks. Each block consists of a set of convolutional layers. At stage one, the input image is processed by block 1 (depicted as red block in Figure 1b). Outputs are then passed on to block 2 (green) but also fed back into block 1. At the next stage, outputs of block 1 are again given directly to block 1 and outputs of block 2 are given to block 3 (blue). Outputs of block 1 and 2 are then merged and then passed to block 2. This process continues in the same fashion through stages 3 and 4. Finally, the results of all individual blocks on stage 4 are combined into a joint prediction. The final output is a vessel probability map of the same size as the input image.

Most current CNN models use the combination of small 3×3 convolutional kernels and pooling layers to reduce model parameters¹⁴⁾. Since the DirectNet architecture presented here relies on a rather shallow network design with fewer layers, it does not contain any pooling layers. Instead, the required receptive field size is achieved by employing larger kernels sizes (5×5, 7×7, 15×¹⁵⁾ combined with the aforementioned recurrent structure of the network. This structure allows the storage and propagation of information across the image. The increase in computational complexity, introduced by larger kernels can be mitigated by the use of depthwise separable convolutions as proposed by Chollet¹⁵⁾. This method can reduce computation time by more than 30% compared to general convolutions by separating spatial convolution from the convolution across image channels.

Ⅲ. Results

The proposed DirectNet model for retinal vessel segmentation was implemented using the Keras library with a Tensorflow backend and evaluated on the DRIVE dataset, which is the most commonlyused dataset for vessel segmentation⁶⁾. The testing procedure followed the common methodology of selecting the annotations of the first human observer as ground truth (annotations of the second observer are usually only used to study human performance). The DRIVE dataset contains 40 fundus images, split into 20 images for training and 20 images for testing. All images were cropped to an input size of 584 × 565 pixels.

To compare the DirectNet architecture to a standard, state-of-the-art method, another patchbased CNN was trained on DRIVE to serve as a baseline. For this a publicly available implementation* based on the U-Net architecture¹³⁾ was used. In the DirectNet architecture, the total number of parameters was 273,668 (see Table 1), whereas the U-Net implementation had 517,666 parameters. All experiments were run on the Intel Core i7 processor with 16Gb RAM and a GeForce GTX 1080Ti graphics card.

Performance of the vessel segmentation can be described with several metrics including the F1-score, accuracy, sensitivity, specificity, precision, and the area under the ROC curve. A definition of these metrics is given in (1).

\begin{array}{l} A c c = \frac{T P + T N}{T P + T N + F P + F N}, \\ S e n s = \frac{T P}{T P + F N}, S p e c i = \frac{T N}{T N + F P}, \\ \Pr e c i = \frac{T P}{T P + F P}, \\ F 1 s c o r e = 2 \times \frac{\Pr e c i s i o n \times r e c a l l}{\Pr e c i s i o n \times r e c a l l} \end{array}

(1)

, where, TP, TN, FP, and FN stand for true positive, true negative, false positive, and false negative classification, respectively. The area under the ROC curve was calculated using the standard implementation provided in the Python library. Segmentation results in form of probability maps were converted to binary images by a fixed threshold of 0.5. Since threshold values may change the results, the automatic Otsu threshold selection method was also tried¹⁶⁾, but did not obtain better results.Figure 2

As shown in Table 2, all parameters obtained from DirectNet were relatively higher than the patch-based CNN method. In particular, the F1 score for predictive evaluation of 0.8124 compared to 0.7653 showed a strong increase in performance (note, that the format of the training and testing paradigm, which is used in the field so far prohibits the use of multiple dataset splits, such that statistical tests or Bland-Altman plots cannot be run). This was mostly driven by an increase in sensitivity, meaning the algorithm’s ability to accurately detect a pixel of a vessel. Additionally, DirectNet showed a significant speed-up compared to the U-Net architecture: training of the patch-based CNN took 8 hours, whereas DirectNet took only 1 hour. Similarly, during testing, U-Net took 1 hour to process all patches of one retinal image, whereas DirectNet finished the same task in 6 seconds – a strong speed-up.

Figure 3 shows the qualitative evaluation of segmentation results on the first 4 images in DRIVE. The original retinal photos are shown in the first column. The second and third columns show human annotations and the developed vessel segmentation maps, respectively.

Finally, DirectNet was also compared to another recent deep-learning method suggested by Liskowski et al.¹⁷⁾. This method required 8 hours of training time on 400,000 sample patches extracted from the 20 training images in DRIVE. Both methods achieved comparable accuracy (0.9535 vs 0.9538) and virtually identical ROC performance (AUC was 0.9790 for DirectNet, compared to 0.9733 for Liskowski et al.). Importantly, at test time DirectNet was still more than 15 times faster than the other approach, which took 92 seconds per image (see Table 2 for all results).

Ⅳ. Conclusion

This study presents a novel method for retinal blood vessel segmentation that is time- and memory-efficient while providing high segmentation accuracy. The proposed recurrent DirectNet architecture provides a compact network architecture (low parameter count) that does not require patch-based scanning techniques or any post-processing steps. It is able to predict a segmentation image by operating directly on the image without prior upor downsampling steps as necessary in other approaches. DirectNet was benchmarked against two other state-of-the-art methods on the DRIVE dataset, yielding or surpassing state-of-the-art performance in terms of accuracy, sensitivity, and specificity. Importantly for practical implementations, however, the proposed DirectNet architecture is at least one order of magnitude faster than traditional patch-based CNNs.

Vessel segmentation is only the first step in an automatic analysis pipeline that can be implemented in clinical practice. In the future, our goal is to derive features based on the segmented vessels that can be helpful in diagnosing certain types of retinopathies such as edema, or early signs of agerelated macular degeneration (AMD). However, for specialized diagnostic tasks based on retinal images a large amount of training data will be required, going far beyond currently available datasets such as DRIVE. Especially in these cases, efficient architectures, like the proposed DirectNet, will be necessary for training on large datasets and in clinical application use cases.

Acknowledgement

This work was supported by an Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korean government (No. 2017-0-00451).

Figure

Figure 1.

Comparison of the standard, patch-based CNNs approach (a) with our proposed DirectNet architecture (b).

Figure 2.

ROC curves of DirectNet (blue) and patch-based CNN (red).

Figure 3.

Segmentation results for the first 4 fundus images in DRIVE. Left: original fundus images; middle: ground-truths; right: segmentation results produced by DirectNet.

Table

Table 1.

DirectNet architecture

Block	Layer	Batchnorm. & Activation	Kernel Size	Num. of arameter
Block 1	Input Layer	-	-	-
Conv2D	BatchNorm &ReLU	3 ×3	864 +128(BN)
Conv2D	BatchNorm &ReLU	7 ×7	100352 +256(BN)
Conv 2D(res)	BatchNorm	1 ×1	4096 +256(BN)
Separable Conv2D	BatchNorm &ReLU	15 ×15	18496 +256(BN)
Separable Conv2D(+res)	BatchNorm &ReLU	15 ×15	18496 +256(BN)
Block 2	Conv 2D(res)	BatchNorm	1 ×1	4096 +256(BN)
Separable Conv2D	BatchNorm &ReLU	15 ×15	18496 +256(BN)
Separable Conv2D	BatchNorm &ReLU	15 ×15	18496 +256(BN)
Bl. 1 +2	Add	ReLU	-	-
Block 3	Conv 2D(res)	BatchNorm	1 ×1	4096 +256(BN)
Separable Conv2D	BatchNorm &ReLU	15 ×15	18496 +256(BN)
Separable Conv 2D(+res)	BatchNorm &ReLU	15 ×15	18496 +256(BN)
Bl.1+2+3	Add	ReLU	-	-
Block 4	Conv 2D(res)	BatchNorm	1 ×1	4096 +256(BN)
Separable Conv2D	BatchNorm &ReLU	15 ×15	18496 +256(BN)
Separable Conv 2D(+res)	BatchNorm &ReLU	15 ×15	18496 +256(BN)
Block1+2+3+4	Add	ReLU	-	-
Final Block	Separable Conv2D	BatchNorm &ReLU	7 ×7	4160 +64(BN)
Separable Conv2D	BatchNorm &Sigmoid	5 ×5	416 +4(BN)
Total	273,668

Table 2.

Performance comparison of patch-based CNN, Liskowskiet al17), and our method

	Patch-based CNN	Liskowski et al.17)	Direct Net
Fundus Images	20	20	20
Fundus Training Patches	400,000(20×20,000)	400,000(20×20,000)	20
F1score	0.7653	-	0.8124
Accuracy	0.9427	0.9535	0.9538
Sensitivity	0.7346	0.7811	0.7851
Specificity	0.9730	0.9807	0.9782
Precision	0.7987	-	0.8458
AUC ROC curve	0.9640	0.9790	0.9733
Jaccard similaritys core	0.9426	-	0.9490
Training time	8h	8h	1h
Test time per image	1h	92sec	6sec

Reference

Lee CH, Woo JM et al.: Clinical Characteristics of Retinal Arterial Macroaneurysms. J Korean Ophthalmol Soc. 43(9), 1612-1620, 2002.
Higgins RD, Yan Y et al.: Regression of retinopathy by squalamine in a mouse model. Pediatr Res. 56(1), 144-149, 2004.
Schmidt-Erfurth UM, Pruente C: Management of neovascular age-related macular degeneration. Prog Retin Eye Res. 26(4), 437-451. 2007.
Ferris FL 3rd, Davis MD et al.: Treatment of diabetic retinopathy. N Engl J Med. 341(1), 667-678. 1999.
Walter T, Klein JC et al.: A contribution of image processing to the diagnosis of diabetic retinopathy-detection of exudates in color fundus images of the human retina. IEEE Trans Med Imaging 21(10), 1236-1243, 2002.
Staal J, Abràmoff MD et al.: Ridge-based vessel segmentation in color images of the retina. IEEE Trans Med Imaging 23(4), 501-509, 2004.
Jiang X, Mojon D: Adaptive local thresholding by verification-based multithreshold probing with application to vessel detection in retinal images. IEEE TPAMI. 25(1), 131-137, 2003.
Hoover A, Kouznetsova V et al.: Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Trans Med imaging 19(3), 203-210, 2000.
Wang H, Hsu W et al.: An effective approach to detect lesions in color retinal images. In: Proc IEEE Conference Computer Vis Pattern Recognition 2, 181-186, 2000.
Zuluaga MA, Magnin IE et al.: Automatic detection of abnormal vascular cross-sections based on density level detection and support vector machines. Int J Comput Assist Radiol Surg. 6(2), 163-174, 2011.
Sopharak A, Dailey MN et al.: Machine learning approach to automatic exudate detection in retinal images from diabetic patients. J Mod Opt. 57(2), 124-135, 2010.
Fu H, Xu Y al.: Retinal vessel segmentation via deep learning network and fully-connected conditional random fields. Proceedings IEEE Int Symposium Biomed Imag. 2016, 698-701, 2016.
Simonyan K, Zisserman A: Very deep convolutional networks for large-scale image recognition. ICLR 2015, 1-14, 2015.
Chollet F: Xception: Deep learning with depthwise separable convolutions. arXiv preprint, arXiv: 1610.02357v2, 1251-1258, 2016.
Ronneberger O, Fisher P et al.: U-net: Convolutional networks for biomedical image segmentation. Int Conference Med Image Computing Computerassisted Intervention 9351, 234-241, 2015.
Otsu N: A threshold selection method from gray-level histograms. IEEE Trans Systems Man Cybernet 9(1), 62–66, 1979.
Liskowski P, Krawiec K: Segmenting retinal blood vessels with deep neural networks. IEEE Trans Medical Imaging 35(11), 2369-2380, 2016.