0% found this document useful (0 votes)
205 views6 pages

AI for Automated Ocular Measurements

Rana Et Al 2024 Artificial Intelligence to Automate Assessment of Ocular and Periocular Measurements
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
205 views6 pages

AI for Automated Ocular Measurements

Rana Et Al 2024 Artificial Intelligence to Automate Assessment of Ocular and Periocular Measurements
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Original research article

European Journal of Ophthalmology


1–6
Artificial intelligence to automate © The Author(s) 2024

assessment of ocular and periocular Article reuse guidelines:


sagepub.com/journals-permissions
measurements DOI: 10.1177/11206721241249773
journals.sagepub.com/home/ejo

Khizar Rana1 , Mark Beecher1, Carmelo Caltabiano1,


Carmelo Macri1 , Yang Zhao2, Johan Verjans2
and Dinesh Selva1

Abstract
Purpose: To develop and validate a deep learning facial landmark detection network to automate the assessment of
periocular anthropometric measurements.
Methods: Patients presenting to the ophthalmology clinic were prospectively enrolled and had their images taken using a
standardised protocol. Facial landmarks were segmented on the images to enable calculation of marginal reflex distance
(MRD) 1 and 2, palpebral fissure height (PFH), inner intercanthal distance (IICD), outer intercanthal distance (OICD),
interpupillary distance (IPD) and horizontal palpebral aperture (HPA). These manual segmentations were used to train
a machine learning algorithm to automatically detect facial landmarks and calculate these measurements. The main out-
comes were the mean absolute error and intraclass correlation coefficient.
Results: A total of 958 eyes from 479 participants were included. The testing set consisted of 290 eyes from 145
patients. The AI algorithm demonstrated close agreement with human measurements, with mean absolute errors
ranging from 0.22 mm for IPD to 0.88 mm for IICD. The intraclass correlation coefficients indicated excellent reliability (ICC >
0.90) for MRD1, MRD2, PFH, OICD, IICD, and IPD, while HPA showed good reliability (ICC 0.84). The landmark detection
model was highly accurate and achieved a mean error rate of 0.51% and failure rate at 0.1 of 0%.
Conclusion: The automated facial landmark detection network provided accurate and reliable periocular measurements. This
may help increase the objectivity of periocular measurements in the clinic and may facilitate remote assessment of patients with
tele-health.

Keywords
Periocular, machine learning, marginal reflex distance

Date received: 16 January 2024; accepted: 4 April 2024

Introduction and cognitively impaired adults. Additionally, with the


increasing adoption of telehealth, physical execution of
Accurate and reliable periocular structures is important in such measurements may become impractical.
diagnosing and monitoring various ophthalmological con-
ditions. Eyelid position can be altered in orbital diseases
1
like thyroid eye disease, tumours, and trauma. Presently, Department of Ophthalmology & Visual Sciences, South Australian
manual measurement methods, such as marginal reflex dis- Institute of Ophthalmology, University of Adelaide, North Terrace, SA
5000, Australia
tance (MRD) 1 and 2, are employed to determine the ver- 2
Australian Institute for Machine Learning, The University of Adelaide, SA
tical distance from the corneal light reflex to the upper and 5000, Adelaide, Australia
lower eyelid margins, respectively. However, these manual
Corresponding author:
approaches are subjective, dependent on the operator’s Khizar Rana, Royal Adelaide Hospital, Port Road, Adelaide, South
skill, and often demand a level of patient cooperation, Australia 5000, Australia.
posing challenges in certain patient groups like children Email: [email protected]
2 European Journal of Ophthalmology 0(0)

We sought to develop a deep learning model for facial land- lateral canthi for each eye were manually annotated
mark detection to automatically detect periocular landmarks (Figure 1). The distances between the periocular landmarks
and conduct accurate periorbital and eyelid measurements. were computed using the open-source OpenCV library.
The calculated dimensions included MRD1, the vertical
distance from the pupillary centre to the centre of the
Methods upper eyelid margin; MRD2, the vertical distance from
We prospectively enrolled participants presenting to the Royal the pupillary centre to the centre of the lower eyelid
Adelaide Hospital ophthalmology clinic who were 18 years of margin; palpebral fissure, the vertical height between the
age or older and gave written informed consent. Patients with upper and lower eyelids, derived by summing MRD1
ocular misalignment, pupil abnormalities, or corneal path- and MRD2; inner intercanthal distance (IICD), the hori-
ology affecting the light reflex were excluded from the zontal distance between the medial canthi; outer inter-
study. The institutional human research ethics committee canthal distance (OICD), the horizontal distance between
approved the study. Study procedures adhered to the princi- the lateral canthi; interpupillary distance (IPD), the hori-
ples of the Declaration of Helsinki. zontal distance between the centres of the two pupils;
and horizontal palpebral aperture (HPA), the horizontal
distance between the medial and lateral canthi within one
Image collection eye (Figure 2).
In a well-lit room, seated participants were placed 1 metre
from a Nikon D90 camera equipped with a 60 mm lens and
positioned on a stand at eye level. Participants were asked Deep learning model development
to look straight, and the photographs were taken head-on. To make our framework reproducible, we adopted a widely
To enable accurate calibration, a circular green adhesive used backbone network HRNet-v2 as our landmark detec-
dot sticker with a diameter of 24 mm was placed on the tion model to predict designed facial landmarks.2
subject’s forehead, allowing for the conversion of pixels HRNet-v2combines the representations from all the
to millimetres. high-to-low resolution parallel streams. Specifically, the
input of the landmark detection model HRNet-v2 is a
facial image of size w × h, and the output of the landmark
Image analysis detection model is likelihood heatmaps H = {H l }Ll=1 for
The images were upload onto Labelbox, a popular web- L pre-defined facial landmarks. In the design of
based annotation tool for segmentation and classification HRNet-v2, the size of the output heatmaps is reduced by
systems.1 Ten periocular landmarks, including the pupil- four times. As described in Section of image analysis, L
lary centre, the midline of the upper eyelid margin, the is equal to ten for the landmark detection model. To opti-
midline of the lower eyelid margin, medial canthi, and mise the landmark detection model, we employ the Mean

Figure 1. The periocular landmarks were manually segmented.


Rana et al. 3

Figure 2. The periocular landmarks were used to calculate the periocular dimensions.

Table 1. Comparison of mean periorbital measurements


between human and AI predictions. Table 2. Mean absolute error between paired measurements
taken by human and AI predictions.
Measured Predicted
Metric Mean Absolute error (standard deviation)
Metric N = 2901
MRD1 0.29 (0.25)
MRD1 3.66 (0.89) 3.64 (0.90) MRD2 0.30 (0.32)
MRD2 5.24 (1.22) 5.23 (1.23) PFH 0.31 (0.31)
PFH 8.90 (1.49) 8.86 (1.52) HPA 0.74 (0.91)
HPA 25.30 (2.08) 25.30 (2.11) OICD 0.73 (0.97)
IICD 0.88 (0.97)
N = 1451
IPD 0.22 (0.18)
IPD 61.2 (3.7) 61.2 (3.8)
MRD, marginal reflex distance; PFH, palpebral fissure height; HPA,
OICD 83.7 (4.6) 83.5 (4.8) horizontal palpebral aperture; IPD, interpupillary distance; OICD, outer
IICD 33.3 (3.4) 33.0 (3.0) intercanthal distance; IICD, inner intercanthal distance.
1
Mean (SD).
MRD, marginal reflex distance; PFH, palpebral fissure height; HPA, shown in Fig. 1. All the face images including both training
horizontal palpebral aperture; IPD, interpupillary distance; OICD, outer
and testing images were scaled to 512 × 256. Our landmark
intercanthal distance; IICD, inner intercanthal distance.
detection model was trained using Pytorch version 1.7.0 on
Squared Error (MSE) loss function to compare the predicted a single NVidia RTX 3090 GPU with 24GB video
heatmaps H and the ground-truth heatmaps H ∗ , which are memory. Data augmentation was performed to improve
generated from the annotated 2D facial landmarks. Here the the robustness of the network to data variations. Image
values on the heatmap H are computed from a 2D Gaussian augmentation was performed by in-plane rotation (± 30
distribution centred at landmark l. Therefore, the loss function degrees), scaling (0.75–1.25) and random horizontal flips
between the predicted heatmaps H and ground-truth heatmaps (probability 50%). The Adam optimizer was used with a
H ∗ is defined as Loss(H, H ∗ ) = ||H − H ∗ ||2 . When the mini-batch size of 16 for 60 epochs. The base learning
training is finished, the landmark coordinates can be generated rate was 10−3 and decayed to 10−4 at the 30th epochs
from the corresponding prediction H ∗ by finding the coordi- and 10−5 at the 50th epochs respectively.
nates of the highest heatmap value and up sampling back
the original image size.
The complete dataset was randomly split into training Statistical analysis
(70%) and evaluation (30%) sets. Each image in the pro- Human and AI predicted measurements were summarised
posed dataset had 10 manually annotated landmarks, as by mean and standard deviation. Agreement between
4 European Journal of Ophthalmology 0(0)

Figure 3. Bland-Altman plots demonstrating the bias and limits of agreement for periocular measurements. Bias (mean of differences)
is the dashed dark grey line. Upper and lower confidence intervals of bias are depicted by the dotted grey lines and grey shading. Upper
and lower limits of agreement are depicted by the dashed black lines. Their associated confidence intervals are depicted by the dotted
black lines and red shading.
Rana et al. 5

human measurements and AI predicted measurements was remote patient assessment, making it particularly relevant
assessed using Bland-Altman plots with 95% confidence inter- in the context of increasing telehealth use.
vals for the average difference between measurements Previous studies have developed methods to calculate
between humans and the AI predictions. The left and right MRD1 and MRD2 with less human input. Bodnar, Neimkin3
measurements were pooled for the bilateral measures. The utilised edge detection techniques, including the Canny edge
mean absolute error between paired observations was calcu- detection method, to identify facial features and estimate
lated using the mean of the absolute value of paired differ- MRD1 and MRD2, and Lou, Yang4 employed a facial land-
ences between human and AI predicted measurements for mark detection program in combination with edge detection
each metric. The interrater reliability of the measurements to recognise the pupillary centre and estimate MRD1 and
was calculated using the intraclass correlation coefficient MRD2. Thomas, Gunasekera5 used OpenFace, an open-source
(ICC). The ICC estimates and 95% confidence intervals AI driven facial analysis software, to measure the vertical pal-
were calculated using the R package irr v0.84.1 based on pebral aperture. However, this study did not calculate the
single measures, absolute-agreement, 2-way mixed-effects MRD1 and MRD2 measurements specifically. Our method-
model. ICC estimates were interpreted as poor reliability ology uses deep learning algorithms to detect periocular land-
(ICC < 0.5), moderate reliability (0.5 < ICC < 0.75), good reli- marks and calculate periocular dimensions, including but not
ability (0.75 < ICC < 0.9), and excellent reliability (ICC > limited to MRD1 and MRD2. Machine learning models
0.90). Statistical analysis was performed using R v4.1.2. A provide increased robustness to variations found in real-world
p-value < 0.05 was considered statistically significant. images such as lighting conditions, angles, facial size, and
expressions. Additionally, deep learning techniques automatic-
ally learn features from the raw imaging data enabling accurate
Results localisation of key periocular landmarks. In our study, we
A total of 958 eyes were included from 479 participants. The adopted the widely used HRNet-V2 as our backbone
mean age of participants was 59 ± 17.9 years and 257 (54%) network to learn the high-resolution representations through
were female. Most participants were Caucasian (407, 85%), the whole process for facial landmark detection. Traditional
with other groups being East Asian (34, 7.1%), South Asian computer vision techniques can have difficulty detecting
(28, 6%), and African (10, 2.1%). The testing set consisted of facial landmarks due to large head position, and heavy occlu-
290 eyes from 145 patients. A summary and comparison of sion. By training a Convolutional neural network (CNN) on a
the human and AI predicted periorbital measurements are dataset of images containing labelled facial landmarks, the
detailed in Table 1. On average, the AI predicted measurements algorithm can identify facial features in new images and
were < 1 mm away from human measurements for all metrics achieve high detection performance in a variety of conditions.
(Table 2). The Bland-Altman plots are showed in Figure 3 The accurate conversion of pixels to millimetres in
with the bias and limits of agreement. The magnitude of differ- images is required, and different studies have adopted dif-
ence between human and AI measurements was less for MRD1, ferent techniques for this purpose. In the Van Brummen
MRD2, IPD and PFH (Figure 3). IICD showed a greater differ- study, the AI algorithm’s pixel-to-mm conversion relied
ence between measurements and less agreement for larger mea- on a corneal width of 11.71 mm, which was different
surements (Figure 3). The intraclass correlation coefficients from the corneal width of 11.77 mm measured by human
demonstrated excellent reliability for all measurements except graders.6 Moreover, measuring the corneal width can pose
HPA which showed good reliability (Table 3). The landmark challenges, particularly in cases of ptosis where the eyelid’s
detection model was highly accurate and achieved a mean position may cover part of the cornea.7,8 In our study, we
error rate of 0.51% and failure rate at 0.1 of 0%. affixed a sticker dot with a known diameter on the forehead

Discussion Table 3. Intraclass correlation coefficients for measurement


reliability for each periorbital metric.
We present the successful development of an automated
facial landmark detection algorithm for periocular mea- Metric ICC 95% CI p
surements. Our study demonstrates the potential to create MRD1 0.906 0.883–0.925 <0.0001
a highly accurate landmark detection model which can MRD2 0.933 0.917–0.947 <0.0001
automatically detect key periocular landmarks and use PFH 0.957 0.946–0.966 <0.0001
these to calculate periocular measurements. The algorithm HPA 0.843 0.806–0.873 <0.0001
produced results less than 1 mm from human measure- OICD 0.966 0.953–0.976 <0.0001
ments with excellent reliability. This automation has the IICD 0.918 0.887–0.941 <0.0001
potential to overcome the subjectivity and operator- IPD 0.997 0.996–0.998 <0.0001
dependent variability associated with manual measure- MRD, marginal reflex distance; PFH, palpebral fissure height; HPA,
ments This AI-based approach can significantly improve horizontal palpebral aperture; IPD, interpupillary distance; OICD, outer
the efficiency of periocular measurements and facilitate intercanthal distance; IICD, inner intercanthal distance.
6 European Journal of Ophthalmology 0(0)

of each participant to allow a reliable and standardised conver- Funding


sion of pixels to mm. All participants had this performed as The authors received no financial support for the research, author-
our study consisted of solely prospectively enrolled patients, ship, and/or publication of this article.
in contrast to the Van Brummen study, which primarily
relied on retrospective recruitment.6
Limitations to this study include the data being from a ORCID iDs
single centre, although the patients studied were diverse in Khizar Rana https://orcid.org/0000-0002-5986-6621
terms of clinical presentation, age, sex, and ethnicity. This Carmelo Macri https://orcid.org/0000-0002-1110-3780
real-world sample size reflects the typical patient population
seen in ophthalmology clinics, enhancing the generalisability
of our findings to other Australian cohorts. A multicentre References
study would however have greater generalisability. Our AI 1. Labelbox. Labelbox: the leading training data platform for data
system compared the measurements done by human graders labeling., https://labelbox.com/ (2020, accessed January 2023).
on images taken in clinic. Comparison with manual measure- 2. Wang J, Sun K, Cheng T, et al. Deep high-resolution represen-
ments done in clinic would allow us to determine whether the tation learning for visual recognition. IEEE Trans Pattern
AI system performed comparably as factors such as frontalis Anal Mach Intell 2021; 43: 3349–3364. 2020/04/06. DOI:
muscle action and subtle blinking may not be completely con- 10.1109/tpami.2020.2983686.
trolled for with a single image. In addition, the photos were 3. Bodnar ZM, Neimkin M and Holds JB. Automated ptosis measure-
taken in standardised conditions and need to be externally vali- ments from facial photographs. JAMA Ophthalmol 2016; 134:
dated on images from different cameras and settings prior to 146–150. 2015/11/26. DOI: 10.1001/jamaophthalmol.2015.4614.
4. Lou L, Yang L, Ye X, et al. A novel approach for automated
being used in a Telehealth setting.
eyelid measurements in blepharoptosis using digital image
In conclusion, our study introduces an automated facial
analysis. Curr Eye Res 2019; 44: 1075–1079. DOI: 10.1080/
landmark detection network for periocular measurements, 02713683.2019.1619779
providing accurate and reliable results. The AI algorithm’s 5. Thomas PBM, Gunasekera CD, Kang S, et al. An artificial
successful development opens avenues for objective and intelligence approach to the assessment of abnormal lid pos-
efficient assessment of periocular structures. As telehealth ition. Plast Reconstr Surg Glob Open 2020; 8: e3089. 2020/
becomes more prevalent, the implementation of such auto- 11/12. DOI: 10.1097/gox.0000000000003089.
mated measurement techniques can streamline remote 6. Van Brummen A, Owen JP, Spaide T, et al. PeriorbitAI:
patient assessment and improve ophthalmic care delivery. Artificial intelligence automation of eyelid and periorbital
measurements. Am J Ophthalmol 2021. 2021/05/20. DOI:
10.1016/j.ajo.2021.05.007.
Financial disclosures/declaration of 7. Schulz CB, Clarke H, Makuloluwe S, et al. Automated extrac-
conflicting interests tion of clinical measures from videos of oculofacial disorders
using machine learning: feasibility, validity and reliability.
The authors declared no potential conflicts of interest with Eye (Lond) 2023; 37: 2810–2816. 2023/02/02. DOI:
respect to the research, authorship, and/or publication of 10.1038/s41433-023-02424-z.
this article. 8. Aleem A, Nallabothula MP, Setabutr P, et al. AutoPtosis. 2021.

You might also like