Automated Phenotyping of Herbaceous Biomass Using U-Net Architecture For - CT Images Segmentation
Automated Phenotyping of Herbaceous Biomass Using U-Net Architecture For - CT Images Segmentation
2024 IEEE 2nd International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings) | 979-8-3315-2952-9/24/$31.00 ©2024 IEEE | DOI: 10.1109/AIBThings63359.2024.10863791
Authorized licensed use limited to: Idaho State University. Downloaded on March 11,2025 at 04:14:04 UTC from IEEE Xplore. Restrictions apply.
II. R ELATED W ORK model to improve the clarity and quality of segmentation
outputs.
Before the advent of deep learning techniques, the identifi- • Utilization of Deep Learning Model: Achieving higher
cation and segmentation of vascular bundles in plant cross- precision in segmenting vascular bundles by utilizing U-
sections were primarily conducted manually. This process Net to enhance the accuracy of our analytical methods.
involved labor-intensive and time-consuming methods that • Manual Mask Overlap Analysis: Overlapping the man-
relied heavily on expert knowledge and meticulous attention ual mask with the model-predicted mask to evaluate the
to detail. For instance, Yusuf et al. [10] demonstrated a model’s performance, ensuring robustness and reliability,
high-throughput phenotyping methodology for assessing stalk and confirming its applicability in real-world scenarios.
lodging resistance by quantifying the cross-sectional mor-
III. P RELIMINARIES
phology of plant stalks. These traditional methods employed
in the segmentation of vascular bundles typically involved A. Thresholding Techniques
manually tracing boundaries on high-resolution images, which, In the realm of image processing thresholding plays a
despite yielding high accuracy, were not feasible for large- role in distinguishing between foreground and background
scale analysis due to the excessive time and effort required. pixels using a specified intensity level. Nevertheless a singular
In recent years, the integration of micro-computed tomog- universal threshold value may not be ideal for images with
raphy (micro-CT) with advanced deep learning techniques lighting or varying object intensities.
has significantly advanced the field of plant phenotyping. Otsu’s thresholding method scrutinizes the histogram of
Leen Van Doorselaer et al. [11] demonstrated the efficacy of an image, which illustrates the distribution of intensities.
using deep learning models specifically designed for micro-CT Otsu’s method systematically evaluates all threshold values
images to improve segmentation accuracy of fruit parenchyma and computes the between-class variance for each. The optimal
tissues. Jianjun Du et al. [12] developed a CNN-integrated threshold is then determined as the value that maximizes this
phenotyping pipeline for vascular bundle phenotypes in maize variance between classes. Put simply, Otsu’s method aims to
stems, where it utilized advanced semantic segmentation find a threshold that creates the noticeable differentiation be-
models to accurately detect and quantify vascular bundles tween the intensity distributions of foreground and background
from CT images, achieving a counting accuracy of 0.997 within the histogram. This strategy proves effective for images
for all stem internode types and over 0.98 accuracy for size with a bimodal (two peaked) histogram, where foreground and
traits. Yuwei Lu et al. [13] extended the application of deep background intensities are distinctly separated. This method
learning to the phenotyping of passion fruit, demonstrating achieves a more precise separation of objects, from their
how nondestructive imaging and segmentation methods can backgrounds compared to using a fixed global threshold.
be effectively utilized to enhance the efficiency and accuracy Let us denote T as the threshold value that separates the
of morphological trait extraction in agriculture. This research foreground and the background. Otsu’s method aims to find
2
illustrated the potential of automated deep learning techniques this threshold by maximizing the between-class variance σB ,
in reducing labor-intensive phenotyping processes and improv- which is calculated as:
ing breeding programs by using a U-Net model, renowned for 2
σB (T ) = ω0 (T ) · ω1 (T ) · [µ0 (T ) − µ1 (T )]2 (1)
its effectiveness in biomedical image segmentation.
Similarly, Qiang Zhong et al. [14] adapted these advanced where ω0 (T ) and ω0 (T ) are the probabilities of the back-
imaging techniques for the textile industry, applying a U- ground and foreground classes, respectively, at threshold T .
Net-based method for segmenting filamentous objects in weft µ0 (T ) and µ1 (T ) are the mean intensities of the background
micro-CT images, enhancing quality control processes within and foreground pixels, respectively, at threshold T .
textile manufacturing and showcasing the versatility of micro- Otsu’s method involves iterating through all possible inten-
2
CT in industrial applications. In dental research, Xiang Lin sity values to compute σB (T ) and determining the threshold
et al. [15] developed a novel pipeline employing the U- T that maximizes this value. This can be mathematically
Net architecture and demonstrated the use of micro-CT for expressed as:
2
segmenting the pulp cavity and teeth, highlighting the potential T = arg max σB (T ) (2)
T
for deep learning to significantly improve the accuracy of
dental diagnostics and treatment planning. Our research aims In practical terms, Otsu’s method calculates a histogram of
to build upon these foundational studies, focusing specifically pixel intensities and computes the probabilities ω0 and ω1
on enhancing the segmentation accuracy and computational along with their respective mean intensities µ0 and µ1 for
efficiency of imaging vascular bundles in corn stalks. each possible threshold value. The threshold T that maximizes
2
σB effectively partitions the image into foreground and back-
The key contributions and objectives of this paper can be
ground regions, allowing accurate segmentation even in cases
summarized as follows.
of varying lighting conditions or uneven illumination.
• Enhanced Image Preprocessing: Integrating advanced Otsu’s method is highly beneficial for the thresholding
thresholding and morphological operations to enhance technique as it can autonomously adapt to local changes in
image preprocessing before applying the deep learning image intensity, making it appropriate for tasks like edge
Authorized licensed use limited to: Idaho State University. Downloaded on March 11,2025 at 04:14:04 UTC from IEEE Xplore. Restrictions apply.
detection, object recognition, and image segmentation across powerful tool for accurate and precise segmentation of cellular
various scenarios. Otsu’s method automates threshold selection structures.
for image processing tasks by utilizing statistical properties of Key Components of the U-Net architecture are Encoder,
pixel distributions, improving robustness and reliability. Decoder, and Skip Connections. Encoder (Contracting Path) is
a convolutional layers with increasing numbers of filters and
B. Morphological Operations ReLU activation functions and max-pooling layers for down-
Morphological image processing is a set of nonlinear op- sampling and feature extraction. Decoder (Expansive Path)
erations that deal with the shape or morphology of features is an up-convolutional layers (transposed convolutions) for
in an image. These techniques are very effective for analyzing upsampling the feature maps. Concatenation with feature maps
geometrical structures within pictures when mathematical mor- from the corresponding contracting path via skip connections.
phology is used. Such procedures function by probing a picture And convolutional layers with ReLU activation functions.
with a basic, preset shape known as a structuring element, Finally, the skip connections are connections between the en-
which is placed at all potential positions in the image and coder and decoder layers at corresponding spatial resolutions.
then compared to the appropriate neighborhood of pixels. Preserve fine-grained details and spatial context for accurate
Here, I used two sorts of morphological operations: opening segmentation.
and closing operations. In the domain of µ-CT images of plant cells, U-Net has
1) Opening Operations: The opening of an image f by a been successfully applied to various tasks such as segmenting
structure element s (denoted by f ◦ s) is an erosion followed cellular structures such as cell walls, nuclei, and organelles,
by a dilation. analyzing the morphology and architecture of plant tissue, and
f ◦ s = (f ⊖ s) ⊕ s quantifying cellular features for phenotypic analysis and plant
breeding. These include integrating attention mechanisms for
The opening of an image can create a gap between objects
feature enhancement, adopting multi-scale architectures for
connected by a thin bridge of pixels. Opening is an idempotent
capturing hierarchical information, and exploring conditional
operation, meaning that once an image is opened, future
generative models for image synthesis and segmentation. In
openings with the same structural element have no influence
our work, the U-Net architecture was chosen for its superior
on it.
performance in biomedical image segmentation tasks, which
(f ◦ s) ◦ s = f ◦ s
are analogous to identifying the internal structures from µ-CT
2) Closing Operations: The closing of an image f by a images. U-Net’s encoder-decoder structure and skip connec-
structure element s (denoted by f • s) is a dilation followed tions enable accurate segmentation with minimal training data,
by an erosion: making it ideal for high-resolution µ-CT images of vascular
bundles in corn stalks.
f • s = (f ⊕ srot ) ⊖ srot
IV. D EVELOPMENT OF D ESIGN F RAMEWORK
In this case, dilation and erosion should be conducted using
a revolving structural element of 180◦ . Typically, the latter is In this section, we dive deeper into the methodologies used
symmetrical, thus its rotated and original forms are identical. to process and enhance high-resolution micro-CT images of
Closing is named by its ability to close gaps in areas while cornstalks, starting from the initial handling of the raw TIFF
maintaining their original proportions. Closing, like opening, files to advanced image modifications before creating their
is idempotent. masks to train our U-Net model. We will evaluate our U-
(f • s) • s = f • s Net architecture using cropped and centered corn stalk images
at the highest resolution available in the dataset, along with
and it is a dual operation of opening (just like opening is their corresponding masks. The U-Net architecture processes
the twin action of closing): these inputs to generate predicted segmentation masks. The
model’s performance is validated through test accuracy by
f • s = (f c ⊖ s)c ; f ◦ s = (f c • s)c
pixel-by-pixel matching, percentage of overlapping color be-
In other words, one may execute the closure or opening of tween the predicted and reference masks, and comparisons
a binary image by taking the image’s complement, opening between model-predicted images versus manually created and
or closing with the structural element, and taking the result’s Photoshop-generated masks. The procedure for the perfor-
complement [16]. mance of the demonstrated model is shown in Fig. 2.
C. U-Net: A Convolutional Network for Image Segmentation A. Data Preparation: Advanced Image Processing
U-Net is a CNN structure created for semantic segmentation The processing begins by converting the raw TIFF images
applications, specifically in the area of biomedical image to grayscale. This step reduces the complexity of the image by
examination. U-Net, created by Ronneberger et al. in 2015, eliminating color data, simplifying the subsequent processing
has become widely popular for its ability to generate precise stages. Grayscale conversion is crucial as it focuses analysis
segmentation masks, particularly when training data is limited. on the intensity of pixels rather than color variations, which
In the context of µ-CT images of plant cells, U-Net serves as a is particularly useful for structural analysis in agricultural
Authorized licensed use limited to: Idaho State University. Downloaded on March 11,2025 at 04:14:04 UTC from IEEE Xplore. Restrictions apply.
(a) Original image. (b) Enhanced image. (c) Binary image.
., · ... . :.,
.,··.
·. 61 ·
. .: ;..)
�)
·- .. ·c:)··,
. :<i>- .. ': ·. -. . .:
· · · · 0 / . _ ·. - _ ;
·- -· .
v· . ¢!' :: .. : :..-
.. r.,.,
.tJ .··.- . e;J:_· _. .
·•·......
'·"'=>. 6, · GL •.·.
.•
I, -- ..
.,·,
-
. ·QO
. 0-., .;, .::·�} ':-�.-· <.�-:-·
. .
. .. . . . . , .,. / . .. :·
. ·�·.? .-'
. ;
;·.-o ·:. ; '.·.:. . ·•··. gi,<
. ..,
.<Sf ... 0 -· > ..
·:
'().
I ' .
.. ,r�..
-....
...,
.,.. . •
;/. \. ·o .
J#,··
.. ··-(";�
-
Fig. 1: Step-by-step process of enhancing and annotating corn stalk images. (a) Original micro-CT image of the corn stalk,
(b) Enhanced image by converting it to grayscale image to improve visibility of features, (c) Binary image generation with
thresholding techniques for segmentation, (d) Cropped and centered corn stalk image with morphological operations, (e) The
corn stalk image with annotated vascular bundles using Photoshop (indicating with red marks), and (f) Generated mask of the
vascular bundles (separated the red marks from the previous image.
studies. Once converted, the images undergo inversion—an transparency where non-essential parts of the image can be
enhancement technique that increases contrast by inverting the made transparent, focusing visual attention on critical areas.
pixel values. In this way, lighter areas become darker and vice Morphological opening is used first to remove small objects
versa, which is especially beneficial for emphasizing specific from the background and morphological closing is applied to
structures such as vascular bundles in the cornstalks, making close small holes within the foreground objects by dilating first
them more distinguishable for accurate segmentation. and then eroding. Later, we cropped the images to focus on
Following the initial preprocessing, we have applied two the relevant parts of the image by removing areas that are fully
combined thresholding techniques, adaptive thresholding and transparent, thus reducing file size and focusing attention on
Otsu’s thresholding methods, which are particularly adept at the areas of interest. The whole process of data preparation
determining an optimal threshold value automatically. The before incorporating it in the U-Net architecture has been
thresholding technique is used to convert the grayscale im- shown in Fig. 1.
ages into binary images, where pixels are either black or
white, based on whether they meet the threshold criterion. B. Mask Generation in Photoshop
Later on, Gaussian blurring is applied to the images. This The process of generating masks in Photoshop involves a
technique employs a Gaussian function to smooth out the series of precise steps to segment vascular bundles accurately.
image, effectively reducing noise and minor imperfections Initially, each vascular bundle is manually selected within the
while preserving essential details within specified regions. binary image. The background is removed after the selection
To refine the image quality further, advanced morphological to achieve a transparent area around the selected regions.
operations are introduced, specifically opening and closing To enhance the visibility of the segmentation, a red border
operations. These are performed after converting the images is created around the selected areas. This is accomplished
to RGBA format, which supports transparency manipula- by navigating to ”Select” ⇒ ”Modify” ⇒ ”Expand” and
tion. The RGBA format allows for sophisticated handling of expanding the selection by 1 pixel. Subsequently, the expanded
Authorized licensed use limited to: Idaho State University. Downloaded on March 11,2025 at 04:14:04 UTC from IEEE Xplore. Restrictions apply.
Input Parameters Output Validations
Fig. 2: A Proposed workflow for U-Net architecture on corn stalk for vascular bundles identification.
selection is recolored to red, and a new layer is created to the accuracy and effectiveness of the model. First, the test
apply this modification. Finally, the new layer is positioned accuracy (%) by overlapping pixel by pixel measures the
to eliminate the background image, resulting in a mask that percentage of correctly predicted pixels by comparing the
clearly delineates the vascular bundles with a red border on a predicted mask to the ground truth mask. This is calculated
transparent background. using the formula 3.
Second, the percentage of overlapping color between two
C. Manual Mask Generation pictures assesses the degree of overlap between the pre-
In order to validate the accuracy of our predictive model, dicted mask and the reference masks (manual and Photoshop-
we have performed a meticulous process of creating a manual generated) by measuring the overlapping regions of the same
mask to identify each of the vascular bundles in an image. color. This is calculated using the formula 4.
Using the Paint application on Windows, we identified and
annotated the vascular bundles on an image that has been Number of Correctly Predicted Pixels
centered and cropped of a binary image from the raw image to Accuracy = × 100
Total Number of Pixels
ensure precision. This painstaking process took approximately (3)
55 minutes for a single image.
Authorized licensed use limited to: Idaho State University. Downloaded on March 11,2025 at 04:14:04 UTC from IEEE Xplore. Restrictions apply.
True Mask Predicted Mask
0 0
200 200
400 400
600 600
800 800
0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200
Fig. 3: Comparison of true mask (generated with Photoshop) and model predicted mask of a corn stalk.
predicted model mask with the manual mask and the pre- R EFERENCES
dicted model mask with the Photoshop mask. The comparison [1] P. McKendry, “Energy production from biomass (part 1): Overview of
between the manual mask and the Photoshop mask yielded a biomass,” Bioresource Technology, vol. 83, no. 1, pp. 37–46, 2002.
95.92% overlap, demonstrating the high similarity between the [2] V. Thaore, D. Chadwick, and N. Shah, “Sustainable production of chemi-
cal intermediates for nylon manufacture: A techno-economic analysis for
two, despite the manual mask requiring significantly more time renewable production of caprolactone,” Chemical Engineering Research
to generate. The comparison between the Photoshop-generated and Design, vol. 135, pp. 140–152, 2018.
mask and the model-predicted mask showed a 96.21% overlap, [3] A. Hussain, S. M. Arif, and M. Aslam, “Emerging renewable and
sustainable energy technologies: State of the art,” Renewable and
further validating the model’s robustness and consistency Sustainable Energy Reviews, vol. 71, pp. 12–28, 2017.
with high-quality references. Lastly, the comparison between [4] S. Panthapulakkal and M. Sain, “The use of wheat straw fibres as
the model-predicted mask and the manual mask revealed a reinforcements in composites,” in Biofiber Reinforcements in Composite
Materials. Elsevier, 2015, pp. 423–453.
96.49% overlap, indicating a high degree of alignment between [5] D. Pennington, “Bioenergy crops,” in Bioenergy. Elsevier, 2020, pp.
the model’s predictions and expert annotations. These high 133–155.
percentages of overlap suggest that the U-Net model is highly [6] S. Viamajala, M. J. Selig, T. B. Vinzant, M. P. Tucker, M. E. Himmel,
J. D. McMillan, and S. R. Decker, “Catalyst transport in corn stover in-
effective in accurately identifying and segmenting corn stalk ternodes elucidating transport mechanisms using direct blue-I,” Applied
regions, demonstrating its precision and reliability in real- Biochemistry and Biotechnology, vol. 130, pp. 509–527, 2006.
world applications. [7] M. Mazaheri, M. Heckwolf, B. Vaillancourt, J. L. Gage et al., “Genome-
wide association analysis of stalk biomass and anatomical traits in
maize,” BMC Plant Biology, vol. 19, no. 1, 2019.
TABLE I: Comparison of mask overlap percentages. [8] P. Song, J. Wang, X. Guo, W. Yang, and C. Zhao, “High-throughput
phenotyping: Breaking through the bottleneck in future crop breeding,”
Percentage of
Comparison of Mask Overlap Percentages The Crop Journal, vol. 9, no. 3, pp. 633–645, 2021.
Overlapping Pixels [9] A. Y. Yuan, Y. Gao, L. Peng, L. Zhou, J. Liu, S. Zhu, and W. Song, “Hy-
Manual Mask vs. Photoshop Mask 95.92% brid deep learning network for vascular segmentation in photoacoustic
Photoshop Mask vs. Model Predicted Mask 96.21% imaging,” Biomedical Optics Express, vol. 11, no. 11, p. 6445, 10 2020.
[10] Y. A. Oduntan, C. J. Stubbs, and D. J. Robertson, “High throughput
Model Predicted Mask vs. Manual Mask 96.49%
phenotyping of cross-sectional morphology to assess stalk lodging
resistance,” Plant Methods, vol. 18, article no. 1, 2022.
[11] L. V. Doorselaer, P. Verboven, and B. Nicolai, “Automatic 3D cell
VI. C ONCLUSION segmentation of fruit parenchyma tissue from X-ray micro CT images
The U-Net architecture has proven to be highly effec- using deep learning,” Plant Methods, vol. 20, article no. 12, 2024.
[12] J. Du, Y. Zhang, X. Lu, M. Zhang, J. Wang, S. Liao, X. Guo, and
tive in segmenting vascular bundles in corn stalk images, C. Zhao, “A deep learning-integrated phenotyping pipeline for vascular
as evidenced by the high accuracy and significant overlap bundle phenotypes and its application in evaluating sap flow in the maize
percentages with manual and Photoshop-generated masks. The stem,” The Crop Journal, vol. 10, no. 5, pp. 1424–1434, 2022.
[13] Y. Lu, R. Wang et al., “Nondestructive 3D phenotyping method of
model achieved a test accuracy of 97.98%, demonstrating passion fruit based on X-ray micro-computed tomography and deep
robust performance and precise segmentation capabilities. The learning,” Frontiers in Plant Science, vol. 13, article no. 1087904, 2023.
advanced preprocessing techniques employed, such as adap- [14] Q. Zhong, J. Zhang, Y. Xu, M. Li, B. Shen, W. Tao, and Q. Li,
“Filamentous target segmentation of weft micro-CT image based on
tive thresholding and morphological operations, significantly U-Net,” Micron, vol. 146,a rticle no. 102923, 2021.
enhanced the quality of the segmentation outputs. The consis- [15] X. Lin, Y. Fu, G. Ren, X. Yang, W. Duan, Y. Chen, and Q. Zhang,
tency between the predicted mask output of the model and the “Micro-computed tomography-guided artificial intelligence for pulp
cavity and tooth segmentation on cone-beam computed tomography,”
manual annotations confirms the reliability and applicability Journal of endodontics, vol. 47, no. 12, pp. 1933–1941, 2021.
of the U-Net model in practical agricultural research and [16] “Morphological image processing.” [Online]. Available:
precision farming. https://www.cs.auckland.ac.nz/courses/compsci773s1c/lectures.
Authorized licensed use limited to: Idaho State University. Downloaded on March 11,2025 at 04:14:04 UTC from IEEE Xplore. Restrictions apply.