1 Detection Segmentation

Uploaded by

Rohit Ranit

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

1 Detection Segmentation

Uploaded by

Rohit Ranit

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

1

MinneApple: A Benchmark Dataset for Apple

Detection and Segmentation
Nicolai Häni1 , Pravakar Roy1 and Volkan Isler1

Abstract—In this work, we present a new dataset to advance

the state-of-the-art in fruit detection, segmentation, and counting
in orchard environments. While there has been significant recent
interest in solving these problems, the lack of a unified dataset
has made it difficult to compare results. We hope to enable
arXiv:1909.06441v2 [cs.CV] 3 Jan 2020

direct comparisons by providing a large variety of high-resolution

images acquired in orchards, together with human annotations
of the fruit on trees. The fruits are labeled using polygonal
masks for each object instance to aid in precise object detection,
localization, and segmentation. Additionally, we provide data for
patch-based counting of clustered fruits. Our dataset contains
over 41’0000 annotated object instances in 1000 images. We
present a detailed overview of the dataset together with baseline
performance analysis for bounding box detection, segmentation,
and fruit counting as well as representative results for yield (a) Detection (b) Segmentation (c) Counting
estimation. We make this dataset publicly available and host
a CodaLab challenge to encourage a comparison of results on
a common dataset. To download the data and learn more about
the MinneApple dataset, please see the project website: http:
//rsn.cs.umn.edu/index.php/MinneApple. Up to date information
is available online.
Index Terms—Agricultural Automation; Robotics in Agricul-
ture and Forestry; Object Detection, Segmentation and Catego-
rization

I. I NTRODUCTION (d) Sample images

D ETECTION, counting, and localization of fruits in or-

chards are important tasks in agricultural automation.
They allow farmers to manage and optimize resources and
Fig. 1: MinneApple contains precise semantic object instance
annotations, from which one can extract bounding boxes for
detection 1a and semantic labels 1b. We also provide an addi-
make informed decisions during harvest. Fruit detection and tional dataset to evaluate patch based counting of overlapping
localization are also precursors to automated fruit picking, fruits 1c. MinneApple contains data from 17 different tree
which is one of the most labor-intensive processes [2]. rows sporting large variety 1d.
Researchers have used a variety of sensor technologies and
algorithms to tackle fruit detection, but cameras together with
computer vision techniques are the most common. Unfortu-
not compare these approaches directly with each other. In this
nately, using computer vision techniques in outdoor orchard
paper, we introduce a new dataset and benchmark evaluation
settings comes with a unique set of challenges: 1) varying
suit for apple detection, segmentation, and counting in orchard
illumination conditions, 2) varying appearances of fruits on
settings together with analysis of baseline algorithms.
trees, 3) and fruit occlusion by foliage, branches or other fruits
(see Figure 1d). While early detection methods primarily relied Benchmark datasets have been popular and at the forefront
on hand-designed features [12] recent years have seen the of progress in computer vision. Many of the popular datasets
incorporation of deep learning methods [29], [1], [3], [27], in computer vision [5], [10], [20] contain a large number
[13]. However, due to the lack of standardized benchmark of images and categories. The COCO dataset, for example,
datasets and testing metrics for precision agriculture, one can contains a category for apples. However, an apple detector
trained on this dataset will perform poorly in orchards, since
*This work was supported by the USDA NIFA MIN-98-G02. The authors the dataset was created to detect apples in general settings.
acknowledge the Minnesota Supercomputing Institute (MSI) at the University
of Minnesota for providing resources that contributed to the research results
These popular datasets additionally contain a small number
reported within this paper. of instances per image and contain iconic images, where the
1 The authors are with the Department of Computer Science
objects to be detected take up a large portion of the image, all
and Engineering, University of Minnesota, Minneapolis, MN,
55455, USA [email protected], [email protected],
of which hurts detection performance.
[email protected] In contrast, MinneApple contains 1000 images with over
2

41, 000 labeled instances of apples. The object instances are used a combination of NRI and RGB images of fruits in
small compared to the image size, and a single image may indoor environments. Their dataset contains only 122 images
contain between 1 and 120 objects. We collected data from from which training and test data are extracted. Bargoti and
multiple fruit varieties over two years, to create the largest Underwood [1] used a similar network for apple detection.
and most diverse dataset of its kind. We hope that this dataset They released their dataset of roughly 1000 image crops that
will provide an important stepping stone in advancing the field they used for training and testing. The images are of size
of precision agriculture. 308 × 202 pixels with circular annotation of the fruits. Stein et
The rest of the paper is organized as follows: In section II, al. [30] used a Faster RCNN network to detect mango fruits.
we introduce current datasets and testing methods, as well as [3] used a Fully Convolutional Network (FCN) to compute
some of the algorithms used as baselines. Then we introduce feature maps. Integrating these feature maps gives them a yield
the dataset and annotation procedure in section III. Section V estimate. They split the dataset of 71 images 50/50 between
contains the dataset statistics and we evaluate benchmark training and testing. In our previous work [26], [27], [28],
algorithms in section VI. [13] we presented results on HD sized images, showing parts
of an orchard row. We presented multiple methods including
II. R ELATED W ORK semi-supervised Gaussian Mixture Model (GMM) a Faster R-
CNN object detector and a semantic segmentation network.
Many computer vision techniques rely on large datasets
The training dataset contained 100 images, and the test set
for training, testing, and comparing different approaches to
contained 207 images. In contrast, we have increased the size
a given problem. They not only provide the means to train
of the dataset by a factor of ×3.5 for this work.
and evaluate new algorithms but encourage direct comparison
of results. Ultimately, they provide the means for researchers
to tackle new and more challenging research problems. The B. Fruit counting
ImageNet [5], Pascal VOC [10] and the COCO [20] datasets After detection of the fruits, they need to be counted.
have made millions of labeled images available to the public Rahnemoonfar and Sheppard [22] used synthetic data to train a
and enabled breakthroughs in image classification and ob- network to classify images according to fruit counts. They test
ject segmentation. Similarly, researchers released specialized their approach on 100 annotated images. Chen et al. [3] used a
datasets for autonomous driving [4], [31], [11] or pedestrian fully convolutional network together with a regression head for
detection [9], [6]. While precision automation and automated counting. They used a total of 71 orange- and 21 apple images
yield mapping have seen much research effort [1], [30], [26], from which they extracted image patches for training. Roy and
[3], [14], [13], each of these papers used their own datasets Isler [27] proposed an unsupervised counting method based on
of varying completeness and level of detail. Gaussian Mixture Models. They used a manually annotated
dataset of 440 images for testing. In our previous work [14],
A. Fruit detection we used a neural network to count clustered fruits. We trained
a network on 13000 patches and tested our approach on 4
The first step in a yield estimation or fruit picking pipeline different datasets with a total of 2800 images.
is the detection of the fruit. Early methods mostly relied
on static color thresholds for detection. The limitations of
these methods were often compensated by adding additional C. Comparison of Datasets
sensors, such as thermal- or Near Infrared (NIR) cameras. Table I summarizes the problem with current fruit detec-
Gongal et al. [12] offer a comprehensive overview of these tion and counting datasets. Because labeling effort is time-
early detection methods. More recent papers used object consuming and costly, researchers have focused on small
detection networks to detect fruits [29], [1], [13]. Sa et al. [29] datasets with little if any in dataset variety. Acquired images

TABLE I: Comparison of datasets used in recent research papers on apple detection and counting.

Fruit Detection type # train images # test images # annotations # scenes resolution ground truth public
Bargoti and Underwood [1] outdoor 729 112 5765 1 308 × 202 circles yes
Stein et al. [30] outdoor 1154 250 7065 1 500 × 500 circles yes
Sa et al. [29] indoor 100 22 359 1 1296 × 964 boxes yes
Liu et al. [21] outdoor 100 - - 1 1920 × 1200 boxes no
MinneApple (ours) outdoor 670 331 41,325 17 1280 × 720 polygons yes

Fruit Counting type # train images # test images # scenes resolution public
Chen et al. [3] outdoor 47 45 1 1280 × 960 no
Rahnemoonfar and Sheppard [22] synthetic 24,000 2400 1 128 × 128 no
MinneApple (ours) outdoor 64597 5764 6 varying yes
3

are chopped into smaller chunks to increase the dataset size different stages of the ripening cycle. The datasets were taken
artificially. These crops show only a small portion of the either from the sunny or shady side of the tree row, and
original image, which shows in the small number of labeled we spread out data capture over multiple days to get more
fruits. While this technique increases dataset size, it does varied illumination conditions. See Figure 2 for samples of
not increase dataset variation, and the developed methods are the annotated images in our dataset.
prone to overfitting. Another issue is that the whole datasets Training Sets: We sampled ten datasets from six different
are split into training/testing. Such splits lead to in-dataset tree rows for training purposes. Dataset show either the front
testing, which makes it impossible to analyze an algorithm on (sunny) or back (shady) side of a tree row. From these ten
its generalization capabilities. datasets, we randomly selected and annotated 670 images
The MinneApple dataset tries to correct these problems. of resolution 1280 × 720 pixels. All of these datasets were
The dataset contains only full resolution images. Data for the acquired in 2015 at the HRC, and they contain different apple
train/test splits are taken from different tree rows and different varieties, fruits across different growing stages and a variety
years. We included a variety of apple species and illumination of tree shapes.
conditions to avoid overfitting. The MinneApple dataset gives Test Sets: To evaluate detection/segmentation and yield
researchers a tool to test their algorithms in an unbiased way estimation performance; we arbitrarily chose four different
and compare to other approaches. sections of the orchard. We collected seven videos from these
four segments in 2016. Acquiring datasets during different
III. I MAGE C OLLECTION years guarantees the independence of the test set. Additionally,
we collected yield estimation ground truth for three tree
The data for this paper were collected at the University
rows by hand collecting and counting per tree yield and
of Minnesota’s Horticultural Research Center (HRC) between
by measuring fruit diameters after harvest. Yield estimation
June 2015 and September 2016. Since this is a university
includes additional steps, such as fruit tracking and tree row
orchard, used for phenotyping research, it is home to a large
merging. Since we do not include a baseline for tracking,
variety of apple tree species. We collected video footage from
we provide only anecdotal results for yield estimation in this
different sections of the orchard using a standard Samsung
paper.
Galaxy S4 cell phone. During data collection, we acquired
video footage by facing the camera horizontally at a single
side of a tree row and moving (by foot) along the tree row B. Counting Datasets
with approximately 1 m/s. Moving the camera at slow speeds
mitigates motion blur effects. We then extracted every fifth Training Sets: We provide two annotated datasets to
image from these video sequences. For the test datasets, we train patch-based counting approaches. One of these datasets
extracted every 30th image. contains green, and one contains red apples, and both were
acquired in 2015. Both datasets were obtained from the sunny
side of the tree row. In total, we obtained 13000 image patches,
A. Detection and Segmentation Datasets which we annotated manually with a ground truth count.
For detection and localization of the fruits, we collected 17 Additionally, we extracted 4500 patches at random that do not
different datasets over two years, ten for training and seven contain apples as negative examples. See Figure 2 for samples
for evaluation. We included fruits of different colors and at of the annotated images in our dataset.

Detection (train) Detection (test) Counting (train/test)

3 2

4 1

3 2

Fig. 2: Samples of annotated images of the detection, segmentation and counting datasets. The detection/segmentation datasets
are annotated with object instance masks, while the counting dataset contains image patches and a corresponding ground truth
count.
4

Test Sets: The test dataset consists of a total of 2874 ground truth integer, representing the count. Due to the small
image patches taken from four image sequences. Two of the resolution these patches and the large volume, the annotation
test datasets contain red apples, one contains greens, and one task proved to be error-prone. We had two different workers
contains a mixture of colors. Additionally, we acquired the annotate each image, and disparities were resolved by a third
fourth dataset from a further distance to test the algorithms worker through a validation process.
generalization capability for counting low-resolution fruits.
V. DATASET S TATISTICS
IV. I MAGE A NNOTATION Here, we give an overview of the properties of MinneApple
We next describe how we labeled images for training in comparison to other object detection datasets. These include
and evaluation. We follow the annotation method in [13]. COCO [20], ImageNet Detection [5] and PASCAL VOC [10].
Following established evaluation protocols, annotations for Each of these datasets varies considerably in the number of
train and validation data will be released, but not for the test annotated images, image types, number of categories, number
one. To test your algorithm please submit your results online. of instances per image, and the size of the annotated objects.
Detection and segmentation: Fruits for the detection and The MS COCO dataset was created to show common objects
segmentation datasets were annotated using the excellent VGG in their natural context. The goal of the ImageNet Detection
annotator tool [8]. We used polygons to label fruits on trees dataset was to detect a large number of object categories. PAS-
in the foreground, while the ones on the ground and trees in CAL VOC contains fewer categories but focuses on objects in
the background were not tagged. Additionally, we labeled the natural images. Our MinneApple dataset, on the other hand,
tree trunks where visible. Please note, that we only provide focuses on detecting many small objects in highly cluttered
annotations for fully or partially visible fruits. Each of the environments.
objects in the scene was then categorized into fruit or tree A summary of the datasets showing the number of instances
trunk. We used an internally recruited workforce for the per category is shown in Figure 3a. MinneApple contains
instance labeling task. Due to the large number of instances fewer categories but far more instances per category than
per image, the small object size and the many occlusions of most datasets. In this it is comparable to other specialized
fruits instances, labeling is an arduous task. Labeling a single datasets such as the Caltech Pedestrian Detection [7] dataset
image takes up to 30 minutes, which translates to roughly 18 or the KITTI [11] dataset. Figure 3b shows the number of
work-hours per 1000 instances. As such, we chose to assign instances per image in comparison to other datasets. MinneAp-
each image to only a single worker for labeling. Each worker is ple contains 1.5 categories and 41.2 instances on average per
instructed in proper labeling techniques before they can begin image. In contrast, the COCO dataset has 3.5 categories and
to annotate. After the worker annotated the first ten image 7.7 instances, and the ImageNet and PASCAL VOC datasets
frames, we conducted an in-person review to give feedback both have less than two categories and three instances per
and issue the first round of corrections. While the instruction image on average. The spread of the number of instances per
and initial feedback improved annotation quality considerably, image compared to COCO, ImageNet, and PASCAL VOC,
we performed an additional verification step to correct each is also more extensive. The MinneApple dataset can contain
object instance if necessary. between 1 and 120 object instances, while the other datasets
Patch-based counting: For the patch-based counting have maximally 15.
method we used a semi-supervised GMM detector [27] to Finally, we analyze the average size of objects in the dataset.
detect image patches that are likely to contain fruits. The In general, smaller objects are harder to detect and require
patches were cropped and annotated by hand with a single specialized network structures [17]. For COCO, PASCAL

Number of Categories vs number of intances Instance per image

1000000 60%
Caltech Ped
100000 50%
KITTI
Instances per category

COCO
MinneApple (ours)
Percentage of images

10000 40%
PASCAL VOC ImageNet Detection COCO
ImageNet Detection
1000 30%
SUN PASCAL VOC
Caltech 256
MinneApple (ours)
100 20%
Caltech 101
10 10%

1 0%
1 10 100 1000 10000 1 11 21 31 41 51 61 71 81 91 101 111 121
Number of categories Number of instances

(a) (b)

Fig. 3: a) Number of annotated instances per category for some common datasets in comparison. b) Distribution of number of
annotated object instances per image for COCO, ImageNet Detection, PASCAL VOC compared with MinneApple. While the
other datasets contain mainly 1-5 objects, ours contains up to 120 instances per image.
5

(322 ≥ object area ≥ 962 ) and large objects (area > 962 ).
We evaluate three different models.
Faster RCNN: The latest implementation of Faster
RCNN [23] with a ResNet-50 backbone. The network uses
pretrained COCO weights for initialization. Faster RCNN con-
sists of a region proposal head and two branches for bounding
box regression and classification. We used parameters in the
paper for optimization.
Tiled Faster RCNN: A reimplementation of Bargoti and
Underwoods [1] proposed model. Due to memory constraints,
Fig. 4 they split the training images into 500 × 500 pixel chunks,
with an overlap of 50 pixels. The detections of the individual
Fig. 5: Area distribution of the objects in our dataset. The chunks are aggregated and filtered using non-maximum sup-
dataset contains mainly small object instances with area < 502 pression. We added a ResNet-50 backbone, a Feature Pyramid
pixels (FPN) [18] head for region proposal to the network and trained
it with focal loss [19]. The network was initialized with
weights pretrained on COCO [20]. We follow [1], [13] in our
VOC and ImageNet Detection, roughly 50% of all objects choice of parameters for optimization.
occupy no more than 10% of the image itself. The other 50%
Mask RCNN: Implementation of a Mask RCNN [15] with
contains objects that occupy between 10 and 100% of the
a ResNet-50 backbone, pretrained on the COCO dataset. In
image (evenly distributed). Our MinneApple dataset contains
addition to using bounding box inputs as Faster RCNN, Mask
almost exclusively small instances. The average object size is
RCNN has an additional branch predicting the object instance
only 40 × 40 pixels in an image of 1280 × 720 pixels, making
mask.
up only 0.17% of the original image size.
If we compare the bounding box detection results in Ta-
ble II, we find that processing the image in a tiled fashion
VI. A LGORITHMIC A NALYSIS performs worse than the detectors operating on the whole
We run a set of state-of-the-art algorithms on each the tasks image. We hypothesize that this is due to the additional
of object detection, segmentation, and counting to establish a filtering step at the end, where non-maximum suppression is
common baseline for future work. used to filter out overlapping bounding boxes. If we compare
the two state of the art object detectors, we find that Faster
RCNN slightly outperforms Mask RCNN. This is somewhat
A. Detection and Segmentation Baselines unexpected since Mask RCNN has access to additional in-
For the following experiments, we take a subset of 600 formation (the instance masks). Further, we find that all the
images from our dataset for training. The leftover 30 images detectors struggle on smaller object instances. Future research
are used for validation during training. We test each algorithm should focus on improving the object detection performance
individually on each of the 331 test images and report average on small and medium-sized objects to achieve significant gains
performance over the test dataset. in overall performance. Our findings confirm Hoiem et al. [17],
Detection evaluation metrics: For bounding box detection, which found that object size is one of the main error factors
we follow established evaluation protocols used by other ob- in object detection.
ject detection datasets [10], [20]. We report Average Precision Semantic segmentation evaluation metrics: While bound-
(AP) as our main evaluation metric. Namely, we use AP ing box prediction is the method predominantly used for
starting at Intersection over Union (IoU) threshold 0.5 and object detection; we recognize that there exist other methods
increase it in intervals of 0.05 up to 0.95 (shorthand notation which only achieve bounding box prediction after an addi-
is [email protected]:0.05:0.95). Additionally, we provide [email protected] and tional post-processing step. These methods include mainly
[email protected]. Since our dataset contains many small objects, we detection through semantic segmentation. To avoid explicit
report AP scores for small (object area < 322 pixels), middle bias towards bounding box prediction methods, we introduce

TABLE II: Fruit detection benchmark results for object detection approaches. Higher numbers are better and the bold marked
numbers indicate the highest performing approach.

Metric
AP [@ 0.5:0.05:0.95] AP [@ 0.5] AP [@ 0.75] AP [small] AP [middle] AP [large]
Tiled FRCNN [1] 0.341 0.639 0.339 0.197 0.519 0.208
Method

Faster RCNN [23] 0.438 0.775 0.455 0.297 0.578 0.871

Mask RCNN [15] 0.433 0.763 0.449 0.295 0.571 0.809
6

separate benchmark algorithms for semantic segmentation. For counting. We report baseline results for approaches which
evaluation, we follow the established metrics used by the were previously published in [14] for completion. We evaluate
COCO dataset [20]. We report Intersection over Union (IoU) two approaches. GMM: An unsupervised method based on
as the primary challenge metric. Additionally, we report class Gaussian Mixture Models. This method fits a mixture of
IoU for apples, pixel accuracy, and class accuracy for apple Gaussians probability distribution to a previously segmented
pixels. We evaluate four different models. image. CNN This method uses a network to classify the
Semi-supervised GMM: A semi-supervised clustering fruits into k distinct classes. The network is based on a
method based on Gaussian Mixture Models (GMM), devel- ResNet50 [16] backbone, and we choose to classify six classes.
oped by Roy and Isler [26]. The model is pretrained on an Table IV shows the counting accuracy. The ResNet50 network
unlabeled dataset, different from the ones contained in the outperforms the GMM model on all of the test sets. How-
train and test sets. ever, the network exhibits considerable variation in counting
User-supervised GMM: The same model as in the semi- performance. Dataset 1 and 3 contain red and mixed apples.
supervised case. The method uses human supervision to create Dataset 4 contains red apples, but the images were acquired
a single model per tree row in the test set. from further away. The best performance is achieved on test
UNet (not pretrained): A semantic segmentation network, dataset 3, which contains green apples. We believe that this is
based on a fully convolutional network architecture [24], [13]. the case because the green fruits show considerably less color
The images in the train and test sets are split into 224 × 224 variation than the red and mixed fruits. The GMM method
sized chunks, and the weights of the network are initialized performs best on test dataset 1, as this dataset is closest to the
randomly. GMM training data. For an in-depth analysis and qualitative
UNet (pretrained): the same model as the one before, comparison, we refer the reader to [14].
but the weights are initialized from a pretrained ImageNet
network. TABLE IV: Fruit cluster counting benchmark results.

TABLE III: Fruit detection benchmark results for semantic Method Dataset 1 Dataset 2 Dataset 3 Dataset 4
segmentation approaches. Higher numbers are better and the GMM [27] 88.0 % 81.8 % 77.2 % 76.1 %
bold marked numbers indicate the highest performing ap-
CNN [14] 88.8 % 92.68 % 95.1 % 88.5 %
proach.
Metric
IoU Class IoU Pixel Acc. Class Acc.
C. Yield estimation
Semi-supervised
0.635 0.341 0.968 0.455 Fruit detection and counting are integral to solving the
GMM [28]
problem of yield estimation. However, yield estimation con-
User-supervised tains additional steps to map detections and counts to tree
0.649 0.455 0.959 0.634
GMM [28] row yield. For one, we need to track fruits across the image
Method

UNet [13] sequence to avoid double counting. We address tracking fruits

0.678 0.397 0.960 0.818
(no pretraining) across images in [26], [27], [28]. Due to the planar structure
UNet [13] of modern apple orchards, fruits can be seen from both
0.685 0.410 0.962 0.848 sides of the tree row. We propose a solution to this problem
(pre-trained)
in [25]. Table V shows yield estimation results using these
tracking components together with the unsupervised GMM
If we compare the average performance of all methods detection method and the CNN counting. These results have
in Table III, we see that UNet with pretrained weights out- been previously published in [13], and we mention them here
performs all others. Only in the class IoU case, the user- for completion. Using this combination of components, we
supervised GMM method outperforms the UNet. These results achieve between 95.5 and 97.8% accuracy with respect to the
are in direct contrast to our previous work [13], where the user- harvested ground truth.
supervised GMM outperformed UNet. Keep in mind though,
that for this work the amount of training data increased by TABLE V: Yield estimation results in terms of fruit counts.
almost one order of magnitude. These results indicate that
Merged fruit counts Summed fruit counts
the deep learning network previously underperformed due to Harvested
from both sides single sides
a lack of data. We further find that using pretrained weights fruit counts
GMM [28] CNN [13] GMM [28] CNN [13]
improves performance slightly. We hypothesize that this is due
256 258 348 347
to the large variety of images found in the ImageNet dataset, Dataset-1 270
(94.81%) (95.56%) (128.89%) (128.52%)
which allows the network to learn more descriptive features
252 268 411 405
during pretraining. Dataset-2 274
(91.98%) (97.81%) (150%) (147.81%)
392 405 422 430
B. Patch-based Fruit Counting Baselines Dataset 3 414
(94.68%) (97.83%) (101.93%) (103.86%)
Next to the benchmark dataset for object detection and
segmentation, we provide a dataset for patch-based fruit
7

VII. C ONCLUSION [11] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous
driving? The KITTI vision benchmark suite. In 2012 IEEE Confer-
We introduced a new dataset for detecting and segmenting ence on Computer Vision and Pattern Recognition, pages 3354–3361,
apples in orchards and a second dataset for counting clustered Providence, RI, June 2012. IEEE.
[12] A. Gongal, S. Amatya, M. Karkee, Q. Zhang, and K. Lewis. Sensors
fruits. With this collection of annotated object instances, we and systems for fruit detection and localization: A review. Computers
hope to help the advancement of object detection, segmenta- and Electronics in Agriculture, 116:8–19, Aug. 2015.
tion, and counting of small objects in cluttered environments. [13] N. Häni, P. Roy, and V. Isler. A comparative study of fruit detection
and counting methods for yield mapping in apple orchards. Journal of
In creating this dataset, we wanted to emphasize the need Field Robotics, Aug. 2019.
for a diverse and unbiased dataset, containing a large number [14] N. Häni, P. Roy, and I. Volkan. Apple Counting using Convolutional
of object instances and apple varieties between the individual Neural Networks. In Intelligent Robots and Systems (IROS), 2018 IEEE
International Conference on. IEEE, 2018.
tree rows. Dataset statistics and results from the baseline algo- [15] K. He, G. Gkioxari, P. Dollr, and R. Girshick. Mask R-CNN.
rithms indicate that the images contain challenging scenarios arXiv:1703.06870 [cs], Mar. 2017. arXiv: 1703.06870.
[16] K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image
for current state-of-the-art object detection algorithms. Recognition. In 2016 IEEE Conference on Computer Vision and Pattern
There are several promising directions for future work Recognition (CVPR), pages 770–778, Las Vegas, NV, USA, June 2016.
to improve the performance of detection and counting al- IEEE.
[17] D. Hoiem, Y. Chodpathumwan, and Q. Dai. Diagnosing Error in Object
gorithms using this dataset. Our analysis of state-of-the-art Detectors. In D. Hutchison, T. Kanade, J. Kittler, J. M. Kleinberg,
object detectors indicates that networks can gain in accuracy F. Mattern, J. C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Ran-
by putting a broader focus on small object instances (area gan, B. Steffen, M. Sudan, D. Terzopoulos, D. Tygar, M. Y. Vardi,
G. Weikum, A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and
< 322 pixels). Similarly, semantic segmentation networks may C. Schmid, editors, Computer Vision ECCV 2012, volume 7574, pages
employ weighting schemes to address the class imbalance 340–353. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.
between foreground (object instances) and background pixels. [18] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie.
Feature Pyramid Networks for Object Detection. In 2017 IEEE Con-
We hope that our dataset will help computer vision researchers ference on Computer Vision and Pattern Recognition (CVPR), pages
working on fruit detection. 936–944, Honolulu, HI, July 2017. IEEE.
[19] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. Focal Loss for
To download and learn more about MinneApple please Dense Object Detection. In 2017 IEEE International Conference on
see the project website: http://rsn.cs.umn.edu/index.php/ Computer Vision (ICCV), pages 2999–3007, Venice, Oct. 2017. IEEE.
MinneApple. [20] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays,
P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollr. Microsoft COCO:
Common Objects in Context. arXiv:1405.0312 [cs], May 2014. arXiv:
VIII. ACKNOWLEDGEMENTS 1405.0312.
[21] X. Liu, S. W. Chen, S. Aditya, N. Sivakumar, S. Dcunha, C. Qu, C. J.
This work was supported by the USDA NIFA MIN-98-G02, Taylor, J. Das, and V. Kumar. Robust Fruit Counting: Combining Deep
and UMN MnDrive. The authors acknowledge the Minnesota Learning, Tracking, and Structure from Motion. In 2018 IEEE/RSJ
Supercomputing Institute (MSI) at the University of Minnesota International Conference on Intelligent Robots and Systems (IROS),
pages 1045–1052, Madrid, Oct. 2018. IEEE.
for providing resources that contributed to the research results [22] Maryam Rahnemoonfar and Clay Sheppard. Deep Count: Fruit Counting
reported within this paper http://www.msi.umn.edu. Based on Deep Simulated Learning. Sensors, 17(12):905, Apr. 2017.
[23] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards Real-
Time Object Detection with Region Proposal Networks. In C. Cortes,
R EFERENCES N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors,
[1] S. Bargoti and J. Underwood. Deep fruit detection in orchards. In Advances in Neural Information Processing Systems 28, pages 91–99.
Robotics and Automation (ICRA), 2017 IEEE International Conference Curran Associates, Inc., 2015.
[24] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks
on, pages 3626–3633. IEEE, 2017.
[2] L. Calvin and P. Martin. The u.s. produce industry and labor: Facing for biomedical image segmentation. In International Conference on
the future in a global economy. (1477-2017-4011):57, 2010. Medical image computing and computer-assisted intervention, pages
[3] S. W. Chen, S. S. Shivakumar, S. Dcunha, J. Das, E. Okon, C. Qu, 234–241. Springer, 2015.
C. J. Taylor, and V. Kumar. Counting Apples and Oranges With Deep [25] P. Roy, W. Dong, and V. Isler. Registering Reconstructions of the Two
Learning: A Data-Driven Approach. IEEE Robotics and Automation Sides of Fruit Tree Rows. In Intelligent Robots and Systems (IROS),
Letters, 2(2):781–788, 2017. 2018 IEEE International Conference on. IEEE, 2018.
[4] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benen- [26] P. Roy and V. Isler. Surveying apple orchards with a monocular
son, U. Franke, S. Roth, and B. Schiele. The Cityscapes Dataset for vision system. In International Conference on Automation Science and
Semantic Urban Scene Understanding. In 2016 IEEE Conference on Engineering (CASE), pages 916–921. IEEE, Aug. 2016.
Computer Vision and Pattern Recognition (CVPR), pages 3213–3223, [27] P. Roy and V. Isler. Vision-Based Apple Counting and Yield Estimation.
In D. Kuli, Y. Nakamura, O. Khatib, and G. Venture, editors, 2016
Las Vegas, NV, USA, June 2016. IEEE.
[5] J. Deng, W. Dong, R. Socher, L.-J. Li, Kai Li, and Li Fei-Fei. ImageNet: International Symposium on Experimental Robotics, volume 1, pages
A large-scale hierarchical image database. In 2009 IEEE Conference on 478–487. Springer International Publishing, Cham, 2017.
[28] P. Roy, A. Kislay, P. A. Plonski, J. Luby, and V. Isler. Vision-based
Computer Vision and Pattern Recognition, pages 248–255, Miami, FL,
June 2009. IEEE. preharvest yield mapping for apple orchards. Computers and Electronics
[6] P. Dollar, C. Wojek, B. Schiele, and P. Perona. Pedestrian detection: A in Agriculture, 164:104897, Sept. 2019.
[29] I. Sa, Z. Ge, F. Dayoub, B. Upcroft, T. Perez, and C. McCool.
benchmark. In 2009 IEEE Conference on Computer Vision and Pattern
Recognition, pages 304–311, Miami, FL, June 2009. IEEE. DeepFruits: A Fruit Detection System Using Deep Neural Networks.
[7] P. Dollar, C. Wojek, B. Schiele, and P. Perona. Pedestrian Detection: Sensors, 16(8):1222, Aug. 2016.
An Evaluation of the State of the Art. IEEE Transactions on Pattern [30] M. Stein, S. Bargoti, and J. Underwood. Image Based Mango Fruit
Detection, Localisation and Yield Estimation Using Multiple View
Analysis and Machine Intelligence, 34(4):743–761, Apr. 2012.
[8] A. Dutta, A. Gupta, and A. Zissermann. VGG Image Annotator (VIA). Geometry. Sensors, 16(11):1915, Nov. 2016.
2016. [31] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell.
[9] A. Ess, B. Leibe, K. Schindler, and L. van Gool. Robust Multiperson BDD100k: A Diverse Driving Video Database with Scalable Annotation
Tracking from a Mobile Platform. IEEE Transactions on Pattern Tooling. arXiv:1805.04687 [cs], May 2018. arXiv: 1805.04687.
Analysis and Machine Intelligence, 31(10):1831–1846, Oct. 2009.
[10] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisser-
man. The Pascal Visual Object Classes (VOC) Challenge. International
Journal of Computer Vision, 88(2):303–338, June 2010.

Creality Slicer User Manual - EN 4.8.0
100% (1)
Creality Slicer User Manual - EN 4.8.0
20 pages
3 Apple Counting Using CNN
No ratings yet
3 Apple Counting Using CNN
7 pages
Creating A Dataset and Models Based On Convolutional Neural Networks To Improve Fruit Classification
No ratings yet
Creating A Dataset and Models Based On Convolutional Neural Networks To Improve Fruit Classification
8 pages
Fruit Recognition Deep Learning
No ratings yet
Fruit Recognition Deep Learning
53 pages
DeepFruits A Fruit Detection System Using Deep Neu
No ratings yet
DeepFruits A Fruit Detection System Using Deep Neu
23 pages
Paper For Ripening Detection Using Ai and DL
No ratings yet
Paper For Ripening Detection Using Ai and DL
13 pages
Peaches Detection Using A Deep Learning Technique
No ratings yet
Peaches Detection Using A Deep Learning Technique
14 pages
Fruit Quality and Defect Image Classification With Conditional GAN Data Augmentation
No ratings yet
Fruit Quality and Defect Image Classification With Conditional GAN Data Augmentation
11 pages
Vitis Varieties
No ratings yet
Vitis Varieties
17 pages
Fruit Recognition From Images Using Deep Learning: Horea Mures An
No ratings yet
Fruit Recognition From Images Using Deep Learning: Horea Mures An
54 pages
2_comparative_study_detection_segmentation
No ratings yet
2_comparative_study_detection_segmentation
28 pages
Fruit Classification
No ratings yet
Fruit Classification
16 pages
Fruit Recognition and Grade of Disease Detection Using Inception V3 Model Base Paper 2
No ratings yet
Fruit Recognition and Grade of Disease Detection Using Inception V3 Model Base Paper 2
5 pages
Yolo Fruit Ripeness
No ratings yet
Yolo Fruit Ripeness
18 pages
A Real-Time Apple Targets Detection Method For Picking Robot Based On ShufflenetV2-YOLOX
No ratings yet
A Real-Time Apple Targets Detection Method For Picking Robot Based On ShufflenetV2-YOLOX
18 pages
Fruits Classification Using Convolutional Neural Network
No ratings yet
Fruits Classification Using Convolutional Neural Network
6 pages
Rodrigues 2021
No ratings yet
Rodrigues 2021
5 pages
Implementation of Fruits Recognition Classifier Using Convolutional Neural Network Algorithm For Observation of Accuracies For Various Hidden Layers
No ratings yet
Implementation of Fruits Recognition Classifier Using Convolutional Neural Network Algorithm For Observation of Accuracies For Various Hidden Layers
4 pages
IJPRSE_V4I5_13
No ratings yet
IJPRSE_V4I5_13
3 pages
PDF 20230627 094939 0000
No ratings yet
PDF 20230627 094939 0000
12 pages
A novel green apple segmentation algorithm based on ensemble U-Net
No ratings yet
A novel green apple segmentation algorithm based on ensemble U-Net
10 pages
Fruit Recognition From Images Using Deep Learning: Acta Universitatis Sapientiae, Informatica June 2018
No ratings yet
Fruit Recognition From Images Using Deep Learning: Acta Universitatis Sapientiae, Informatica June 2018
54 pages
Deep Learning Based Classification On Fruit Diseases An Application For Precision Agriculture
No ratings yet
Deep Learning Based Classification On Fruit Diseases An Application For Precision Agriculture
14 pages
Automated Visual Fruit Detection For Harvesting
No ratings yet
Automated Visual Fruit Detection For Harvesting
4 pages
POSTER Classification of Fruits and Detection of Disease Using CNN
No ratings yet
POSTER Classification of Fruits and Detection of Disease Using CNN
1 page
Sensors: Automatic Parameter Tuning For Adaptive Thresholding in Fruit Detection
No ratings yet
Sensors: Automatic Parameter Tuning For Adaptive Thresholding in Fruit Detection
21 pages
1 s2.0 S2352340923006248 Main
No ratings yet
1 s2.0 S2352340923006248 Main
6 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Apple Detection During Different Growth Stages in Orchards Using The Improved YOLO-V3 Model
No ratings yet
Apple Detection During Different Growth Stages in Orchards Using The Improved YOLO-V3 Model
10 pages
Fruits and Vegetables Detection Using YOLO Algorithm
No ratings yet
Fruits and Vegetables Detection Using YOLO Algorithm
8 pages
Apple Detection in Complex Scene Using The Improved Yolov4 Model
No ratings yet
Apple Detection in Complex Scene Using The Improved Yolov4 Model
15 pages
2018 Liu, Detection of Citrus Fruit and Tree Trunks in Natural Environments Using A
No ratings yet
2018 Liu, Detection of Citrus Fruit and Tree Trunks in Natural Environments Using A
8 pages
Fruit Detection and Sorting Based On Machine Learning
100% (1)
Fruit Detection and Sorting Based On Machine Learning
6 pages
Referece paper
No ratings yet
Referece paper
9 pages
Presentation 1
No ratings yet
Presentation 1
19 pages
Real-Time Apple Detection System Using Embedded Sy
No ratings yet
Real-Time Apple Detection System Using Embedded Sy
13 pages
Y20cs027 Internship
No ratings yet
Y20cs027 Internship
18 pages
Detection of Passion Fruits and Maturity
No ratings yet
Detection of Passion Fruits and Maturity
12 pages
An Automated Fruit Harvesting Robot by Using Deep Learning: Research Article Open Access
No ratings yet
An Automated Fruit Harvesting Robot by Using Deep Learning: Research Article Open Access
8 pages
Sensors 22 08192 v3
No ratings yet
Sensors 22 08192 v3
20 pages
Tieng Viet
No ratings yet
Tieng Viet
32 pages
Fruit-Classification Report
100% (1)
Fruit-Classification Report
17 pages
Automatic Fruit Classification Using Random Forest Algorithm: Abstract-The Aim of This Paper Is To Develop An Effective
No ratings yet
Automatic Fruit Classification Using Random Forest Algorithm: Abstract-The Aim of This Paper Is To Develop An Effective
5 pages
Mathematics 10 03091 v2
No ratings yet
Mathematics 10 03091 v2
26 pages
fruit dect final
No ratings yet
fruit dect final
35 pages
Applsci 13 12504
No ratings yet
Applsci 13 12504
17 pages
(IJCST-V10I3P28) :ashique Sherief, Muhammed Fayas, Adol Antony George, Adish H, Shyni Shajahan
No ratings yet
(IJCST-V10I3P28) :ashique Sherief, Muhammed Fayas, Adol Antony George, Adish H, Shyni Shajahan
5 pages
ANVI_16_Prof.Dr.Deepali+Jadhav_12_1257
No ratings yet
ANVI_16_Prof.Dr.Deepali+Jadhav_12_1257
9 pages
BFP Net Balanced Feature Pyramid Network for Small Apple
No ratings yet
BFP Net Balanced Feature Pyramid Network for Small Apple
19 pages
Final Paper
No ratings yet
Final Paper
11 pages
Classification of Fruits and Detection of Disease Using CNN
No ratings yet
Classification of Fruits and Detection of Disease Using CNN
8 pages
Evaluation of CNN Alexnet and GoogleNet For Fruit
No ratings yet
Evaluation of CNN Alexnet and GoogleNet For Fruit
8 pages
Literature Review
No ratings yet
Literature Review
11 pages
A Machine Learning Approach For Predicti
No ratings yet
A Machine Learning Approach For Predicti
5 pages
Fruit Image Classification Model Based On MobileNetV2 With Deep Transfer Learning Technique
No ratings yet
Fruit Image Classification Model Based On MobileNetV2 With Deep Transfer Learning Technique
15 pages
Fast and Accurate Detection of Kiwifruit in Orchar
No ratings yet
Fast and Accurate Detection of Kiwifruit in Orchar
24 pages
CSE 498 Research Proposal
No ratings yet
CSE 498 Research Proposal
4 pages
IJNRD2404318
No ratings yet
IJNRD2404318
5 pages
A Hybrid Deep Learning-Based Fruit Classification Using Attentionmodel
No ratings yet
A Hybrid Deep Learning-Based Fruit Classification Using Attentionmodel
12 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Ch5 Group2docx
No ratings yet
Ch5 Group2docx
3 pages
CCNA 1 v7.0 Modules 16 - 17: Building and Securing A Small Network Exam Answers 2020
No ratings yet
CCNA 1 v7.0 Modules 16 - 17: Building and Securing A Small Network Exam Answers 2020
25 pages
DLL CSS11 Week8
No ratings yet
DLL CSS11 Week8
3 pages
Thesis 2018
No ratings yet
Thesis 2018
64 pages
Application Security - Secure Software Development
No ratings yet
Application Security - Secure Software Development
7 pages
SAP Fiori
No ratings yet
SAP Fiori
18 pages
z10 Installation Manual (GC28-6864-08b)
No ratings yet
z10 Installation Manual (GC28-6864-08b)
291 pages
Introduction To Vector Embeddings and Vector Databases
No ratings yet
Introduction To Vector Embeddings and Vector Databases
11 pages
How To Setup Employee Directory
No ratings yet
How To Setup Employee Directory
3 pages
Network Interface Card (NIC) Installation and Troubleshooting
No ratings yet
Network Interface Card (NIC) Installation and Troubleshooting
9 pages
SAP Basic Transportation Functions
100% (1)
SAP Basic Transportation Functions
9 pages
Hardware Interview
No ratings yet
Hardware Interview
50 pages
KubeVirt Version 1 Planning
No ratings yet
KubeVirt Version 1 Planning
4 pages
Menor, Rosalinda A.
No ratings yet
Menor, Rosalinda A.
7 pages
CDP_and_LLDP
No ratings yet
CDP_and_LLDP
10 pages
Service Manual: 17" LCD Monitor Dell E176Fpc
No ratings yet
Service Manual: 17" LCD Monitor Dell E176Fpc
73 pages
Bcloud 365 For Telkom Sigma (Web Side)
No ratings yet
Bcloud 365 For Telkom Sigma (Web Side)
19 pages
CGMT Practical - File
No ratings yet
CGMT Practical - File
27 pages
Panasonic DP-8060/8045/8035
No ratings yet
Panasonic DP-8060/8045/8035
34 pages
Overlay Operation: - Raster Overlay - Vector Overlay
No ratings yet
Overlay Operation: - Raster Overlay - Vector Overlay
12 pages
Computer Science Project Class 12
No ratings yet
Computer Science Project Class 12
20 pages
Mastering Your SAP PI - PO Passwords With KeePass and SAPNW WindowRenamer - SAP Blogs
No ratings yet
Mastering Your SAP PI - PO Passwords With KeePass and SAPNW WindowRenamer - SAP Blogs
7 pages
IET DAVV Be - Com - It-A - Apr - 2011
No ratings yet
IET DAVV Be - Com - It-A - Apr - 2011
19 pages
Imdrf Rps WG pd1 n27r2
No ratings yet
Imdrf Rps WG pd1 n27r2
12 pages
Dice Resume CV Senthilkumar Sellappan
No ratings yet
Dice Resume CV Senthilkumar Sellappan
3 pages
Signals and Systems EEE223 Analysis of Continuous Time LTI Systems Using Convolution Integral
No ratings yet
Signals and Systems EEE223 Analysis of Continuous Time LTI Systems Using Convolution Integral
9 pages
MN04802004Z en
No ratings yet
MN04802004Z en
79 pages
WhitePaper OBIEE Maps 25032013 v01
No ratings yet
WhitePaper OBIEE Maps 25032013 v01
9 pages