PREDICTIVE HYBRIDIZATION OF
ORPHAN CROPS USING
MACHINE LEARNING MODELS
22UCS407/IDEA & DESIGN SPRINT REPORT
Submitted by
JUSTIN VARGHESE - 710723104037
KANISHKA V - 710723104040
KEVIN K R - 710723104047
MAHASMRITI S S - 710723104057
in partial fulfillment for the award of the degree
of
BACHELOR OF ENGINEERING
in
COMPUTER SCIENCE AND ENGINEERING
Dr. N.G.P INSTITUTE OF TECHNOLOGY, COIMBATORE - 641048
AN AUTONOMOUS INSTITUTION
ANNA UNIVERSITY: CHENNAI 600 025
MAY 2025
I
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this Report titled “PREDICTIVE HYBRIDIZATION OF
ORPHAN CROPS USING MACHINE LEARNING MODELS ” is the
bonafide work of JUSTIN VARGHESE (710720104037), KANISHKA V
(710720104040), KEVIN K R(710720104047) and MAHASMRITI S S
(710720104057) who carried out the work under my supervision.
SIGNATURE SIGNATURE
HEAD OF THE DEPARTMENT SUPERVISOR
Dr. D.PALANIKKUMAR M.E, Ph.D. Dr. L.SRINIVASAN M.Tech,Ph.D.
Professor & Head, Associate Professor,
Department of Computer Science and Department of Computer Science
Engineering, and Engineering,
Dr. N. G. P Institute of Technology, Dr. N. G. P Institute of Technology,
Coimbatore-641048. Coimbatore-641048.
Submitted for the End Semester Idea & Design Sprint Viva-Voce Examination held
on___________________.
____________ _____________
INTERNAL EXAMINER EXTERNAL EXAMINER
II
DECLARATION
I hereby declare that the project work entitled “PREDICTIVE HYBRIDIZATION OF
ORPHAN CROPS USINGMACHINE LEARNING MODELS” submitted to the
autonomous 22UCS407/Idea & Design sprint viva voce – June 2025 is the report of the
original project work done by me under the guidance of Dr. L. SRINIVASAN M.Tech,
Ph.D. Associate Professor, Department of Computer Science and Engineering, Dr.N.G.P
Institute of Technology, Coimbatore- 641 048.
NAME SIGNATURE
JUSTIN VARGHESE _________________________
KANISHKA V _________________________
KEVIN K R _________________________
MAHASMRITI S S _________________________
I certify that the declaration made by the above candidate is true.
Project Guide
Dr. L. SRINIVASAN M.Tech, Ph.D.
Associate Professor,
Department of Computer Science and
Engineering,
Dr. N. G. P Institute of Technology,
Coimbatore-641048.
II
ACKNOWLEDGEMENT
Words act as a gateway to express tokens of acknowledgement. First of all, we
would like to thank the supreme power, our Parents and Almighty God, who has given
us the strength and courage to complete our work successfully.
We would like to express our profound gratitude and deep sense of thanks to
Dr. Nalla G. Palaniswami MD., AB (USA), Chairman of Kovai Medical Center &
Hospital, for providing us with the necessary facilities to complete our project work
effectively.
Our heartfelt gratitude to Dr. Thavamani D Palaniswami MD., AB (USA),
Secretary of Dr. N. G. P. Institute of Technology, for his generous attitude and constant
motivation, which had been one of the sole reasons for completing our project.
We are sincerely grateful to Dr. S. U. Prabha M.E, Ph.D. Principal who has always
been a source of inspiration, well-wisher and a pillar of support for all the students in
our institution by rendering full motivation always.
We are highly indebted to Dr. D. Palanikkumar M.E, Ph.D. Professor and Head of
the Department, Department of Computer Science and Engineering, for his dedication,
keen interest and overwhelming attitude to help his students that have helped us to a
very great extent to accomplish this task.
We wish to thank our Idea & Design Sprint Coordinator Ms.R.P.Shermy, M.E,
Assistant Professor, Department of Computer Science and Engineering, for her
excellent assistance, aspiring guidance, regular feedback and invaluably constructive
ideas.
We express our hearty thanks to our Project Guide Dr. L. SRINIVASAN M.Tech,
Ph.D. Associate professor, Department of Computer Science and Engineering, for his
valuable guidance and timely help for completing our project.
Finally, we owe huge thanks to our teaching Faculty members and Non- teaching
staff members, whose love and insights have so deeply enriched our work.
II
ABSTRACT
In drought-prone regions, conventional breeding methods struggle to produce climate-
resilient crops efficiently, often taking years through trial-and-error approaches. To
address this challenge, this project integrates DNA sequencing, Evolutionary Algorithms
(EA), and Reinforcement Learning (RL) to optimize hybridization in orphan crops—
nutrient-rich but underutilized plants crucial for food security and biodiversity.
This machine learning-driven framework employs EA to simulate natural evolution,
generating genetically diverse hybrids through crossover and mutation. Concurrently, RL
optimizes hybrid selection by analyzing multi-generational performance data, ensuring
superior traits such as high yield, enhanced disease resistance, and adaptability to adverse
environmental conditions. This methodology significantly reduces breeding time compared
to traditional methods while maximizing agronomic potential.
By integrating genetic information with real-time environmental data, the proposed model
enhances breeding efficiency, enabling the development of climate-resilient crops capable
of withstanding extreme weather events, pest infestations, and soil degradation. The
incorporation of large-scale open datasets further strengthens scalability and real-world
applicability. This cost-effective machine learning approach provides a transformative
tool for farmers, researchers, and policymakers, facilitating a sustainable, data-driven
pathway to global food security and resilient agriculture.
Keywords: Machine learning, DNA sequencing, Evolutionary Algorithms (EA),
Reinforcement Learning (RL), hybrid crop prediction, food security, climate resilience,
agronomic potential.
II
LIST OF ABBREVIATIONS
ACRONYMS ABBREVIATIONS
GWAS – Genome-Wide Association Study
SNP – Single Nucleotide Polymorphism
QTL – Quantitative Trait Loci
HMM – Hidden Markov Model
ANN – Artificial Neural Network (specific to
hybridization prediction)
RIL – Recombinant Inbred Line
MAS – Marker-Assisted Selection
GS – Genomic Selection
NIRS – Near-Infrared Spectroscopy
GxE – Genotype-by-Environment Interaction
PCA – Principal Component Analysis
DNN – Deep Neural Network
RF – Random Forest
k-mers – Short DNA Sequence Substrings Used
in Genomic Analysis
SVM – Support Vector Machine
II
TABLE OF CONTENTS
CHAPTER NO TITLE PAGE NO
ABSTRACT
LIST OF FIGURES
LIST OF ABBREVIATIONS
1 INTRODUCTION
1.1 Overview of Hybridization in Agriculture
1.2 Orphan Crops and Their Importance
1.3 Challenges in Traditional Crop Hybridization
1.4 Role of Machine Learning (ML) in Plant Breeding
1.5 ML Algorithms in Hybrid Crop Prediction
1.6 Need for Predictive Hybridization Models
1.7 Objectives of the Project
1.8 Organization of the Report
1.9 Summary
2 LITERATURE REVIEW
2.1 Introduction
2.2 Machine Learning in Agriculture
2.3 Predictive Modeling for Hybrid Crops
2.4 Feature Selection Techniques in Plant Trait Prediction
2.5 Studies on Orphan Crops and Yield Improvement
2.6 Gaps in Existing Research
2.7 Summary
II
3 PROBLEM STATEMENT AND PROPOSED SYSTEM
3.1 Introduction
3.2 Limitations of Traditional Hybridization Methods
3.3 Problem Statement
3.4 Proposed Machine Learning-Based Hybridization
3.5 Expected Advantages Over Traditional Methods
3.6 System Architecture Overview
3.7 Summary
4 SYSTEM SPECIFICATION
4.1 Introduction
4.2 Hardware Requirements
4.3 Software Requirements
4.4 Machine Learning Frameworks Used
4.5 Dataset Description and Collection
4.6 Preprocessing of Plant Trait Data
4.7 Summary
5 PROPOSED METHODOLOGY
5.1 Introduction
5.2 Modules of the System
5.2.1 Parent Plant Trait Extraction
5.2.2 Feature Engineering for Hybrid Prediction
5.2.3 Machine Learning Model Training
5.2.4 Random Forest for Hybrid Trait Prediction
5.2.5 Comparison with Other ML Algorithms
5.2.6 Model Evaluation Metrics
5.2.7 Deployment of the Model
5.3 Summary
II
6 IMPLEMENTATION AND RESULTS
6.1 Implementation of Machine Learning Models
6.2 Dataset Splitting and Training Process
6.3 Performance Analysis of Hybrid Predictions
6.4 Evaluation Metrics: Accuracy, Precision, and Recall
6.5 Hybrid Crop Yield Prediction Results
6.6 Comparison of Model Performance
6.7 Summary
7 CONCLUSION AND FUTURE WORK
7.1 Conclusion
7.2 Future Work and Enhancements
II