Jaison Iyer 2025 Empowering Democracy A Comprehensive Analysis and Predictive Modelling of Voter Turnout in Indian
Jaison Iyer 2025 Empowering Democracy A Comprehensive Analysis and Predictive Modelling of Voter Turnout in Indian
Abstract
This study aims to break down the complex dynamics driving voter turnout in Indian general elections, providing a detailed examination
of the various elements that influence voters to engage in the democratic process. It investigates how voter engagement is changing
over years, looking at socio-economic factors, regional differences and historical patterns that have an impact on civic engagement. The
research utilizes a detailed exploratory data analysis to examine voter data from 1952 to 2019. Key factors influencing voting turnout
are identified through statistical methods and visualizations. In order to predict and comprehend voter behaviour based on various
socio-demographic parameters, the study uses advanced machine learning algorithms, such as Random Forest, XGBoost, LSTM and
other important models. This project contributes to the understanding of voter behaviour, providing actionable insights for improving
democratic participation in the Indian electoral landscape by utilizing hyper-localized constituency-wise data. Previous studies mostly
looked into the political landscape of other countries and did not use any hyper-localized data. The study reveals regional differences,
socio-economic linkages and important drivers of voter turnout. It highlights the value of focused campaigns, interventions tailored to
a certain region and the use of technology to increase political engagements.
Key Words
Voter Turnout, Democratic Participation, Indian General Election, Regional Variations, Socio-economic Factors, Data Analysis,
Machine Learning
Corresponding author:
Lakshmi Shankar Iyer, School of Business and Management, Christ University, Bangalore, Karnataka 560029, India.
E-mail: [email protected]
2 Vision
eastern states than in the northern heartland. Significant The Major Impacts of Lesser Voter
differences are also seen across the constituencies of differ- Turnout in India
ent regions. The variations in region and between urban
and rural areas highlight the complex issues involved in Lesser turnout of voters is a major concern in any demo-
attracting votes from a wide range of Indian voters. cratic setup. This is an indicator of questioning the legiti-
Several factors contributed to increased voter turnout in macy of governments, potentially suggesting a lack of
2019. A significant factor was the rise in voter registration democratic duty among elected leaders. Low turnout may
brought about by programmes like SVEEP. The increased favour more involved groups, leading to biased and skewed
political engagement was facilitated by awareness initiatives representation, particularly concerning diverse demograph-
as well as the strong attraction of leaders such as Narendra ics in India. Lower voting increases the risk of control by the
Modi. The 2019 elections saw an increase in public interest, wealthy, potentially leading to corruption and laws favour-
highlighting the importance of targeted initiatives to improve ing a select group. Non-participation results in the neglect
voter education and motivation. Although there is a general of significant issues, missing opportunities to address the
upward tendency, there are still a number of obstacles to over- diverse demands of the population as well as opportunities
come to guarantee inclusive and widespread involvement. to deserving candidates. Low turnout reflects discontent,
The best voter turnout is still being hampered by elements apathy and alienation, perpetuating a cycle of disinterest in
including low awareness, worker migration, disillusionment politics. Low political participation hinders inclusive devel-
with the political system, fear of violence, technical difficul- opment, leading to social and economic disparities. Low
ties and scepticism of Electronic Voting Machines (EVMs). voter turnout weakens democratic accountability, discourse
The ECI has taken initiatives to address these concerns, such and representation, threatening the overall health of the
as simplifying registration procedures and improving polling democratic system. Hence, predicting voter turnout would
station accessibility. However, a more focused strategy may help the government motivate eligible voters and strategize
be required to address the specific hurdles that regional voters, their visits to the polling booth on the voting day. This would
migrants and other underprivileged groups experience. help in planning the resources and ensure the rightful repre-
Achieving a high voter turnout is essential for strong sentation of the population in the Government.
democracy and efficient government. Nations like The current study focuses on the prediction of voter
Sweden and Belgium, with voter turnout rates of 87.2% turnout across various regions and states during the 2024
(2019) and 88.4% (2018), respectively, showcase broad General Elections in India, which would help state govern-
citizen participation in governance. In India, with a ments prepare for the election day.
current turnout of 68%, a substantial portion of the elec-
torate did not vote in the 2019 general election, poten-
tially altering the outcome. Increased voter participation
Literature Review
leads to a more inclusive and representative democracy,
ensuring that the government reflects or represents Multiple studies have been carried out with regard to the
diverse perspectives of the populace. prediction of party win and voter’s behaviour during the
Predicting voter participation in India is crucial for polling season.
democracy, societal well-being and the economy. Higher The article ‘Predicting Propensity to Vote with
turnouts indicate an engaged electorate, contributing to the Machine Learning’ utilized voting records from the
legitimacy of elected administrations. Understanding turnout IPUMS-ASA, United States. Voting behaviours dataset
patterns over time helps track changes in political opinions compiled by the American Statistical Association that
and habits. Predictions aid logistical arrangements and secu- contains US voting data from 2004 to 2018. The study
rity measures on Election Day, allowing efficient allocation used machine learning environment using TensorFlow.
of resources. Voter turnout projections also guide political Three experiments were conducted with the best model
parties in resource allocation and campaign planning, foster- achieving a Matthews correlation coefficient of 0.39 on
ing a more representative and inclusive democratic process. held-out 2018 data. The results indicate a moderately
This study enhances the integrity of the voting process accurate inference of voting propensity. The results were
and its continuous improvement. Increased voter turnout not inferior compared to a previous study by Challenor,
only strengthens democracy but also impacts a country’s which achieved a higher MCC of 0.74 using a support
economic future. A robust democratic base attracts domestic vector machine (SVM) model. Further work is recom-
and foreign investments, promoting economic growth and mended by incorporating additional demographic, geo-
competitiveness globally. The ECI’s data reveal steady graphic and psychographic data.
growth in population, electors and voters from 1951 to 2019. The study titled ‘Application of Artificial Neural
However, the widening gap between registered electors Networks in Predicting Voter Turnout Based on the Analysis
and actual voters raises concerns about effective democratic of Demographic Data’ obtained data from the Polish
participation. Targeted initiatives are needed to bridge this National Electoral Commission and Central Statistical Office,
gap, ensuring a more robust and inclusive democratic process. containing demographic information and voter turnout
Jaison and Iyer 3
records of Polish communes from 2005, 2007 and 2011 Keras library. More geographic data could improve United
(Michalak, 2019). The machine learning methods used States predictions. Factors like get-out-the-vote efforts may
were random forest regressor and artificial neural networks explain overestimation bias. In this study, there is a need
implemented in Python using the sklearn, keras and tensor- to improve model interpretability. This demonstrates the
flow libraries. This study did not involve the non-geo- potential of using neural networks in election forecasting.
graphical factors like political climate. Nevertheless, the use of public data is a strength. The study
Further study used digital traces captured via a browser by Desai et al. (2019) used techniques like logistic regres-
add-on and mobile app, recording domains visited and apps sion, SVMs, random forest and Naive Bayes classifier for
used over 4 months. Gradient-boosted decision trees were data preprocessing and feature engineering. More robust
implemented in XGBoost and trained on the digital traces to data imputation and feature engineering could potentially
predict self-reported voting behaviour and party preferences improve model performance. Incorporating additional data
from the survey data (Bach et al., 2019). Hyperparameter sources may also help. Factors like get-out- the-vote efforts
tuning used 10-fold cross-validation. The study could have are not captured. The study provides a good framework but
taken up larger and more diverse samples across multiple is limited by data quality.
elections by extending it to other political contexts. The study by Frank and Coma in 2023 examined 579
The study (Hare & Kutsuris, 2022) analyses the use of elections in 80 democracies from 1945 to 2014 using
ensemble models to predict swing voters in U.S. presiden- turnout data from the International IDEA database. A total
tial elections with data from the 2012 Cooperative of 127 potential turnout predictors were derived from 44
Campaign Analysis Project survey with 43,998 respond- previous studies. Extreme bound analysis was imple-
ents interviewed at multiple timepoints before and after the mented, running over 15 million regressions with 70 vari-
2012 election. An ensemble of eight supervised machine ables using fixed effects, random effects and clustered
learning models was constructed to predict swing voters. standard error models. The results provide a systematic
The ensemble model outperformed the baseline and indi- empirical basis for future turnout research. More work is
vidual component learners in predicting swing voters. The needed on the potential interaction effects between varia-
study (Moses & Box-Steffensmeier, 2020) utilizes syn- bles and how results may vary across different country
thetic data and survey data from the 2016 Cooperative contexts or time periods. Various data sources, including
official government records, large-scale surveys, commer-
Congressional Election Study with over 40,000 respond-
cial voter files, field experiments, and new digital trace
ents to demonstrate machine learning approaches. A variety
data, contributed to understanding voting behaviour.
of machine learning algorithms are discussed, including
Findings of the study by Harada et al. (2022) highlight the
classification and regression trees (CART), random forests,
influence of individual factors, such as income, education,
neural networks, SVMs and ensemble methods. Model
age and civic skills on turnout. Racial minorities and youth
tuning, cross-validation and hyperparameter optimization
exhibit lower average turnout, mitigated by mobilization
are highlighted in the study.
efforts. Research gaps include the need for new digital data
The study conducted by Reimer and Toby (2019) used
sources, exploring long-term turnout habits and micro-tar-
data from a 2008 field experiment conducted in Michigan
geting. Despite progress, voter turnout remains a pressing
with 180,002 households randomly assigned to receive concern in ongoing efforts to comprehend and enhance
different mailers or a control. The causal machine learn- democratic engagement. Later study utilized diverse data
ing models used include logistic regression with interac- sources, including official government records, large-scale
tions, a two-model approach, and causal forests. Model surveys, commercial voter files, field experiments and
tuning utilized cross-validation. The study by Hua et al. emerging digital trace data; researchers employ techniques
(2021) uses machine learning techniques to predict and such as observational studies, natural experiments, field
understand voter turnout by using data from the Asian experiments, data mining, machine learning and causal
Barometer Survey conducted in Malaysia in 2010 and inference to explore turnout dynamics. Major findings
2014, containing individuals’ information, demographics reveal disparities in turnout based on resources, age, race,
and other factors. The machine learning methods used costs, institutions, mobilization, social ties and habit
were decision tree algorithms—CHAID, CART and C5.0. strength, yet questions persist about causal mechanisms
Age was the most significant predictor differentiating and generalizability.
younger and older voters. Class imbalance in the data Research gaps emphasize the need for digital trace data
poses a challenge. The study could have explored other to complement traditional sources, understanding specific
algorithms such as SVM and random forests to improve causal mechanisms, assessing generalizability, exploring
predictions. long-term turnout habits and evaluating the impact of misin-
The study by Garcia et al. (2018) utilized data from formation. Significance of voter turnout through Government
public opinion surveys conducted in Hong Kong and the intervention plays a big role when it is analysed from a
United States, as well as census data from the United States regional perspective, as in a country like India, state-wise
Census Bureau. The surveys range from 2008 to 2016. Deep dynamics play a big role in this aspect due to variation in
neural networks were implemented in Python using the demography.
4 Vision
Figure 1. Line Chart Showing Voters Turnout Over Years in Indian General Elections.
Figure 2. Clustered Bar Chart Comparing Average Voter Turnout in General Election with Voter Turnout in Assembly Election Over
the Years.
6 Vision
Figure 3. Stacked Bar Chart Comparing Non-voters with Margin Votes of Winning Candidate.
Table 2. Constituency-wise Average Voter Turnout Percentages voter engagement in these areas. Constituencies like
for Top 10 Least Voter Turnout (%). Hyderabad, Kalyan and Patna Sahib that have high electors
also demonstrate lower voter turnout percentages, ranging
Average of Voter Average of Local from 34.60% to 45.80%. This indicates a comparatively
PC Name Turnout (%) Election Turnout (%) weak civic participation in these regions during elections.
Anantnag 8.98 66.4 The democratic process is showing a notable and alarm-
Srinagar 14.43 66.4 ing tendency, as seen by the stacked bar chart that com-
Baramulla 34.6 66.4 pares the total number of non-voters to the total margin for
Hyderabad 44.84 79.7
each constituency. When the number of non-voters exceeds
Kalyan 45.31 61.4
Patna Sahib 45.8 58.7 the margin of victory for each candidate in each constitu-
Secundrabad 46.5 79.7 ency, it suggests that a significant proportion of eligible
Phulpur 48.7 60.7 voters are not casting votes. Candidates may have lost out
Nalanda 48.79 58.7 on opportunities to win over a substantial number of non-
Karakat 49.09 58.7 voting voters, as indicated by the difference between the
Total 68.14 70.77 margin of victory and the non-voter rate.
Jaison and Iyer 7
Exploratory Data Analysis The scatter plot in Figure 7 shows a non-linear trend line
for the variables, indicating that there is no linear link between
The dependent variable’s histogram, which displays nor- the factors and voter turnout. Rather, it suggests a more com-
mality, indicates that the distribution of voter turnout per- plicated and potentially curved relationship. This suggests that
centages among constituencies is largely bell-shaped and a straightforward linear regression model cannot be sufficient
symmetrical. This normal distribution compliance is to adequately capture the complex patterns in the data when it
crucial to statistical analysis because it suggests that voter
turnout levels are generally distributed around a central
value, with fewer extreme values on either end of the spec-
trum. Voter turnout is distributed normally when a large
proportion of the constituency has ordinary or median
levels of participation, and a smaller fraction of the con-
stituency shows extraordinarily high or low turnout rates.
Because of its normality, a variety of statistical techniques
that presume a normal distribution can be used more easily,
leading to more accurate and dependable evaluations and
forecasts of voter participation.
The correlation matrix provides insight into the connec-
tions between different variables and voter turnout. From
the correlation matrix provided, we observe that local elec-
tion turnout shows the highest positive correlation with
voter turnout in general elections. This suggests that areas
with higher local election turnout also tend to exhibit
higher voter turnout in general elections. Age groups,
gender distribution, literacy rate, income per capita and
municipal election turnout are among the other characteris-
tics that show significant connections with voter turnout.
These associations imply that sociodemographic character-
istics are important in determining voting behaviour. These
parameters could improve the accuracy of the model in
non-linear regression for voter turnout prediction, espe-
cially given their non-linear correlations. Age groups, the
literacy rate and per capita income, for example, may have
non-linear effects on voter participation; hence, a robust
non-linear regression model is required for a more sophis-
ticated prediction. Figure 6. Electors Strength Over Different Indian States.
comes to forecasting voter participation. To accurately depict ● LSTM: LSTM is a type of recurrent neural network
the underlying link between the predictor variables and the suitable for processing sequential data. It can cap-
turnout percentages, it is more suitable to use non-linear ture long-term dependencies and complex temporal
regression approaches to model the complex dependencies patterns, making it effective for non-linear regres-
and changes in voter turnout. sion tasks involving time-series data.
● Multilayer Perceptrons (MLPs): MLPs are feedfor-
ward neural networks with multiple layers. They
Model Building
apply transformations to input data through nodes
Enough studies have been conducted with various machine and activation functions. MLPs are versatile and
learning algorithms, however, there is a need to conduct suitable for various regression problems involving
analysis over a longer duration in voter turnout prediction. non-linear relationships between input features and
the target variable.
● Support Vector Machine (SVM): The SVM is a
learning method used for classification and regres- Statistical measures are a good indicator of the accuracy of the
sion tasks. It finds the best hyperplane to fit the data, model behaviour while proposing the same for prediction.
especially in high-dimensional spaces, and handles
non-linear relationships using kernel functions.
● Decision Tree: Decision trees split data into subsets
Results and Discussions
based on features to make predictions. They are With an outstanding R2 of 0.9997, which indicates extraordi-
good at capturing non-linear relationships and are nary prediction accuracy, XGBoost performs better than the
easy to interpret. other regression models that were assessed. With an RMSE
● Random Forest: This method combines multiple of 0.0149 and an MAE of 0.0111, it shows minimal errors
decision trees to make predictions and handles non- and accurate prediction. With an R2 of 0.7989, random forest
linear relationships well. It uses randomness in exhibits good predictive power and relatively low mistakes,
building trees for better generalization (Figure 8). making it another highly performing algorithm. MLPs and
● Gradient Boosting: Gradient Boosting combines SVM also produce good results. LSTM, on the other hand,
weak learners, typically decision trees, sequentially has a lower R2 and somewhat more mistakes occur even
to improve predictions. It captures complex non- though it captures some patterns. Therefore, XGBoost and
linear relationships effectively and produces accu- random forest are the most successful models, demonstrat-
rate results. ing the right fit for predicting voter turnout.
● XGBoost: XGBoost is an optimized version of gra- The XGBoost plot shows an almost straight line when
dient boosting that improves model performance comparing the real and forecasted data. This implies that there
through parallel processing and regularization. It is nearly perfect alignment among the model’s predictions and
excels at identifying complex patterns in non-linear the observed values. High prediction accuracy is indicated
regression problems (Figure 9). by this alignment, which is the desired outcome. When the
Figure 8. Actual Versus Predicted Plot for Random Forest. Figure 9. Actual Versus Predicted Plot for XGBoost.
Jaison and Iyer 9
model’s predictions closely align with the observed data for improving overall turnout. The analysis shows that
points over the entire range of values, the plot displays a states with high elector populations, such as Bihar, Uttar
straight line. Pradesh and Maharashtra, continue to experience low
Within the random forest model, the points are closely voter turnout in assembly elections. Conversely, Jammu
distributed around the regression line when comparing the & Kashmir displays a contrasting pattern, with higher
actual and predicted values. This shows that the model’s participation in local elections compared to general elec-
predictions are generally correct, since predicted outcomes tions, potentially influenced by political dynamics post
closely match the observed values. The closeness of the Article 370. Certain constituencies, such as Anantnag,
points to the line indicates a strong fit and indicates that Srinagar and Baramulla, exhibit very low voter involve-
the model successfully captured the underlying patterns in ment, while others with large electorates, such as
the data. The random forest algorithm generates dependa- Hyderabad, Kalyan and Patna Sahib, also experience dis-
ble predictions by approximating the relationships within parities in turnout. These variations can be attributed to
the dataset, as evidenced by the general alignment between socio-economic conditions, political dynamics, historical
predicted and actual values. backgrounds and local issues. In the 2019 general elec-
Based on the importance matrix in Figure 10, we can say tions, northern states with higher electorates demonstrated
that the variables that are most relevant to predict voter lower voter turnout and a clear inclination towards the
turnout seem to be general electors, service electors, gender, BJP, while South India and North Eastern states exhibited
average number of electors per polling station, postal voters, support for non-BJP parties. This spatial disparity reflects
contestants, number of polling stations, local election turnout the complex relationships between party affiliations,
and literacy rate (Diwakar, 2008). regional sentiments and voter engagement. Concerns arise
as the number of non-voters exceeds the margin of victory
in certain constituencies, potentially impacting election
Findings outcomes. This disparity underscores the need for compre-
It is observed that Indian general elections have witnessed hensive strategies to enhance voter engagement and elimi-
a consistent upward trend in voter turnout, reaching a peak nate obstacles to turnout (Lane, 2021). The analysis reveals
of 68% in 2019. Factors contributing to this increase 297 million non-voters, a cumulative victory margin of 106
include enhanced access to polling places, civic education, million, and an average turnout of 68.14% in general elec-
technological advancements, election reforms and societal tions, emphasizing the challenges in engaging every voter.
changes. Significant regional variations exist in voter The higher average turnout of 70.77% in assembly elec-
turnout, with states like Jammu and Kashmir, Bihar, Uttar tions underscores the importance of understanding voter
Pradesh and Maharashtra exhibiting lower turnout rates behaviour at the local level. Various machine learning
despite larger voter populations. Identifying and address- models predict voter turnout, with XGBoost and random
ing obstacles to inclusive democratic processes is crucial forest demonstrating exceptional accuracy (R2 of 0.9997).
10 Vision
The 2024 general elections saw a decline in the voter scandals, or alliances, can significantly influence turnout,
turnout to 63.88% across all phases. There is a significant making predictions challenging. Mistakes in election proce-
drop in the turnout, especially in states such as Uttar dures, EVM failures and staff shortages are unpredictable
Pradesh, Maharashtra and Gujarat, where the number of factors that can suppress turnout and affect model accuracy.
Lok Sabha seats is more. This has hampered the results The movement of workers across states, especially in urban
outcome. Events that occurred during the period 2019– and migrant-heavy areas, poses challenges in predicting
2024, such as Ram Mandir issue, Article 370, have made turnout accurately.
an impact on the voter turnout and these events are extra-
ordinary in nature. With the census not being updated, it is
of concern that the actual results reflect the voters’ desire to
Future Scope
elect the right Government. It is a matter of concern when Some of the research questions that could be explored as
non-voters exceed the margin of victory. Literacy rates and future study could be related to the impact of emerging tech-
gender aspects can be taken into consideration with a tar- nologies like blockchain and online voting platforms on
geted approach to improve turnout. The government can voter engagement and accessibility. Also, a study on the
deploy successful engagement strategies from high-turnout relationship between new socio-economic indicators, demo-
southern states. The most significant aspect is the average graphic changes and voter turnout to identify key factors
number of voters per polling station. This analysis can help influencing participation can be taken up. It would be a good
plan resources at the granular level. This would help the idea to explore psychological factors influencing voter
officers optimize the resources and motivate citizens at the behaviour, such as trust in political institutions and the
polling station level watching the polling pattern through- effects of negative campaigns. It is necessary to foster col-
out the voting day. Most of them hesitate to step out of the laborative interdisciplinary research involving political sci-
house due to queuing and waiting time. Social influencers entists, data scientists, psychologists and policymakers to
at the local community level can be utilized to enhance gain comprehensive insights into voter turnout dynamics.
community engagement. To engage with youth, govern-
ments can utilize technology and social media effectively.
Research should evaluate methods for promoting civic
Conclusion
engagement, including educational campaigns and techno- In conclusion, this study provides important insights into
logical interventions. Studying the relationship between improving civic engagement by highlighting trends and pat-
voter turnout, democratic legitimacy and trust in govern- terns in voter turnout. Although there have been improve-
ment institutions can inform policy decisions. Long-term ments, voter participation gaps remain, which emphasizes
trends in voter turnout and electoral behaviour should the need for targeted interventions. Policymakers and elec-
be analysed to identify patterns and drivers of civic toral authorities can develop measures to address such barri-
engagement. ers and promote more voter participation by utilizing the
analysis’s findings. The strengthening of the Indian democ-
racy depends on the pursuit of a more engaged and repre-
Limitations of the Study sentative electorate. In the future, further research should
Inaccuracies or inconsistencies in historical election records, focus on identifying and understanding the continuing diffi-
census data and PDFs could compromise the reliability and culties associated with voting while also testing and putting
validity of the analysis. The absence of census data beyond new ideas into practice. Working together, we can close
2011 and the lack of exact demographic and social-economic current differences and create a political environment in
data restrict the accuracy, especially in capturing changes in which all citizens feel empowered and represented as we
the population and age-specific voting trends. The exclusion work towards an inclusive democratic process guided by the
of certain qualitative and quantitative factors may limit the findings of this study.
model’s accuracy. Incorporating additional factors related to
voter attitudes, local issues and candidate profiles could Declaration of Conflicting Interests
provide more nuanced insights. Relying on forecasted The authors declared no potential conflicts of interest with respect
census values introduces a predictive element, and discrep- to the research, authorship and/or publication of this article.
ancies between forecasted and actual demographic data may
impact the precision of the analysis. Shifts in public senti- Funding
ment towards political issues, parties and candidates are The authors received no financial support for the research,
dynamic and challenging to capture in models, potentially authorship and/or publication of this article.
leading to inaccuracies. Localized issues, candidate profiles,
weather conditions and other context-specific factors can ORCID iD
influence voter turnout but are difficult to incorporate sys-
Lakshmi Shankar Iyer https://orcid.org/0000-0002-4765-8622
tematically. Last-minute occurrences, such as violence,
Jaison and Iyer 11