Kissan-Dost: Bridging the Last Mile in Smallholder Precision Agriculture with Conversational IoT

Muhammad Saad Ali, Daanish U. Khan, Laiba Intizar Ahmad, Umer Irfan, Maryam Mustafa, Naveed Anwar Bhatti,
Muhammad Hamad Alizai

Abstract

We present Kissan-Dost, a multilingual, sensor-grounded conversational system that turns live on-farm measurements and weather into plain-language guidance delivered over WhatsApp text or voice. The system couples commodity soil and climate sensors with retrieval-augmented generation, then enforces grounding, traceability, and proactive alerts through a modular pipeline. In a 90-day, two-site pilot with five participants, we ran three phases (baseline, dashboard only, chatbot only). Dashboard engagement was sporadic and faded, while the chatbot was used nearly daily and informed concrete actions. Controlled tests on 99 sensor-grounded crop queries achieved over 90 percent correctness with subsecond end-to-end latency, alongside high-quality translation outputs. Results show that careful last-mile integration, not novel circuitry, unlocks the latent value of existing Agri-IoT for smallholders.

I Introduction

Agricultural communities in low- and middle-income countries (LMICs) stand to gain immensely from the IoT revolution, yet a profound gap remains between technological potential and on-ground adoption. Modern IoT systems can capture fine-grained data on soil health, microclimate, and crop conditions, the raw ingredients for precision agriculture, but translating these complex data into understandable, actionable guidance for smallholder farmers in LMICs is a significant challenge. Agriculture is critical in these regions for food security and poverty reduction. For example, in Pakistan, agriculture is the largest sector of the economy but is projected to be among the worst hit by climate change; the country is currently the world’s 8^th most climate-vulnerable nation [9]. To address this threat, Pakistan and similar countries must equip their producers with technology to promote wider adoption of climate-smart practices. Prior studies show that farmers in resource-constrained settings struggle with conventional IoT solutions due to limited internet, low literacy, and interfaces that assume technical expertise [43, 8]. The very users who could benefit most from data-driven farming are thus often excluded by designs that fail to account for local constraints. This reality motivates our work and raises the key research question:

How can IoT sensing be translated into sensor-cited, language-appropriate recommendations that smallholders trust and act on?

Our Approach: We argue that closing the adoption gap requires shifting effort from novel hardware to last-mile usability. Affordability remains a boundary condition, but the harder problem is translating live sensor streams into culturally grounded, language-appropriate guidance. We therefore developed Kissan-Dost¹¹1Kissan-Dost means “farmer’s friend” in Urdu., an end-to-end pipeline that marries off-the-shelf sensors with a retrieval-augmented large language model (LLM) and delivers recommendations through WhatsApp (the de-facto communication channel for rural Pakistan [28]). Figure 1 outlines the architecture: sensors transmit via ESP-NOW protocol to an edge gateway; a cloud service fuses these streams with weather forecasts and a domain knowledge base; finally, a multilingual LLM generates crop-specific advice that reaches farmers as text and optional voice notes.

This design follows three principles. (i) Familiar medium: Delivering over WhatsApp avoids app installation and unfamiliar UI metaphors. (ii) Literacy independence: Voice notes in local language let low-literate workers consume the same content as owners. (iii) Sensor grounding: Each recommendation cites the underlying measurement and forecast, enhancing transparency and trust. For instance, when Kissan-Dost judges that current soil moisture is inadequate for the crop and growth stage, considering soil type, recent conditions, and forecast rain, it automatically issues a literacy-independent voice prompt recommending timely irrigation:

Deployment Summary: We ran three consecutive 15-day phases at each site: (P1) Baseline (observation only), (P2) Dashboard only, and (P3) Chatbot only. Engagement during P2 was sporadic and largely ceased by week two. In the subsequent P3 chatbot phase, participants interacted nearly daily with Kissan-Dost.

Contributions: This paper makes three key contributions:

End-to-End IoT-Chatbot Framework. To our knowledge, the first end-to-end, deployed system that marries live sensor streams with an LLM and delivers localized advice via an existing channel (WhatsApp). The design is hardware-agnostic and open-source for reproducibility [2].

Human-centered pipeline. A modular chain (intent $\rightarrow$ retrieval $\rightarrow$ synthesis $\rightarrow$ proactive alerts) that emphasizes evidence grounding, traceability, and multilingual accessibility for low-literacy users.

Empirical Evidence of Impact. Beyond bench metrics (accuracy, grounding, latency, translation), a 90-day, two-site deployment shows sustained daily engagement and concrete decisions (e.g., irrigation postponement, acidity correction), highlighting real-world viability.

II Related Work

Kissan-Dost sits at the intersection of agricultural IoT monitoring, literacy-aware conversational interfaces, and emerging IoT–LLM integration.

Agricultural IoT platforms: Systems such as FarmBeats [39] and AgriSens [29] demonstrate robust rural sensing and networking, with follow-on work improving range, power, and bill of materials. However, many deployments assume that technically trained intermediaries interpret dashboards, leaving an “implementation gap” between sensing and actionable use in smallholder settings [43, 8]. Kissan-Dost keeps commodity sensing but focuses on translating measurements into decisions through a low-friction interface.

Refer to caption — Figure 1: Kissan-Dost System Architecture: data flow from sensors to the LLM-based conversational interface.

Conversational access in low-resource contexts: IVR and voice systems (e.g., Avaaj Otalo [31], FarmChat [15]) and other low-literacy designs can improve access to information, but are typically limited to static knowledge bases and lack personalization from plot-specific, real-time data [19]. Kissan-Dost grounds responses in a user’s own sensor streams and forecasts and delivers them through WhatsApp, a channel already embedded in daily communication [28].

IoT streams with LLMs: Recent work explores using LLMs to narrate or query sensor events [1], while prototypes such as AgriGPT [25] focus on natural-language access to agricultural knowledge. Empirical evidence from live, sensor-grounded deployments with proactive alerts remains limited [6]. Kissan-Dost contributes a field-tested system that couples retrieval-augmented generation with continuous sensing and evaluates both response quality and real-world engagement.

III Kissan-Dost: Design and Implementation

Kissan-Dost is an end-to-end platform that links advanced IoT sensing with practical, field-level decision support. Its design rests on three principles. (i) Actionable accessibility: the interface must remain intuitive for users who lack technical training, eliminating any need to interpret raw data. (ii) Context-aware integration: recommendations must mirror current soil and climate conditions by fusing sensor streams with external sources such as weather forecasts and local market prices. (iii) Cultural localization: interaction should fit local languages, dialects, and farming practices, relying on familiar channels like WhatsApp voice notes to foster trust and sustained use.

We next outline how these principles shaped both the hardware architecture for data collection and the software pipeline that converts measurements into plain-language guidance for smallholder farmers.

III-A System Overview

Kissan-Dost combines sensor devices, a wireless gateway, cloud services, and a multilingual chatbot interface (Figure 1). IoT nodes measure soil moisture, temperature, NPK, conductivity, and pH, then relay readings to a gateway that forwards the data to the cloud over cellular or Wi-Fi links. The cloud layer enriches these data with external feeds, passes the combined record into a retrieval-augmented generation module, and queries an LLM. Farmers use WhatsApp to send text or voice messages; incoming queries are translated into English for processing, and replies are rendered in the farmer’s preferred language, affording participation of users with varied literacy levels. This round-trip forms a continuous feedback loop that adapts advice to evolving field conditions.

During onboarding, farmers provide location, crop type, and language preferences. Kissan-Dost stores these settings and tailors each subsequent recommendation accordingly, turning heterogeneous sensor, climate, and market data into concise, actionable guidance that supports day-to-day farm management across diverse contexts.

III-B Hardware Design

We built a custom sensor–gateway stack for two main reasons. First, openness and interoperability: most commercial “plug and play” kits hide functionality behind opaque APIs, which hinders deep integration with our retrieval-augmented dialogue pipeline and prevents firmware-level changes. Second, transparent economics without turning affordability into a race to the bottom: by relying on commodity parts (ESP32 $\mu$ C, an industrial 7-in-1 soil probe, and off-the-shelf batteries and enclosures), our bill of materials stays easy to verify and replication-friendly.

Our per-node component cost is $110, and an entry setup (one node + one ESP32-class gateway) totals $140. We emphasize that these figures are not meant to be the minimum possible cost, but rather representative of contemporary Agriculture-IoT builds using commodity MCUs and COTS probes.

Therefore, the value of Kissan-Dost lies less in cheaper hardware and more in a human-centered pipeline that translates readings from many nodes into clear, actionable advice for nontechnical farmers.

III-B1 Sensing

The Agri Sensor module in Figure 2 forms the core of our data-collection infrastructure, using a compact 7-in-1 sensor suite to track temperature, soil moisture, nitrogen, phosphorus, potassium, pH, and electrical conductivity (Table I). These variables guide irrigation scheduling, fertiliser management, and soil amendments [32]. Each node sends readings over a low-power ESP-NOW link, ensuring connectivity in remote areas. Data reaches the cloud through a dedicated gateway and is then converted into actionable insights. All electronics sit inside a rugged IP65 enclosure for year-round reliability.

TABLE I: Sensing parameters of the Agri Sensor module.

Category	Parameter	Agronomic Relevance	Key References
Soil Nutrients	N, P, K	Indicates macronutrient availability for optimal plant growth, real-time data informs fertiliser strategies.	[41]
	pH	Governs nutrient solubility and root uptake, deviations can limit yields or cause toxicity.	[21, 20]
	Electrical Conductivity (EC)	Reflects soil salinity and potential salt accumulation, excess salinity hampers crop growth.	[26]
Environmental Factors	Temperature	Influences plant metabolic processes, sudden changes can signal heat stress or frost risk.	[35]
Environmental Factors	Soil Moisture	Balances water use with crop requirements, prevents drought stress and nutrient leaching.	[14, 37]

III-B2 Gateway

The gateway aggregates and forwards sensor data to the cloud using an ESP32 that runs ESP-NOW for local links and cellular (LTE/3G) . Synchronized duty cycling coordinates the gateway and sensor nodes, lowering power demand without sacrificing responsiveness. Adaptive scheduling adjusts transmission frequency to match event urgency and link quality, while an onboard buffer preserves up to 72 hours of readings during connectivity outages. Housed in the same IP65 enclosure as the sensors, the gateway continues operating reliably under harsh field conditions.

III-C Software Architecture

The cloud backend translates live measurements into personalised advice. It scales elastically, so new sensors or users do not cause bottlenecks.

III-C1 WhatsApp interface

Farmers send text or voice notes to a dedicated bot. An in-line translator converts any non-English query to English for processing, after which the reply is rendered in the farmer’s language. We adopt this English-normalization step because prior work shows multilingual LLMs often reason more reliably in English [11]. This bidirectional translation supports Urdu, Punjabi, Sindhi, and other regional languages, reducing the literacy and language barriers that block many existing dashboards.

III-C2 Chained-prompt LLM pipeline

Converting a farmer’s question plus sensor streams into a useful answer is a multi-stage task. We therefore split the logic into lightweight, specialised calls. The Query-Intent Parser inspects the message and emits a JSON request that specifies which data fields are required, for example, recent soil moisture and a two-day forecast. A Contextual Enricher attaches farm profile details such as crop type, location, and the last week of interaction history [4]. A Multilingual Translator keeps all intermediate representations consistent with the farmer’s preferred language. The Data-Synthesis and Recommendation module then fuses context with live data and crafts a concise, actionable reply. Finally, a Proactive Alert Handler monitors incoming sensor packets and asks the LLM to assess whether values are atypical for the crop and growth stage; when risk is inferred, it issues a warning, even if the farmer has not asked a question. Each step is simpler than the entire pipeline, which makes errors easier to trace and future extensions easier to add.

III-C3 Retrieval-augmented generation and knowledge base

Before the core LLM composes its answer, a semantic search retrieves passages from local extension manuals and best-practice guides. These excerpts are injected into the prompt, grounding the response in vetted agronomy and providing citations that the farmer can trace [12]. This retrieval step reduces hallucinations and ensures that fertiliser or irrigation advice aligns with local guidelines.

III-C4 Agronomic data repository

Historical sensor readings, farm profiles, and chat logs reside in a time-series database. Longitudinal analysis spots gradual shifts such as creeping salinity, while the dialogue engine can recall recent advice to avoid repetition.

III-C5 Weather and market data integration

External feeds supply short-term forecasts and crop-price trends. By combining these feeds with field data, the system can warn of imminent storms or suggest a favourable window to sell a harvest.

III-C6 Orchestrator

A lightweight service coordinates every module, schedules daily summaries, manages concurrent requests, and retries failed components. The orchestrator maintains low latency as demand increases.

III-C7 Personalised chatbot persona

A compact system prompt steers the LLM to respond as a practical agronomy advisor: concise, action-oriented, and tailored to the user’s crop, location, and recent conditions. We keep persona instructions lightweight and rely on sensor/forecast citations plus retrieved agronomy passages for grounding and consistency [34, 36].

III-D Operational Flow

Figure 3 brings the pieces together. A farmer sends a WhatsApp message, which is translated to English if needed. The Intent Parser produces a JSON request that lists the required inputs, typically the latest sensor readings, a weather forecast, market prices, and relevant knowledge-base passages. The orchestrator gathers those inputs, hands them to the Data-Synthesis module, and returns the reply in the farmer’s language, either as text or as a voice note. Scheduled summaries follow the same route, triggered automatically at the times chosen during onboarding.

While conversations proceed, all incoming sensor readings are logged, which allows trend analysis and richer future replies. In this manner Kissan-Dost delivers both reactive answers and proactive guidance through a single, low-friction chat interface.

IV System Performance

We evaluated Kissan-Dost across four dimensions. First, we measured conversational accuracy (the correctness, relevance, coherence, and conciseness of replies) using an LLM-as-a-Judge strategy that requires no human annotation [24] as obtaining large-scale, expert-labeled datasets in agriculture is impractical. Second, we quantified factual grounding with the RAGAS framework, focusing on answer relevance and faithfulness to retrieved sources [10]. Third, we assessed multilingual fidelity by running reference-free metrics, COMETKiwi [33] and MetricX-24 [18], on Urdu, Punjabi, and Sindhi outputs to confirm fluency and semantic accuracy. Finally, we conducted controlled bench tests to validate the performance characteristics of our off-the-shelf hardware components.

IV-A Evaluation Methodology

High-quality public datasets for Punjab’s smallholder agriculture are scarce, so we built a synthetic benchmark to complement the field trial. Following recent work that uses LLMs to generate realistic test corpora for under-resourced domains [13], we created context-rich queries that reflect local agronomy and language.

Dataset tiers. We created 99 queries spanning three difficulty levels: Easy (single-fact), Medium (multi-factor), and Hard (sensor-driven inference), following prior LLM evaluation practice [42, 38].

Crop coverage. To keep the benchmark realistic, we chose three crops—maize, sugarcane, and spinach—covering staple grain, cash, and fast-cycle vegetable categories. Each crop contributed 11 queries per tier, yielding 99 total (Table II).

TABLE II: Crop-wise sensor data collection.

Crop	Season	Deployment Context	Location Type
Maize	Fall	Initial pilot	Commercial farm
Sugarcane	Fall–Winter	User Study 1	Commercial farm
Spinach	Spring	User Study 2	University farm

Language and context. Each query incorporated live sensor features such as NPK, pH, EC, and weather forecasts, then was phrased after real farmer conversations to capture regional vocabulary and sentence structure. This approach stresses the system with authentic linguistic patterns while covering scenarios unseen during deployment.

IV-B Results $\to$ Response Quality

We used an LLM-as-a-Jury protocol [40] with four judge models (Table III) to rate correctness, coherence, relevance, and conciseness over 99 queries. Figure 5 shows high correctness across tiers, including 92.9% on Hard items; relevance stayed above 90% for all tiers. Coherence and conciseness remained stable ( $\geq$ 78%), indicating that the chained-prompt pipeline produces consistently interpretable answers under sensor-driven reasoning.

IV-C Results $\to$ Retrieval Pipeline Fidelity (RAGAS)

We measured how firmly Kissan-Dost’s answers stay tied to evidence with the RAGAS framework [10], which reports two scores: answer relevance (does the response address the query topically?) and faithfulness (is the answer grounded in retrieved documents?). All 99 synthetic queries, built independently of the retrieval corpus, were evaluated.

Medium-difficulty items scored best (Relevance 96.6%, Faithfulness 87%), likely because they balance specificity and complexity. Easy items lagged (85.8%, 76.1%), their vagueness yielding more generic retrieval. Hard items remained highly relevant (94.2%) but faithfulness dipped to 74.7%, indicating occasional extrapolation when the model synthesises multiple sources. The wider spread in faithfulness mirrors earlier findings [10] and highlights the ongoing challenge of strict grounding in multi-factor agronomic advice.

IV-D Results $\to$ Translation Capability

Accurate translation underpins trust in Kissan-Dost’s multilingual deployments. We measured fidelity for Urdu, Punjabi, and Sindhi using the reference-free metrics COMETKiwi [33] and MetricX-24 [18] (Table IV).

•

Urdu: national lingua franca.
•

Punjabi: spoken by 73 % of rural Punjab [30].
•

Sindhi: primary tongue for 92 % of rural Sindh [30].

Punjab and Sindh produce most of Pakistan’s wheat, rice, sugarcane, and cotton [27]; reliable translation into their dominant languages is therefore critical for adoption.

TABLE IV: Translation metrics

Metric	Score Range	Key Features
COMETKiwi	0–1 (↑ better)	Correlates with human fluency/adequacy judgment
MetricX-24	0–25 (↓ better)	Hybrid evaluation with MQM/DA grounding

TABLE V: Evaluated Translation Models

Model	Developer	Capabilities
GPT-4o	OpenAI	Strong few-shot multilingual performance
LLaMA 3.3 70B	Meta	Open-weight, fast inference, high accuracy
Mixtral 8x22B	Mistral	Efficient MoE model
Qwen2-VL 72B	Alibaba	Multimodal, regional language support
DeepSeek V3	DeepSeek	Tuned for low-cost multilingual deployment

TABLE VI: Average COMETKiwi Scores (Higher is Better)

Model	Urdu	Punjabi	Sindhi
GPT-4o	0.826	\cellcolorgreen!200.746	0.767
DeepSeek V3	\cellcolorgreen!200.835	0.719	\cellcolorgreen!200.775
LLaMA 3.3	0.808	0.000	0.676
Qwen2-VL 72B	0.617	0.513	0.369
Mixtral 8x22B	0.544	0.491	0.383

TABLE VII: Average MetricX-24 Scores (Lower is Better)

Model	Urdu	Punjabi	Sindhi
GPT-4o	2.558	3.681	4.618
DeepSeek V3	\cellcolorgreen!202.143	\cellcolorgreen!203.028	\cellcolorgreen!204.558
LLaMA 3.3	3.505	25.000	7.714
Qwen2-VL 72B	10.004	11.040	13.506
Mixtral 8x22B	13.109	14.810	16.754

We benchmarked five models (Table V) with a focus on agricultural terminology and correct script. Recent work shows “non-reasoning” MT models often beat reasoning-oriented ones on raw translation quality [5]; hence, our model set differs from the “thinking” judges used in Section IV-B.

Tables VI and VII list average scores. A specific adjustment was made for LLaMA 3.3’s Punjabi results: although semantically accurate, the model produced output in Gurmukhi script (used in Indian Punjab) rather than Shahmukhi script (used in Pakistan). Since this rendered the translation unreadable for our target users, we manually set the COMETKiwi score to 0 and the MetricX-24 score to 25 to reflect its lack of deployment usability for Punjabi.

IV-E Results $\to$ Hardware validation

Table VIII summarizes key envelope metrics. The complete received-signal-strength profile appears in Figure 7. In line-of-sight tests, the packet delivery ratio remained above $90\%$ out to $425\text{\,}\mathrm{m}$ , which exceeded stable ranges reported for comparable 2.4 GHz links [16]. Transmit (TX), sensing, processing, and sleep currents were measured with a precision shunt-resistor fixture (Figure 7). Our hardware results are intended only to verify operability in representative field conditions, not to set new bounds. We therefore report typical ranges and conditions sufficient for replication and explicitly avoid generalizing beyond the tested farms. Independent studies of the sensor family report accuracy consistent with the manufacturer’s datasheet specifications [23].

TABLE VIII: Hardware validation and sensor parameters. Envelope values drawn from prior Agriculture-IoT platforms [16, 22]

Hardware Validation
Domain Metric Ours (mW) Envelope Comm. Stable range (typ. conditions) $90\%$ to $425\text{\,}\mathrm{m}$ $10\text{\,}\mathrm{m}$ – $100\text{\,}\mathrm{m}$ Energy Transmission Power 1030 mW^∗ 10–835 mW Sensor power 115 mW Processing power 482 mW 1–750 mW [22] MCU sleep power 0.030 mW 0.001–0.825 mW [22]

^∗ ESP32 power includes MCU baseline; RF-TX adds 460–730 mW.

Sensor Parameters (values adopted from [7])
Metric Range Accuracy Temperature $-40$ – $80\,^{\circ}$ C $\pm 0.5\,^{\circ}$ C Humidity $0$ – $100\%$ RH $\pm 2\%$ (0–50%), $\pm 3\%$ (50–100%) pH $3$ – $9$ $\pm 0.3$ EC (Conductivity) $0$ – $20{,}000\,\mu$ S/cm $\pm 3\%$ (0–10,000), $\pm 5\%$ (10,000–20,000) N, P, K $1$ – $2999$ mg/kg $\leq 5\%$

IV-F System Performance Discussion

Across synthetic stress tests, Kissan-Dost maintained high relevance even on Hard queries, but faithfulness dipped when answers required multi-source synthesis (sensors + forecasts + agronomy text). This reflects a practical trade-off: strict retrieval grounding reduces hallucinations, while useful agronomic guidance often requires controlled inference. Multilingual evaluation also surfaced deployment details that matter (e.g., Punjabi script choice). Our results should be read as stress-test indicators rather than agronomist-verified field accuracy.

V Field Deployment and Pilot Study

To test real-world viability, we ran a 90-day formative pilot study after IRB approval. Rather than a large-scale quantitative trial, we opted for an in-depth, qualitative approach, installing Kissan-Dost at two contrasting Punjab sites: a 2.5-acre commercial sugarcane field and a 0.5-acre university plot that grows organic vegetables for faculty. Five people, two managers and three field workers²²2One worker left midway for health reasons., used the system for 45 days at each site. Figure 8 shows the sensor layouts.

The mix of commercial and academic plots lets us observe both strategic planning and daily field work. This pilot aimed to (i) verify hardware and cloud robustness under field conditions, (ii) gather qualitative feedback on the WhatsApp interface, and (iii) surface challenges and opportunities for a larger-scale study, rather than to produce statistically significant yield outcomes.

V-A Methodology

Deployment phases: Each site followed three successive 15-day periods. The opening phase observed existing practices and captured decision-making through baseline interviews³³3We released code and documentation via anonymized GitHub repo [2].. Next, participants used a localized Urdu dashboard (Figure 10) that visualized temperature, conductivity, NPK, pH, and moisture with trend lines, serving as a visual benchmark. Finally, they switched to the Kissan-Dost WhatsApp chatbot(e.g., Figure 10), which delivered the same sensor insights in conversational form and in the user’s preferred language.

User onboarding: During the baseline interview, we recorded each farmer’s phone number, language, crops, and location, then sent a test message via the WhatsApp Business API. A successful reply activated the account and locked in the chosen language, so all later chats were automatically localised.

Data collection and analysis: We combined system logs, field notes, and semi-structured interviews. Interview transcripts were coded thematically with an evolving code-book to track adoption, and behaviour change. Triangulating qualitative insights with usage metrics and observations revealed patterns in accessibility, knowledge gain, and how trust formed across the three phases. Summary of instruments shown in Table IX.

TABLE IX: Data collection methods and analysis

Method	Description	Data Captured	Analysis Approach
System Logs	WhatsApp conversations with the chatbot	Query types, usage patterns, interaction frequency	Frequency analysis [17]
Semi-Structured Interviews	Pre- and post-deployment semi-structured interviews	Perceptions of trust, usability, knowledge gain, adoption barriers	Thematic analysis using structured codebook [3]
Field Observations	Researcher notes during site visits	Non-verbal behaviors, contextual challenges	Qualitative pattern identification

V-B User Experience

Our field deployment surfaced insights into decision-making, usability, trust, and knowledge gains across the three study phases.

Phase 1: Baseline agricultural practices: During baseline, decisions at both farms were largely experience-driven. Irrigation, weeding, and field preparation relied on visual inspection, seasonal heuristics, and peer input, with limited use of external data (e.g., weather forecasts).

Managers were comfortable with smartphones but had little exposure to digital agriculture tools. Generic weather apps were used occasionally but considered unreliable for plot-specific decisions, reinforcing skepticism toward automated advisories.

Phase 2: The dashboard usability challenge: Introducing the IoT web dashboard (Figure 10) led to consistently low engagement⁴⁴4Engagement defines the number of times the dashboard app was opened. across both sites (Figure 11, Phase 2). Managers opened it sporadically but reported that interpreting graphs required agronomic expertise, limiting sustained use. Workers largely disengaged; for example, Worker (Farm 1) never opened the dashboard during the 15-day phase, even after the demonstration. His sentiments⁵⁵5All comments, originally in local language, were translated into English were:

Overall, dashboards increased transparency but did not bridge the gap from raw sensor values to practical decisions for users without technical training.

Phase 3: Chatbot adoption and engagement: Participants quickly abandoned the dashboard but maintained steady WhatsApp use; Figure 11 contrasts Phase 3’s activity with Phase 2’s near-silence. This supports our claim: farmers engage when advice is interpretable and delivered in their language. Since agronomic conditions shift over days, success is reflected in sustained daily use rather than message volume; two to three exchanges per day were sufficient. Farm 1 adopted Kissan-Dost immediately, while Farm 2 ramped up after early recommendations proved accurate. By Day 8, the worker at Farm 2 consulted the bot regularly, coinciding with warnings about declining crop health.

Two episodes illustrate impact. At Farm 1, the manager postponed irrigation after the chatbot flagged adequate soil moisture (Figure 12). At Farm 2, the bot detected a pH drop (Figure 13), issued repeated alerts, and suggested lime:

The worker could not afford the treatment but confirmed the diagnosis:

Across both farms, users highlighted the simple language, voice-note option, and familiar WhatsApp interface:

By translating sensor streams into clear, local-language guidance, Kissan-Dost was both accessible and actionable in day-to-day farming.

V-C Synthesized Insights and Implications

Analysis across both deployments revealed consistent themes regarding the chatbot’s impact:

Accessibility & reduced cognitive load: The chatbot’s simplified, conversational approach lowered the barrier to interpreting complex sensor data, particularly compared to dashboards. This proved effective for users regardless of literacy level. The familiar WhatsApp interface further eased adoption. Users frequently cited clarity and ease of understanding:

Trust formation: Trust was built through the chatbot’s perceived precision, such as site-specific, root-level insights, as well as the reinforcement of farmer intuition and relevant contextual cues. Participants appreciated advice aligned with local conditions, including planting times and observable field trends. For instance:

Knowledge expansion & decision support: Participants described gaining new insights into agricultural practices, including organic pest control and optimal fertilization timing. These knowledge gains translated into operational decisions. For example:

Limitations & future needs: Participants identified two key limitations. First, the chatbot lacked crop lifecycle awareness, limiting its contextual relevance during different growth stages. Second, the short duration of the study constrained its potential for long-term guidance:

VI Conclusion

Kissan-Dost shows that combining off-the-shelf soil sensors with retrieval-augmented generation and a WhatsApp chatbot can deliver precision-agriculture advice that smallholder farmers actually use. The system distills live sensor streams and vetted agronomy guidance into clear, multilingual text or voice messages. In controlled tests, it achieved $>$ 90% correct sensor-grounded answers, and a 90-day field deployment showed sustained daily engagement and improved irrigation decisions, unlike the dashboard. Overall, the findings highlight that last-mile delivery—language fit, cultural alignment, and trusted channels—is what unlocks the value of existing Agri-IoT hardware in LMIC contexts.

References

[1] A. Abgaryan et al. (2024) IoT-LLM: leveraging large language models for intuitive IoT data interpretation. In EWSN, Cited by: §II.
[2] (2025) Anonymized Kissan-Dost source code. External Links: Link Cited by: §I, footnote 3.
[3] T. Chakravorti et al. (2025) Open science practices by early career hci researchers: perceptions, challenges, and benefits. arXiv. Cited by: TABLE IX.
[4] C. Chan et al. (2024) RQ-RAG: learning to refine queries for retrieval augmented generation. arXiv. Cited by: §III-C2.
[5] C. Chen et al. (2024) DeepSeek vs. o3-mini: how well can reasoning llms evaluate mt and summarization?. Cited by: §IV-D.
[6] H. Chu et al. (2023) IoT solutions for smart farming: a comprehensive review on the current trends, challenges, and future prospects for sustainable agriculture. Journal of Forestry Science and Technology. Cited by: §II.
[7] CWT soil sensor (npk type) manual v1.4. CWT CO., LIMITED. Note: Product datasheet and user manual Cited by: TABLE VIII.
[8] T. Dibbern et al. (2024) Main drivers and barriers to the adoption of digital agriculture technologies. Smart Agricultural Technology. Cited by: §I, §II.
[9] D. Eckstein et al. (2021) Global climate risk index 2021. Technical report Germanwatch e. V., Bonn, Germany. Cited by: §I.
[10] S. Es et al. (2023) RAGAS: automated evaluation of retrieval augmented generation. arXiv. Cited by: §IV-C, §IV-C, §IV.
[11] J. Etxaniz et al. (2023) Do multilingual language models think better in english?. arXiv. Cited by: §III-C1.
[12] Y. Gao et al. (2024) Retrieval-augmented generation for large language models: a survey. arXiv. Cited by: §III-C3.
[13] X. Guo and Y. Chen (2024) Generative AI for synthetic data generation: methods, challenges and the future. arXiv. Cited by: §IV-A.
[14] G. R. F. Ibrahim et al. (2023) Assessing how irrigation practices and soil moisture affect crop growth through monitoring sentinel-1 and sentinel-2 data. Environmental Monitoring and Assessment. Cited by: TABLE I.
[15] M. Jain et al. (2018) FarmChat: a conversational agent to answer farmer queries. UbiComp. Cited by: §II.
[16] H. M. Jawad et al. (2017) Energy-efficient wireless sensor networks for precision agriculture: a review. Sensors. Cited by: §IV-E, TABLE VIII, TABLE VIII.
[17] M. Jung and othersu (2024) Quantitative insights into large language model usage and trust in academia: an empirical study. arXiv. Cited by: TABLE IX.
[18] J. Juraska et al. (2024) MetricX-24: the Google submission to the WMT 2024 metrics shared task. In Machine Translation, Cited by: §IV-D, §IV.
[19] A. Kamilaris and F. X. Prenafeta-Boldú (2018) Deep learning in agriculture: a survey. Computers and Electronics in Agriculture. Cited by: §II.
[20] S. O. Kennedy (2022) Soil pH and its impact on nutrient availability and crop growth. IJGGE. Cited by: TABLE I.
[21] F. Khaled and A. Sayed (2023) Soil pH and its influence on nutrient availability and plant health. IJACR. Cited by: TABLE I.
[22] A. Khalifeh et al. (2022) Microcontroller unit-based wireless sensor network nodes: a review. Sensors. Cited by: TABLE VIII, TABLE VIII, TABLE VIII, TABLE VIII.
[23] L. L. Kumar et al. (2024) Monitoring of soil nutrients using soil npk sensor and arduino. Eco. Env. & Cons.. Cited by: §IV-E.
[24] D. Li et al. (2025) From generation to judgment: opportunities and challenges of llm-as-a-judge. arXiv. External Links: 2411.16594 Cited by: §IV.
[25] Y. Li et al. (2023) AgriGPT: customizing large language models for agricultural knowledge systems. In EDBT, Cited by: §II.
[26] R. M. A. Machado et al. (2017) Soil salinity: effect on vegetable crop growth. management practices to prevent and mitigate soil salinization. Horticulturae. Cited by: TABLE I.
[27] Ministry of National Food Security and Research (2024) Crops area & production by 2022–23 (district wise). Note: Accessed April 2025 Cited by: §IV-D.
[28] M.S. Nain, R. Singh, and J. R. Mishra (2019) Social networking of innovative farmers through whatsapp messenger for learning exchange: a study of content sharing. The Indian Journal of Agricultural Sciences. Cited by: §I, §II.
[29] T. Ojha et al. (2021) AgriSens: iot-based dynamic irrigation scheduling system for water management of irrigated crops. IEEE IoTs Journal. Cited by: §II.
[30] Pakistan Bureau of Statistics (2023) Population by mother tongue. Cited by: 2nd item, 3rd item.
[31] N. Patel et al. (2010) Avaaj otalo: a field study of an interactive voice forum for small farmers in rural india. In CHI, Cited by: §II.
[32] P. Placidi and et al. (2021) Monitoring soil and ambient parameters in the iot precision agriculture scenario. Sensors. Cited by: §III-B1.
[33] R. Rei et al. (2022) CometKiwi: IST-unbabel 2022 submission for the quality estimation shared task. In WMT, Cited by: §IV-D, §IV.
[34] K. I. Roumeliotis and N. D. Tselikas (2023) ChatGPT and Open-AI models: a preliminary review. Future Internet. Cited by: §III-C7.
[35] S. Sharma et al. (2022) Impact of high temperature on germination, seedling growth and enzymatic activity of wheat. Agriculture, MDPI. Cited by: TABLE I.
[36] N. Singh et al. (2024) Farmer.chat: scaling ai-powered agricultural services for smallholder farmers. arXiv. Cited by: §III-C7.
[37] R. K. Soothar et al. (2021) Effect of different soil moisture regimes on plant growth and water use efficiency of sunflower: experimental study and modeling. Bulletin of the National Research Centre. Cited by: TABLE I.
[38] W. Thorne et al. (2024) Increasing the difficulty of automatically generated questions via reinforcement learning with synthetic preference. arXiv. Cited by: §IV-A.
[39] D. Vasisht et al. (2017) FarmBeats: an IoT platform for Data-Driven agriculture. In NSDI, Cited by: §II.
[40] P. Verga et al. (2024) Replacing judges with juries: evaluating llm generations with a panel of diverse models. Cited by: §IV-B.
[41] S. Viswambharan and P. K. H. (2024) Chemical fertiliser (NPK) and its importance in agriculture: a comprehensive overview. IJFMR. Cited by: TABLE I.
[42] S. Wang et al. (2024) DomainRAG: a chinese benchmark for evaluating domain-specific retrieval-augmented generation. arXiv. Cited by: §IV-A.
[43] S. Wolfert et al. (2017) Big data in smart farming – a review. Agricultural Systems. Cited by: §I, §II.

Model	Developer	Capabilities
GPT‑4.1	OpenAI	Advanced reasoning, state‑of‑the‑art decision-making
GPT o3‑mini	OpenAI	Lightweight, efficient; strong analytical skills
Claude 3.7 Sonnet	Anthropic	Balanced reasoning, safe outputs, fast responses
Gemini 2.5 Pro	Google DeepMind	Robust multi‑turn analysis and contextual understanding