Skip to content

Add a reusable “Web Search Context” mode for all AI features (primary use case: Add Tags / auto-tagging) #565

@jethrop

Description

@jethrop

Your idea

Introduce a general-purpose Web Search Context mode that ThunderAI can apply to any AI function (summarize, rewrite, translate, classify, add tags, etc.) to optionally ground the model with lightweight web context before generating its final output.

This should be implemented as a shared, reusable capability (not a one-off for tagging), but with Add Tags / auto-tagging (prompt_add_tags) as the primary, first-class use case.

Key capabilities:

  • A global toggle to enable “Web Search Context,” plus per-feature controls (at minimum: tagging).
  • A user-editable Business context / custom instructions field that guides the web search step (for example, “I run a local bakery; prefer tags like Suppliers, Wholesale Orders, Delivery Apps…”).
  • A local RAG cache of web search results so repeated contexts do not require repeated web calls.
  • No changes to output schemas for existing prompts. For tagging specifically: do not embed sources, explanations, or metadata inside tag names; keep producing the same tag list output the tagging prompt already expects (for example {"tags":[...]}).

Value

  • Improves output quality for queries that benefit from up-to-date or niche information (vendors, tools, organizations, acronyms, regulations, product names).
  • Makes Add Tags / auto-tagging substantially more accurate for ambiguous senders without requiring users to maintain huge rule lists.
  • Reduces latency and cost over time via local caching (RAG-style reuse).
  • Keeps the UX consistent because each AI feature keeps its existing output format; this only adds upstream context.
Proposed UX and settings

Global settings:

  • Enable Web Search Context (default off)

  • Search provider (configurable; allow OpenAI-compatible provider options where relevant)

  • Business context / custom instructions (multi-line, optional; included in the web-search step)

  • RAG cache settings:

    • enable/disable
    • TTL (for example, 7 days)
    • max entries / storage limit
    • “Clear cache” button

Per-feature controls:

  • Apply Web Search Context to:

    • Add Tags / auto-tagging (primary)
    • Other AI features (optional checkboxes or per-command toggles)
Suggested implementation plan (modular)

Phase 1: Shared “Web Search Context” module + tagging integration

  1. Create a shared module (for example js/mzta-web-context.js) that exposes:

    • getWebContext({ queryTerms, businessContext, scope, cacheKey }): { contextText, sourcesMeta }
  2. Query building defaults (privacy-first):

    • Tagging default query terms: sender domain + subject
    • Optional scopes (explicit opt-in): include sanitized snippet
    • Include the Business context as guidance for query construction and/or result summarization
  3. Caching (RAG-lite first):

    • Cache raw/normalized snippets + metadata locally, keyed by sender domain and query signature
    • Reuse cached context when fresh; fall back to live web search otherwise
  4. Inject web context into the existing prompt pipeline:

    • Append a bounded “Web context” block (or provide a placeholder like {%web_context%})
    • Do not alter the existing output format requirements

Phase 2: Full local RAG cache (better reuse across features)

  • Store web results as small documents (title + snippet + source domain + timestamp).

  • Retrieval:

    • baseline lexical matching (domain + keywords)
    • optional semantic retrieval if embeddings are available/configured
  • For each AI function invocation, retrieve top-k relevant cached contexts and include them in the prompt context, bounded by strict limits.

Phase 3: Provider-native web grounding (optional)

  • If a configured provider supports native web grounding, allow selecting that mode.
  • Still cache the resulting context locally to avoid repeating work.
Guardrails and privacy notes
  • Default to sending minimal data for search (sender domain + subject for tagging).

  • Clearly disclose in the UI that enabling Web Search Context may transmit:

    • derived query terms
    • business context text (or a derived form of it)
    • optionally a sanitized snippet if explicitly enabled
  • Enforce strict size limits for:

    • business context
    • injected web context
    • cached documents
  • Hard failure behavior:

    • if web search fails, continue normally without web context (no blocking error)
Tagging-specific acceptance criteria (primary)
  • When enabled, tagging accuracy improves on ambiguous vendor/tool emails.
  • Tag output remains exactly the same shape as today (for example {"tags":[...]}); no extra text in tag values.
  • Cache reuse reduces repeated searches for the same sender domain over time.
General acceptance criteria
  • No behavior change when the feature is disabled.
  • Works as a shared capability that other AI functions can opt into without duplicating code.
  • Web search failures never break the main AI action; they only remove the extra context.

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    api_claudeClaude API Integration (it was Anthropic)api_google_geminiGoogle Gemini API Integrationapi_ollamaOllama API Integrationapi_openaiOpenAI API Integration for Chatgptapi_openai_compCompatible OpenAI API Integration for local LLMsnew feature

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions