Privacy

AI Processing Policy

When you use MindWiki AI features, parts of your vault may be sent to upstream AI providers to generate a response. This document tells you exactly which providers, what data they receive, what they're contractually allowed to do with it, and how to turn AI off.

Updated 2026-05-25

Which AI features exist, and what they do

Vault-grounded chat

When you ask a question of MindWiki AI, the system:

  • Computes a vector embedding of your prompt and searches your vault for the most relevant page snippets.
  • Constructs a prompt that includes (a) the system prompt, (b) the retrieved snippets as grounding, and (c) your message.
  • Sends that prompt to an upstream LLM provider for completion.
  • Streams the response back, attributes it to the source pages, and displays it.

Live voice conversation

On iOS, Live Conversation streams microphone audio through LiveKit (real-time voice infrastructure) to a server-side agent that combines speech-to-text, vault-grounded reasoning, and text-to-speech. The audio is transient — not stored beyond the duration of the session. The transcript may be stored if you choose to save a session.

Voice capture

Voice Capture transcribes spoken audio into a new vault page. The audio is uploaded to our object store (R2) and attached to the page; the transcript is generated server-side and inserted into the page. You can disable saving the audio attachment in Settings.

Embeddings (search and retrieval)

For every page we generate vector embeddings that power semantic search and the retrieval step of chat. Embeddings are stored in Cloudflare Vectorize. Re-embedding happens when pages are edited.

Optional automation features (Pro and Power tiers)

Auto-Linker, weekly capture review, pattern detection, and other automation features run periodic scheduled jobs against your vault that include LLM calls. Each feature can be turned off independently in Settings.

Which providers receive what

The full list of subprocessors is on the Subprocessors page. The AI-relevant providers, summarized:

  • Anthropic — primary LLM for chat and agentic features. Receives the retrieved snippets + your message + the system prompt. Does not train on inputs from our API account.
  • OpenAI — selectively used for voice transcription (Whisper) and certain embedding models. Receives transient audio / text fragments. Does not train on API inputs.
  • Google AI — selectively used for cost-efficient embeddings. Receives text fragments. Does not train on API inputs.
  • LiveKit — real-time WebRTC audio transit for Live Conversation. Audio is ephemeral; no LiveKit-side persistence beyond session connection metadata.
  • Cloudflare AI / Vectorize — vector storage and (in some cases) on-platform inference. Same provider as the underlying infrastructure.

What we send vs. what we don't

We deliberately limit what crosses our boundary into third-party AI infrastructure.

What we send

  • The text of your prompt or voice utterance.
  • The page snippets retrieved from your vault that are most relevant to the prompt (typically a few kilobytes per turn).
  • System prompts and tool definitions for vault-aware features.
  • For Auto-Linker and similar features running on a schedule, the title, frontmatter, and a passage from each candidate page.

What we don't send

  • Your full vault to any provider as a bulk transfer.
  • Your account email, password, payment information, or device identifiers as part of an AI prompt.
  • Content from other users (no cross-tenant data ever leaves a single user's vault context).
  • Audio recordings beyond the active session, where we've told you we keep nothing.

Retention

  • Chat conversation history — stored in your vault for as long as you keep the conversation; deleted when you delete the conversation. Used by us only to render the conversation back to you.
  • LLM provider retention — each provider has its own retention. Anthropic, OpenAI, and Google AI all retain API inputs and outputs for limited windows (typically 30 days) for abuse monitoring; they do not train on the content. We do not extend their retention.
  • Voice audio— transient in transit through LiveKit; saved long-term only as an attachment in your vault if Voice Capture was used with the "Save audio attachment" option on.
  • Vector embeddings — stored as long as the source page exists. Deleted when the page is deleted; rebuilt when the page is edited.
  • Cached completions — we may cache responses to identical retrieval-grounded prompts within your account for a short window (hours) to improve perceived speed. Cache is per-account; not shared across users.

Training

We do not use your private vault content to train AI models — our own or third-party.

  • We use API-tier access at every upstream provider. On API tiers, providers do not train on inputs by default; the provider lists and our contracts confirm this.
  • We do not feed vault content into evaluations, fine-tuning datasets, or model improvement programs.
  • The exception is anonymized, aggregated metrics about MindWiki usage (counts, latencies, error rates) — no personal content, used only to operate the service.
  • If we ever want to use your content to train or evaluate something specific to your account (for example, a personalized retrieval model), we will ask for opt-in consent.

How to turn AI off

iOS

  • Settings → MindWiki AI → Voice conversations — toggle off to disable Live Conversation entry points.
  • Settings → Notifications → Product updates & Marketing emails — control the email channels independently.
  • iOS Settings app → MindWiki → Microphone — revoke microphone access entirely. The app cannot use voice features without the OS permission.

macOS

Preferences → AI → toggle individual features. Disabling AI here disables AI surfaces across the app, including chat and Auto-Linker.

MindWiki Cloud

Account → AI → toggle individual features. Same scope as macOS.

Full opt-out via support

If you want a hard server-side block — no AI processing of your account, period — email dpo@mindwiki.io and we'll flag the account. AI-dependent features (chat, voice, retrieval-grounded search, Auto-Linker) will be unavailable on the account after the flag is set.

When AI is required vs. optional

  • Required for the feature to function — chat, voice conversation, voice capture transcription, semantic search, Auto-Linker, pattern detection. Turning these features off disables them; we can't run them locally on your device today.
  • Optional / not enabled by default — pattern detection daily summaries, automated weekly capture review, marketing-channel push notifications.
  • Never AI-touched — sign-in, payment processing, sync, account management, vault export, account deletion. These flows do not pass through any AI provider.

Outputs are not guaranteed to be correct

AI outputs can be wrong, biased, incomplete, or fabricated. The Service is provided "as is" with respect to AI output; see the No-warranty section of the Acceptable Use Policy and the High-risk use cases section that prohibits using MindWiki AI as the sole authority for medical, legal, financial, or safety-of-life decisions.

Connecting your own AI agents (MCP)

When you connect Claude, ChatGPT, Codex, Cursor, Claude Code, or any other MCP-compatible agent to your vault, the agent is a third party operating under your direction. The data flow is:

  • The agent reads pages from your vault via the MCP server when you ask it to.
  • The agent sends what it read to the model behind it (Claude, GPT-4, etc.) as part of its own request.
  • That model is governed by the agent's privacy policy, not ours — the agent operator is the data processor for that step.
  • You remain responsible for what your agents do. Set MCP scopes conservatively, review writes, and revoke keys you no longer use.

Related documents