Knowledge Source Types

Knowledge sources are the content you add to train your RAG agents. PrimePilot supports multiple source types, each with its own ingestion and processing pipeline. The same source types are available for both knowledge bases and agent-specific knowledge.

Available Source Types

Documents

Upload PDFs, Word documents, Excel spreadsheets, PowerPoint presentations, text files, and other document formats. Documents are processed to extract text, which is then chunked and embedded for retrieval.

Supported formats: PDF, Word (.doc, .docx), Excel (.xls, .xlsx), PowerPoint (.ppt, .pptx), Text (.txt), CSV, RTF
Processing: Text extraction → chunking → embedding → vector store indexing
Plan: Available on all plans

Individual URLs

Add specific webpage URLs to index. Useful for blog posts, articles, documentation pages, or any single web page you want your agent to reference.

Use cases: Blog posts, news articles, documentation pages, support articles
Processing: Web scraping → text extraction → chunking → embedding → indexing
Plan: Available on all plans

Audio Recordings

Upload audio files for transcription and indexing. PrimePilot transcribes the audio to text, then processes it like a document.

Supported formats: MP3, WAV, M4A, WebM, OGG (max 100MB per file)
Use cases: Meeting recordings, interviews, podcasts, voice memos
Processing: Transcription (voice-to-text) → text chunking → embedding → indexing
Plan: Available on all plans

Sitemaps

Provide a sitemap XML URL to index multiple pages at once. PrimePilot discovers URLs from the sitemap and processes each page.

Use cases: Documentation sites, blogs, product catalogs with sitemaps
Processing: Sitemap parsing → URL discovery → page scraping → chunking → embedding → indexing
Plan: Available on all plans

Google Product Feed

Import and sync product data from Google Shopping feeds. Ideal for e-commerce and product-focused agents.

Use cases: Product catalogs, inventory, pricing, product descriptions
Processing: Feed parsing → product data extraction → chunking → embedding → indexing
Plan: Available on all plans

Google Drive (Pro)

Connect and sync files from Google Drive. Automatically indexes documents in selected folders.

Use cases: Shared drives, team documents, cloud storage
Plan: Pro plan only

Google Calendar (Pro)

Sync events and appointments from Google Calendar for context-aware scheduling and availability.

Plan: Pro plan only

Full Websites (Pro)

Crawl and index entire websites. Configure crawl depth, frequency, and scope to index large sites.

Use cases: Corporate sites, documentation portals, help centers
Plan: Pro plan only

Processing Pipeline

When you add a knowledge source:

Ingestion: Content is uploaded or fetched (e.g., from a URL or Drive)
Extraction: Text is extracted (from documents, web pages, or transcribed audio)
Chunking: Text is split into smaller segments for retrieval
Embedding: Chunks are converted to vector embeddings using your configured embedding model
Indexing: Embeddings are stored in your vector store for similarity search

Processing status is shown per source: pending, processing, completed, or failed. You can retry failed sources from the management pages.

Managing Sources

Sources can be managed from:

Knowledge Base detail page → Knowledge Sources tab → Manage on each card
Agent detail page → Knowledge Sources tab → Manage on each card

Each management page shows a filtered list of sources for that context (agent or knowledge base), with search, filters, and actions (upload, add, delete, process, download).