Knowledge Source Types
Knowledge sources are the content you add to train your RAG agents. PrimePilot supports multiple source types, each with its own ingestion and processing pipeline. The same source types are available for both knowledge bases and agent-specific knowledge.
Available Source Types
Documents
Upload PDFs, Word documents, Excel spreadsheets, PowerPoint presentations, text files, and other document formats. Documents are processed to extract text, which is then chunked and embedded for retrieval.
- Supported formats: PDF, Word (.doc, .docx), Excel (.xls, .xlsx), PowerPoint (.ppt, .pptx), Text (.txt), CSV, RTF
- Processing: Text extraction → chunking → embedding → vector store indexing
- Plan: Available on all plans
Individual URLs
Add specific webpage URLs to index. Useful for blog posts, articles, documentation pages, or any single web page you want your agent to reference.
- Use cases: Blog posts, news articles, documentation pages, support articles
- Processing: Web scraping → text extraction → chunking → embedding → indexing
- Plan: Available on all plans
Audio Recordings
Upload audio files for transcription and indexing. PrimePilot transcribes the audio to text, then processes it like a document.
- Supported formats: MP3, WAV, M4A, WebM, OGG (max 100MB per file)
- Use cases: Meeting recordings, interviews, podcasts, voice memos
- Processing: Transcription (voice-to-text) → text chunking → embedding → indexing
- Plan: Available on all plans
Sitemaps
Provide a sitemap XML URL to index multiple pages at once. PrimePilot discovers URLs from the sitemap and processes each page.
- Use cases: Documentation sites, blogs, product catalogs with sitemaps
- Processing: Sitemap parsing → URL discovery → page scraping → chunking → embedding → indexing
- Plan: Available on all plans
Google Product Feed
Import and sync product data from Google Shopping feeds. Ideal for e-commerce and product-focused agents.
- Use cases: Product catalogs, inventory, pricing, product descriptions
- Processing: Feed parsing → product data extraction → chunking → embedding → indexing
- Plan: Available on all plans
Google Drive (Pro)
Connect and sync files from Google Drive. Automatically indexes documents in selected folders.
- Use cases: Shared drives, team documents, cloud storage
- Plan: Pro plan only
Google Calendar (Pro)
Sync events and appointments from Google Calendar for context-aware scheduling and availability.
- Plan: Pro plan only
Full Websites (Pro)
Crawl and index entire websites. Configure crawl depth, frequency, and scope to index large sites.
- Use cases: Corporate sites, documentation portals, help centers
- Plan: Pro plan only
Processing Pipeline
When you add a knowledge source:
- Ingestion: Content is uploaded or fetched (e.g., from a URL or Drive)
- Extraction: Text is extracted (from documents, web pages, or transcribed audio)
- Chunking: Text is split into smaller segments for retrieval
- Embedding: Chunks are converted to vector embeddings using your configured embedding model
- Indexing: Embeddings are stored in your vector store for similarity search
Processing status is shown per source: pending, processing, completed, or failed. You can retry failed sources from the management pages.
Managing Sources
Sources can be managed from:
- Knowledge Base detail page → Knowledge Sources tab → Manage on each card
- Agent detail page → Knowledge Sources tab → Manage on each card
Each management page shows a filtered list of sources for that context (agent or knowledge base), with search, filters, and actions (upload, add, delete, process, download).