Knowledge Base - OrbitAI

Overview

Knowledge Bases in OrbitAI enable agents to access and retrieve information from external documents and data sources. Using Retrieval Augmented Generation (RAG), agents can ground their responses in factual information, answer questions about specific domains, and provide accurate, contextual assistance based on your organization’s knowledge.

RAG-Enabled

Retrieval Augmented Generation for grounded responses

Multi-Format

Support for PDF, Markdown, JSON, and text documents

Semantic Search

Embedding-based retrieval finds relevant information

Automatic

Agents automatically query knowledge during execution

Scalable

Handle large document collections efficiently

Dynamic

Add or update knowledge sources at runtime

Key Capabilities

Retrieval Augmented Generation (RAG)

RAG combines the power of LLMs with factual information from your knowledge base. Agents automatically retrieve relevant context from documents and use it to generate accurate, grounded responses.

Semantic Document Search

Using vector embeddings, the knowledge base performs semantic search to find relevant information even when queries don’t exactly match document text. This enables natural language queries over your documents.

Automatic Context Injection

When agents execute tasks, relevant knowledge is automatically retrieved and injected into the LLM context, requiring no manual intervention or query operations.

Multi-Document Synthesis

Agents can synthesize information from multiple documents simultaneously, creating comprehensive answers that draw from your entire knowledge base.

How Knowledge Bases Work

RAG Architecture

Knowledge Base RAG Pipeline
    ├── 1. Document Ingestion
    │   ├── Load documents from file paths
    │   ├── Parse content (PDF, MD, JSON, TXT)
    │   ├── Split into chunks (with overlap)
    │   └── Extract metadata
    │
    ├── 2. Embedding & Indexing
    │   ├── Generate embeddings for each chunk
    │   ├── Create vector index
    │   ├── Store in vector database
    │   └── Build metadata index
    │
    ├── 3. Query Processing
    │   ├── Agent generates query from task
    │   ├── Embed query using same model
    │   ├── Vector similarity search
    │   └── Rank results by relevance
    │
    ├── 4. Context Retrieval
    │   ├── Retrieve top-k relevant chunks
    │   ├── Apply similarity threshold
    │   ├── Rerank if needed
    │   └── Format as context
    │
    └── 5. Response Generation
        ├── Inject retrieved context into prompt
        ├── LLM generates response
        ├── Ground response in source documents
        └── Return with citations (optional)

Embedding-Based Retrieval

Knowledge bases use vector embeddings to enable semantic search:

How It Works
Vector Similarity
Embedding Models

Step 1: Document Processing

Document: "The OrbitAI framework enables multi-agent orchestration"
↓
Chunks: ["The OrbitAI framework enables", "enables multi-agent orchestration"]
↓
Embeddings: [0.234, -0.112, 0.445, ...], [0.556, -0.223, 0.334, ...]

Step 2: Query Processing

Query: "How does OrbitAI work?"
↓
Query Embedding: [0.245, -0.108, 0.432, ...]
↓
Similarity Search: Find closest document embeddings

Step 3: Retrieval

Similarity Scores:
  Chunk 1: 0.92 (very relevant) ✓
  Chunk 2: 0.88 (very relevant) ✓
  Chunk 3: 0.65 (less relevant) ✗
↓
Return top-k chunks above threshold

System Integration

OrbitAI Execution Flow with Knowledge Base
    ├── User Request
    │   └── "What are the agent capabilities?"
    │
    ├── Orbit Orchestrator
    │   └── Routes to appropriate agent
    │
    ├── Agent Execution
    │   ├── Analyzes task requirements
    │   ├── Formulates knowledge query
    │   └── Triggers knowledge retrieval
    │
    ├── Knowledge Base Query
    │   ├── Embeds query
    │   ├── Searches vector index
    │   ├── Retrieves relevant chunks
    │   └── Returns context
    │
    ├── LLM Processing
    │   ├── Context: Retrieved knowledge
    │   ├── Task: User request
    │   ├── Agent: Role and purpose
    │   └── Generates response
    │
    └── Response
        └── "Agents have tools, memory, and knowledge..."

Knowledge retrieval happens automatically during agent execution. You don’t need to manually query the knowledge base—agents do it for you based on task requirements.

Configuration and Usage

Basic Configuration

Add knowledge sources to agents or orbits using simple file paths:

import OrbitAI

let agent = Agent(
    role: "Documentation Assistant",
    purpose: "Answer questions about product documentation",
    context: "Expert assistant with access to all product docs",
    knowledgeSources: [
        "./docs/user-guide.pdf",
        "./docs/api-reference.md",
        "./docs/faq.txt"
    ]
)

Agent-Level Knowledge Base

Configure knowledge bases for individual agents:

Define Knowledge Sources

Specify document paths when creating the agent:

let knowledgeAgent = Agent(
    role: "Knowledge Expert",
    purpose: "Provide expert answers using company knowledge",
    context: """
    Expert assistant with deep knowledge of:
    - Company policies
    - Product specifications
    - Customer FAQs
    - Technical documentation
    """,
    knowledgeSources: [
        // Company policies
        "./knowledge/policies/employee-handbook.pdf",
        "./knowledge/policies/code-of-conduct.md",

        // Product information
        "./knowledge/products/catalog.json",
        "./knowledge/products/specifications.pdf",

        // Support documentation
        "./knowledge/support/faq.txt",
        "./knowledge/support/troubleshooting.md"
    ]
)

Create Tasks

Create tasks that leverage the knowledge base:

let queryTask = ORTask(
    description: """
    Answer the user's question using information from the knowledge base.
    Provide accurate, detailed responses with specific references.
    """,
    expectedOutput: "Comprehensive answer with source references",
    agent: knowledgeAgent
)

Execute

Run the orbit—knowledge is automatically queried:

let orbit = try await Orbit.create(
    name: "Knowledge Query System",
    agents: [knowledgeAgent],
    tasks: [queryTask],
    inputs: ["user_query": "What is our return policy?"]
)

let result = try await orbit.run()
// Agent automatically queried knowledge base
// Response grounded in return policy document
print(result.output)

Orbit-Level Knowledge Base

Share knowledge across all agents in an orbit:

let orbit = try await Orbit.create(
    name: "Documentation System",
    agents: [searchAgent, summaryAgent, answerAgent],
    tasks: [searchTask, summaryTask, answerTask],
    process: .sequential,
    knowledgeSources: [
        // Shared knowledge for all agents
        "./docs/product-manual.pdf",
        "./docs/api-reference.md",
        "./data/product-catalog.json"
    ]
)

When to Use
Priority Rules
Best Practices

Agent-level knowledge base:

Agents need different domain knowledge
Specialized agents with unique document sets
Fine-grained control over knowledge access
Separate knowledge bases per role

Orbit-level knowledge base:

All agents need same knowledge
Collaborative workflows
Shared company knowledge
Simplified configuration

Example:

// Different knowledge per agent
let legalAgent = Agent(
    role: "Legal Advisor",
    knowledgeSources: ["./legal/contracts.pdf"]
)

let hrAgent = Agent(
    role: "HR Assistant",
    knowledgeSources: ["./hr/policies.pdf"]
)

// vs. shared knowledge
let orbit = try await Orbit.create(
    name: "Company Assistant",
    agents: [generalAgent1, generalAgent2],
    knowledgeSources: [  // Shared by all
        "./company/handbook.pdf"
    ]
)

Dynamic Knowledge Sources

Add knowledge sources dynamically after creation:

// Create orbit
let orbit = try await Orbit.create(
    name: "Adaptive System",
    agents: [agent],
    tasks: [task],
    knowledgeSources: [
        "./initial-knowledge.pdf"
    ]
)

// Add new knowledge source at runtime
await orbit.addKnowledgeSource("./new-document.pdf")
await orbit.addKnowledgeSource("./updated-policy.md")

// Knowledge base automatically updates
// Agents can now access new documents
let result = try await orbit.run()

Use Cases:

Loading documents based on user input
Adding knowledge as it becomes available
A/B testing different knowledge sources
Incremental knowledge base building

Knowledge Sources

OrbitAI supports multiple document formats for knowledge ingestion:

Supported File Formats

PDF Documents

Format: .pdf Use Cases: Manuals, reports, research papers, books Features:

Text extraction from pages
Metadata preservation
Table detection
Multi-page support

Example:

knowledgeSources: [
    "./manuals/user-guide.pdf",
    "./reports/annual-report-2024.pdf",
    "./research/whitepaper.pdf"
]

Markdown Files

Format: .md, .mdx Use Cases: Documentation, wikis, README files Features:

Native markdown parsing
Header hierarchy preservation
Code block handling
Link resolution

Example:

knowledgeSources: [
    "./wiki/getting-started.md",
    "./docs/api-reference.mdx",
    "./README.md"
]

JSON Data

Format: .json Use Cases: Structured data, catalogs, configuration Features:

Structured data parsing
Nested object handling
Array processing
Schema-aware search

Example:

knowledgeSources: [
    "./data/product-catalog.json",
    "./config/settings.json",
    "./customers/profiles.json"
]

Plain Text

Format: .txt Use Cases: FAQs, notes, transcripts, logs Features:

Simple text ingestion
Fast processing
No formatting overhead
Universal compatibility

Example:

knowledgeSources: [
    "./support/faq.txt",
    "./notes/meeting-notes.txt",
    "./data/customer-feedback.txt"
]

File Path Specifications

Knowledge sources accept various path formats:

Relative Paths
Absolute Paths
Directory Paths
URL Paths

Relative to current working directory:

knowledgeSources: [
    "./docs/guide.pdf",              // Current dir + docs/
    "../shared/knowledge.md",        // Parent dir + shared/
    "local-file.txt"                 // Current dir
]

Best for: Project-relative documents

Document Organization

Organize knowledge sources for optimal retrieval:

// Poor organization
knowledgeSources: [
    "./doc1.pdf",
    "./doc2.pdf",
    "./doc3.pdf",
    "./doc4.pdf",
    "./doc5.pdf"  // What are these?
]

// Good organization
knowledgeSources: [
    // Product documentation
    "./knowledge/products/user-guide.pdf",
    "./knowledge/products/api-reference.md",

    // Company policies
    "./knowledge/policies/employee-handbook.pdf",
    "./knowledge/policies/security-policy.md",

    // Support resources
    "./knowledge/support/faq.txt",
    "./knowledge/support/troubleshooting.md",

    // Data resources
    "./knowledge/data/product-catalog.json",
    "./knowledge/data/pricing.json"
]

Organization Best Practice: Use a clear directory structure that mirrors your knowledge domains. This makes maintenance easier and helps with debugging retrieval issues.

Knowledge Source Examples

Customer Support System

let supportAgent = Agent(
    role: "Customer Support Agent",
    purpose: "Help customers with issues using knowledge base",
    context: "Friendly support agent with comprehensive product knowledge",
    knowledgeSources: [
        // Product documentation
        "./kb/products/user-manual.pdf",
        "./kb/products/quick-start-guide.pdf",

        // Troubleshooting guides
        "./kb/support/common-issues.md",
        "./kb/support/error-codes.txt",
        "./kb/support/troubleshooting-steps.md",

        // FAQ database
        "./kb/faq/general-faq.txt",
        "./kb/faq/technical-faq.md",

        // Policy documents
        "./kb/policies/return-policy.pdf",
        "./kb/policies/warranty-info.pdf"
    ]
)

let supportTask = ORTask(
    description: """
    Answer the customer's question using the knowledge base.
    Provide clear, accurate information with specific references.
    If information is not in knowledge base, say so clearly.
    """,
    expectedOutput: "Helpful answer with source references"
)

Legal Document Analysis

let legalAgent = Agent(
    role: "Legal Research Assistant",
    purpose: "Research legal documents and provide analysis",
    context: "Legal expert with access to contracts and case law",
    knowledgeSources: [
        // Contracts
        "./legal/contracts/vendor-agreements/",
        "./legal/contracts/employment/",

        // Policies
        "./legal/policies/data-privacy.pdf",
        "./legal/policies/terms-of-service.md",

        // Case references
        "./legal/cases/precedents.json",
        "./legal/cases/summaries.md",

        // Regulations
        "./legal/regulations/compliance-requirements.pdf"
    ]
)

Product Recommendation Engine

let recommendationAgent = Agent(
    role: "Product Advisor",
    purpose: "Recommend products based on customer needs",
    context: "Expert advisor with complete product knowledge",
    knowledgeSources: [
        // Product catalog
        "./products/catalog.json",
        "./products/specifications.pdf",

        // Reviews and ratings
        "./products/reviews.json",
        "./products/customer-feedback.txt",

        // Comparison guides
        "./products/comparison-charts.md",
        "./products/buying-guides.pdf",

        // Inventory information
        "./products/availability.json",
        "./products/pricing.json"
    ]
)

Technical Documentation Assistant

let techDocsAgent = Agent(
    role: "Technical Writer Assistant",
    purpose: "Help with technical documentation queries",
    context: "Expert in API documentation and technical writing",
    knowledgeSources: [
        // API documentation
        "./docs/api/rest-api.md",
        "./docs/api/graphql-api.md",
        "./docs/api/webhooks.md",

        // SDK documentation
        "./docs/sdks/swift-sdk.md",
        "./docs/sdks/python-sdk.md",

        // Architecture guides
        "./docs/architecture/system-design.pdf",
        "./docs/architecture/deployment.md",

        // Code examples
        "./docs/examples/quickstart.md",
        "./docs/examples/tutorials/",

        // Changelog
        "./docs/changelog.md"
    ]
)

Healthcare Information System

let healthcareAgent = Agent(
    role: "Medical Information Assistant",
    purpose: "Provide medical information from approved sources",
    context: "Medical assistant with access to clinical guidelines",
    knowledgeSources: [
        // Clinical guidelines
        "./medical/guidelines/treatment-protocols.pdf",
        "./medical/guidelines/diagnostic-criteria.md",

        // Drug information
        "./medical/drugs/formulary.json",
        "./medical/drugs/interactions.pdf",

        // Procedures
        "./medical/procedures/standard-procedures.md",
        "./medical/procedures/safety-protocols.pdf",

        // Patient education
        "./medical/education/condition-guides.pdf",
        "./medical/education/preventive-care.md"
    ]
)

Medical Disclaimer: Healthcare applications require careful validation and should not replace professional medical advice. Always ensure compliance with healthcare regulations (HIPAA, etc.).

Integration Patterns

Knowledge Base + Memory

Combine knowledge bases with memory systems for powerful agents:

let hybridAgent = Agent(
    role: "Adaptive Assistant",
    purpose: "Provide personalized help using both knowledge and memory",
    context: """
    Intelligent assistant that:
    - Uses knowledge base for factual information
    - Uses memory for user preferences and history
    - Combines both for personalized, accurate responses
    """,
    // Memory for user interactions
    memory: true,
    longTermMemory: true,
    // Knowledge base for factual information
    knowledgeSources: [
        "./knowledge/product-docs.pdf",
        "./knowledge/company-info.md"
    ]
)

How it works:

User Query: "What are the features of Product X?"
    ↓
Agent Processing:
    ├── Memory Check: "User previously asked about Product X pricing"
    ├── Knowledge Query: "Product X features from documentation"
    └── Synthesis: Personalized response combining both
    ↓
Response: "Product X has features A, B, C (from knowledge base).
           Based on your previous interest in pricing (from memory),
           you might also want to know that it's available at..."

Use Case: Personalized Support
Use Case: Learning System
Use Case: Sales Assistant

let personalizedSupport = Agent(
    role: "Personal Support Agent",
    purpose: "Provide personalized customer support",
    context: "Support agent with knowledge and memory",

    // Remember customer history
    memory: true,
    longTermMemory: true,

    // Access support knowledge
    knowledgeSources: [
        "./support/faq.txt",
        "./support/troubleshooting.md"
    ]
)

// First interaction
// User: "How do I reset my password?"
// Agent uses: Knowledge base for procedure
// Agent stores: User asked about password reset

// Second interaction (later)
// User: "I'm still having issues"
// Agent uses: Memory (knows about password reset)
//            + Knowledge base (troubleshooting steps)
// Response: Contextual help for password reset issues

Knowledge Base + Tools

Combine knowledge bases with tools for action-oriented agents:

let actionableAgent = Agent(
    role: "Executive Assistant",
    purpose: "Answer questions and take actions",
    context: "Assistant with knowledge and capabilities to act",

    // Knowledge for information
    knowledgeSources: [
        "./calendar-policies.md",
        "./company-contacts.json"
    ],

    // Tools for actions
    tools: [
        "apple_calendar",  // Create calendar events
        "send_email",      // Send emails
        "web_search"       // Search for info not in KB
    ]
)

let task = ORTask(
    description: """
    Check the company contacts in the knowledge base for John's email,
    then send him an email about the meeting using the send_email tool,
    and create a calendar event using the calendar tool based on
    the meeting policies in the knowledge base.
    """,
    expectedOutput: "Confirmation of email sent and event created"
)

Agent workflow:

Query knowledge base: Find John’s email in contacts
Query knowledge base: Check meeting policies for defaults
Use tool: Send email to John
Use tool: Create calendar event
Return: Confirmation with details

Knowledge Base Access in Tasks

Access knowledge base programmatically in custom tasks:

let customTask = ORTask(
    description: "Custom knowledge retrieval task",
    expectedOutput: "Processed knowledge results",
    customHandler: { context in
        guard let knowledgeBase = context.knowledgeBase else {
            return "Knowledge base not available"
        }

        // Query knowledge base
        let results = try await knowledgeBase.query(
            query: "product specifications",
            limit: 5,
            threshold: 0.75
        )

        // Process results
        var output = "Found \(results.count) relevant documents:\n\n"

        for (index, result) in results.enumerated() {
            output += "\(index + 1). \(result.metadata.filename)\n"
            output += "   Relevance: \(result.score)\n"
            output += "   Content: \(result.content.prefix(100))...\n\n"
        }

        return output
    }
)

Best Practices

Document Preparation

Clean Documents

Prepare documents for optimal retrieval:Do:

Remove unnecessary formatting
Use clear headings and structure
Include relevant metadata
Keep content focused

Don’t:

Include excessive boilerplate
Use unclear abbreviations
Mix unrelated topics
Keep outdated information

Chunk-Friendly Content

Structure content for effective chunking:Good structure:

## Feature Name

Brief description of the feature.

### How It Works

Detailed explanation...

### Use Cases

- Use case 1
- Use case 2

Poor structure:

FeatureName:desc:works:cases...
[All in one block]

Rich Metadata

Include metadata for better retrieval:PDF: Use title, author, subject fields Markdown: Include frontmatter

---
title: Feature Documentation
category: User Guide
version: 2.0
---

JSON: Structure with metadata

{
  "metadata": {
    "category": "products",
    "updated": "2024-01-15"
  },
  "content": {...}
}

Document Size

Optimal document sizing:Too small: < 1 page

Merge related documents
Create topic-based documents

Optimal: 5-50 pages

Good chunk coverage
Manageable retrieval

Too large: > 100 pages

Split into logical sections
Create separate documents per topic

Knowledge Base Architecture

Small Projects
Medium Projects
Large Projects
Enterprise Scale

< 10 documents

// Simple, flat structure
let agent = Agent(
    role: "Assistant",
    knowledgeSources: [
        "./docs/guide.pdf",
        "./docs/faq.txt",
        "./data/catalog.json"
    ]
)

Characteristics:

Single agent with all knowledge
Flat file structure
No complex organization needed

Retrieval Optimization

Tune Similarity Thresholds

Adjust thresholds based on retrieval quality:

// Low threshold (0.6-0.7): Broad retrieval
// More results, some may be less relevant
let broadConfig = KnowledgeConfiguration(
    similarityThreshold: 0.65
)

// Medium threshold (0.75-0.8): Balanced
// Good balance of recall and precision
let balancedConfig = KnowledgeConfiguration(
    similarityThreshold: 0.75  // Recommended
)

// High threshold (0.85-0.95): Precise
// Fewer results, high relevance
let preciseConfig = KnowledgeConfiguration(
    similarityThreshold: 0.90
)

Testing approach:

// Test different thresholds
for threshold in [0.6, 0.7, 0.75, 0.8, 0.85, 0.9] {
    let results = try await kb.query(
        query: testQuery,
        threshold: threshold
    )
    print("Threshold \(threshold): \(results.count) results")
    // Evaluate quality and adjust
}

Limit Result Count

Retrieve optimal number of results:

// Too few (1-2): May miss relevant info
let tooFew = try await kb.query(query: query, limit: 1)

// Optimal (3-10): Good context without overload
let optimal = try await kb.query(query: query, limit: 5)

// Too many (20+): Context overload, slower
let tooMany = try await kb.query(query: query, limit: 25)

Guidelines:

Quick answers: 3-5 results
Comprehensive analysis: 5-10 results
Research tasks: 10-20 results
Monitor context window: Don’t exceed LLM limits

Query Formulation

Formulate effective queries:Poor queries:

"product"          // Too vague
"x"                // Too short
"aslkdjf"          // Nonsense

Good queries:

"product features and specifications"
"return policy for damaged items"
"API authentication methods"

Best practice:

// Let agents formulate queries naturally
let task = ORTask(
    description: """
    Answer the user's question about our return policy.
    Be specific about damaged items versus change of mind.
    """,
    expectedOutput: "Detailed return policy explanation"
)
// Agent will formulate appropriate knowledge query

Caching Strategies

Cache frequently accessed knowledge:

final class CachedKnowledgeBase {
    private var queryCache: [String: [KnowledgeResult]] = [:]
    private let cacheExpiry: TimeInterval = 3600  // 1 hour

    func query(
        query: String,
        limit: Int,
        threshold: Double
    ) async throws -> [KnowledgeResult] {
        // Check cache
        if let cached = queryCache[query] {
            return Array(cached.prefix(limit))
        }

        // Query knowledge base
        let results = try await performQuery(
            query: query,
            limit: limit,
            threshold: threshold
        )

        // Cache results
        queryCache[query] = results

        return results
    }

    // Periodic cache cleanup
    func cleanupCache() {
        // Remove old entries
    }
}

Production Best Practices

Version Control

Track knowledge base changes:

# Store knowledge in version control
git add knowledge/
git commit -m "Update product documentation"

# Tag knowledge versions
git tag -a kb-v1.2 -m "Knowledge base v1.2"

Benefits:

Track document changes
Rollback if needed
Coordinate with code releases

Validation

Validate knowledge base setup:

func validateKnowledgeBase() async throws {
    // Check files exist
    for source in knowledgeSources {
        guard FileManager.default.fileExists(
            atPath: source
        ) else {
            throw ValidationError.fileNotFound(source)
        }
    }

    // Test retrieval
    let testQuery = "test query"
    let results = try await kb.query(
        query: testQuery,
        limit: 1
    )

    guard !results.isEmpty else {
        throw ValidationError.noResults
    }

    print("✓ Knowledge base validated")
}

Monitoring

Monitor knowledge base usage:

// Log queries
let stats = KnowledgeStats()

func query(...) async throws -> [Result] {
    stats.queryCount += 1
    let start = Date()

    let results = try await performQuery(...)

    stats.avgLatency = updateAverage(
        Date().timeIntervalSince(start)
    )
    stats.avgResultCount = updateAverage(
        results.count
    )

    return results
}

// Regular reporting
print("""
Knowledge Base Stats:
  Queries: \(stats.queryCount)
  Avg Latency: \(stats.avgLatency)s
  Avg Results: \(stats.avgResultCount)
""")

Documentation

Document your knowledge base:

# Knowledge Base Documentation

## Structure
- `/knowledge/products/` - Product docs
- `/knowledge/support/` - Support docs
- `/knowledge/company/` - Company info

## Update Process
1. Update source documents
2. Validate changes
3. Deploy to production
4. Monitor retrieval quality

## Maintenance
- Review quarterly
- Remove outdated docs
- Add new information

Troubleshooting

Common Issues

Knowledge Base Not Loading

Symptom: Agent can’t access knowledge base or documents not found.Causes:

Invalid file paths
Missing files
Permission issues
Unsupported file format

Diagnosis:

// Check file existence
for source in knowledgeSources {
    let exists = FileManager.default.fileExists(atPath: source)
    print("\(source): \(exists ? "✓" : "✗ NOT FOUND")")

    if exists {
        let isReadable = FileManager.default.isReadableFile(
            atPath: source
        )
        print("  Readable: \(isReadable ? "✓" : "✗")")
    }
}

// Check file format
let path = knowledgeSources[0]
let ext = (path as NSString).pathExtension
print("File extension: .\(ext)")
print("Supported: \(["pdf", "md", "mdx", "json", "txt"].contains(ext))")

Solutions:

// 1. Use absolute paths
let homeDir = FileManager.default.homeDirectoryForCurrentUser
let docPath = homeDir.appendingPathComponent("Documents/knowledge/doc.pdf")

knowledgeSources: [
    docPath.path  // Absolute path
]

// 2. Verify paths before creating agent
func validatePaths(_ paths: [String]) throws {
    for path in paths {
        guard FileManager.default.fileExists(atPath: path) else {
            throw KBError.fileNotFound(path)
        }
    }
}

try validatePaths(knowledgeSources)

// 3. Create directory if needed
let kbDir = "./knowledge"
try FileManager.default.createDirectory(
    atPath: kbDir,
    withIntermediateDirectories: true
)

// 4. Check file permissions
// Ensure files are readable (not protected)

Poor Retrieval Quality

Symptom: Irrelevant results or missing relevant information.Causes:

Threshold too high or too low
Poor embedding model
Document quality issues
Query formulation problems
Insufficient knowledge coverage

Diagnosis:

// Test with known queries
let testCases = [
    ("product features", 5),
    ("return policy", 3),
    ("technical specifications", 5)
]

for (query, expectedMin) in testCases {
    let results = try await kb.query(
        query: query,
        limit: 10,
        threshold: 0.7
    )

    print("\nQuery: \(query)")
    print("Results: \(results.count) (expected ≥ \(expectedMin))")

    if results.isEmpty {
        print("⚠️  No results - check documents contain this info")
    }

    for (i, result) in results.prefix(3).enumerated() {
        print("\(i+1). Score: \(result.score) - \(result.metadata.filename)")
        print("   \(result.content.prefix(100))...")
    }
}

Solutions:

// 1. Lower similarity threshold
let config = KnowledgeConfiguration(
    similarityThreshold: 0.65  // Down from 0.75
)

// 2. Increase result limit
let results = try await kb.query(
    query: query,
    limit: 10  // Up from 5
)

// 3. Improve documents
// - Add more context
// - Use clear headings
// - Include synonyms and related terms

// 4. Better embedding model
let config = KnowledgeConfiguration(
    embeddingModel: "text-embedding-3-large"  // Higher quality
)

// 5. Add missing documents
await orbit.addKnowledgeSource("./additional-docs.pdf")

// 6. Test query variations
let queries = [
    "product features",
    "what are the features",
    "product capabilities",
    "features and benefits"
]

for query in queries {
    let results = try await kb.query(query: query, limit: 5)
    print("\(query): \(results.count) results")
}

Slow Knowledge Retrieval

Symptom: Knowledge queries take too long.Causes:

Large knowledge base
Expensive embedding generation
No caching
Inefficient vector search
Network latency (remote embeddings)

Diagnosis:

// Measure query time
let start = Date()
let results = try await kb.query(
    query: "test query",
    limit: 5,
    threshold: 0.75
)
let duration = Date().timeIntervalSince(start)

print("Query took: \(duration)s")

if duration > 2.0 {
    print("⚠️  Slow query (> 2s)")
}

// Profile components
let embedStart = Date()
let embedding = try await generateEmbedding("test")
print("Embedding: \(Date().timeIntervalSince(embedStart))s")

let searchStart = Date()
let searchResults = try await vectorSearch(embedding)
print("Search: \(Date().timeIntervalSince(searchStart))s")

Solutions:

// 1. Implement caching
let cachedKB = CachedKnowledgeBase(underlying: kb)

// 2. Use faster embedding model
let config = KnowledgeConfiguration(
    embeddingModel: "text-embedding-3-small"  // Faster
)

// 3. Reduce knowledge base size
// Remove unused documents
// Split into specialized agents

// 4. Use vector database for large scale
let vectorDB = VectorDatabaseConfig(
    provider: "pinecone",
    // Optimized for fast retrieval
)

// 5. Prefetch likely queries
Task.detached {
    let commonQueries = ["faq", "pricing", "features"]
    for query in commonQueries {
        try await kb.query(query: query, limit: 5)
        // Warms cache
    }
}

// 6. Batch embeddings
let embeddings = try await generateEmbeddingsBatch(queries)
// Faster than individual calls

Knowledge Not Updating

Symptom: Updated documents not reflected in retrieval.Causes:

Cache not invalidated
Index not refreshed
Using old orbit instance
Documents not reprocessed

Diagnosis:

// Check document modification time
let path = "./knowledge/doc.pdf"
let attrs = try FileManager.default.attributesOfItem(atPath: path)
let modDate = attrs[.modificationDate] as? Date
print("Document modified: \(modDate ?? Date())")

// Check index update time
let indexDate = try await kb.getLastIndexUpdate()
print("Index updated: \(indexDate)")

if let mod = modDate, let idx = indexDate, mod > idx {
    print("⚠️  Document newer than index - needs reindex")
}

Solutions:

// 1. Recreate knowledge base
let newOrbit = try await Orbit.create(
    name: "Updated System",
    agents: agents,
    tasks: tasks,
    knowledgeSources: knowledgeSources  // Will reprocess
)

// 2. Clear cache
await kb.clearCache()

// 3. Force reindex
await kb.reindex()

// 4. Dynamic update
// Remove old source
await orbit.removeKnowledgeSource("./old-doc.pdf")
// Add updated source
await orbit.addKnowledgeSource("./updated-doc.pdf")

// 5. Implement auto-refresh
Task {
    while isRunning {
        try await Task.sleep(nanoseconds: 3600 * 1_000_000_000)
        await kb.reindexIfNeeded()  // Check for changes
    }
}

Out of Memory Errors

Symptom: Application crashes or memory errors when loading knowledge base.Causes:

Too many documents
Documents too large
All documents loaded at once
Embeddings cached in memory

Diagnosis:

// Check memory usage
let memoryUsage = getMemoryUsage()
print("Memory: \(memoryUsage) MB")

// Count documents
print("Knowledge sources: \(knowledgeSources.count)")

// Check file sizes
var totalSize: Int64 = 0
for source in knowledgeSources {
    let attrs = try FileManager.default.attributesOfItem(
        atPath: source
    )
    let size = attrs[.size] as? Int64 ?? 0
    totalSize += size
    print("\(source): \(size / 1024 / 1024) MB")
}
print("Total: \(totalSize / 1024 / 1024) MB")

Solutions:

// 1. Reduce knowledge base size
// Use only essential documents
knowledgeSources: [
    "./essential-docs.pdf"  // Not entire library
]

// 2. Split into specialized agents
let agent1 = Agent(
    role: "Product Expert",
    knowledgeSources: ["./products/"]  // Subset
)

let agent2 = Agent(
    role: "Support Expert",
    knowledgeSources: ["./support/"]  // Different subset
)

// 3. Use lazy loading
let config = KnowledgeConfiguration(
    loadingStrategy: .lazy  // Load on demand
)

// 4. External vector database
// Don't load all in memory
let vectorDB = VectorDatabaseConfig(
    provider: "pinecone"  // External storage
)

// 5. Limit document size
// Split large PDFs into smaller documents

// 6. Clear embeddings cache periodically
await kb.clearEmbeddingsCache()

Debugging Knowledge Bases

Create debugging utilities for knowledge base inspection:

final class KnowledgeBaseDebugger {
    let knowledgeBase: KnowledgeBase

    init(knowledgeBase: KnowledgeBase) {
        self.knowledgeBase = knowledgeBase
    }

    func printStatistics() async throws {
        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
        print("Knowledge Base Statistics")
        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")

        let stats = try await knowledgeBase.getStatistics()

        print("Documents: \(stats.documentCount)")
        print("Chunks: \(stats.chunkCount)")
        print("Total size: \(stats.totalSizeBytes / 1024 / 1024) MB")
        print("Avg chunks per doc: \(stats.chunkCount / max(stats.documentCount, 1))")
        print("Index size: \(stats.indexSizeBytes / 1024 / 1024) MB")
        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
    }

    func listDocuments() async throws {
        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
        print("Knowledge Base Documents")
        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")

        let docs = try await knowledgeBase.listDocuments()

        for (index, doc) in docs.enumerated() {
            print("\n[\(index + 1)] \(doc.filename)")
            print("   Path: \(doc.path)")
            print("   Format: \(doc.format)")
            print("   Size: \(doc.sizeBytes / 1024) KB")
            print("   Chunks: \(doc.chunkCount)")
            print("   Indexed: \(doc.indexedDate)")
        }

        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
    }

    func testQuery(_ query: String) async throws {
        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
        print("Testing Query: \"\(query)\"")
        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")

        // Test various thresholds
        for threshold in [0.6, 0.7, 0.75, 0.8, 0.9] {
            let results = try await knowledgeBase.query(
                query: query,
                limit: 10,
                threshold: threshold
            )

            print("\nThreshold \(threshold): \(results.count) results")

            for (i, result) in results.prefix(3).enumerated() {
                print("  \(i+1). Score: \(String(format: "%.3f", result.score))")
                print("     Source: \(result.metadata.filename)")
                print("     Content: \(result.content.prefix(80))...")
            }
        }

        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
    }

    func validateSources(_ sources: [String]) {
        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
        print("Validating Knowledge Sources")
        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")

        var valid = 0
        var invalid = 0

        for source in sources {
            let exists = FileManager.default.fileExists(atPath: source)
            let readable = FileManager.default.isReadableFile(atPath: source)

            if exists && readable {
                print("✓ \(source)")
                valid += 1
            } else {
                print("✗ \(source)")
                if !exists {
                    print("  Error: File not found")
                } else if !readable {
                    print("  Error: Not readable")
                }
                invalid += 1
            }
        }

        print("\nSummary: \(valid) valid, \(invalid) invalid")
        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
    }
}

// Usage
let debugger = KnowledgeBaseDebugger(knowledgeBase: orbit.knowledgeBase!)
try await debugger.printStatistics()
try await debugger.listDocuments()
try await debugger.testQuery("product features")
debugger.validateSources(knowledgeSources)

Advanced Configuration

Knowledge Base Configuration Object

For advanced use cases, configure knowledge base behavior with KnowledgeConfiguration:

let knowledgeConfig = KnowledgeConfiguration(
    // Chunking parameters
    chunkSize: 512,              // Characters per chunk
    chunkOverlap: 50,            // Overlap between chunks

    // Retrieval parameters
    retrievalLimit: 10,          // Max chunks to retrieve
    similarityThreshold: 0.75,   // Minimum similarity score

    // Embedding configuration
    embeddingModel: "text-embedding-ada-002",
    embeddingDimensions: 1536,

    // Processing options
    preprocessText: true,        // Clean text before embedding
    extractMetadata: true,       // Parse document metadata

    // Caching
    enableCache: true,
    cacheExpiry: 3600,          // Cache TTL in seconds

    // Vector storage
    vectorStore: .inMemory,     // or .persistent, .pinecone, etc.
    persistencePath: "./kb-vectors"
)

let agent = Agent(
    role: "Advanced Knowledge Agent",
    knowledgeSources: ["./docs/"],
    knowledgeConfig: knowledgeConfig
)

Chunking Strategies

Different chunking strategies for different document types:

Fixed-Size Chunking
Semantic Chunking
Structural Chunking
Sliding Window

Best for: General documents, mixed content

let fixedConfig = KnowledgeConfiguration(
    chunkSize: 512,           // Fixed size
    chunkOverlap: 50,         // 10% overlap
    chunkingStrategy: .fixedSize
)

Pros:

Predictable chunk sizes
Simple implementation
Works for most documents

Cons:

May split mid-sentence
Doesn’t respect structure

Vector Store Options

Choose the appropriate vector storage backend:

In-Memory Store

Best for: Development, small knowledge bases

let config = KnowledgeConfiguration(
    vectorStore: .inMemory
)

Characteristics:

✅ Fast retrieval
✅ No setup required
✅ Simple debugging
❌ Lost on restart
❌ Memory limited
❌ Single instance only

Recommended: < 1000 documents

Persistent Store

Best for: Production, medium knowledge bases

let config = KnowledgeConfiguration(
    vectorStore: .persistent,
    persistencePath: "./kb-vectors"
)

Characteristics:

✅ Survives restarts
✅ Reasonable performance
✅ No external dependencies
❌ Slower than in-memory
❌ Limited scalability

Recommended: 1000-10,000 documents

Pinecone

Best for: Large-scale production

let config = KnowledgeConfiguration(
    vectorStore: .pinecone,
    pineconeConfig: PineconeConfig(
        apiKey: "your-api-key",
        environment: "us-west1-gcp",
        indexName: "knowledge-base"
    )
)

Characteristics:

✅ Massive scalability
✅ Fast at any scale
✅ Managed service
❌ External dependency
❌ Additional cost

Recommended: 10,000+ documents

Custom Backend

Best for: Specialized requirements

final class CustomVectorStore: VectorStore {
    func store(
        vectors: [Vector],
        metadata: [Metadata]
    ) async throws {
        // Custom storage logic
    }

    func search(
        query: [Double],
        limit: Int
    ) async throws -> [VectorResult] {
        // Custom search logic
    }
}

let config = KnowledgeConfiguration(
    vectorStore: .custom(CustomVectorStore())
)

Use cases:

Integration with existing systems
Specialized search algorithms
Custom security requirements

Metadata Extraction and Filtering

Extract and use metadata for enhanced retrieval:

// Configure metadata extraction
let config = KnowledgeConfiguration(
    extractMetadata: true,
    metadataFields: [
        .title,
        .author,
        .createdDate,
        .modifiedDate,
        .category,
        .tags
    ]
)

// Query with metadata filters
let results = try await knowledgeBase.query(
    query: "product specifications",
    filters: [
        .category("products"),
        .dateRange(from: startDate, to: endDate),
        .tags(["version-2.0", "approved"])
    ]
)

// Example: Multi-tenant knowledge base
let customerResults = try await knowledgeBase.query(
    query: "pricing information",
    filters: [
        .metadata("customer_id", equals: customerId),
        .metadata("access_level", greaterThan: 2)
    ]
)

Reranking Strategies

Improve retrieval quality with reranking:

let config = KnowledgeConfiguration(
    retrievalLimit: 20,          // Initial broad retrieval
    rerankingEnabled: true,
    rerankingStrategy: .crossEncoder,
    rerankingLimit: 5,           // Final results after reranking
    crossEncoderModel: "cross-encoder/ms-marco-MiniLM-L-12-v2"
)

No Reranking
Cross-Encoder
LLM Reranking
Hybrid

Speed: Fastest Quality: Good

let config = KnowledgeConfiguration(
    retrievalLimit: 5,
    rerankingEnabled: false
)

Vector similarity only

Performance Optimization

Optimization Checklist

Optimize Chunk Size

Test different chunk sizes for your content:

// Test chunk sizes
let chunkSizes = [256, 512, 1024, 2048]

for size in chunkSizes {
    let config = KnowledgeConfiguration(chunkSize: size)
    let kb = try await KnowledgeBase(
        sources: testSources,
        config: config
    )

    // Measure retrieval quality
    let results = try await kb.query(query: testQuery)
    print("Chunk size \(size): \(results.count) results")
    // Evaluate quality manually
}

Guidelines:

Small (256-512): Precise retrieval, technical docs
Medium (512-1024): Balanced, general use
Large (1024-2048): Broader context, narratives

Tune Retrieval Parameters

Optimize retrieval for your use case:

let config = KnowledgeConfiguration(
    retrievalLimit: 5,           // Start small
    similarityThreshold: 0.75,   // Adjust based on quality
    rerankingEnabled: true       // Enable for better results
)

Performance tips:

Lower retrievalLimit = faster, may miss information
Higher similarityThreshold = fewer but better results
Enable reranking for quality, disable for speed

Implement Caching

Cache frequently accessed queries:

let config = KnowledgeConfiguration(
    enableCache: true,
    cacheExpiry: 3600,         // 1 hour
    cacheStrategy: .lru,       // Least Recently Used
    maxCacheSize: 1000         // Max cached queries
)

Cache strategies:

LRU: Good for varied queries
LFU: Good for repeated queries
TTL: Good for time-sensitive data

Choose Efficient Embedding Model

Balance quality vs. performance:

// Development: Fast and cheap
embeddingModel: "text-embedding-3-small"

// Production: Balanced
embeddingModel: "text-embedding-ada-002"

// High-quality: Best results
embeddingModel: "text-embedding-3-large"

Batch Processing

Process documents in batches:

let sources = // ... large list of sources
let batchSize = 10

for batch in sources.chunked(into: batchSize) {
    try await knowledgeBase.addSourcesBatch(batch)
    // Process in manageable chunks
}

Lazy Loading

Load documents on-demand:

let config = KnowledgeConfiguration(
    loadingStrategy: .lazy,    // Load when needed
    preloadPriority: [         // Preload critical docs
        "./critical-docs.pdf"
    ]
)

Performance Benchmarking

Benchmark your knowledge base setup:

import Foundation

final class KnowledgeBaseBenchmark {
    let knowledgeBase: KnowledgeBase

    func runBenchmark() async throws -> BenchmarkResults {
        var results = BenchmarkResults()

        // Test queries
        let testQueries = [
            "product features",
            "pricing information",
            "technical specifications",
            "return policy",
            "customer support"
        ]

        // Warm up
        for query in testQueries {
            _ = try await knowledgeBase.query(query: query, limit: 5)
        }

        // Benchmark retrieval speed
        var retrievalTimes: [TimeInterval] = []

        for query in testQueries {
            let start = Date()
            let queryResults = try await knowledgeBase.query(
                query: query,
                limit: 5,
                threshold: 0.75
            )
            let duration = Date().timeIntervalSince(start)

            retrievalTimes.append(duration)
            results.queryResults[query] = queryResults.count
        }

        results.avgRetrievalTime = retrievalTimes.reduce(0, +) / Double(retrievalTimes.count)
        results.minRetrievalTime = retrievalTimes.min() ?? 0
        results.maxRetrievalTime = retrievalTimes.max() ?? 0

        // Benchmark embedding generation
        let embeddingStart = Date()
        _ = try await knowledgeBase.generateEmbedding(for: "test query")
        results.embeddingTime = Date().timeIntervalSince(embeddingStart)

        // Memory usage
        results.memoryUsage = getMemoryUsage()

        // Cache hit rate
        results.cacheHitRate = knowledgeBase.getCacheHitRate()

        return results
    }

    func printResults(_ results: BenchmarkResults) {
        print("""
        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
        Knowledge Base Benchmark Results
        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

        Retrieval Performance:
          Average: \(String(format: "%.3f", results.avgRetrievalTime))s
          Min: \(String(format: "%.3f", results.minRetrievalTime))s
          Max: \(String(format: "%.3f", results.maxRetrievalTime))s

        Embedding Generation: \(String(format: "%.3f", results.embeddingTime))s
        Memory Usage: \(results.memoryUsage) MB
        Cache Hit Rate: \(String(format: "%.1f", results.cacheHitRate * 100))%

        Query Results:
        """)

        for (query, count) in results.queryResults {
            print("  \(query): \(count) results")
        }

        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
    }

    private func getMemoryUsage() -> Int {
        var info = mach_task_basic_info()
        var count = mach_msg_type_number_t(MemoryLayout<mach_task_basic_info>.size)/4

        let kerr: kern_return_t = withUnsafeMutablePointer(to: &info) {
            $0.withMemoryRebound(to: integer_t.self, capacity: 1) {
                task_info(mach_task_self_, task_flavor_t(MACH_TASK_BASIC_INFO), $0, &count)
            }
        }

        guard kerr == KERN_SUCCESS else { return 0 }
        return Int(info.resident_size) / 1024 / 1024
    }
}

struct BenchmarkResults {
    var avgRetrievalTime: TimeInterval = 0
    var minRetrievalTime: TimeInterval = 0
    var maxRetrievalTime: TimeInterval = 0
    var embeddingTime: TimeInterval = 0
    var memoryUsage: Int = 0
    var cacheHitRate: Double = 0
    var queryResults: [String: Int] = [:]
}

// Usage
let benchmark = KnowledgeBaseBenchmark(knowledgeBase: kb)
let results = try await benchmark.runBenchmark()
benchmark.printResults(results)

Security and Privacy

Sensitive Data Handling

Implement safeguards for sensitive information:

final class SecureKnowledgeBase: KnowledgeBase {
    private let sensitivePatterns: [String: NSRegularExpression] = [
        "ssn": try! NSRegularExpression(pattern: #"\d{3}-\d{2}-\d{4}"#),
        "credit_card": try! NSRegularExpression(pattern: #"\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}"#),
        "email": try! NSRegularExpression(pattern: #"[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}"#, options: .caseInsensitive),
        "phone": try! NSRegularExpression(pattern: #"\+?\d{10,}"#)
    ]

    override func processDocument(_ content: String) async throws -> ProcessedDocument {
        // Redact sensitive information
        var sanitized = content

        for (type, pattern) in sensitivePatterns {
            let range = NSRange(sanitized.startIndex..., in: sanitized)
            sanitized = pattern.stringByReplacingMatches(
                in: sanitized,
                range: range,
                withTemplate: "[\(type.uppercased())]"
            )
        }

        return try await super.processDocument(sanitized)
    }

    func setCustomPattern(name: String, pattern: String) throws {
        let regex = try NSRegularExpression(pattern: pattern)
        sensitivePatterns[name] = regex
    }
}

// Usage
let secureKB = SecureKnowledgeBase()
try secureKB.setCustomPattern(
    name: "employee_id",
    pattern: #"EMP-\d{6}"#
)

Access Control

Implement role-based access to knowledge:

final class AccessControlledKnowledgeBase: KnowledgeBase {
    private var documentPermissions: [String: Set<String>] = [:]

    func setDocumentPermissions(
        documentPath: String,
        roles: Set<String>
    ) {
        documentPermissions[documentPath] = roles
    }

    override func query(
        query: String,
        userRole: String,
        limit: Int,
        threshold: Double?
    ) async throws -> [KnowledgeResult] {
        // Get all results
        let allResults = try await super.query(
            query: query,
            limit: limit * 2,  // Get more to filter
            threshold: threshold
        )

        // Filter by permissions
        let filtered = allResults.filter { result in
            guard let requiredRoles = documentPermissions[result.sourcePath] else {
                return true  // No restrictions
            }
            return requiredRoles.contains(userRole)
        }

        return Array(filtered.prefix(limit))
    }
}

// Usage
let acKB = AccessControlledKnowledgeBase()

// Set permissions
acKB.setDocumentPermissions(
    documentPath: "./confidential/executive-plan.pdf",
    roles: ["executive", "board"]
)

acKB.setDocumentPermissions(
    documentPath: "./public/user-guide.pdf",
    roles: ["employee", "customer", "public"]
)

// Query with role
let results = try await acKB.query(
    query: "company strategy",
    userRole: currentUser.role,
    limit: 5
)

Encryption

Encrypt knowledge base storage:

import CryptoKit

final class EncryptedKnowledgeBase: KnowledgeBase {
    private let encryptionKey: SymmetricKey

    init(encryptionKey: SymmetricKey) {
        self.encryptionKey = encryptionKey
        super.init()
    }

    override func persistVectors(
        vectors: [Vector],
        path: String
    ) async throws {
        // Serialize vectors
        let data = try JSONEncoder().encode(vectors)

        // Encrypt
        let sealedBox = try AES.GCM.seal(
            data,
            using: encryptionKey
        )

        // Write encrypted data
        try sealedBox.combined?.write(to: URL(fileURLWithPath: path))
    }

    override func loadVectors(from path: String) async throws -> [Vector] {
        // Read encrypted data
        let encryptedData = try Data(contentsOf: URL(fileURLWithPath: path))

        // Decrypt
        let sealedBox = try AES.GCM.SealedBox(combined: encryptedData)
        let decryptedData = try AES.GCM.open(sealedBox, using: encryptionKey)

        // Deserialize
        return try JSONDecoder().decode([Vector].self, from: decryptedData)
    }
}

// Usage
let encryptionKey = SymmetricKey(size: .bits256)
let encryptedKB = EncryptedKnowledgeBase(encryptionKey: encryptionKey)

Audit Logging

Track knowledge base access:

final class AuditedKnowledgeBase: KnowledgeBase {
    private let auditLogger: AuditLogger

    override func query(
        query: String,
        userId: String,
        limit: Int,
        threshold: Double?
    ) async throws -> [KnowledgeResult] {
        // Log query
        await auditLogger.log(
            event: .knowledgeQuery,
            userId: userId,
            details: [
                "query": query,
                "limit": "\(limit)",
                "threshold": "\(threshold ?? 0.7)"
            ]
        )

        let results = try await super.query(
            query: query,
            limit: limit,
            threshold: threshold
        )

        // Log results
        await auditLogger.log(
            event: .knowledgeResults,
            userId: userId,
            details: [
                "query": query,
                "resultCount": "\(results.count)",
                "sources": results.map { $0.sourcePath }.joined(separator: ", ")
            ]
        )

        return results
    }
}

// Audit Logger
actor AuditLogger {
    private var logs: [AuditLog] = []

    func log(event: AuditEvent, userId: String, details: [String: String]) {
        let log = AuditLog(
            timestamp: Date(),
            event: event,
            userId: userId,
            details: details
        )
        logs.append(log)

        // Persist to secure storage
        Task {
            try await persistLog(log)
        }
    }

    private func persistLog(_ log: AuditLog) async throws {
        // Write to secure audit log
    }
}

struct AuditLog {
    let timestamp: Date
    let event: AuditEvent
    let userId: String
    let details: [String: String]
}

enum AuditEvent {
    case knowledgeQuery
    case knowledgeResults
    case documentAdded
    case documentRemoved
}

Real-World Examples

Example 1: Customer Support Bot

Complete implementation of a knowledge-powered support system:

import OrbitAI

// Configure knowledge base
let supportKB = KnowledgeConfiguration(
    chunkSize: 512,
    retrievalLimit: 5,
    similarityThreshold: 0.75,
    rerankingEnabled: true,
    enableCache: true
)

// Create support agent
let supportAgent = Agent(
    role: "Customer Support Specialist",
    purpose: """
    Provide accurate customer support using the knowledge base.
    Always cite sources and provide helpful, friendly responses.
    """,
    context: """
    Expert support agent with access to:
    - Product documentation
    - Troubleshooting guides
    - FAQ database
    - Return policies

    Guidelines:
    - Always check knowledge base before responding
    - Provide specific references to documentation
    - Escalate if information not in knowledge base
    - Be friendly and empathetic
    """,
    knowledgeSources: [
        "./kb/products/user-manual.pdf",
        "./kb/support/troubleshooting.md",
        "./kb/support/faq.txt",
        "./kb/policies/returns.pdf",
        "./kb/policies/warranty.pdf"
    ],
    knowledgeConfig: supportKB,
    memory: true,  // Remember conversation
    tools: [
        "create_support_ticket",
        "check_order_status",
        "send_email"
    ]
)

// Create tasks
let analyzeQuery = ORTask(
    description: """
    Analyze the customer's question and determine:
    1. What information they need
    2. Which knowledge sources are relevant
    3. If tools are needed (order lookup, ticket creation)
    """,
    expectedOutput: "Analysis of customer needs"
)

let provideAnswer = ORTask(
    description: """
    Using the knowledge base and any tool results:
    1. Answer the customer's question accurately
    2. Cite specific documentation sources
    3. Provide step-by-step instructions if needed
    4. Offer additional relevant information
    """,
    expectedOutput: "Complete answer with citations"
)

let followUp = ORTask(
    description: """
    Based on the answer provided:
    1. Check if question was fully answered
    2. Suggest related resources
    3. Ask if customer needs further assistance
    4. Create support ticket if needed
    """,
    expectedOutput: "Follow-up message"
)

// Create orbit
let supportOrbit = try await Orbit.create(
    name: "Customer Support System",
    agents: [supportAgent],
    tasks: [analyzeQuery, provideAnswer, followUp],
    process: .sequential,
    verbose: true
)

// Handle customer inquiry
func handleCustomerInquiry(_ inquiry: String) async throws -> String {
    let result = try await supportOrbit.run(
        inputs: [
            "customer_inquiry": inquiry,
            "timestamp": Date().description
        ]
    )

    return result.output
}

// Example usage
let response = try await handleCustomerInquiry(
    "How do I reset my password? I've tried the forgot password link but didn't receive an email."
)

print(response)
// Output includes:
// - Steps from user manual
// - Troubleshooting tips from knowledge base
// - Offer to check email settings
// - Create support ticket if issue persists

Example 2: Legal Document Analysis

RAG-powered legal research assistant:

// Legal knowledge configuration
let legalKBConfig = KnowledgeConfiguration(
    chunkSize: 1024,           // Larger chunks for legal text
    chunkOverlap: 200,         // High overlap for context
    retrievalLimit: 10,        // More results for comprehensive analysis
    similarityThreshold: 0.80, // Higher precision
    rerankingEnabled: true,
    rerankingStrategy: .crossEncoder,
    extractMetadata: true,
    metadataFields: [.title, .createdDate, .category, .tags]
)

// Legal research agent
let legalAgent = Agent(
    role: "Legal Research Assistant",
    purpose: "Analyze legal documents and provide research summaries",
    context: """
    Expert legal researcher with access to:
    - Contract templates and precedents
    - Case law summaries
    - Regulatory documents
    - Legal opinions

    Guidelines:
    - Provide accurate citations
    - Note jurisdictional differences
    - Identify relevant precedents
    - Flag potential issues
    - Maintain confidentiality
    """,
    knowledgeSources: [
        "./legal/contracts/vendor-agreements/",
        "./legal/contracts/employment/",
        "./legal/cases/precedents/",
        "./legal/regulations/compliance/",
        "./legal/opinions/internal/"
    ],
    knowledgeConfig: legalKBConfig,
    longTermMemory: true,      // Track research history
    entityMemory: true         // Track cases, statutes, parties
)

// Research workflow
let researchTask = ORTask(
    description: """
    Research the legal question using the knowledge base:
    1. Identify relevant legal documents
    2. Extract key clauses and precedents
    3. Note jurisdictional considerations
    4. Summarize findings with citations
    """,
    expectedOutput: "Legal research memo with citations"
)

let analysisTask = ORTask(
    description: """
    Analyze the research findings:
    1. Identify potential legal issues
    2. Compare with precedents
    3. Note compliance requirements
    4. Recommend next steps
    """,
    expectedOutput: "Legal analysis with recommendations"
)

// Create orbit with metadata filtering
let legalOrbit = try await Orbit.create(
    name: "Legal Research System",
    agents: [legalAgent],
    tasks: [researchTask, analysisTask],
    process: .sequential
)

// Perform research with filters
func performLegalResearch(
    question: String,
    jurisdiction: String,
    dateRange: (Date, Date)?
) async throws -> String {
    var filters: [MetadataFilter] = [
        .metadata("jurisdiction", equals: jurisdiction)
    ]

    if let (start, end) = dateRange {
        filters.append(.dateRange(from: start, to: end))
    }

    let result = try await legalOrbit.run(
        inputs: [
            "question": question,
            "jurisdiction": jurisdiction,
            "filters": filters.description
        ]
    )

    return result.output
}

Example 3: Medical Information System

HIPAA-compliant medical knowledge system:

// Medical knowledge configuration
let medicalKBConfig = KnowledgeConfiguration(
    chunkSize: 768,
    retrievalLimit: 5,
    similarityThreshold: 0.85,  // High precision for medical info
    rerankingEnabled: true,
    enableCache: false,         // Don't cache sensitive data
    extractMetadata: true
)

// Secure medical knowledge base
let secureMe dicalKB = SecureKnowledgeBase()

// Medical information agent
let medicalAgent = Agent(
    role: "Medical Information Specialist",
    purpose: "Provide evidence-based medical information",
    context: """
    Medical information specialist with access to:
    - Clinical guidelines
    - Treatment protocols
    - Drug formulary
    - Patient education materials

    IMPORTANT:
    - Only provide information from approved sources
    - Include proper disclaimers
    - Never diagnose or prescribe
    - Refer to healthcare providers when appropriate
    - Maintain HIPAA compliance
    """,
    knowledgeSources: [
        "./medical/guidelines/treatment-protocols.pdf",
        "./medical/guidelines/diagnostic-criteria.pdf",
        "./medical/drugs/formulary.json",
        "./medical/procedures/standard-procedures.md",
        "./medical/education/patient-guides.pdf"
    ],
    knowledgeConfig: medicalKBConfig,
    memory: false,              // No persistent memory (privacy)
    tools: [
        "check_drug_interactions",
        "search_medical_literature"
    ]
)

// Add medical disclaimer
let disclaimerTask = ORTask(
    description: """
    Add appropriate medical disclaimers:
    - Information is for educational purposes only
    - Not a substitute for professional medical advice
    - Consult healthcare provider for medical decisions
    """,
    expectedOutput: "Response with disclaimer"
)

let medicalOrbit = try await Orbit.create(
    name: "Medical Information System",
    agents: [medicalAgent],
    tasks: [disclaimerTask],
    process: .sequential,
    memory: false  // HIPAA compliance
)

Medical Applications: Healthcare applications must comply with regulations (HIPAA, GDPR, etc.). This example is for educational purposes. Consult legal and compliance experts before deploying medical AI systems.

Migration and Maintenance

Migrating Existing Knowledge

Migrate from other RAG systems to OrbitAI:

// Migration from LangChain
func migrateLangChainKnowledgeBase(
    langchainVectorStore: String
) async throws {
    // Load LangChain vectors
    let vectors = try await loadLangChainVectors(from: langchainVectorStore)

    // Convert to OrbitAI format
    let orbitVectors = vectors.map { lc in
        Vector(
            id: lc.id,
            embedding: lc.embedding,
            metadata: Metadata(from: lc.metadata),
            content: lc.pageContent
        )
    }

    // Create OrbitAI knowledge base
    let kb = try await KnowledgeBase()
    try await kb.importVectors(orbitVectors)

    print("Migrated \(orbitVectors.count) vectors")
}

// Migration from custom system
func migrateCustomKnowledgeBase(
    documentsPath: String
) async throws {
    // List all documents
    let fileManager = FileManager.default
    let documents = try fileManager.contentsOfDirectory(atPath: documentsPath)

    // Create knowledge sources list
    let sources = documents.map { doc in
        "\(documentsPath)/\(doc)"
    }

    // Create new knowledge base
    let kb = try await KnowledgeBase(sources: sources)

    print("Migrated \(documents.count) documents")
}

Knowledge Base Versioning

Version your knowledge base for rollback capability:

final class VersionedKnowledgeBase {
    private var versions: [String: KnowledgeBase] = [:]
    private var currentVersion: String

    init(initialVersion: String = "v1.0") {
        self.currentVersion = initialVersion
    }

    func createVersion(
        version: String,
        sources: [String],
        config: KnowledgeConfiguration
    ) async throws {
        let kb = try await KnowledgeBase(
            sources: sources,
            config: config
        )

        versions[version] = kb

        // Persist version metadata
        try await saveVersionMetadata(version: version, sources: sources)
    }

    func switchVersion(_ version: String) throws {
        guard versions[version] != nil else {
            throw KBError.versionNotFound(version)
        }

        currentVersion = version
        print("Switched to version \(version)")
    }

    func getCurrentKB() -> KnowledgeBase? {
        return versions[currentVersion]
    }

    func listVersions() -> [String] {
        return Array(versions.keys).sorted()
    }

    private func saveVersionMetadata(
        version: String,
        sources: [String]
    ) async throws {
        let metadata = VersionMetadata(
            version: version,
            createdAt: Date(),
            sources: sources
        )

        // Persist to disk
        let data = try JSONEncoder().encode(metadata)
        try data.write(to: URL(fileURLWithPath: "./kb-versions/\(version).json"))
    }
}

struct VersionMetadata: Codable {
    let version: String
    let createdAt: Date
    let sources: [String]
}

// Usage
let versionedKB = VersionedKnowledgeBase()

// Create v1.0
try await versionedKB.createVersion(
    version: "v1.0",
    sources: ["./docs/v1/"],
    config: config
)

// Create v2.0 with updated docs
try await versionedKB.createVersion(
    version: "v2.0",
    sources: ["./docs/v2/"],
    config: config
)

// Rollback if needed
try versionedKB.switchVersion("v1.0")

Maintenance Operations

Regular maintenance for optimal performance:

final class KnowledgeBaseMaintenanceManager {
    let knowledgeBase: KnowledgeBase

    // Remove duplicate documents
    func deduplicateDocuments() async throws {
        let docs = try await knowledgeBase.listDocuments()
        var seen: Set<String> = []
        var duplicates: [String] = []

        for doc in docs {
            let hash = try await doc.contentHash()
            if seen.contains(hash) {
                duplicates.append(doc.path)
            } else {
                seen.insert(hash)
            }
        }

        // Remove duplicates
        for duplicate in duplicates {
            try await knowledgeBase.removeSource(duplicate)
            print("Removed duplicate: \(duplicate)")
        }

        print("Removed \(duplicates.count) duplicates")
    }

    // Reindex modified documents
    func reindexModifiedDocuments() async throws {
        let docs = try await knowledgeBase.listDocuments()
        var reindexed = 0

        for doc in docs {
            let fileModDate = try await doc.fileModificationDate()
            let indexDate = doc.indexedDate

            if fileModDate > indexDate {
                try await knowledgeBase.reindexDocument(doc.path)
                reindexed += 1
                print("Reindexed: \(doc.filename)")
            }
        }

        print("Reindexed \(reindexed) documents")
    }

    // Remove orphaned vectors
    func cleanupOrphanedVectors() async throws {
        let vectors = try await knowledgeBase.getAllVectors()
        let docs = try await knowledgeBase.listDocuments()
        let validPaths = Set(docs.map { $0.path })

        var removed = 0

        for vector in vectors {
            if !validPaths.contains(vector.sourcePath) {
                try await knowledgeBase.removeVector(vector.id)
                removed += 1
            }
        }

        print("Removed \(removed) orphaned vectors")
    }

    // Optimize vector index
    func optimizeIndex() async throws {
        print("Optimizing vector index...")
        try await knowledgeBase.optimizeIndex()
        print("Index optimization complete")
    }

    // Run full maintenance
    func runFullMaintenance() async throws {
        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
        print("Knowledge Base Maintenance")
        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")

        try await deduplicateDocuments()
        try await reindexModifiedDocuments()
        try await cleanupOrphanedVectors()
        try await optimizeIndex()

        print("Maintenance complete")
        print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
    }
}

// Schedule regular maintenance
Task {
    let maintenanceManager = KnowledgeBaseMaintenanceManager(
        knowledgeBase: kb
    )

    while isRunning {
        // Run weekly
        try await Task.sleep(nanoseconds: 7 * 24 * 3600 * 1_000_000_000)
        try await maintenanceManager.runFullMaintenance()
    }
}

Next Steps

Memory Systems

Learn about memory systems for dynamic knowledge

Agent Configuration

Configure agents with knowledge bases

Tools

Combine knowledge with tools for action-oriented agents

Orbit Workflows

Orchestrate knowledge-powered agent workflows

Pro Tip: Start with a small, well-organized knowledge base (5-10 essential documents) and expand based on retrieval gaps. Monitor which queries return poor results and add targeted documents to fill those gaps.

Getting started

Core Concepts

Tools

Learn

​Overview

RAG-Enabled

Multi-Format

Semantic Search

Automatic

Scalable

Dynamic

​Key Capabilities

​How Knowledge Bases Work

​RAG Architecture

​Embedding-Based Retrieval

​System Integration

​Configuration and Usage

​Basic Configuration

​Agent-Level Knowledge Base

​Orbit-Level Knowledge Base

​Dynamic Knowledge Sources

​Knowledge Sources

​Supported File Formats

PDF Documents

Markdown Files

JSON Data

Plain Text

​File Path Specifications

​Document Organization

​Knowledge Source Examples

​Integration Patterns

​Knowledge Base + Memory

​Knowledge Base + Tools

​Knowledge Base Access in Tasks

​Best Practices

​Document Preparation

Clean Documents

Chunk-Friendly Content

Rich Metadata

Document Size

​Knowledge Base Architecture

​Retrieval Optimization

​Production Best Practices

Version Control

Validation

Monitoring

Documentation

​Troubleshooting

​Common Issues

​Debugging Knowledge Bases

​Advanced Configuration

​Knowledge Base Configuration Object

​Chunking Strategies

​Vector Store Options

In-Memory Store

Persistent Store

Pinecone

Custom Backend

​Metadata Extraction and Filtering

​Reranking Strategies

​Performance Optimization

​Optimization Checklist

​Performance Benchmarking

​Security and Privacy

​Sensitive Data Handling

​Access Control

​Encryption

​Audit Logging

​Real-World Examples

​Example 1: Customer Support Bot

​Example 2: Legal Document Analysis

​Example 3: Medical Information System

​Migration and Maintenance

​Migrating Existing Knowledge

​Knowledge Base Versioning

​Maintenance Operations

​Next Steps

Memory Systems

Agent Configuration

Tools

Overview

Key Capabilities

How Knowledge Bases Work

RAG Architecture

Embedding-Based Retrieval

System Integration

Configuration and Usage

Basic Configuration

Agent-Level Knowledge Base

Orbit-Level Knowledge Base

Dynamic Knowledge Sources

Knowledge Sources

Supported File Formats

File Path Specifications

Document Organization

Knowledge Source Examples

Integration Patterns

Knowledge Base + Memory

Knowledge Base + Tools

Knowledge Base Access in Tasks

Best Practices

Document Preparation

Knowledge Base Architecture

Retrieval Optimization

Production Best Practices

Troubleshooting

Common Issues

Debugging Knowledge Bases

Advanced Configuration

Knowledge Base Configuration Object

Chunking Strategies

Vector Store Options

Metadata Extraction and Filtering

Reranking Strategies

Performance Optimization

Optimization Checklist

Performance Benchmarking

Security and Privacy

Sensitive Data Handling

Access Control

Encryption

Audit Logging

Real-World Examples

Example 1: Customer Support Bot

Example 2: Legal Document Analysis

Example 3: Medical Information System

Migration and Maintenance

Migrating Existing Knowledge

Knowledge Base Versioning

Maintenance Operations

Next Steps