Lesson 6 of 6·11 min read

Making Company Knowledge Accessible

The theory is set. Now let's get practical: How do you make your real company knowledge — distributed across Confluence, SharePoint, emails, PDFs, and databases — accessible to AI?

Step 1: Inventory Knowledge Sources

Create an overview of all knowledge sources:

SourceTypeVolumeUpdate FrequencyPriority
ConfluenceWiki~2,000 pagesWeeklyHigh
SharePointFiles~10,000 docsMonthlyHigh
Email archivesUnstructured~500,000DailyMedium
Internal DBStructured~50 tablesReal-timeHigh
Slack/TeamsChat~1M messagesReal-timeLow

Step 2: Build Connectors

Confluence

  • Atlassian REST API for page content and metadata
  • CQL (Confluence Query Language) for targeted extraction
  • Webhooks for incremental updates

SharePoint / OneDrive

  • Microsoft Graph API
  • Delta queries for incremental syncs
  • Carry over permission filters (important for compliance!)

Emails

  • IMAP/Exchange connector
  • Only internal emails and threads with business relevance
  • PII detection and masking before ingestion

Databases

  • SQL views as defined interfaces
  • Change Data Capture for real-time updates
  • Embed schema documentation as additional context

Step 3: Data Pipeline

Source → Extraction → Cleaning → PII Filter → Chunking → Embedding → Vector DB
           ↓                                                              ↓
        Scheduler                                                    Metadata Store
     (daily/weekly)                                               (source, date, permissions)

Step 4: Access Control

Critical: The RAG pipeline must only return information that the requesting user is authorized to see.

  • Carry over permissions from source systems
  • Filter against user roles at query time
  • Regular audits of access patterns

Common Pitfalls

  • ❌ Migrating everything at once (instead of iterating)
  • ❌ Ignoring permissions
  • ❌ Not flagging outdated documents
  • ❌ No feedback loop with users

Practical tip: Start with a single source (e.g., Confluence) and 10 power users. Collect feedback, optimize, then scale to more sources. A RAG system thrives on iterative improvement.