Making Company Knowledge Accessible

The theory is set. Now let's get practical: How do you make your real company knowledge — distributed across Confluence, SharePoint, emails, PDFs, and databases — accessible to AI?

Step 1: Inventory Knowledge Sources

Create an overview of all knowledge sources:

Source	Type	Volume	Update Frequency	Priority
Confluence	Wiki	~2,000 pages	Weekly	High
SharePoint	Files	~10,000 docs	Monthly	High
Email archives	Unstructured	~500,000	Daily	Medium
Internal DB	Structured	~50 tables	Real-time	High
Slack/Teams	Chat	~1M messages	Real-time	Low

Step 2: Build Connectors

Confluence

Atlassian REST API for page content and metadata
CQL (Confluence Query Language) for targeted extraction
Webhooks for incremental updates

SharePoint / OneDrive

Microsoft Graph API
Delta queries for incremental syncs
Carry over permission filters (important for compliance!)

Emails

IMAP/Exchange connector
Only internal emails and threads with business relevance
PII detection and masking before ingestion

Databases

SQL views as defined interfaces
Change Data Capture for real-time updates
Embed schema documentation as additional context

Step 3: Data Pipeline

Source → Extraction → Cleaning → PII Filter → Chunking → Embedding → Vector DB
           ↓                                                              ↓
        Scheduler                                                    Metadata Store
     (daily/weekly)                                               (source, date, permissions)

Step 4: Access Control

Critical: The RAG pipeline must only return information that the requesting user is authorized to see.

Carry over permissions from source systems
Filter against user roles at query time
Regular audits of access patterns

Common Pitfalls

❌ Migrating everything at once (instead of iterating)
❌ Ignoring permissions
❌ Not flagging outdated documents
❌ No feedback loop with users

Practical tip: Start with a single source (e.g., Confluence) and 10 power users. Collect feedback, optimize, then scale to more sources. A RAG system thrives on iterative improvement.