Enterprise Feature

Document Intelligence

Enterprise-grade RAG with hybrid search. Your AI agents answer questions from your documents with automatic citations.

What is Document Intelligence?

DruidX Document Intelligence uses enterprise-grade RAG (Retrieval-Augmented Generation) with hybrid search combining dense semantic embeddings and sparse keyword matching. Upload PDFs, DOCX, TXT, and other files—DruidX automatically chunks, embeds, and indexes them. When users ask questions, agents retrieve relevant passages and generate accurate responses with automatic citations linking to source documents. This ensures AI answers are grounded in your actual data, not hallucinated.

How Document Intelligence Works

From upload to accurate, cited answers in three simple steps.

1

Upload Documents

Upload PDFs, Word docs, text files, or CSVs. DruidX extracts text, splits into optimized chunks, and creates vector embeddings.

2

Hybrid Search

When you ask a question, DruidX searches using both semantic similarity and keyword matching to find the most relevant passages.

3

Cited Answers

The AI generates responses grounded in your documents with numbered citations [1][2] so you can verify every claim.

Enterprise-Grade Capabilities

Built for organizations that need accurate, secure document AI.

Easy Document Upload

Upload PDFs, DOCX, TXT, CSV, and more. DruidX automatically extracts, chunks, and indexes content.

Hybrid Search

Combines dense (semantic) and sparse (keyword) embeddings for superior retrieval accuracy.

Automatic Citations

Every answer includes numbered citations [1][2] linking back to source documents.

Persistent Knowledge Base

Build permanent knowledge bases for your agents. Documents persist across conversations.

Multi-Collection Support

Create separate document collections for different agents, projects, or departments.

Real-Time Indexing

New documents are indexed immediately. No waiting—start querying right away.

Supported File Formats

Upload documents in multiple formats. More coming soon.

PDF
Reports, manuals, contracts
DOCX
Word documents, proposals
TXT
Plain text, logs, notes
CSV
Spreadsheets, data exports
MD
Markdown documentation
HTML
Web content, articles

Popular Use Cases

How teams use Document Intelligence to unlock their knowledge.

Customer Support

Upload product docs, FAQs, and support articles. Agents answer customer questions with accurate, cited information.

Legal & Compliance

Index contracts, policies, and regulations. Quickly find relevant clauses with exact document references.

Research & Analysis

Upload research papers, reports, and data. Agents synthesize insights across multiple sources.

Employee Onboarding

Index employee handbooks, training materials, and SOPs. New hires get instant, accurate answers.

Technical Architecture

Enterprise-grade infrastructure for accurate document retrieval.

Hybrid Search Architecture

  • Dense Vectors: OpenAI text-embedding-3-small (1024 dim)
  • Sparse Vectors: Splade for keyword matching
  • Ranking: Reciprocal Rank Fusion (RRF)
  • Database: Qdrant vector database

Security & Compliance

  • SOC 2 Type II compliant infrastructure
  • Encrypted at rest (AES-256) and in transit (TLS 1.3)
  • Isolated vector collections per workspace
  • On-premise deployment available (Enterprise)

Frequently Asked Questions

Everything you need to know about Document Intelligence

RAG is a technique that combines document retrieval with AI generation. When you ask a question, DruidX first searches your documents to find relevant passages, then uses those passages as context for the AI to generate an accurate, grounded response with citations.

Hybrid search combines two retrieval methods: dense embeddings (semantic understanding—finds conceptually similar content) and sparse embeddings (keyword matching—finds exact terms). Together, they catch both conceptual matches and specific terminology, resulting in more accurate retrieval than either method alone.

DruidX supports PDF, DOCX, TXT, CSV, MD (Markdown), and HTML files. PDFs are processed with OCR for scanned documents. We're continuously adding support for more formats including Excel, PowerPoint, and images with text.

Individual files can be up to 50MB. Total storage depends on your plan: Growth (1GB), Scale (10GB), Enterprise (unlimited). Documents are chunked into smaller segments for optimal retrieval, so even very long documents work well.

Yes. You can create separate collections for different agents or use cases. For example, a customer support agent might have access to product docs, while a legal agent accesses contracts. Collections can also be shared across multiple agents.

When an agent uses information from your documents, it automatically includes numbered citations like [1][2]. At the end of the response, you'll see 'Sources Referenced' with the exact filename and relevant quote. This ensures transparency and lets users verify information.

Yes. Documents are encrypted at rest and in transit. Each workspace has isolated vector collections. SOC 2 Type II compliance ensures enterprise-grade security. Enterprise plans can include on-premise deployment for maximum data control.

Most documents are indexed within seconds to a few minutes, depending on size. You can start querying immediately—even while larger documents are still processing, already-indexed content is searchable.

Yes. Agents can search your documents AND the web in the same query. For example, an agent might pull product specs from your docs and compare them against competitor information found online.

DruidX uses OpenAI's text-embedding-3-small (1024 dimensions) for dense vectors and Splade for sparse/keyword vectors. This combination provides excellent retrieval quality while keeping costs reasonable.

Make Your Documents Intelligent

Upload your docs and start getting accurate, cited AI answers in minutes.