Document Intelligence
Enterprise-grade RAG with hybrid search. Your AI agents answer questions from your documents with automatic citations.
What is Document Intelligence?
DruidX Document Intelligence uses enterprise-grade RAG (Retrieval-Augmented Generation) with hybrid search combining dense semantic embeddings and sparse keyword matching. Upload PDFs, DOCX, TXT, and other files—DruidX automatically chunks, embeds, and indexes them. When users ask questions, agents retrieve relevant passages and generate accurate responses with automatic citations linking to source documents. This ensures AI answers are grounded in your actual data, not hallucinated.
How Document Intelligence Works
From upload to accurate, cited answers in three simple steps.
Upload Documents
Upload PDFs, Word docs, text files, or CSVs. DruidX extracts text, splits into optimized chunks, and creates vector embeddings.
Hybrid Search
When you ask a question, DruidX searches using both semantic similarity and keyword matching to find the most relevant passages.
Cited Answers
The AI generates responses grounded in your documents with numbered citations [1][2] so you can verify every claim.
Enterprise-Grade Capabilities
Built for organizations that need accurate, secure document AI.
Easy Document Upload
Upload PDFs, DOCX, TXT, CSV, and more. DruidX automatically extracts, chunks, and indexes content.
Hybrid Search
Combines dense (semantic) and sparse (keyword) embeddings for superior retrieval accuracy.
Automatic Citations
Every answer includes numbered citations [1][2] linking back to source documents.
Persistent Knowledge Base
Build permanent knowledge bases for your agents. Documents persist across conversations.
Multi-Collection Support
Create separate document collections for different agents, projects, or departments.
Real-Time Indexing
New documents are indexed immediately. No waiting—start querying right away.
Supported File Formats
Upload documents in multiple formats. More coming soon.
Popular Use Cases
How teams use Document Intelligence to unlock their knowledge.
Customer Support
Upload product docs, FAQs, and support articles. Agents answer customer questions with accurate, cited information.
Legal & Compliance
Index contracts, policies, and regulations. Quickly find relevant clauses with exact document references.
Research & Analysis
Upload research papers, reports, and data. Agents synthesize insights across multiple sources.
Employee Onboarding
Index employee handbooks, training materials, and SOPs. New hires get instant, accurate answers.
Technical Architecture
Enterprise-grade infrastructure for accurate document retrieval.
Hybrid Search Architecture
- Dense Vectors: OpenAI text-embedding-3-small (1024 dim)
- Sparse Vectors: Splade for keyword matching
- Ranking: Reciprocal Rank Fusion (RRF)
- Database: Qdrant vector database
Security & Compliance
- SOC 2 Type II compliant infrastructure
- Encrypted at rest (AES-256) and in transit (TLS 1.3)
- Isolated vector collections per workspace
- On-premise deployment available (Enterprise)
Frequently Asked Questions
Everything you need to know about Document Intelligence
RAG is a technique that combines document retrieval with AI generation. When you ask a question, DruidX first searches your documents to find relevant passages, then uses those passages as context for the AI to generate an accurate, grounded response with citations.
Hybrid search combines two retrieval methods: dense embeddings (semantic understanding—finds conceptually similar content) and sparse embeddings (keyword matching—finds exact terms). Together, they catch both conceptual matches and specific terminology, resulting in more accurate retrieval than either method alone.
DruidX supports PDF, DOCX, TXT, CSV, MD (Markdown), and HTML files. PDFs are processed with OCR for scanned documents. We're continuously adding support for more formats including Excel, PowerPoint, and images with text.
Individual files can be up to 50MB. Total storage depends on your plan: Growth (1GB), Scale (10GB), Enterprise (unlimited). Documents are chunked into smaller segments for optimal retrieval, so even very long documents work well.
Yes. You can create separate collections for different agents or use cases. For example, a customer support agent might have access to product docs, while a legal agent accesses contracts. Collections can also be shared across multiple agents.
When an agent uses information from your documents, it automatically includes numbered citations like [1][2]. At the end of the response, you'll see 'Sources Referenced' with the exact filename and relevant quote. This ensures transparency and lets users verify information.
Yes. Documents are encrypted at rest and in transit. Each workspace has isolated vector collections. SOC 2 Type II compliance ensures enterprise-grade security. Enterprise plans can include on-premise deployment for maximum data control.
Most documents are indexed within seconds to a few minutes, depending on size. You can start querying immediately—even while larger documents are still processing, already-indexed content is searchable.
Yes. Agents can search your documents AND the web in the same query. For example, an agent might pull product specs from your docs and compare them against competitor information found online.
DruidX uses OpenAI's text-embedding-3-small (1024 dimensions) for dense vectors and Splade for sparse/keyword vectors. This combination provides excellent retrieval quality while keeping costs reasonable.
Make Your Documents Intelligent
Upload your docs and start getting accurate, cited AI answers in minutes.