2026-03-27RAGAI searchLlamaIndexOllamaopen sourcetutorialpractical AI

He built a RAG system from scratch — 738K documents for €184

A developer with zero RAG experience indexed 451GB of company files into a searchable AI chat — for €184. His 6 failures and fixes are a masterclass in practical AI.

A developer named Andros Fenollosa had never built a RAG system (a way to let AI search and answer questions about your own documents) before. His company asked him to make a decade of project files — 451 gigabytes — searchable through an AI chat interface.

He did it. The total infrastructure cost: €184. The article documenting his journey hit 288 points on Hacker News and triggered a massive discussion about what actually works when building AI-powered search.

The mission: make 10 years of files talk back

The requirements were deceptively simple: build an internal chat tool where employees could ask questions about old project files and get answers with source citations. The catch? Nearly a terabyte of mixed documents — PDFs, Word files, spreadsheets, presentations, emails, plus video and simulation files mixed in.

No cloud AI services allowed. Everything had to run locally for data privacy. And it needed to be fast.

6 problems that almost killed the project

1. Picking the right tools with zero experience

After research, he landed on this stack — all free and open-source:

Ollama + Llama 3.2 — runs the AI model locally on your machine (no API costs)

nomic-embed-text — converts documents into numbers the AI can search through (called "embeddings")

ChromaDB — stores those numbers for fast similarity searching (a "vector database")

LlamaIndex — connects all the pieces together (the RAG framework)

Flask + Streamlit — the web interface employees actually use

2. 451GB of chaos crashed everything

The first attempt? LlamaIndex tried to process everything — including video files, simulations, and backups. The laptop's RAM overflowed within minutes, freezing the entire operating system.

The fix: He built a filtering system that excluded non-text files (videos, images, executables, archives) and converted supported formats (PDF, DOCX, XLSX, PPTX) to plain text first. Result: 54% fewer files to process.

3. Every restart meant starting over

The default indexing stored everything in JSON. With hundreds of gigabytes, any crash or restart meant reprocessing every document from scratch — a process that took days.

The fix: Switched to ChromaDB with a checkpoint system. The indexer processes files in batches of 150, saving progress after each batch. Power goes out? It picks up exactly where it left off.

4. A laptop GPU was 100x too slow

The integrated GPU processed 500MB of documents in 4-5 hours. At that rate, 451GB would take months.

The fix: Rented an NVIDIA RTX 4000 server from Hetzner (20GB of GPU memory). Total rental cost for the multi-week indexing job: €184. That's roughly one day of developer salary in Europe.

5. Corrupt files broke the entire pipeline

Broken PDFs, Word files with corrupted macros, and malformed spreadsheets would crash the indexer and halt all processing.

The fix: Added error tolerance — if one file fails, log it and move on. Don't let one bad file stop 738,000 others.

6. 451GB didn't fit on the production server

The 100GB production VM couldn't hold the original files (451GB) plus the search index (54GB) plus the AI model (10GB).

The fix: Moved original documents to Azure cloud storage. The production server only keeps the search index and AI model. When the AI cites a source, it generates a temporary download link to the original file in the cloud.

The final numbers

738,470 document chunks indexed and searchable

54GB search index size

451GB of original source documents

€184 total infrastructure cost (GPU rental)

2-3 weeks of GPU-accelerated indexing time

$0/month ongoing AI costs (everything runs locally)

What the Hacker News crowd added

The HN discussion (288 points, 87 comments) surfaced real-world wisdom from engineers who've built similar systems:

"RAG is not dead — bad RAG is dead." Simple embed-and-search approaches consistently underperform. Combining traditional keyword search with AI-powered semantic search (hybrid search) produces much better results.

Data quality beats everything. The author's own conclusion: "Spend time building the best possible data. If the source is not relevant enough, the LLM won't be able to generate good answers." Multiple commenters agreed — data preparation is 80% of the work.

Stale docs are worse than no docs. Outdated documentation actively misleads AI models, which then give confidently wrong answers. Regular data refreshes are essential.

Try this yourself — the starter stack

If you want to build something similar (smaller scale), here's the minimum setup:

# Install Ollama (runs AI models locally)
curl -fsSL https://ollama.com/install.sh | sh

# Download the AI model and embedding model
ollama pull llama3.2:3b
ollama pull nomic-embed-text

# Install the Python libraries
pip install llama-index chromadb flask streamlit

The author's key advice: start with a small folder of documents (under 1GB), get the pipeline working end-to-end, then scale up. Don't try to index everything on day one.

Who this matters to

If you're a small business drowning in years of files nobody can search — this proves you can build a private AI search tool for under €200. If you're a developer considering RAG for the first time — this is the most honest guide available, failures included. If you're evaluating enterprise AI search tools that cost $50K+ per year — this shows what's possible with free, open-source alternatives.

The full article with code snippets and architecture diagrams is on Andros Fenollosa's blog.

Related Content — Get Started with Easy Claude Code | Free Learning Guides | More AI News

Sources

Stay updated on AI news

Simple explanations of the latest AI developments