Planning

Local RAG Knowledge Base

A self-hosted AI knowledge base using Ollama and AnythingLLM for private, local document retrieval and chat.

OllamaAnythingLLMRAGLLM

The Idea

I wanted something that felt like ChatGPT, but for my own data, without sending anything to the cloud. I’ve spent years building systems and documenting things, but finding that information later is always harder than it should be. Search works when you know what you're looking for. It falls apart when you don’t.

The idea here is simple: take everything I care about (documents, notes, configs, random files, eventually email and shared drives) and make it searchable through natural language. Ask a question, get an answer, and know where that answer came from.

I also wanted to remove the dependency on tokens and external APIs. Not because they’re bad, but because they come with tradeoffs: cost, latency, and most importantly, data leaving your environment. This system is meant to run entirely local. No usage limits. No surprises.

At a higher level, this is less about building a tool and more about learning how modern AI systems actually work under the hood, especially the kind companies are starting to deploy internally.

The Stack

Core Stack

  • Ollama: local LLM inference
  • AnythingLLM: RAG pipeline + UI
  • Docker: container orchestration
  • Ubuntu Server: host OS

Hardware

  • RTX 3070: primary inference GPU
  • 64GB ECC RAM: memory for embeddings + vector DB
  • NVMe storage: fast retrieval + indexing

Planning

I forced myself to keep this simple. It’s really easy to over-engineer this kind of project before you even know what’s useful. The first version is a single-node system:

  • Ollama for running models locally
  • AnythingLLM for ingestion, embeddings, and chat
  • Docker on Ubuntu to keep everything contained and repeatable

Hardware-wise, I’m using a repurposed server with 64GB of RAM and an RTX 3070 for inference. It’s more than enough to get something real working without turning this into a science project.

I’m being intentional about what gets indexed. I’m not trying to dump an entire filesystem into a vector database and hope for the best. Instead, I’m starting with curated data (documents that actually matter) and expanding from there.

The plan is to layer this out over time:

  • Start with local files and a clean knowledge base
  • Add Synology and structured folders
  • Integrate Google Drive and Gmail with filtering
  • Eventually bring in system data and logs

Longer term, this is also about portability. I want something that can evolve from a home lab into something I could realistically stand up inside a company, on hardware we already own.

Execution

Right now, this is in the build phase. The initial focus is getting the core loop working end-to-end:

  • Ingest documents
  • Chunk and embed them
  • Store them in a vector database
  • Query them through a local model

That sounds straightforward, but the details matter. Chunking strategy, embedding quality, and filtering have a bigger impact on results than the model itself. A smaller model with good context consistently beats a larger model with bad data.

I’m also paying attention to performance. Local systems don’t have token limits, but they do have real constraints: GPU memory, disk speed, and indexing time. So part of this is figuring out what actually scales and what just looks good on paper.

The goal isn’t to make something flashy. It’s to make something I’d actually use. If I can’t rely on it to answer real questions about my own data, it’s not done.

What I’m Learning

  • Most of the value comes from the data pipeline, not the model
  • Indexing everything is worse than indexing the right things
  • RAG systems are only as good as their context quality
  • Local AI trades cost limits for infrastructure responsibility
  • Simple architectures are easier to evolve than “perfect” ones

What’s Next

  • Improve ingestion with incremental updates instead of full re-indexing
  • Add better metadata and filtering for more accurate retrieval
  • Split data into logical workspaces (personal vs work)
  • Experiment with agents for multi-step workflows
  • Test a multi-node setup using spare server hardware

Why This Matters

This isn’t just a home lab project. It’s a way to understand how AI can actually be used inside an organization without handing data off to third parties. If this works the way I expect, the same pattern can be applied to:

  • internal documentation
  • support knowledge bases
  • operational data
  • institutional knowledge that usually gets lost over time

The end goal is simple: make information easier to access, without adding more tools or complexity.