I Run My Own ChatGPT on a Mini ITX PC and It Works
troysk
May 27, 2026 · 3 min read
I use large language models every day for coding and writing and asking questions I could Google but the AI answers faster, and I was not comfortable with my conversations sitting on OpenAI’s servers where every prompt and every code snippet and every half-baked idea becomes part of their training data. So I moved my AI to my own hardware using Ollama, which runs large language models locally, and Open WebUI, which gives it a ChatGPT-like interface, and together they replace ChatGPT on your own server.
You do not need a datacenter or an expensive GPU to run this. My refurbished Intel i3 8th gen in a Mini ITX cabinet with 32 gigs of RAM runs some genuinely impressive models. The DeepSeek V4 Flash distill, specifically Qwen3.5-9B-DeepSeek-V4-Flash-GGUF from the Jackrong collection, is a 9 billion parameter model that needs about 6 gigs of storage and 10 gigs of RAM at Q4 quantization, and it handles coding and reasoning and writing at a level that is hard to believe is running on hardware that sits in my closet. I run this model daily and it works great.
The Docker setup runs Ollama as one service and Open WebUI as another. Ollama exposes an API on port eleven thousand four hundred thirty-four, and Open WebUI connects to it through the OLLAMA_BASE_URL environment variable. You run docker compose up, pull a model through the Ollama CLI, and Open WebUI gives you a familiar chat interface connected to a model running on your own hardware. The first time I saw that interface responding to my prompts from a machine in my closet I felt like I had hacked the matrix.
Open WebUI lets you switch between different models mid-conversation, which is useful because different models excel at different tasks. My daily driver is the DeepSeek V4 Flash distill running on Ollama, specifically the Qwen3.5-9B-DeepSeek-V4-Flash-GGUF model from the Jackrong collection on Hugging Face. This thing is a 9 billion parameter model distilled from DeepSeek V4 and it punches way above its weight class, handling coding and reasoning and creative writing with a quality that honestly surprised me for something running on consumer hardware.
The RAG feature is what makes this truly useful for daily work. You upload documents like PDFs or text files through the chat interface and ask questions about them, and Open WebUI embeds the document content into the prompt so the model can answer based on it. I use this for summarizing long articles and asking questions about technical documentation and analyzing server logs, and no data ever leaves my server.
Other tools can use Ollama’s API directly for code completion in editors or workflow automation through n8n or any custom integration you build. I have an n8n workflow that summarizes my RSS feeds every morning using Ollama, running at six AM and sending me a digest, all local and all private.
The DeepSeek V4 distill models are genuinely impressive, they pack near-frontier reasoning into a size that runs comfortably on a machine with 32 gigs of RAM. I have replaced ChatGPT entirely with local models and I do not feel like I am compromising on quality. The conversations stay on my hardware, there is no subscription fee, and the models keep getting better every few weeks as the open source community releases new distillations. You do not need a GPU or a datacenter, just a small ITX PC and fifteen minutes to pull your first model.
If this resonates why not subscribe to the newsletter? I write about self-hosting and local AI.
Get New Articles
Weekly guides on self-hosting, privacy, and infrastructure.
No spam. Unsubscribe anytime.