
Comfyui Local First
Part 1/2 of the n8n AI Studio Journey Series
The Vision: Why Local-First AI Matters
Let me be honest upfront: this isn't a "weekend project" story. This is months of part-time learning, breaking things, starting over, and slowly building something that actually works. If you've ever tried to self-host GPU-accelerated AI services, you know exactly what I mean.
I'm building what I call an "n8n AI Studio"—a local-first Docker environment that orchestrates multiple AI services to create content pipelines. The goal? Go from a text prompt to a fully produced talking avatar video, entirely on my own hardware, with no API dependencies on external services.
Why local-first?
Privacy: My content, my data, my control
Cost: No per-request charges eating into budgets
Flexibility: Experiment without worrying about rate limits
Learning: Understanding these systems from the ground up
The stack includes ComfyUI for image generation, Chatterbox TTS for voice synthesis, Remotion for video composition, and n8n for workflow automation—all running in Docker containers on a single RTX 3090 24GB GPU.
But before any of that sophisticated orchestration can happen, I need to get ComfyUI working. And that's what this post is about.
The Challenge: Why This Isn't Plug-and-Play
If you've looked at ComfyUI's GitHub and thought "oh, just pull the image and run it," let me share what I've learned: getting the container to start is about 10% of the journey.
Here's what makes this genuinely challenging:
1. GPU Passthrough Complexity
Docker + NVIDIA GPU passthrough + proper CUDA memory management isn't as simple as --gpus all. You're managing:
NVIDIA Container Toolkit configuration
CUDA memory allocation strategies (my RTX 3090 shares resources across multiple services)
Proper device capabilities (
compute,utility,video,graphics)Security exceptions (GPU access requires
no-new-privileges:false)
2. Model Management Nightmare
ComfyUI is nothing without models. And "downloading models" sounds simple until you realize:
Foundational models alone are 150GB+
Many models are gated (requires Hugging Face authentication and ToS acceptance)
Download speeds vary wildly (even with HF Transfer acceleration)
Organizing models across drives (I'm using a multi-drive setup: code on SSD, models on HDD)
Verifying integrity without built-in checksums
3. Multi-Service Architecture
ComfyUI doesn't run in isolation. In my setup, it needs to:
Communicate with n8n for workflow triggers (internal Docker DNS)
Share output directories with downstream services (bind mounts)
Respect shared GPU memory limits (other services need VRAM too)
Proxy through Nginx for local network access
4. Documentation Gaps
The ComfyUI community is amazing, but documentation is scattered:
Official docs focus on desktop installation
Docker examples are outdated or incomplete
Custom node installation is trial-and-error
Multi-model-path configuration barely documented
I'm not complaining—this is open source at its finest. But it means you're reading logs, debugging YAML syntax, and learning by breaking things.
My Approach: Infrastructure First, Capabilities Second
After several false starts, I settled on a philosophy: build the foundation rock-solid before adding complexity.
Phase 1: Infrastructure (Where I Am Now)
Get ComfyUI running with foundational models:
✅ Docker environment configured (v27.5.1)
✅ NVIDIA Container Toolkit working
✅ Multi-drive model storage setup
✅ Web UI accessible
🔄 Model downloads in progress 11 October 2025
See the Model list I have decided on and busy gatheringCustom HTML/CSS/JAVASCRIPT⏳ First inference test pending
Phase 2: Text-to-Image Foundation
Master the basics before adding video:
FLUX.1-dev for high-quality T2I generation
LoRA integration for style control
Upscaling workflows (SUPIR + RealESRGAN)
Batch generation pipelines
Phase 3: Image-to-Video Expansion
Add temporal dimension:
WAN 2.2 video generation models
Motion consistency workflows
Frame interpolation techniques
Phase 4: Talking Avatar Integration
The final piece:
InfiniteTalk lip-sync generation
Audio-driven animation (Chatterbox TTS integration)
Full pipeline: text → speech → avatar video
Phase 5: Production Hardening
Make it reliable:
Error recovery mechanisms
Queue management optimization
Monitoring and alerting
Documentation for future me
Why this sequence? Each phase builds on the previous one. If text-to-image doesn't work reliably, adding video generation just multiplies the failure points.
Where We Are Now: The Model Download Marathon
Here's the current state of affairs:
✅ What's Working
Docker Container: ComfyUI boots successfully
GPU Detection: RTX 3090 visible and CUDA operational
Web UI: Accessible at
http://192.168.1.13:8188(direct) andhttp://192.168.1.13:8080(via Nginx)Network Integration: Internal Docker DNS resolves
comfyui-mainfrom other containersVolume Mounts: Both internal storage and external model drive mounted correctly
🔄 In Progress: The Model Download Reality Check
This is where theory met reality.
I knew the models were large. I expected to download 150GB+. What I didn't fully appreciate was:
The Scale of Individual Models:
FLUX.1-dev: 24GB (overnight download)
WAN 2.2 T2I: 20GB (another overnight)
InfiniteTalk 14B: 28GB (you get the idea)
Plus dozens of smaller models (encoders, VAEs, LoRAs, ControlNets)
The Waiting Game: When you kick off a 24GB download at 2AM hoping it'll be done by morning, you need to know:
Is it still downloading or did it stall?
How much progress has been made?
What's the transfer rate?
How many files have actually arrived?
I built a simple monitoring script (monitor_download.sh) that checks directory size, file count, and calculates progress. It's not sophisticated, but it's honest feedback: "Yes, it's still working. No, it's not frozen. Yes, you should go to bed."
The Storage Dance: My setup uses:
Primary SSD: Docker volumes, code, configs
External HDD: Model storage (145GB+ and counting)
ComfyUI needs to know about both. Enter extra_model_paths.yaml—a configuration file that maps multiple storage locations so ComfyUI can discover models across drives.
main_models:base_path: /comfy/mnt/ComfyUI/models/external_drive:base_path: /media/inky/abc1/models/checkpoints: checkpoints/loras: loras/[... more paths ...]Getting this right was crucial. Get it wrong and ComfyUI can't find your painstakingly downloaded models.
⏳ What's Next
Once the foundational models finish downloading:
First Inference Test: Can I generate a simple image from a text prompt?
Custom Node Installation: Add WanVideoWrapper, InfiniteTalk, SUPIR
Workflow Testing: Build and export basic T2I workflows
N8N Integration: Connect ComfyUI to automation pipelines
Lessons So Far
1. Patience Is a Technical Skill
Large model downloads aren't a coffee break—they're overnight affairs. Plan accordingly. Start downloads before bed, not before important calls.
2. Monitoring Tools Prevent Anxiety
Building that download monitor wasn't just useful—it was necessary for my sanity. When you're downloading 24GB files, "I think it's working" isn't good enough.
3. Documentation Is Your Future Self's Best Friend
I'm writing this blog post partly for you, but mostly for me in three months when I need to rebuild this because I upgraded Docker and broke everything.
4. The Community Is Gold
Every solution I've implemented builds on someone else's work:
Docker image:
mmartial/comfyui-nvidia-dockerModel organization strategies from Reddit threads
CUDA optimization tips from GitHub issues
Nginx configurations adapted from production setups
Standing on the shoulders of giants isn't just a phrase—it's how this gets done.
5. Breaking Things Is Part of Building
I've rebuilt this setup three times. Each iteration taught me something:
Iteration 1: Focused on getting it running (any way possible)
Iteration 2: Focused on making it maintainable (proper configs, secrets management)
Iteration 3 (current): Focused on making it shareable (documentation, reproducibility)
What's Next: The Roadmap Forward
Immediate (This Week)
✅ Complete foundational model downloads
⏳ Run first text-to-image inference test
⏳ Verify model discovery in ComfyUI UI
⏳ Export working T2I workflow as JSON
Short-Term (Next 2 Weeks)
Install and test custom nodes (WanVideo, InfiniteTalk, SUPIR)
Build image-to-video workflow
Test upscaling pipeline
Document n8n integration patterns
Next Blog Post
"First Inference: From Prompt to Pixels"
The moment of truth: does it actually work?
Troubleshooting the inevitable issues
Performance benchmarks (how fast is generation on RTX 3090?)
What works, what doesn't, what surprised me
Resources & Following Along
If you're building something similar or want to replicate this setup:
📄 Technical Documentation
Deep-dive technical specs, YAML configs, and command references:
ComfyUI Production Setup Roadmap (Google Doc)
🤖 Model Download Reference
Complete list of models, download commands, and monitoring tools:
ComfyUI Foundational Models
💬 Let's Connect
Building something similar? Hit a different issue? I'd love to hear about your setup. Drop a comment or reach out—this journey is better when it's shared.
Final Thoughts
This isn't a "look how easy it is" post. It's a "here's what I'm learning" post.
I'm documenting this journey because:
Future me will need these notes
Someone else is facing the same challenges right now
Transparency builds better solutions
The goal isn't perfection—it's progress. And right now, progress looks like:
A working ComfyUI container ✅
Models slowly accumulating on disk 🔄
A clear path forward ⏳
Next stop: first inference test. That's when we find out if all this infrastructure work actually... works.
Stay tuned for Part 2: "First Inference: From Prompt to Pixels"
This is Part 1 of a multi-part series documenting the build of a local-first AI content studio. Follow along as I share the wins, the failures, and the lessons learned along the way.
Series Navigation:
Part 1: ComfyUI Foundation (You are here)
Part 2: First Inference (Coming soon)
Part 3: Image-to-Video Pipeline (Planned)
Part 4: Talking Avatar Integration (Planned)
Part 5: Production Lessons Learned (Planned)
