AI development, full-stack engineering, and infrastructure consulting.
From prototype to production — no hype, just results.
I'm Charles Chen — a software engineer and AI developer who runs his own multi-node GPU cluster and builds production AI systems from scratch. I operate 550+ GB of VRAM across NVIDIA RTX PRO 6000 and DGX Spark hardware, interconnected with high-speed QSFP links — all running local-first AI infrastructure I built myself.
I don't just talk about AI — I train models, build custom inference servers, write Rust-based runtimes, and deploy multi-agent systems that run 24/7. Whether you need a private LLM deployment, a custom AI pipeline, or someone to architect your GPU infrastructure — I've already done it for myself and I can do it for you.
End-to-end engineering across the AI and software stack.
Custom model training, fine-tuning, RAG pipelines, and LLM integration. I run my own GPU fleet and deploy models locally — no cloud bills, no data leaving your network, full control.
End-to-end application development from database design to polished UIs. I build fast, reliable software with clean architecture that your team can maintain.
On-prem GPU clusters, networking, and deployment automation. I build the same infrastructure I run daily — multi-node GPU fleets with automated health monitoring, auto-restart, and zero-downtime serving.
Architecture reviews, technology strategy, and team mentoring. I help engineering teams make better technical decisions and ship faster.
Extract structured data from PDFs, images, and scanned documents using local OCR and LLM pipelines. No cloud APIs, no data leaving your network — everything runs on-premise with GPU acceleration.
Build local semantic search systems with vector embeddings, hybrid retrieval (BM25 + cosine similarity), and intelligent ranking. Power your apps with meaning-aware search that runs entirely on your hardware.
Real projects, real infrastructure, all running in production.
Designed and operate a private AI compute cluster spanning 550+ GB VRAM across 3x NVIDIA RTX PRO 6000 (294 GB) and 2x DGX Spark (128 GB each), interconnected via high-speed QSFP links. Runs 28 parallel inference workers across Ollama and vLLM backends with automated fleet management, health monitoring, and auto-restart capabilities. Processes thousands of AI requests daily with zero cloud dependency.
Built a high-performance Rust runtime for orchestrating AI agents with an event-driven architecture using Tokio, broadcast channels, and a custom EventBus. Features a deep research pipeline that analyzes codebases across the GPU fleet, extracting 21 structured fields per file with semantic embeddings and cross-project pattern detection.
Built an autonomous knowledge extraction pipeline that analyzes 50+ open-source repositories across the GPU fleet. Extracts patterns into a 50+ table knowledge base with 542K+ pattern rows and 32K+ semantic embeddings. Uses hybrid search (FTS5 BM25 + vector cosine similarity + RRF fusion) to surface actionable gap analysis, then auto-executes upgrades in prioritized waves.
Architected a team of 7 specialized AI agents that coordinate autonomously — handling project management, code generation, testing, security audits, ops, and research. Each agent maintains persistent memory across sessions with a shared knowledge base of 3,300+ entries powering cross-agent learning.
Distributed code analysis system that processes repositories at ~21 files/min across 28 GPU workers running qwen3-coder-next 80B MoE. Extracts 21 structured fields per file — architecture patterns, security signals, performance optimizations, concurrency patterns, and more. Fully resumable with incremental git-diff updates and automated database syncing.
Built end-to-end embedding and semantic search pipelines running entirely on local hardware. Uses nomic-embed-text for 768-dim vector generation, Qdrant for vector similarity, and custom RRF fusion combining BM25 full-text with cosine similarity. Powers cross-project pattern discovery, deduplication, and intelligent retrieval across 32K+ embedded documents.
Built and deployed custom MCP servers that extend AI assistants with persistent memory, tool integrations, and context management. Enables AI agents to maintain cross-session knowledge, search structured databases, and coordinate through shared protocols — all running locally with no external API dependencies.
Document processing pipeline combining OCR, layout analysis, and LLM-powered extraction for structured data capture from PDFs, images, and scanned documents. Runs entirely on local GPU infrastructure — no cloud OCR APIs, no data leaving the network. Handles legal documents, invoices, and complex multi-page layouts.
Database architecture and analysis across massive datasets — schema design, query optimization, migration pipelines, and analytics dashboards. Built unified knowledge bases spanning 50+ tables with FTS5 full-text search, B-tree indexing strategies, and cross-database synchronization handling 542K+ rows with sub-second query times.
Automated fleet management system with NVML-based GPU discovery, thread-safe endpoint pooling with quarantine/restore lifecycle, watchdog health monitoring, and auto-restart of crashed inference servers. Tracks GPU temps, VRAM utilization, inference latency, and data quality metrics across the entire cluster in real-time.
Full bare-metal server setup, OS installation, driver configuration, and service porting across multi-node environments. NVIDIA driver stacks, CUDA toolkits, networking (Tailscale mesh, QSFP direct links), containerized deployments, and migrating workloads between machines with zero downtime. From racking hardware to running inference.
Full-stack AI platform for a law firm, featuring intelligent document analysis, case research automation, and client management. Combines custom LLM pipelines with domain-specific fine-tuning for legal document understanding and generation.
Full-stack e-commerce platform with product catalog management, inventory tracking, order processing, and customer-facing storefront. Integrated payment processing, search functionality, and responsive design optimized for mobile shoppers.
Professional website design and development — from marketing landing pages to full web applications. Responsive design, SEO optimization, Cloudflare deployment, custom domains, email routing setup, and ongoing maintenance. Fast, clean, and built to convert.
Have a project in mind? I'm always interested in hearing about new challenges — whether it's a greenfield AI project, a complex infrastructure problem, or scaling an existing system.