Charles Chen | AI Development & Software Consulting

About

Engineer first.
Consultant second.

I'm Charles Chen — a software engineer and AI developer who runs his own multi-node GPU cluster and builds production AI systems from scratch. I operate 550 GB of VRAM across NVIDIA RTX PRO 6000 Blackwell and DGX Spark hardware, interconnected via a 200Gbps QSFP mesh with verified RoCE v2 RDMA. I built my own fleet control plane in Rust, indexed 1,014 codebases into a 37GB knowledge system, and run everything local-first with zero cloud dependency.

I don't just talk about AI — I build the infrastructure, write the control planes, design the knowledge pipelines, and deploy the models. My fleet runs 24/7 serving inference at 110+ tok/s per GPU. Whether you need a private LLM deployment, a knowledge extraction system, or someone to architect your GPU infrastructure — I've already built it at scale for myself.

                                
                                On-Prem GPU Fleet
                                3x RTX PRO 6000 + 2x DGX Spark

                                
                                Local-First AI
                                Private inference, no cloud dependency

                                
                                Rust + Python
                                High-performance systems & AI pipelines

Tech Stack

AI / ML

PyTorch Transformers LLMs RAG Agentic AI MCP vLLM Qdrant tree-sitter NVFP4/FP8 CUDA 13

Backend

Rust Python Node.js Tokio PostgreSQL SQLite Redis

Frontend

React TypeScript Next.js Tailwind

Infrastructure

NVIDIA Blackwell DGX Spark vLLM sglang RDMA/RoCE NCCL Docker systemd Cloudflare

Services

What I build.

End-to-end engineering across the AI and software stack.

AI & Machine Learning

Custom model training, fine-tuning, RAG pipelines, and LLM integration. I run my own GPU fleet and deploy models locally — no cloud bills, no data leaving your network, full control.

Custom LLM fine-tuning (DPO, RLHF, SFT)
RAG systems & semantic knowledge bases
Multi-agent AI architectures
Model quantization & inference optimization

Full-Stack Development

End-to-end application development from database design to polished UIs. I build fast, reliable software with clean architecture that your team can maintain.

Web applications & APIs
Database architecture & optimization
Real-time systems & microservices
Performance engineering

Infrastructure & DevOps

On-prem GPU clusters, networking, and deployment automation. I build the same infrastructure I run daily — multi-node GPU fleets with automated health monitoring, auto-restart, and zero-downtime serving.

Multi-GPU cluster design & deployment
High-speed interconnects (QSFP, InfiniBand)
Fleet management & automated monitoring
Local-first AI infrastructure (no cloud lock-in)

Technical Consulting

Architecture reviews, technology strategy, and team mentoring. I help engineering teams make better technical decisions and ship faster.

Architecture review & planning
AI strategy & feasibility analysis
Code audits & tech debt reduction
Team training & mentoring

OCR & Document Processing

Extract structured data from PDFs, images, and scanned documents using local OCR and LLM pipelines. No cloud APIs, no data leaving your network — everything runs on-premise with GPU acceleration.

Local OCR with GPU acceleration
PDF & image data extraction
Legal & financial document processing
Layout analysis & table extraction

Embeddings & Semantic Search

Build local semantic search systems with vector embeddings, hybrid retrieval (BM25 + cosine similarity), and intelligent ranking. Power your apps with meaning-aware search that runs entirely on your hardware.

Vector embedding pipelines
Hybrid search (FTS5 + vector)
Knowledge base construction
Semantic deduplication & ranking

Work

What I've built.

Real projects, real infrastructure, all running in production.

Flagship — Knowledge System

The Librarian — Universal Code Intelligence

Built a universal knowledge system in Rust that indexes 1,014 open-source codebases (37 GB, 1.8M+ pages, 4.4M+ chunks, 11M+ structural entities, 40K+ semantic entities). Three-tier retrieval: hybrid search (BM25 + vector), agentic self-correcting loops, and graph-walk verification. Semantic extraction pipeline runs across 5 GPUs concurrently using Qwen3.5 with structured grammar output. Features community detection, topic modeling, concept graphs, and 32 MCP tools for AI-assisted development. The largest personal code knowledge base ever built.

RustSQLite (37GB)QdrantBGE-M3vLLMtree-sitterPageRankMCP

Flagship — Fleet Control

Enterprise — GPU Fleet Control Plane

Built a Rust fleet control plane managing 3 machines with 550 GB VRAM across NVIDIA RTX PRO 6000 and DGX Spark hardware. Cache-aware routing via consistent hash ring for KV cache reuse, lock-free circuit breakers (atomic CAS state machine), batch dispatch across all 5 GPUs weighted by throughput, Spark-to-Spark tensor parallelism orchestration, and RDMA/RoCE network awareness (112 Gb/s verified). Dynamic worker registration for cloud GPUs, historical metrics with 7-day retention, and a web dashboard. 8,000 LOC, 13 unit tests, 12 MCP tools.

RustAxumSQLiteRoCE/RDMANCCLvLLMsglangMCP

GPU Infrastructure

Private 550 GB GPU Cluster

Designed and operate a private AI compute cluster: 3x RTX PRO 6000 Blackwell (294 GB) + 2x DGX Spark GB10 (256 GB), interconnected via 200Gbps QSFP mesh with verified RoCE v2 at 112 Gb/s and 1.5µs latency. Runs vLLM with prefix caching and NVFP4 quantization, serving ~110 tok/s per mcqueen GPU. Full RDMA stack (ConnectX-6/7), automated fleet management via Enterprise, and zero cloud dependency.

NVIDIA BlackwellDGX Spark200Gbps QSFPRoCE v2vLLMCUDA 13

Rust AI CLI

Klaus Code — AI Development CLI

Full-featured AI CLI in Rust with streaming REPL, autonomous tool calling, plan mode, and multi-agent orchestration. Plugin system with marketplace, MCP client integration, and fleet-aware endpoint discovery via Enterprise. Workspace architecture with separate core library and CLI binary. v2.1.0.

RustTokioSQLiteMCPvLLMStreaming

DGX Ecosystem

DGX Code — NVIDIA Fleet CLI

Klaus fork specialized for the NVIDIA DGX ecosystem. Jensen personality, Volt theme, fleet management commands, MCP client, ACP server. Manages GPU allocation, model deployment, and inference across DGX Spark nodes. v0.9.0.

RustDGX SparkMCPACPFleet Mgmt

Tax AI

Clawoitte — AI Tax Optimization

Agentic tax optimization system with three layers: L0 (categorize), L1 (personal → business reclassification), L2 (cross-category optimization). Four A2A specialist agents handle retirement, deductions, credits, and categorization. LLM-powered reasoning suggests savings that static rules miss. 278 tests, 15 profiles.

PythonLLMsA2A ProtocolSQLite

Open Source

Second Opinion — Claude Code Plugin

Open-source Claude Code plugin that assembles 5 dynamic expert reviewers to critique AI-generated plans. Reviewers are domain-matched with constructive and adversarial perspectives. Two-pass review with severity tracking and delta reporting. v1.3.0, published on GitHub.

PythonMCPClaude CodeLLMs

Mobile

Tutorify — Voice-First Study App

Flutter mobile app for voice-first learning with STT/TTS, voice commands, and AI-powered tutoring. Students interact by speaking — the app transcribes, processes via LLM, and speaks back. v1.1.0.

FlutterDartSTT/TTSLLMs

Legal AI

Legal Practice AI Platform

Full-stack AI platform for a law firm — document analysis, case research automation, and client management. Custom LLM pipelines with domain-specific fine-tuning for legal document understanding.

LLMsFine-tuningReactNode.jsPostgreSQL

E-Commerce

Amazon FBA & Merch Pipeline

End-to-end Amazon product management with SP-API and Ads API integration. Merch design pipeline with AI image generation, inventory tracking, and automated listing optimization.

PythonAmazon SP-APIAds APIImage Gen

Web Development

Custom Websites & Web Applications

Professional website design and development — from marketing landing pages to full web applications. Responsive design, SEO, Cloudflare deployment, custom domains, and ongoing maintenance.

HTML/CSS/JSReactCloudflare PagesSEO

Building intelligent systems
that actually ship.

Engineer first.
Consultant second.

Tech Stack

AI / ML

Backend

Frontend

Infrastructure

What I build.

AI & Machine Learning

Full-Stack Development

Infrastructure & DevOps

Technical Consulting

OCR & Document Processing

Embeddings & Semantic Search

What I've built.

The Librarian — Universal Code Intelligence

Enterprise — GPU Fleet Control Plane

Private 550 GB GPU Cluster

Klaus Code — AI Development CLI

DGX Code — NVIDIA Fleet CLI

Clawoitte — AI Tax Optimization

Second Opinion — Claude Code Plugin

Tutorify — Voice-First Study App

Legal Practice AI Platform

Amazon FBA & Merch Pipeline

Custom Websites & Web Applications

Let's build
something great.

Building intelligent systems that actually ship.

Engineer first.Consultant second.

Tech Stack

AI / ML

Backend

Frontend

Infrastructure

What I build.

AI & Machine Learning

Full-Stack Development

Infrastructure & DevOps

Technical Consulting

OCR & Document Processing

Embeddings & Semantic Search

What I've built.

The Librarian — Universal Code Intelligence

Enterprise — GPU Fleet Control Plane

Private 550 GB GPU Cluster

Klaus Code — AI Development CLI

DGX Code — NVIDIA Fleet CLI

Clawoitte — AI Tax Optimization

Second Opinion — Claude Code Plugin

Tutorify — Voice-First Study App

Legal Practice AI Platform

Amazon FBA & Merch Pipeline

Custom Websites & Web Applications

Let's buildsomething great.

Building intelligent systems
that actually ship.

Engineer first.
Consultant second.

Let's build
something great.