Available for new projects

Building intelligent systems
that actually ship.

AI development, full-stack engineering, and infrastructure consulting.
From prototype to production — no hype, just results.

550GB+ GPU VRAM Fleet
Local-First AI Infrastructure
Full Stack Prototype to Prod
Scroll

Engineer first.
Consultant second.

I'm Charles Chen — a software engineer and AI developer who runs his own multi-node GPU cluster and builds production AI systems from scratch. I operate 550+ GB of VRAM across NVIDIA RTX PRO 6000 and DGX Spark hardware, interconnected with high-speed QSFP links — all running local-first AI infrastructure I built myself.

I don't just talk about AI — I train models, build custom inference servers, write Rust-based runtimes, and deploy multi-agent systems that run 24/7. Whether you need a private LLM deployment, a custom AI pipeline, or someone to architect your GPU infrastructure — I've already done it for myself and I can do it for you.

On-Prem GPU Fleet 3x RTX PRO 6000 + 2x DGX Spark
Local-First AI Private inference, no cloud dependency
Rust + Python High-performance systems & AI pipelines

Tech Stack

AI / ML

PyTorch Transformers LLMs RAG Fine-tuning DPO/RLHF vLLM Ollama CUDA

Backend

Rust Python Node.js Tokio PostgreSQL SQLite Redis

Frontend

React TypeScript Next.js Tailwind

Infrastructure

NVIDIA GPUs Docker Linux Tailscale Cloudflare CI/CD

What I build.

End-to-end engineering across the AI and software stack.

AI & Machine Learning

Custom model training, fine-tuning, RAG pipelines, and LLM integration. I run my own GPU fleet and deploy models locally — no cloud bills, no data leaving your network, full control.

  • Custom LLM fine-tuning (DPO, RLHF, SFT)
  • RAG systems & semantic knowledge bases
  • Multi-agent AI architectures
  • Model quantization & inference optimization

Full-Stack Development

End-to-end application development from database design to polished UIs. I build fast, reliable software with clean architecture that your team can maintain.

  • Web applications & APIs
  • Database architecture & optimization
  • Real-time systems & microservices
  • Performance engineering

Infrastructure & DevOps

On-prem GPU clusters, networking, and deployment automation. I build the same infrastructure I run daily — multi-node GPU fleets with automated health monitoring, auto-restart, and zero-downtime serving.

  • Multi-GPU cluster design & deployment
  • High-speed interconnects (QSFP, InfiniBand)
  • Fleet management & automated monitoring
  • Local-first AI infrastructure (no cloud lock-in)

Technical Consulting

Architecture reviews, technology strategy, and team mentoring. I help engineering teams make better technical decisions and ship faster.

  • Architecture review & planning
  • AI strategy & feasibility analysis
  • Code audits & tech debt reduction
  • Team training & mentoring

OCR & Document Processing

Extract structured data from PDFs, images, and scanned documents using local OCR and LLM pipelines. No cloud APIs, no data leaving your network — everything runs on-premise with GPU acceleration.

  • Local OCR with GPU acceleration
  • PDF & image data extraction
  • Legal & financial document processing
  • Layout analysis & table extraction

Embeddings & Semantic Search

Build local semantic search systems with vector embeddings, hybrid retrieval (BM25 + cosine similarity), and intelligent ranking. Power your apps with meaning-aware search that runs entirely on your hardware.

  • Vector embedding pipelines
  • Hybrid search (FTS5 + vector)
  • Knowledge base construction
  • Semantic deduplication & ranking

What I've built.

Real projects, real infrastructure, all running in production.

Rust Runtime

Kaiju — AI Agent Runtime Engine

Built a high-performance Rust runtime for orchestrating AI agents with an event-driven architecture using Tokio, broadcast channels, and a custom EventBus. Features a deep research pipeline that analyzes codebases across the GPU fleet, extracting 21 structured fields per file with semantic embeddings and cross-project pattern detection.

RustTokioSQLitePyTorchDocker
Knowledge Pipeline

Perfect Cell — AI Knowledge Extraction

Built an autonomous knowledge extraction pipeline that analyzes 50+ open-source repositories across the GPU fleet. Extracts patterns into a 50+ table knowledge base with 542K+ pattern rows and 32K+ semantic embeddings. Uses hybrid search (FTS5 BM25 + vector cosine similarity + RRF fusion) to surface actionable gap analysis, then auto-executes upgrades in prioritized waves.

PythonSQLite FTS5QdrantEmbeddingsLLMsJinja2
Multi-Agent AI

Autonomous AI Agent Team

Architected a team of 7 specialized AI agents that coordinate autonomously — handling project management, code generation, testing, security audits, ops, and research. Each agent maintains persistent memory across sessions with a shared knowledge base of 3,300+ entries powering cross-agent learning.

LLMsRAGPythonSQLiteEmbeddings
Deep Research

GPU Fleet Code Analysis Engine

Distributed code analysis system that processes repositories at ~21 files/min across 28 GPU workers running qwen3-coder-next 80B MoE. Extracts 21 structured fields per file — architecture patterns, security signals, performance optimizations, concurrency patterns, and more. Fully resumable with incremental git-diff updates and automated database syncing.

PythonOllamavLLMnomic-embed-textSQLite
Embeddings & Search

Local Semantic Search Infrastructure

Built end-to-end embedding and semantic search pipelines running entirely on local hardware. Uses nomic-embed-text for 768-dim vector generation, Qdrant for vector similarity, and custom RRF fusion combining BM25 full-text with cosine similarity. Powers cross-project pattern discovery, deduplication, and intelligent retrieval across 32K+ embedded documents.

nomic-embed-textQdrantFTS5PythonCUDA
MCP Integration

Model Context Protocol Servers

Built and deployed custom MCP servers that extend AI assistants with persistent memory, tool integrations, and context management. Enables AI agents to maintain cross-session knowledge, search structured databases, and coordinate through shared protocols — all running locally with no external API dependencies.

MCPPythonSQLiteLLMsJSON-RPC
OCR & Documents

Local Document Intelligence

Document processing pipeline combining OCR, layout analysis, and LLM-powered extraction for structured data capture from PDFs, images, and scanned documents. Runs entirely on local GPU infrastructure — no cloud OCR APIs, no data leaving the network. Handles legal documents, invoices, and complex multi-page layouts.

OCRPyTorchLLMsPythonCUDA
Database Engineering

Large-Scale Data Analysis & Migration

Database architecture and analysis across massive datasets — schema design, query optimization, migration pipelines, and analytics dashboards. Built unified knowledge bases spanning 50+ tables with FTS5 full-text search, B-tree indexing strategies, and cross-database synchronization handling 542K+ rows with sub-second query times.

SQLitePostgreSQLFTS5PythonMigration Scripts
Fleet Ops

GPU Fleet Management & Monitoring

Automated fleet management system with NVML-based GPU discovery, thread-safe endpoint pooling with quarantine/restore lifecycle, watchdog health monitoring, and auto-restart of crashed inference servers. Tracks GPU temps, VRAM utilization, inference latency, and data quality metrics across the entire cluster in real-time.

PythonNVMLTailscalesystemdThreading
Server & DevOps

Bare-Metal Server Builds & Porting

Full bare-metal server setup, OS installation, driver configuration, and service porting across multi-node environments. NVIDIA driver stacks, CUDA toolkits, networking (Tailscale mesh, QSFP direct links), containerized deployments, and migrating workloads between machines with zero downtime. From racking hardware to running inference.

LinuxNVIDIA DriversDockerTailscalesystemd
Legal AI

Legal Practice AI Platform

Full-stack AI platform for a law firm, featuring intelligent document analysis, case research automation, and client management. Combines custom LLM pipelines with domain-specific fine-tuning for legal document understanding and generation.

LLMsFine-tuningReactNode.jsPostgreSQL
E-Commerce

E-Commerce Marketplace Platform

Full-stack e-commerce platform with product catalog management, inventory tracking, order processing, and customer-facing storefront. Integrated payment processing, search functionality, and responsive design optimized for mobile shoppers.

ReactNode.jsPostgreSQLStripeTailwind
Web Development

Custom Websites & Web Applications

Professional website design and development — from marketing landing pages to full web applications. Responsive design, SEO optimization, Cloudflare deployment, custom domains, email routing setup, and ongoing maintenance. Fast, clean, and built to convert.

HTML/CSS/JSReactCloudflare PagesSEOResponsive

Let's build
something great.

Have a project in mind? I'm always interested in hearing about new challenges — whether it's a greenfield AI project, a complex infrastructure problem, or scaling an existing system.