Cogs

Modular humanoid face platform that can see, hear, speak, emote, recognize people, remember, and dream

Humanoid AIComputer VisionVoice SynthesisMemory SystemsDockerPostgreSQLFastAPI

Overview

Cogs is a comprehensive humanoid face platform that runs locally on a laptop and can be extended to real hardware (servos, mic arrays, depth cameras) without changing UIs. It features a modular architecture with 15+ microservices handling vision, perception, speech, emotion, memory, and conversation capabilities.

System Architecture
Modular microservices architecture with Docker Compose

Front End (HDMI-1)

Face UI with Canvas/WebGL display, person bubbles, viseme animations, and status toasts

Back End (HDMI-2)

Control Panel with status dashboard, relationship cards, dream reports, and system metrics

Vision System
  • • Face recognition & tracking
  • • RealSense depth camera support
  • • Person detection & identification
  • • Familiarity scoring
Perception System
  • • Sound pressure level monitoring
  • • Sound source localization
  • • ReSpeaker mic array support
  • • Real-time audio processing
Dialog & TTS
  • • Natural language conversation
  • • Viseme timing generation
  • • Context-aware responses
  • • ElevenLabs voice integration
Memory & Relations
  • • PostgreSQL with pgvector
  • • Semantic & hybrid search
  • • Relationship card system
  • • Interaction history tracking

Dream Mode

Nightly Embedding & Learning
Autonomous learning system that processes memories during idle time

The Dream service schedules re-embedding of relationship cards and updates preferences from conversation transcripts. This allows the system to consolidate memories and improve its understanding of people over time.

OpenAI

Embeddings

pgvector

Vector DB

Semantic

Search

Hardware Development Plan

Two build configurations: fast prototype path and production-ready premium build

Prototype / Budget Build
Fastest path to working demo, fully upgradable

$1.5K

- $1.8K

Jetson Orin Nano Super (8 GB)

$249 • Starter brain with JetPack 6

Luxonis OAK-D Pro (Wide)

$399 • Depth+RGB+IR, onboard AI

ReSpeaker Mic Array v2.0

$64 • Far-field + DoA/beamforming

1TB NVMe SSD + Micro Servos

8-12 MG90S servos for face/pan/tilt

✓ Upgradable to AGX Orin later
✓ Full software stack included
✓ 3D-printed head shell

Production / Premium Build
Rich awareness, smoother motion, 24/7 operation

$3.2K

- $4.0K

Jetson AGX Orin 64 GB Dev Kit

$1,999 • ~275 TOPS, local RAG/Dream Mode

Luxonis OAK-D Pro + Smart Servos

Dynamixel XL-330/XW with feedback

60 GHz mmWave + VOC/CO₂

Human presence, air quality sensing

2TB NVMe + Production Shell

Shielding, serviceability, premium finish

✓ 360° situational awareness (opt. LiDAR)
✓ Advanced emotion detection
✓ Nightly dream consolidation

Prototype Build - Bill of Materials
Complete component list for budget build (no LiDAR/VOC)
SubsystemPart / ModelQtyEst. $Notes
ComputeJetson Orin Nano Super (8 GB)1249Starter brain; JetPack 6
StorageNVMe SSD 1 TB (PCIe 4.0)1120Transcripts, embeddings, logs
VisionLuxonis OAK-D Pro (Wide)1399Depth+RGB+IR, onboard AI
Audio InReSpeaker Mic Array v2.0 (USB)164Far-field + DoA/beamforming
Audio OutCompact powered speakers (3.5 mm)130TTS output
Motion MCUTeensy 4.1130Real-time servo control
Servo ExpanderPCA9685 16-ch (opt.)115More PWM channels
ActuatorsMicro servos (MG90S class)8–12~80Face + pan/tilt
DisplaysFront LCD ~11.6″ HDMI IPS1174Face UI
Rear status touch LCD ~7″170Config/diagnostics
Power (servos)5 V 10–20 A regulated PSU175Isolated from Jetson PSU
USB / IOPowered USB 3.0 hub (7-port)150Stable power for OAK-D + mics
Env sensorsBME280 + Ambient light sensor115Comfort + auto-dim
Presence (opt.)60 GHz mmWave human-presence130–45Detect nearby in dark
MechanicalHead shell + mounts (3D-print)1250–500Brackets, trays, covers
Wiring/MiscCables, harness, standoffs, heat-shrink1 set100Build kit
Production Build - Bill of Materials
Complete component list for premium build (no LiDAR)
SubsystemPart / ModelQtyEst. $Notes
ComputeJetson AGX Orin 64 GB Dev Kit11,999~275 TOPS; local RAG/Dream Mode
StorageNVMe SSD 2 TB (PCIe 4.0+)1200Transcripts, embeddings, snapshots
VisionLuxonis OAK-D Pro (Wide)1399Low-light depth; offload inference
Audio InReSpeaker Mic Array v2.0164Far-field + DoA
Audio OutCompact powered speakers130TTS
Audio Fusion SWWhisper + openSMILE + emotion modelSWPipeline (direction+tone+text)
Motion MCUTeensy 4.1130Deterministic PWM + watchdog
Servo ControlPCA9685 or Dynamixel interface115–60Choose per actuator type
ActuatorsSmart servos (Dynamixel XL-330/XW)8–12400–1,200Smoother, feedback, safer
DisplaysFront LCD ~11.6″ HDMI IPS1174Face UI
Rear status touch LCD ~7″170Relationship cards, logs
Presence60 GHz mmWave sensor130–45Human presence/breathing
Env sensorsBME280 + Ambient light sensor115Comfort + auto-dim
Air qualityVOC + CO₂ module120–60Context + safety logging
Situational (opt.)RPLIDAR A2 (2D 360°)1230360° approach awareness
Power (servos)5 V 20 A PSU (fused rail)190Isolated from Jetson
NetworkingPowered USB 3.0 hub + Wi-Fi 6E dongle1 each50 + 60Bandwidth + fast backhaul
MechanicalProduction head shell/brackets1400–800Shielding, serviceability
Dream Mode(nightly jobs; included in SW stack)Summarize/prune/re-index
Config B Vision

OAK-D Pro (Wide) for robust low-light depth and onboard AI acceleration

Audio Fusion

ReSpeaker → VAD/DoA/SPL → Whisper ASR → openSMILE/emotion → fused event

Dream Mode

Nightly summarization, pruning, vector re-index for long-term relationship memory

Expansion Ready: Headers reserved for LiDAR and VOC/CO₂ sensors. Add them later without rewiring.

Microservices Architecture

Face UI
:8070

Front-facing display

Control Panel
:8090

Operator interface

Vision
:8085

Face recognition

Perception
:8086

Audio processing

TTS
:8087

Speech synthesis

Anim
:8089

Servo control

Relations-PG
:8092

Relationship DB

Dialog
:8093

Conversation AI

Dream
:8096

Memory consolidation

Telemetry
:8095

System metrics

Technology Stack

FastAPI

Backend

PostgreSQL

Database

Docker

Containers

Node.js

Frontend