Cogs

The Affect-Aware AI

Bridging the gap between artificial intelligence and human emotional cognition through local processing and persistent memory

Research PlatformAI CognitionEmotional Intelligence23 MicroservicesMemory SystemsRelationship LearningPrivacy-FirstOpen Research

Research Hypothesis

Can AI develop true cognition and emotional awareness through relationship-building rather than just responding to emotional prompts?

50,000+ lines of code40+ docsActive Research 2024-2025

Try Cogs Live Demo Request GitHub Access Share Feedback

Note: Live demo access requires credentials. Contact for research access.

Four Pillars of AI Cognition

Cogs explores whether AI can develop genuine cognition through these fundamental principles

Develop Relationships Over Time

• Build familiarity through repeated interactions

• Remember past conversations and contexts

• Recognize behavioral patterns and preferences

Learn from Interactions

• Adapt responses based on feedback

• Build models of individual communication styles

• Understand personal contexts (location, time, activities)

Know WHEN to Respond with Emotion

• Move beyond "emotionally prompted" responses

• Understand appropriate emotional contexts

• Develop genuine emotional intelligence vs. emotional mimicry

Build Contextual Understanding Through Memory

• Store and retrieve relevant past experiences

• Connect current situations to historical patterns

• Use context to inform emotional responses

Multi-Modal Humanoid Interface

Cogs is designed as a modular humanoid face platform that can see, hear, speak, emote, recognize people, remember, and dream. The platform runs locally on a laptop or Jetson Nano and can be extended to real hardware (servos, mic arrays, depth cameras) without changing UIs.

23 microservices work together to create a cohesive AI system that builds relationships, learns boundaries, and develops contextual emotional awareness over time.

System Architecture

Modular microservices architecture with Docker Compose

Front End (HDMI-1)

Face UI with Canvas/WebGL display, person bubbles, viseme animations, and status toasts

Back End (HDMI-2)

Control Panel with status dashboard, relationship cards, dream reports, and system metrics

Vision System

• Face recognition & tracking
• RealSense depth camera support
• Person detection & identification
• Familiarity scoring

Perception System

• Sound pressure level monitoring
• Sound source localization
• ReSpeaker mic array support
• Real-time audio processing

Dialog & TTS

• Natural language conversation
• Viseme timing generation
• Context-aware responses
• ElevenLabs voice integration

Memory & Relations

• PostgreSQL with pgvector
• Semantic & hybrid search
• Relationship card system
• Interaction history tracking

CESS - Cogs Emotional State System

Cogs maintains its own internal emotional state using four "needs buckets" that decay over time, creating authentic emotional responses based on internal state rather than just mirroring user emotions.

Connection

Social need, loneliness

Decay: -1 per hour

Impact: Initiates conversation when low

Competence

Achievement, confidence

Decay: -0.5 per day

Impact: Seeks validation when low

Curiosity

Growth, learning

Decay: -0.5 per day

Impact: Asks speculative questions when low

Safety

Stability, calmness

Decay: No decay

Impact: More cautious when low

Derived Moods

The system creates authentic emotional responses based on Cogs' internal state: Flourishing, Satisfied, Anxious, Lonely, Restless, Drained, and more—rather than just mirroring user emotions.

Personality Learning System

Cogs learns boundaries and preferences through interaction, developing genuine understanding of what's appropriate rather than following scripted responses.

Boundary Detection

• Recognizes when questions are inappropriate

• Stores boundary violations (severity 1-5 scale)

• Never asks about sensitive topics again

Genuine Apologies

• Understands why something was wrong

• Not scripted responses

• Learns from mistakes

Topic Sensitivity Tracking

• Remembers what topics to avoid

• Tracks sensitivity levels (1-5 scale)

• Adapts communication style

Clarification Queue

• Asks permission before clarifying

• One question at a time

• Finds appropriate moments

Example Learning Flow

User: "That's none of your business"

→ Cogs stores boundary (severity=4)

→ Apologizes genuinely

→ Never asks about that topic again

Personal Memory Lake (PML)

Privacy-first system for importing decades of personal data—encrypted at rest, never leaves device

60-230 GB

Raw Data

~15 GB

Embeddings

100%

Local Storage

Data Sources

• Google Takeout: Gmail, Calendar, Photos, Location, Drive

• Amazon: Orders, Kindle, Alexa

• Facebook: Posts, Messages, Photos

Privacy Architecture

• Encrypted at rest (LUKS volumes)

• Never leaves device

• Multi-layer filtering

• Presence-aware access

• Voice authentication required

Dream Mode

Nightly Consolidation Process

Autonomous learning system that processes memories during idle time

The Dream service schedules re-embedding of relationship cards and updates preferences from conversation transcripts. This allows the system to consolidate memories and improve its understanding of people over time.

• Re-embed relationship cards for semantic search

• Discover patterns across conversations

• Update personality preferences

• Consolidate learnings

• Boost Curiosity emotional bucket

• Optimize memory retrieval

OpenAI

Embeddings

pgvector

Vector DB

Semantic

Hardware Evolution

From prototype to production-ready humanoid platform

Current Version (2024-2025)

Development platform with Logitech webcam and Jetson Nano

Current Cogs setup with Logitech webcam, display showing animated face interface, and Jetson Nano hardware

• Logitech C922 webcam for face recognition

• Jetson Nano 4GB development board

• Portable display with animated face UI

• 23 microservices running in Docker

• Full software stack operational

Future Vision (2025-2026)

Production humanoid with Luxonis OAK-D and servo-driven expressions

Future Cogs design with Luxonis OAK-D camera, tablet display, Dynamixel servos, and Jetson AGX Orin in custom enclosure

• Luxonis OAK-D Pro depth camera with onboard AI

• Jetson AGX Orin 64GB for local LLM inference

• Tablet display with animated expressions

• Dynamixel smart servos for smooth motion

• Custom enclosure with pan/tilt mechanism

• ReSpeaker mic array for far-field audio

What Makes Cogs Different

Traditional AI Chatbots

❌ Stateless (no memory between sessions)

❌ Emotionally prompted (respond to sentiment in text)

❌ Context-free (no awareness of situation)

❌ Relationship-blind (treat everyone the same)

Cogs Research Platform

✅ Stateful: Remembers all interactions

✅ Emotionally Aware: Decides when emotion is appropriate

✅ Context-Rich: Knows time, place, weather, activity

✅ Relationship-Oriented: Builds individual profiles

✅ Self-Aware: Has internal emotional state (CESS)

✅ Learning: Adapts based on feedback and boundaries

Early Success Signals

524

Interactions Completed

Recognition

Successfully identifies family members and requests introductions for new faces

Baseline Shifts

Adjusts responses when users shift from "Happy" to "Sad" baselines during test scenarios

Loneliness Detection

"Connect" bucket drains over time, prompting Cogs to express loneliness if ignored

"In real life, when you perceive someone else as emotional, your brain combines signals from your eyes, ears, nose, mouth... An AI model would need much more of this information."

— Lisa Feldman Barrett, Neuroscientist

This is why Cogs integrates multi-modal inputs: vision, voice, context (weather, time, location), and memory to build a richer understanding of emotional states.

Current Capabilities (2025)

What Cogs Can Do Now

✅ Recognize faces and remember people

✅ Maintain conversation history with semantic search

✅ Enrich conversations with weather, location, time context

✅ Track relationship development over time

✅ Maintain internal emotional state (CESS)

✅ Learn boundaries and adapt communication style

✅ Ask clarifying questions thoughtfully

✅ Apologize genuinely when crossing boundaries

✅ Consolidate memories during "dream mode"

✅ Import personal data from Google/Facebook/Amazon

✅ Handle phone calls with voice AI (Twilio)

✅ Generate speech with viseme animation

✅ Control physical servos for facial expressions

What's In Development

🚀 Hume AI emotion detection from voice

🚀 Facial expression emotion analysis

🚀 Proactive conversation initiation

🚀 "On this day" memory surfacing

🚀 Cross-source data correlation (PML)

🚀 Voice authentication for privacy

Hardware Development Plan

Two build configurations: fast prototype path and production-ready premium build

Prototype / Budget Build

Fastest path to working demo, fully upgradable

$1.5K

- $1.8K

Jetson Orin Nano Super (8 GB)

$249 • Starter brain with JetPack 6

Luxonis OAK-D Pro (Wide)

$399 • Depth+RGB+IR, onboard AI

ReSpeaker Mic Array v2.0

$64 • Far-field + DoA/beamforming

1TB NVMe SSD + Micro Servos

8-12 MG90S servos for face/pan/tilt

✓ Upgradable to AGX Orin later
✓ Full software stack included
✓ 3D-printed head shell

Production / Premium Build

Rich awareness, smoother motion, 24/7 operation

$3.2K

- $4.0K

Jetson AGX Orin 64 GB Dev Kit

$1,999 • ~275 TOPS, local RAG/Dream Mode

Luxonis OAK-D Pro + Smart Servos

Dynamixel XL-330/XW with feedback

60 GHz mmWave + VOC/CO₂

Human presence, air quality sensing

2TB NVMe + Production Shell

Shielding, serviceability, premium finish

✓ 360° situational awareness (opt. LiDAR)
✓ Advanced emotion detection
✓ Nightly dream consolidation

Prototype Build - Bill of Materials

Complete component list for budget build (no LiDAR/VOC)

Subsystem	Part / Model	Qty	Est. $	Notes
Compute	Jetson Orin Nano Super (8 GB)	1	249	Starter brain; JetPack 6
Storage	NVMe SSD 1 TB (PCIe 4.0)	1	120	Transcripts, embeddings, logs
Vision	Luxonis OAK-D Pro (Wide)	1	399	Depth+RGB+IR, onboard AI
Audio In	ReSpeaker Mic Array v2.0 (USB)	1	64	Far-field + DoA/beamforming
Audio Out	Compact powered speakers (3.5 mm)	1	30	TTS output
Motion MCU	Teensy 4.1	1	30	Real-time servo control
Servo Expander	PCA9685 16-ch (opt.)	1	15	More PWM channels
Actuators	Micro servos (MG90S class)	8–12	~80	Face + pan/tilt
Displays	Front LCD ~11.6″ HDMI IPS	1	174	Face UI
	Rear status touch LCD ~7″	1	70	Config/diagnostics
Power (servos)	5 V 10–20 A regulated PSU	1	75	Isolated from Jetson PSU
USB / IO	Powered USB 3.0 hub (7-port)	1	50	Stable power for OAK-D + mics
Env sensors	BME280 + Ambient light sensor	1	15	Comfort + auto-dim
Presence (opt.)	60 GHz mmWave human-presence	1	30–45	Detect nearby in dark
Mechanical	Head shell + mounts (3D-print)	1	250–500	Brackets, trays, covers
Wiring/Misc	Cables, harness, standoffs, heat-shrink	1 set	100	Build kit

Production Build - Bill of Materials

Complete component list for premium build (no LiDAR)

Subsystem	Part / Model	Qty	Est. $	Notes
Compute	Jetson AGX Orin 64 GB Dev Kit	1	1,999	~275 TOPS; local RAG/Dream Mode
Storage	NVMe SSD 2 TB (PCIe 4.0+)	1	200	Transcripts, embeddings, snapshots
Vision	Luxonis OAK-D Pro (Wide)	1	399	Low-light depth; offload inference
Audio In	ReSpeaker Mic Array v2.0	1	64	Far-field + DoA
Audio Out	Compact powered speakers	1	30	TTS
Audio Fusion SW	Whisper + openSMILE + emotion model	—	SW	Pipeline (direction+tone+text)
Motion MCU	Teensy 4.1	1	30	Deterministic PWM + watchdog
Servo Control	PCA9685 or Dynamixel interface	1	15–60	Choose per actuator type
Actuators	Smart servos (Dynamixel XL-330/XW)	8–12	400–1,200	Smoother, feedback, safer
Displays	Front LCD ~11.6″ HDMI IPS	1	174	Face UI
	Rear status touch LCD ~7″	1	70	Relationship cards, logs
Presence	60 GHz mmWave sensor	1	30–45	Human presence/breathing
Env sensors	BME280 + Ambient light sensor	1	15	Comfort + auto-dim
Air quality	VOC + CO₂ module	1	20–60	Context + safety logging
Situational (opt.)	RPLIDAR A2 (2D 360°)	1	230	360° approach awareness
Power (servos)	5 V 20 A PSU (fused rail)	1	90	Isolated from Jetson
Networking	Powered USB 3.0 hub + Wi-Fi 6E dongle	1 each	50 + 60	Bandwidth + fast backhaul
Mechanical	Production head shell/brackets	1	400–800	Shielding, serviceability
Dream Mode	(nightly jobs; included in SW stack)	—	—	Summarize/prune/re-index

Config B Vision

OAK-D Pro (Wide) for robust low-light depth and onboard AI acceleration

Audio Fusion

ReSpeaker → VAD/DoA/SPL → Whisper ASR → openSMILE/emotion → fused event

Dream Mode

Nightly summarization, pruning, vector re-index for long-term relationship memory

Expansion Ready: Headers reserved for LiDAR and VOC/CO₂ sensors. Add them later without rewiring.

Microservices Architecture

Face UI

:8070

Front-facing display

Control Panel

:8090

Operator interface

Vision

:8085

Face recognition

Perception

:8086

Audio processing

TTS

:8087

Speech synthesis

Anim

:8089

Servo control

Relations-PG

:8092

Relationship DB

Dialog

:8093

Conversation AI

Dream

:8096

Memory consolidation

Telemetry

:8095

System metrics

Technology Stack

FastAPI

Backend

PostgreSQL

Database

Docker

Containers

Node.js

Frontend