surge/docs/architecture.md

13 KiB

Architecture High-Level Design: Surge

Executive Summary

This Architecture High-Level Design establishes the technical foundation for Surge, a mobile application enabling users to discover and complete structured self-improvement challenges. Building upon the Feature Definition's prioritization of the daily check-in experience and streak psychology, this architecture emphasizes responsive local-first interactions, reliable data synchronization, and a foundation that supports future social features without over-engineering the MVP.

The design balances immediate delivery needs with strategic positioning for Phase 2 social capabilities, ensuring the core tracking experience remains fast and satisfying even under poor network conditions.


System Architecture Overview

graph TB
    subgraph "Client Layer"
        MA[Mobile App<br/>React Native]
        LS[(Local Storage<br/>SQLite/Realm)]
    end
    
    subgraph "API Layer"
        AG[API Gateway<br/>AWS API Gateway]
        AUTH[Auth Service<br/>Firebase Auth]
    end
    
    subgraph "Application Layer"
        US[User Service]
        CS[Challenge Service]
        PS[Progress Service]
    end
    
    subgraph "Data Layer"
        PG[(PostgreSQL<br/>Primary DB)]
        RC[(Redis<br/>Cache/Sessions)]
    end
    
    subgraph "Supporting Services"
        PN[Push Notifications<br/>Firebase FCM]
        AN[Analytics<br/>Mixpanel/Amplitude]
    end
    
    MA <--> LS
    MA <--> AG
    AG <--> AUTH
    AG <--> US
    AG <--> CS
    AG <--> PS
    US <--> PG
    CS <--> PG
    PS <--> PG
    PS <--> RC
    US <--> PN
    MA --> AN

Technology Stack

Mobile Application

Layer Technology Rationale
Framework React Native Cross-platform efficiency, strong ecosystem, team familiarity
State Management Zustand Lightweight, minimal boilerplate, excellent for offline-first patterns
Local Database WatermelonDB Optimized for React Native, built-in sync capabilities, lazy loading
Navigation React Navigation Industry standard, deep linking support
UI Components Custom + React Native Reanimated Bold, high-energy design requires custom animations

Backend Services

Component Technology Rationale
Runtime Node.js with TypeScript Type safety, shared models with frontend, async performance
Framework Fastify High performance, schema validation, lower overhead than Express
Database PostgreSQL 15 ACID compliance, JSON support, proven reliability for user data
Cache Redis Session management, streak calculations, leaderboard preparation
Authentication Firebase Auth Rapid implementation, social login support, secure token management

Infrastructure

Component Technology Rationale
Cloud Provider AWS Comprehensive services, reliable, cost-effective at scale
Container Orchestration AWS ECS Fargate Serverless containers, reduced operational overhead
API Management AWS API Gateway Rate limiting, request validation, easy Lambda integration if needed
CDN CloudFront Challenge asset delivery, global edge caching
CI/CD GitHub Actions Integrated with codebase, cost-effective, extensive marketplace

Core Component Design

Challenge Service

Manages the challenge library and challenge definitions. As noted in Feature Definition, launching with 5 well-documented challenges is prioritized over quantity.

classDiagram
    class Challenge {
        +uuid id
        +string name
        +string description
        +int duration_days
        +DailyRequirement[] requirements
        +DifficultyLevel difficulty
        +string[] tags
        +boolean is_active
    }
    
    class DailyRequirement {
        +uuid id
        +string title
        +string description
        +RequirementType type
        +json validation_rules
        +int sort_order
    }
    
    class RequirementType {
        <<enumeration>>
        BOOLEAN
        NUMERIC
        DURATION
        PHOTO_PROOF
    }
    
    Challenge "1" --> "*" DailyRequirement
    DailyRequirement --> RequirementType

Design Decisions:

  • Challenge definitions are admin-managed, cached aggressively on device
  • Requirement types support future extensibility (photo proof for social features)
  • Validation rules stored as JSON for flexible challenge-specific logic

Progress Service

The heart of the user experience. Following Feature Definition's emphasis on making check-ins "fast, satisfying, and visually rewarding," this service prioritizes write performance and immediate feedback.

classDiagram
    class UserChallenge {
        +uuid id
        +uuid user_id
        +uuid challenge_id
        +date start_date
        +ChallengeStatus status
        +int current_streak
        +int longest_streak
        +int attempt_number
    }
    
    class DailyProgress {
        +uuid id
        +uuid user_challenge_id
        +date progress_date
        +int day_number
        +boolean is_complete
        +timestamp completed_at
    }
    
    class TaskCompletion {
        +uuid id
        +uuid daily_progress_id
        +uuid requirement_id
        +json completion_data
        +timestamp completed_at
    }
    
    class ChallengeStatus {
        <<enumeration>>
        ACTIVE
        COMPLETED
        FAILED
        PAUSED
    }
    
    UserChallenge "1" --> "*" DailyProgress
    DailyProgress "1" --> "*" TaskCompletion
    UserChallenge --> ChallengeStatus

Streak Calculation Strategy:

  • Current streak calculated on write (not read) for instant UI updates
  • Redis maintains hot streak data for active users
  • Nightly batch job reconciles any sync discrepancies
  • attempt_number tracks restarts, supporting Feature Definition's "encouraging restart experience"

User Service

Handles authentication, profile management, and notification preferences.

sequenceDiagram
    participant App
    participant Firebase
    participant API
    participant DB
    
    App->>Firebase: Social Login (Google/Apple)
    Firebase-->>App: ID Token
    App->>API: POST /auth/verify
    API->>Firebase: Verify Token
    Firebase-->>API: User Claims
    API->>DB: Upsert User
    DB-->>API: User Record
    API-->>App: JWT + User Profile
    App->>App: Store JWT Securely

Offline-First Architecture

Given that daily check-ins are the core interaction, the app must function reliably regardless of network conditions.

graph LR
    subgraph "User Action"
        A[Complete Task]
    end
    
    subgraph "Local First"
        B[Write to Local DB]
        C[Update UI Immediately]
        D[Queue Sync Operation]
    end
    
    subgraph "Background Sync"
        E{Network Available?}
        F[Sync to Server]
        G[Retry with Backoff]
        H[Conflict Resolution]
    end
    
    A --> B
    B --> C
    B --> D
    D --> E
    E -->|Yes| F
    E -->|No| G
    F --> H
    G -.->|Retry| E

Sync Strategy:

  • All progress writes happen locally first, providing instant feedback
  • Background sync with exponential backoff (5s, 15s, 45s, 2min max)
  • Last-write-wins conflict resolution (acceptable for single-user MVP)
  • Server timestamp used as source of truth for streak calculations
  • Sync queue persisted to survive app termination

Data Architecture

PostgreSQL Schema (Simplified)

-- Core tables with indexes optimized for common queries
users (id, firebase_uid, email, display_name, created_at, updated_at)
  INDEX: firebase_uid (unique), email

challenges (id, name, slug, duration_days, difficulty, is_active, metadata)
  INDEX: slug (unique), is_active

challenge_requirements (id, challenge_id, title, type, validation_rules, sort_order)
  INDEX: challenge_id

user_challenges (id, user_id, challenge_id, start_date, status, current_streak, attempt_number)
  INDEX: (user_id, status), (user_id, challenge_id)

daily_progress (id, user_challenge_id, progress_date, day_number, is_complete, completed_at)
  INDEX: (user_challenge_id, progress_date) UNIQUE

task_completions (id, daily_progress_id, requirement_id, completion_data, completed_at)
  INDEX: daily_progress_id

Redis Data Structures

# Active user streaks (hot data)
streak:{user_id}:{challenge_id} -> { current: 45, longest: 45, last_date: "2024-01-15" }
TTL: 7 days (refreshed on activity)

# Session management
session:{token} -> { user_id, expires_at, device_id }
TTL: 30 days

# Future: Leaderboard preparation
leaderboard:{challenge_id}:daily -> Sorted Set (user_id -> streak)

API Design

RESTful API with consistent patterns. Key endpoints:

Endpoint Method Purpose
/challenges GET List active challenges (cached)
/challenges/{id} GET Challenge details with requirements
/me/challenges GET User's active and past challenges
/me/challenges POST Start a new challenge
/me/challenges/{id}/progress GET Full progress for a challenge
/me/challenges/{id}/today GET Today's tasks and completion status
/me/challenges/{id}/today PATCH Update task completions
/sync POST Batch sync for offline changes

Response Time Targets:

  • Challenge library: <100ms (CDN cached)
  • Today's progress: <150ms (Redis + DB)
  • Task completion: <200ms (write path)

Security Architecture

graph TB
    subgraph "Client Security"
        A[Secure Token Storage<br/>iOS Keychain / Android Keystore]
        B[Certificate Pinning]
        C[Biometric Lock Option]
    end
    
    subgraph "Transport Security"
        D[TLS 1.3]
        E[API Gateway Rate Limiting]
    end
    
    subgraph "Backend Security"
        F[JWT Validation]
        G[Row-Level Security]
        H[Input Validation<br/>Fastify Schemas]
    end
    
    A --> D
    B --> D
    D --> E
    E --> F
    F --> G
    F --> H

Key Security Measures:

  • Firebase Auth handles credential security
  • Short-lived JWTs (1 hour) with refresh token rotation
  • All user data queries filtered by authenticated user_id
  • Rate limiting: 100 requests/minute per user
  • Input validation at API gateway and service layers

Scalability Considerations

MVP Scale (10K users):

  • Single PostgreSQL instance (db.t3.medium)
  • Single Redis instance (cache.t3.micro)
  • 2 ECS tasks behind ALB
  • Estimated cost: ~$150/month

Growth Path (100K+ users):

  • PostgreSQL read replicas for challenge library queries
  • Redis cluster for streak calculations
  • Horizontal scaling of stateless API services
  • Consider Aurora Serverless for variable load

Social Features Preparation:

  • User ID foreign keys in place for future friend relationships
  • Redis sorted sets ready for leaderboard implementation
  • Event-driven architecture allows adding notification triggers

Deployment Architecture

graph TB
    subgraph "Production"
        ALB[Application Load Balancer]
        ECS1[ECS Task 1]
        ECS2[ECS Task 2]
        RDS[(RDS PostgreSQL)]
        REDIS[(ElastiCache Redis)]
    end
    
    subgraph "CI/CD"
        GH[GitHub Actions]
        ECR[ECR Registry]
    end
    
    subgraph "Monitoring"
        CW[CloudWatch]
        SENTRY[Sentry]
    end
    
    GH --> ECR
    ECR --> ECS1
    ECR --> ECS2
    ALB --> ECS1
    ALB --> ECS2
    ECS1 --> RDS
    ECS2 --> RDS
    ECS1 --> REDIS
    ECS2 --> REDIS
    ECS1 --> CW
    ECS1 --> SENTRY

Deployment Strategy:

  • Blue/green deployments via ECS
  • Database migrations run as pre-deployment task
  • Feature flags for gradual rollouts
  • Automated rollback on health check failures

Recommendations

  1. Invest in Local-First Infrastructure: The offline-first pattern is critical for the daily check-in experience. Allocate adequate time for sync logic and conflict handling.
  2. Implement Comprehensive Analytics Early: As noted in Feature Definition, event tracking from day one informs Phase 2 social features. Instrument all user interactions.
  3. Design APIs for Mobile Efficiency: Combine related data in single responses (today's tasks + streak + progress) to minimize round trips.
  4. Plan for Streak Edge Cases: Timezone handling, daylight saving transitions, and missed-day scenarios need careful consideration in both client and server logic.
  5. Prepare Social Foundation Without Building It: Include user_id relationships and Redis structures that support leaderboards, but don't implement social features until validated.

Technical Risks & Mitigations

Risk Impact Mitigation
Offline sync conflicts Data loss, user frustration Comprehensive conflict resolution, sync status UI
Streak calculation errors Core feature broken Server-side validation, reconciliation jobs, audit logs
Firebase Auth dependency Authentication outage Graceful degradation, cached sessions
React Native performance Poor animation experience Native driver animations, performance profiling

Next Steps

  1. Set up infrastructure-as-code (Terraform/CDK) for reproducible environments
  2. Implement authentication flow and user service
  3. Build challenge service with seed data for 5 launch challenges
  4. Develop progress service with offline-first client integration
  5. Establish CI/CD pipeline with staging environment