# Architecture High-Level Design: Surge ## Executive Summary This Architecture High-Level Design establishes the technical foundation for Surge, a mobile application enabling users to discover and complete structured self-improvement challenges. Building upon the Feature Definition's prioritization of the daily check-in experience and streak psychology, this architecture emphasizes responsive local-first interactions, reliable data synchronization, and a foundation that supports future social features without over-engineering the MVP. The design balances immediate delivery needs with strategic positioning for Phase 2 social capabilities, ensuring the core tracking experience remains fast and satisfying even under poor network conditions. *** ## System Architecture Overview ```mermaid graph TB subgraph "Client Layer" MA[Mobile App
React Native] LS[(Local Storage
SQLite/Realm)] end subgraph "API Layer" AG[API Gateway
AWS API Gateway] AUTH[Auth Service
Firebase Auth] end subgraph "Application Layer" US[User Service] CS[Challenge Service] PS[Progress Service] end subgraph "Data Layer" PG[(PostgreSQL
Primary DB)] RC[(Redis
Cache/Sessions)] end subgraph "Supporting Services" PN[Push Notifications
Firebase FCM] AN[Analytics
Mixpanel/Amplitude] end MA <--> LS MA <--> AG AG <--> AUTH AG <--> US AG <--> CS AG <--> PS US <--> PG CS <--> PG PS <--> PG PS <--> RC US <--> PN MA --> AN ``` *** ## Technology Stack ### Mobile Application | Layer | Technology | Rationale | | ----- | ---------- | --------- | | Framework | React Native | Cross-platform efficiency, strong ecosystem, team familiarity | | State Management | Zustand | Lightweight, minimal boilerplate, excellent for offline-first patterns | | Local Database | WatermelonDB | Optimized for React Native, built-in sync capabilities, lazy loading | | Navigation | React Navigation | Industry standard, deep linking support | | UI Components | Custom + React Native Reanimated | Bold, high-energy design requires custom animations | ### Backend Services | Component | Technology | Rationale | | --------- | ---------- | --------- | | Runtime | Node.js with TypeScript | Type safety, shared models with frontend, async performance | | Framework | Fastify | High performance, schema validation, lower overhead than Express | | Database | PostgreSQL 15 | ACID compliance, JSON support, proven reliability for user data | | Cache | Redis | Session management, streak calculations, leaderboard preparation | | Authentication | Firebase Auth | Rapid implementation, social login support, secure token management | ### Infrastructure | Component | Technology | Rationale | | --------- | ---------- | --------- | | Cloud Provider | AWS | Comprehensive services, reliable, cost-effective at scale | | Container Orchestration | AWS ECS Fargate | Serverless containers, reduced operational overhead | | API Management | AWS API Gateway | Rate limiting, request validation, easy Lambda integration if needed | | CDN | CloudFront | Challenge asset delivery, global edge caching | | CI/CD | GitHub Actions | Integrated with codebase, cost-effective, extensive marketplace | *** ## Core Component Design ### Challenge Service Manages the challenge library and challenge definitions. As noted in Feature Definition, launching with 5 well-documented challenges is prioritized over quantity. ```mermaid classDiagram class Challenge { +uuid id +string name +string description +int duration_days +DailyRequirement[] requirements +DifficultyLevel difficulty +string[] tags +boolean is_active } class DailyRequirement { +uuid id +string title +string description +RequirementType type +json validation_rules +int sort_order } class RequirementType { <> BOOLEAN NUMERIC DURATION PHOTO_PROOF } Challenge "1" --> "*" DailyRequirement DailyRequirement --> RequirementType ``` **Design Decisions:** * Challenge definitions are admin-managed, cached aggressively on device * Requirement types support future extensibility (photo proof for social features) * Validation rules stored as JSON for flexible challenge-specific logic ### Progress Service The heart of the user experience. Following Feature Definition's emphasis on making check-ins "fast, satisfying, and visually rewarding," this service prioritizes write performance and immediate feedback. ```mermaid classDiagram class UserChallenge { +uuid id +uuid user_id +uuid challenge_id +date start_date +ChallengeStatus status +int current_streak +int longest_streak +int attempt_number } class DailyProgress { +uuid id +uuid user_challenge_id +date progress_date +int day_number +boolean is_complete +timestamp completed_at } class TaskCompletion { +uuid id +uuid daily_progress_id +uuid requirement_id +json completion_data +timestamp completed_at } class ChallengeStatus { <> ACTIVE COMPLETED FAILED PAUSED } UserChallenge "1" --> "*" DailyProgress DailyProgress "1" --> "*" TaskCompletion UserChallenge --> ChallengeStatus ``` **Streak Calculation Strategy:** * Current streak calculated on write (not read) for instant UI updates * Redis maintains hot streak data for active users * Nightly batch job reconciles any sync discrepancies * `attempt_number` tracks restarts, supporting Feature Definition's "encouraging restart experience" ### User Service Handles authentication, profile management, and notification preferences. ```mermaid sequenceDiagram participant App participant Firebase participant API participant DB App->>Firebase: Social Login (Google/Apple) Firebase-->>App: ID Token App->>API: POST /auth/verify API->>Firebase: Verify Token Firebase-->>API: User Claims API->>DB: Upsert User DB-->>API: User Record API-->>App: JWT + User Profile App->>App: Store JWT Securely ``` *** ## Offline-First Architecture Given that daily check-ins are the core interaction, the app must function reliably regardless of network conditions. ```mermaid graph LR subgraph "User Action" A[Complete Task] end subgraph "Local First" B[Write to Local DB] C[Update UI Immediately] D[Queue Sync Operation] end subgraph "Background Sync" E{Network Available?} F[Sync to Server] G[Retry with Backoff] H[Conflict Resolution] end A --> B B --> C B --> D D --> E E -->|Yes| F E -->|No| G F --> H G -.->|Retry| E ``` **Sync Strategy:** * All progress writes happen locally first, providing instant feedback * Background sync with exponential backoff (5s, 15s, 45s, 2min max) * Last-write-wins conflict resolution (acceptable for single-user MVP) * Server timestamp used as source of truth for streak calculations * Sync queue persisted to survive app termination *** ## Data Architecture ### PostgreSQL Schema (Simplified) ```sql -- Core tables with indexes optimized for common queries users (id, firebase_uid, email, display_name, created_at, updated_at) INDEX: firebase_uid (unique), email challenges (id, name, slug, duration_days, difficulty, is_active, metadata) INDEX: slug (unique), is_active challenge_requirements (id, challenge_id, title, type, validation_rules, sort_order) INDEX: challenge_id user_challenges (id, user_id, challenge_id, start_date, status, current_streak, attempt_number) INDEX: (user_id, status), (user_id, challenge_id) daily_progress (id, user_challenge_id, progress_date, day_number, is_complete, completed_at) INDEX: (user_challenge_id, progress_date) UNIQUE task_completions (id, daily_progress_id, requirement_id, completion_data, completed_at) INDEX: daily_progress_id ``` ### Redis Data Structures ``` # Active user streaks (hot data) streak:{user_id}:{challenge_id} -> { current: 45, longest: 45, last_date: "2024-01-15" } TTL: 7 days (refreshed on activity) # Session management session:{token} -> { user_id, expires_at, device_id } TTL: 30 days # Future: Leaderboard preparation leaderboard:{challenge_id}:daily -> Sorted Set (user_id -> streak) ``` *** ## API Design RESTful API with consistent patterns. Key endpoints: | Endpoint | Method | Purpose | | -------- | ------ | ------- | | `/challenges` | GET | List active challenges (cached) | | `/challenges/{id}` | GET | Challenge details with requirements | | `/me/challenges` | GET | User's active and past challenges | | `/me/challenges` | POST | Start a new challenge | | `/me/challenges/{id}/progress` | GET | Full progress for a challenge | | `/me/challenges/{id}/today` | GET | Today's tasks and completion status | | `/me/challenges/{id}/today` | PATCH | Update task completions | | `/sync` | POST | Batch sync for offline changes | **Response Time Targets:** * Challenge library: <100ms (CDN cached) * Today's progress: <150ms (Redis + DB) * Task completion: <200ms (write path) *** ## Security Architecture ```mermaid graph TB subgraph "Client Security" A[Secure Token Storage
iOS Keychain / Android Keystore] B[Certificate Pinning] C[Biometric Lock Option] end subgraph "Transport Security" D[TLS 1.3] E[API Gateway Rate Limiting] end subgraph "Backend Security" F[JWT Validation] G[Row-Level Security] H[Input Validation
Fastify Schemas] end A --> D B --> D D --> E E --> F F --> G F --> H ``` **Key Security Measures:** * Firebase Auth handles credential security * Short-lived JWTs (1 hour) with refresh token rotation * All user data queries filtered by authenticated user\_id * Rate limiting: 100 requests/minute per user * Input validation at API gateway and service layers *** ## Scalability Considerations **MVP Scale (10K users):** * Single PostgreSQL instance (db.t3.medium) * Single Redis instance (cache.t3.micro) * 2 ECS tasks behind ALB * Estimated cost: \~$150/month **Growth Path (100K+ users):** * PostgreSQL read replicas for challenge library queries * Redis cluster for streak calculations * Horizontal scaling of stateless API services * Consider Aurora Serverless for variable load **Social Features Preparation:** * User ID foreign keys in place for future friend relationships * Redis sorted sets ready for leaderboard implementation * Event-driven architecture allows adding notification triggers *** ## Deployment Architecture ```mermaid graph TB subgraph "Production" ALB[Application Load Balancer] ECS1[ECS Task 1] ECS2[ECS Task 2] RDS[(RDS PostgreSQL)] REDIS[(ElastiCache Redis)] end subgraph "CI/CD" GH[GitHub Actions] ECR[ECR Registry] end subgraph "Monitoring" CW[CloudWatch] SENTRY[Sentry] end GH --> ECR ECR --> ECS1 ECR --> ECS2 ALB --> ECS1 ALB --> ECS2 ECS1 --> RDS ECS2 --> RDS ECS1 --> REDIS ECS2 --> REDIS ECS1 --> CW ECS1 --> SENTRY ``` **Deployment Strategy:** * Blue/green deployments via ECS * Database migrations run as pre-deployment task * Feature flags for gradual rollouts * Automated rollback on health check failures *** ## Recommendations 1. **Invest in Local-First Infrastructure**: The offline-first pattern is critical for the daily check-in experience. Allocate adequate time for sync logic and conflict handling. 2. **Implement Comprehensive Analytics Early**: As noted in Feature Definition, event tracking from day one informs Phase 2 social features. Instrument all user interactions. 3. **Design APIs for Mobile Efficiency**: Combine related data in single responses (today's tasks + streak + progress) to minimize round trips. 4. **Plan for Streak Edge Cases**: Timezone handling, daylight saving transitions, and missed-day scenarios need careful consideration in both client and server logic. 5. **Prepare Social Foundation Without Building It**: Include user\_id relationships and Redis structures that support leaderboards, but don't implement social features until validated. *** ## Technical Risks & Mitigations | Risk | Impact | Mitigation | | ---- | ------ | ---------- | | Offline sync conflicts | Data loss, user frustration | Comprehensive conflict resolution, sync status UI | | Streak calculation errors | Core feature broken | Server-side validation, reconciliation jobs, audit logs | | Firebase Auth dependency | Authentication outage | Graceful degradation, cached sessions | | React Native performance | Poor animation experience | Native driver animations, performance profiling | *** ## Next Steps 1. Set up infrastructure-as-code (Terraform/CDK) for reproducible environments 2. Implement authentication flow and user service 3. Build challenge service with seed data for 5 launch challenges 4. Develop progress service with offline-first client integration 5. Establish CI/CD pipeline with staging environment