13 KiB
Architecture High-Level Design: Surge
Executive Summary
This Architecture High-Level Design establishes the technical foundation for Surge, a mobile application enabling users to discover and complete structured self-improvement challenges. Building upon the Feature Definition's prioritization of the daily check-in experience and streak psychology, this architecture emphasizes responsive local-first interactions, reliable data synchronization, and a foundation that supports future social features without over-engineering the MVP.
The design balances immediate delivery needs with strategic positioning for Phase 2 social capabilities, ensuring the core tracking experience remains fast and satisfying even under poor network conditions.
System Architecture Overview
graph TB
subgraph "Client Layer"
MA[Mobile App<br/>React Native]
LS[(Local Storage<br/>SQLite/Realm)]
end
subgraph "API Layer"
AG[API Gateway<br/>AWS API Gateway]
AUTH[Auth Service<br/>Firebase Auth]
end
subgraph "Application Layer"
US[User Service]
CS[Challenge Service]
PS[Progress Service]
end
subgraph "Data Layer"
PG[(PostgreSQL<br/>Primary DB)]
RC[(Redis<br/>Cache/Sessions)]
end
subgraph "Supporting Services"
PN[Push Notifications<br/>Firebase FCM]
AN[Analytics<br/>Mixpanel/Amplitude]
end
MA <--> LS
MA <--> AG
AG <--> AUTH
AG <--> US
AG <--> CS
AG <--> PS
US <--> PG
CS <--> PG
PS <--> PG
PS <--> RC
US <--> PN
MA --> AN
Technology Stack
Mobile Application
| Layer | Technology | Rationale |
|---|---|---|
| Framework | React Native | Cross-platform efficiency, strong ecosystem, team familiarity |
| State Management | Zustand | Lightweight, minimal boilerplate, excellent for offline-first patterns |
| Local Database | WatermelonDB | Optimized for React Native, built-in sync capabilities, lazy loading |
| Navigation | React Navigation | Industry standard, deep linking support |
| UI Components | Custom + React Native Reanimated | Bold, high-energy design requires custom animations |
Backend Services
| Component | Technology | Rationale |
|---|---|---|
| Runtime | Node.js with TypeScript | Type safety, shared models with frontend, async performance |
| Framework | Fastify | High performance, schema validation, lower overhead than Express |
| Database | PostgreSQL 15 | ACID compliance, JSON support, proven reliability for user data |
| Cache | Redis | Session management, streak calculations, leaderboard preparation |
| Authentication | Firebase Auth | Rapid implementation, social login support, secure token management |
Infrastructure
| Component | Technology | Rationale |
|---|---|---|
| Cloud Provider | AWS | Comprehensive services, reliable, cost-effective at scale |
| Container Orchestration | AWS ECS Fargate | Serverless containers, reduced operational overhead |
| API Management | AWS API Gateway | Rate limiting, request validation, easy Lambda integration if needed |
| CDN | CloudFront | Challenge asset delivery, global edge caching |
| CI/CD | GitHub Actions | Integrated with codebase, cost-effective, extensive marketplace |
Core Component Design
Challenge Service
Manages the challenge library and challenge definitions. As noted in Feature Definition, launching with 5 well-documented challenges is prioritized over quantity.
classDiagram
class Challenge {
+uuid id
+string name
+string description
+int duration_days
+DailyRequirement[] requirements
+DifficultyLevel difficulty
+string[] tags
+boolean is_active
}
class DailyRequirement {
+uuid id
+string title
+string description
+RequirementType type
+json validation_rules
+int sort_order
}
class RequirementType {
<<enumeration>>
BOOLEAN
NUMERIC
DURATION
PHOTO_PROOF
}
Challenge "1" --> "*" DailyRequirement
DailyRequirement --> RequirementType
Design Decisions:
- Challenge definitions are admin-managed, cached aggressively on device
- Requirement types support future extensibility (photo proof for social features)
- Validation rules stored as JSON for flexible challenge-specific logic
Progress Service
The heart of the user experience. Following Feature Definition's emphasis on making check-ins "fast, satisfying, and visually rewarding," this service prioritizes write performance and immediate feedback.
classDiagram
class UserChallenge {
+uuid id
+uuid user_id
+uuid challenge_id
+date start_date
+ChallengeStatus status
+int current_streak
+int longest_streak
+int attempt_number
}
class DailyProgress {
+uuid id
+uuid user_challenge_id
+date progress_date
+int day_number
+boolean is_complete
+timestamp completed_at
}
class TaskCompletion {
+uuid id
+uuid daily_progress_id
+uuid requirement_id
+json completion_data
+timestamp completed_at
}
class ChallengeStatus {
<<enumeration>>
ACTIVE
COMPLETED
FAILED
PAUSED
}
UserChallenge "1" --> "*" DailyProgress
DailyProgress "1" --> "*" TaskCompletion
UserChallenge --> ChallengeStatus
Streak Calculation Strategy:
- Current streak calculated on write (not read) for instant UI updates
- Redis maintains hot streak data for active users
- Nightly batch job reconciles any sync discrepancies
attempt_numbertracks restarts, supporting Feature Definition's "encouraging restart experience"
User Service
Handles authentication, profile management, and notification preferences.
sequenceDiagram
participant App
participant Firebase
participant API
participant DB
App->>Firebase: Social Login (Google/Apple)
Firebase-->>App: ID Token
App->>API: POST /auth/verify
API->>Firebase: Verify Token
Firebase-->>API: User Claims
API->>DB: Upsert User
DB-->>API: User Record
API-->>App: JWT + User Profile
App->>App: Store JWT Securely
Offline-First Architecture
Given that daily check-ins are the core interaction, the app must function reliably regardless of network conditions.
graph LR
subgraph "User Action"
A[Complete Task]
end
subgraph "Local First"
B[Write to Local DB]
C[Update UI Immediately]
D[Queue Sync Operation]
end
subgraph "Background Sync"
E{Network Available?}
F[Sync to Server]
G[Retry with Backoff]
H[Conflict Resolution]
end
A --> B
B --> C
B --> D
D --> E
E -->|Yes| F
E -->|No| G
F --> H
G -.->|Retry| E
Sync Strategy:
- All progress writes happen locally first, providing instant feedback
- Background sync with exponential backoff (5s, 15s, 45s, 2min max)
- Last-write-wins conflict resolution (acceptable for single-user MVP)
- Server timestamp used as source of truth for streak calculations
- Sync queue persisted to survive app termination
Data Architecture
PostgreSQL Schema (Simplified)
-- Core tables with indexes optimized for common queries
users (id, firebase_uid, email, display_name, created_at, updated_at)
INDEX: firebase_uid (unique), email
challenges (id, name, slug, duration_days, difficulty, is_active, metadata)
INDEX: slug (unique), is_active
challenge_requirements (id, challenge_id, title, type, validation_rules, sort_order)
INDEX: challenge_id
user_challenges (id, user_id, challenge_id, start_date, status, current_streak, attempt_number)
INDEX: (user_id, status), (user_id, challenge_id)
daily_progress (id, user_challenge_id, progress_date, day_number, is_complete, completed_at)
INDEX: (user_challenge_id, progress_date) UNIQUE
task_completions (id, daily_progress_id, requirement_id, completion_data, completed_at)
INDEX: daily_progress_id
Redis Data Structures
# Active user streaks (hot data)
streak:{user_id}:{challenge_id} -> { current: 45, longest: 45, last_date: "2024-01-15" }
TTL: 7 days (refreshed on activity)
# Session management
session:{token} -> { user_id, expires_at, device_id }
TTL: 30 days
# Future: Leaderboard preparation
leaderboard:{challenge_id}:daily -> Sorted Set (user_id -> streak)
API Design
RESTful API with consistent patterns. Key endpoints:
| Endpoint | Method | Purpose |
|---|---|---|
/challenges |
GET | List active challenges (cached) |
/challenges/{id} |
GET | Challenge details with requirements |
/me/challenges |
GET | User's active and past challenges |
/me/challenges |
POST | Start a new challenge |
/me/challenges/{id}/progress |
GET | Full progress for a challenge |
/me/challenges/{id}/today |
GET | Today's tasks and completion status |
/me/challenges/{id}/today |
PATCH | Update task completions |
/sync |
POST | Batch sync for offline changes |
Response Time Targets:
- Challenge library: <100ms (CDN cached)
- Today's progress: <150ms (Redis + DB)
- Task completion: <200ms (write path)
Security Architecture
graph TB
subgraph "Client Security"
A[Secure Token Storage<br/>iOS Keychain / Android Keystore]
B[Certificate Pinning]
C[Biometric Lock Option]
end
subgraph "Transport Security"
D[TLS 1.3]
E[API Gateway Rate Limiting]
end
subgraph "Backend Security"
F[JWT Validation]
G[Row-Level Security]
H[Input Validation<br/>Fastify Schemas]
end
A --> D
B --> D
D --> E
E --> F
F --> G
F --> H
Key Security Measures:
- Firebase Auth handles credential security
- Short-lived JWTs (1 hour) with refresh token rotation
- All user data queries filtered by authenticated user_id
- Rate limiting: 100 requests/minute per user
- Input validation at API gateway and service layers
Scalability Considerations
MVP Scale (10K users):
- Single PostgreSQL instance (db.t3.medium)
- Single Redis instance (cache.t3.micro)
- 2 ECS tasks behind ALB
- Estimated cost: ~$150/month
Growth Path (100K+ users):
- PostgreSQL read replicas for challenge library queries
- Redis cluster for streak calculations
- Horizontal scaling of stateless API services
- Consider Aurora Serverless for variable load
Social Features Preparation:
- User ID foreign keys in place for future friend relationships
- Redis sorted sets ready for leaderboard implementation
- Event-driven architecture allows adding notification triggers
Deployment Architecture
graph TB
subgraph "Production"
ALB[Application Load Balancer]
ECS1[ECS Task 1]
ECS2[ECS Task 2]
RDS[(RDS PostgreSQL)]
REDIS[(ElastiCache Redis)]
end
subgraph "CI/CD"
GH[GitHub Actions]
ECR[ECR Registry]
end
subgraph "Monitoring"
CW[CloudWatch]
SENTRY[Sentry]
end
GH --> ECR
ECR --> ECS1
ECR --> ECS2
ALB --> ECS1
ALB --> ECS2
ECS1 --> RDS
ECS2 --> RDS
ECS1 --> REDIS
ECS2 --> REDIS
ECS1 --> CW
ECS1 --> SENTRY
Deployment Strategy:
- Blue/green deployments via ECS
- Database migrations run as pre-deployment task
- Feature flags for gradual rollouts
- Automated rollback on health check failures
Recommendations
- Invest in Local-First Infrastructure: The offline-first pattern is critical for the daily check-in experience. Allocate adequate time for sync logic and conflict handling.
- Implement Comprehensive Analytics Early: As noted in Feature Definition, event tracking from day one informs Phase 2 social features. Instrument all user interactions.
- Design APIs for Mobile Efficiency: Combine related data in single responses (today's tasks + streak + progress) to minimize round trips.
- Plan for Streak Edge Cases: Timezone handling, daylight saving transitions, and missed-day scenarios need careful consideration in both client and server logic.
- Prepare Social Foundation Without Building It: Include user_id relationships and Redis structures that support leaderboards, but don't implement social features until validated.
Technical Risks & Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| Offline sync conflicts | Data loss, user frustration | Comprehensive conflict resolution, sync status UI |
| Streak calculation errors | Core feature broken | Server-side validation, reconciliation jobs, audit logs |
| Firebase Auth dependency | Authentication outage | Graceful degradation, cached sessions |
| React Native performance | Poor animation experience | Native driver animations, performance profiling |
Next Steps
- Set up infrastructure-as-code (Terraform/CDK) for reproducible environments
- Implement authentication flow and user service
- Build challenge service with seed data for 5 launch challenges
- Develop progress service with offline-first client integration
- Establish CI/CD pipeline with staging environment