457 lines
13 KiB
Markdown
457 lines
13 KiB
Markdown
# Architecture High-Level Design: Surge
|
|
|
|
## Executive Summary
|
|
|
|
This Architecture High-Level Design establishes the technical foundation for Surge, a mobile application enabling users to discover and complete structured self-improvement challenges. Building upon the Feature Definition's prioritization of the daily check-in experience and streak psychology, this architecture emphasizes responsive local-first interactions, reliable data synchronization, and a foundation that supports future social features without over-engineering the MVP.
|
|
|
|
The design balances immediate delivery needs with strategic positioning for Phase 2 social capabilities, ensuring the core tracking experience remains fast and satisfying even under poor network conditions.
|
|
|
|
***
|
|
|
|
## System Architecture Overview
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Client Layer"
|
|
MA[Mobile App<br/>React Native]
|
|
LS[(Local Storage<br/>SQLite/Realm)]
|
|
end
|
|
|
|
subgraph "API Layer"
|
|
AG[API Gateway<br/>AWS API Gateway]
|
|
AUTH[Auth Service<br/>Firebase Auth]
|
|
end
|
|
|
|
subgraph "Application Layer"
|
|
US[User Service]
|
|
CS[Challenge Service]
|
|
PS[Progress Service]
|
|
end
|
|
|
|
subgraph "Data Layer"
|
|
PG[(PostgreSQL<br/>Primary DB)]
|
|
RC[(Redis<br/>Cache/Sessions)]
|
|
end
|
|
|
|
subgraph "Supporting Services"
|
|
PN[Push Notifications<br/>Firebase FCM]
|
|
AN[Analytics<br/>Mixpanel/Amplitude]
|
|
end
|
|
|
|
MA <--> LS
|
|
MA <--> AG
|
|
AG <--> AUTH
|
|
AG <--> US
|
|
AG <--> CS
|
|
AG <--> PS
|
|
US <--> PG
|
|
CS <--> PG
|
|
PS <--> PG
|
|
PS <--> RC
|
|
US <--> PN
|
|
MA --> AN
|
|
```
|
|
|
|
***
|
|
|
|
## Technology Stack
|
|
|
|
### Mobile Application
|
|
|
|
| Layer | Technology | Rationale |
|
|
| ----- | ---------- | --------- |
|
|
| Framework | React Native | Cross-platform efficiency, strong ecosystem, team familiarity |
|
|
| State Management | Zustand | Lightweight, minimal boilerplate, excellent for offline-first patterns |
|
|
| Local Database | WatermelonDB | Optimized for React Native, built-in sync capabilities, lazy loading |
|
|
| Navigation | React Navigation | Industry standard, deep linking support |
|
|
| UI Components | Custom + React Native Reanimated | Bold, high-energy design requires custom animations |
|
|
|
|
### Backend Services
|
|
|
|
| Component | Technology | Rationale |
|
|
| --------- | ---------- | --------- |
|
|
| Runtime | Node.js with TypeScript | Type safety, shared models with frontend, async performance |
|
|
| Framework | Fastify | High performance, schema validation, lower overhead than Express |
|
|
| Database | PostgreSQL 15 | ACID compliance, JSON support, proven reliability for user data |
|
|
| Cache | Redis | Session management, streak calculations, leaderboard preparation |
|
|
| Authentication | Firebase Auth | Rapid implementation, social login support, secure token management |
|
|
|
|
### Infrastructure
|
|
|
|
| Component | Technology | Rationale |
|
|
| --------- | ---------- | --------- |
|
|
| Cloud Provider | AWS | Comprehensive services, reliable, cost-effective at scale |
|
|
| Container Orchestration | AWS ECS Fargate | Serverless containers, reduced operational overhead |
|
|
| API Management | AWS API Gateway | Rate limiting, request validation, easy Lambda integration if needed |
|
|
| CDN | CloudFront | Challenge asset delivery, global edge caching |
|
|
| CI/CD | GitHub Actions | Integrated with codebase, cost-effective, extensive marketplace |
|
|
|
|
***
|
|
|
|
## Core Component Design
|
|
|
|
### Challenge Service
|
|
|
|
Manages the challenge library and challenge definitions. As noted in Feature Definition, launching with 5 well-documented challenges is prioritized over quantity.
|
|
|
|
```mermaid
|
|
classDiagram
|
|
class Challenge {
|
|
+uuid id
|
|
+string name
|
|
+string description
|
|
+int duration_days
|
|
+DailyRequirement[] requirements
|
|
+DifficultyLevel difficulty
|
|
+string[] tags
|
|
+boolean is_active
|
|
}
|
|
|
|
class DailyRequirement {
|
|
+uuid id
|
|
+string title
|
|
+string description
|
|
+RequirementType type
|
|
+json validation_rules
|
|
+int sort_order
|
|
}
|
|
|
|
class RequirementType {
|
|
<<enumeration>>
|
|
BOOLEAN
|
|
NUMERIC
|
|
DURATION
|
|
PHOTO_PROOF
|
|
}
|
|
|
|
Challenge "1" --> "*" DailyRequirement
|
|
DailyRequirement --> RequirementType
|
|
```
|
|
|
|
**Design Decisions:**
|
|
|
|
* Challenge definitions are admin-managed, cached aggressively on device
|
|
* Requirement types support future extensibility (photo proof for social features)
|
|
* Validation rules stored as JSON for flexible challenge-specific logic
|
|
|
|
### Progress Service
|
|
|
|
The heart of the user experience. Following Feature Definition's emphasis on making check-ins "fast, satisfying, and visually rewarding," this service prioritizes write performance and immediate feedback.
|
|
|
|
```mermaid
|
|
classDiagram
|
|
class UserChallenge {
|
|
+uuid id
|
|
+uuid user_id
|
|
+uuid challenge_id
|
|
+date start_date
|
|
+ChallengeStatus status
|
|
+int current_streak
|
|
+int longest_streak
|
|
+int attempt_number
|
|
}
|
|
|
|
class DailyProgress {
|
|
+uuid id
|
|
+uuid user_challenge_id
|
|
+date progress_date
|
|
+int day_number
|
|
+boolean is_complete
|
|
+timestamp completed_at
|
|
}
|
|
|
|
class TaskCompletion {
|
|
+uuid id
|
|
+uuid daily_progress_id
|
|
+uuid requirement_id
|
|
+json completion_data
|
|
+timestamp completed_at
|
|
}
|
|
|
|
class ChallengeStatus {
|
|
<<enumeration>>
|
|
ACTIVE
|
|
COMPLETED
|
|
FAILED
|
|
PAUSED
|
|
}
|
|
|
|
UserChallenge "1" --> "*" DailyProgress
|
|
DailyProgress "1" --> "*" TaskCompletion
|
|
UserChallenge --> ChallengeStatus
|
|
```
|
|
|
|
**Streak Calculation Strategy:**
|
|
|
|
* Current streak calculated on write (not read) for instant UI updates
|
|
* Redis maintains hot streak data for active users
|
|
* Nightly batch job reconciles any sync discrepancies
|
|
* `attempt_number` tracks restarts, supporting Feature Definition's "encouraging restart experience"
|
|
|
|
### User Service
|
|
|
|
Handles authentication, profile management, and notification preferences.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant App
|
|
participant Firebase
|
|
participant API
|
|
participant DB
|
|
|
|
App->>Firebase: Social Login (Google/Apple)
|
|
Firebase-->>App: ID Token
|
|
App->>API: POST /auth/verify
|
|
API->>Firebase: Verify Token
|
|
Firebase-->>API: User Claims
|
|
API->>DB: Upsert User
|
|
DB-->>API: User Record
|
|
API-->>App: JWT + User Profile
|
|
App->>App: Store JWT Securely
|
|
```
|
|
|
|
***
|
|
|
|
## Offline-First Architecture
|
|
|
|
Given that daily check-ins are the core interaction, the app must function reliably regardless of network conditions.
|
|
|
|
```mermaid
|
|
graph LR
|
|
subgraph "User Action"
|
|
A[Complete Task]
|
|
end
|
|
|
|
subgraph "Local First"
|
|
B[Write to Local DB]
|
|
C[Update UI Immediately]
|
|
D[Queue Sync Operation]
|
|
end
|
|
|
|
subgraph "Background Sync"
|
|
E{Network Available?}
|
|
F[Sync to Server]
|
|
G[Retry with Backoff]
|
|
H[Conflict Resolution]
|
|
end
|
|
|
|
A --> B
|
|
B --> C
|
|
B --> D
|
|
D --> E
|
|
E -->|Yes| F
|
|
E -->|No| G
|
|
F --> H
|
|
G -.->|Retry| E
|
|
```
|
|
|
|
**Sync Strategy:**
|
|
|
|
* All progress writes happen locally first, providing instant feedback
|
|
* Background sync with exponential backoff (5s, 15s, 45s, 2min max)
|
|
* Last-write-wins conflict resolution (acceptable for single-user MVP)
|
|
* Server timestamp used as source of truth for streak calculations
|
|
* Sync queue persisted to survive app termination
|
|
|
|
***
|
|
|
|
## Data Architecture
|
|
|
|
### PostgreSQL Schema (Simplified)
|
|
|
|
```sql
|
|
-- Core tables with indexes optimized for common queries
|
|
users (id, firebase_uid, email, display_name, created_at, updated_at)
|
|
INDEX: firebase_uid (unique), email
|
|
|
|
challenges (id, name, slug, duration_days, difficulty, is_active, metadata)
|
|
INDEX: slug (unique), is_active
|
|
|
|
challenge_requirements (id, challenge_id, title, type, validation_rules, sort_order)
|
|
INDEX: challenge_id
|
|
|
|
user_challenges (id, user_id, challenge_id, start_date, status, current_streak, attempt_number)
|
|
INDEX: (user_id, status), (user_id, challenge_id)
|
|
|
|
daily_progress (id, user_challenge_id, progress_date, day_number, is_complete, completed_at)
|
|
INDEX: (user_challenge_id, progress_date) UNIQUE
|
|
|
|
task_completions (id, daily_progress_id, requirement_id, completion_data, completed_at)
|
|
INDEX: daily_progress_id
|
|
```
|
|
|
|
### Redis Data Structures
|
|
|
|
```
|
|
# Active user streaks (hot data)
|
|
streak:{user_id}:{challenge_id} -> { current: 45, longest: 45, last_date: "2024-01-15" }
|
|
TTL: 7 days (refreshed on activity)
|
|
|
|
# Session management
|
|
session:{token} -> { user_id, expires_at, device_id }
|
|
TTL: 30 days
|
|
|
|
# Future: Leaderboard preparation
|
|
leaderboard:{challenge_id}:daily -> Sorted Set (user_id -> streak)
|
|
```
|
|
|
|
***
|
|
|
|
## API Design
|
|
|
|
RESTful API with consistent patterns. Key endpoints:
|
|
|
|
| Endpoint | Method | Purpose |
|
|
| -------- | ------ | ------- |
|
|
| `/challenges` | GET | List active challenges (cached) |
|
|
| `/challenges/{id}` | GET | Challenge details with requirements |
|
|
| `/me/challenges` | GET | User's active and past challenges |
|
|
| `/me/challenges` | POST | Start a new challenge |
|
|
| `/me/challenges/{id}/progress` | GET | Full progress for a challenge |
|
|
| `/me/challenges/{id}/today` | GET | Today's tasks and completion status |
|
|
| `/me/challenges/{id}/today` | PATCH | Update task completions |
|
|
| `/sync` | POST | Batch sync for offline changes |
|
|
|
|
**Response Time Targets:**
|
|
|
|
* Challenge library: <100ms (CDN cached)
|
|
* Today's progress: <150ms (Redis + DB)
|
|
* Task completion: <200ms (write path)
|
|
|
|
***
|
|
|
|
## Security Architecture
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Client Security"
|
|
A[Secure Token Storage<br/>iOS Keychain / Android Keystore]
|
|
B[Certificate Pinning]
|
|
C[Biometric Lock Option]
|
|
end
|
|
|
|
subgraph "Transport Security"
|
|
D[TLS 1.3]
|
|
E[API Gateway Rate Limiting]
|
|
end
|
|
|
|
subgraph "Backend Security"
|
|
F[JWT Validation]
|
|
G[Row-Level Security]
|
|
H[Input Validation<br/>Fastify Schemas]
|
|
end
|
|
|
|
A --> D
|
|
B --> D
|
|
D --> E
|
|
E --> F
|
|
F --> G
|
|
F --> H
|
|
```
|
|
|
|
**Key Security Measures:**
|
|
|
|
* Firebase Auth handles credential security
|
|
* Short-lived JWTs (1 hour) with refresh token rotation
|
|
* All user data queries filtered by authenticated user\_id
|
|
* Rate limiting: 100 requests/minute per user
|
|
* Input validation at API gateway and service layers
|
|
|
|
***
|
|
|
|
## Scalability Considerations
|
|
|
|
**MVP Scale (10K users):**
|
|
|
|
* Single PostgreSQL instance (db.t3.medium)
|
|
* Single Redis instance (cache.t3.micro)
|
|
* 2 ECS tasks behind ALB
|
|
* Estimated cost: \~$150/month
|
|
|
|
**Growth Path (100K+ users):**
|
|
|
|
* PostgreSQL read replicas for challenge library queries
|
|
* Redis cluster for streak calculations
|
|
* Horizontal scaling of stateless API services
|
|
* Consider Aurora Serverless for variable load
|
|
|
|
**Social Features Preparation:**
|
|
|
|
* User ID foreign keys in place for future friend relationships
|
|
* Redis sorted sets ready for leaderboard implementation
|
|
* Event-driven architecture allows adding notification triggers
|
|
|
|
***
|
|
|
|
## Deployment Architecture
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Production"
|
|
ALB[Application Load Balancer]
|
|
ECS1[ECS Task 1]
|
|
ECS2[ECS Task 2]
|
|
RDS[(RDS PostgreSQL)]
|
|
REDIS[(ElastiCache Redis)]
|
|
end
|
|
|
|
subgraph "CI/CD"
|
|
GH[GitHub Actions]
|
|
ECR[ECR Registry]
|
|
end
|
|
|
|
subgraph "Monitoring"
|
|
CW[CloudWatch]
|
|
SENTRY[Sentry]
|
|
end
|
|
|
|
GH --> ECR
|
|
ECR --> ECS1
|
|
ECR --> ECS2
|
|
ALB --> ECS1
|
|
ALB --> ECS2
|
|
ECS1 --> RDS
|
|
ECS2 --> RDS
|
|
ECS1 --> REDIS
|
|
ECS2 --> REDIS
|
|
ECS1 --> CW
|
|
ECS1 --> SENTRY
|
|
```
|
|
|
|
**Deployment Strategy:**
|
|
|
|
* Blue/green deployments via ECS
|
|
* Database migrations run as pre-deployment task
|
|
* Feature flags for gradual rollouts
|
|
* Automated rollback on health check failures
|
|
|
|
***
|
|
|
|
## Recommendations
|
|
|
|
1. **Invest in Local-First Infrastructure**: The offline-first pattern is critical for the daily check-in experience. Allocate adequate time for sync logic and conflict handling.
|
|
2. **Implement Comprehensive Analytics Early**: As noted in Feature Definition, event tracking from day one informs Phase 2 social features. Instrument all user interactions.
|
|
3. **Design APIs for Mobile Efficiency**: Combine related data in single responses (today's tasks + streak + progress) to minimize round trips.
|
|
4. **Plan for Streak Edge Cases**: Timezone handling, daylight saving transitions, and missed-day scenarios need careful consideration in both client and server logic.
|
|
5. **Prepare Social Foundation Without Building It**: Include user\_id relationships and Redis structures that support leaderboards, but don't implement social features until validated.
|
|
|
|
***
|
|
|
|
## Technical Risks & Mitigations
|
|
|
|
| Risk | Impact | Mitigation |
|
|
| ---- | ------ | ---------- |
|
|
| Offline sync conflicts | Data loss, user frustration | Comprehensive conflict resolution, sync status UI |
|
|
| Streak calculation errors | Core feature broken | Server-side validation, reconciliation jobs, audit logs |
|
|
| Firebase Auth dependency | Authentication outage | Graceful degradation, cached sessions |
|
|
| React Native performance | Poor animation experience | Native driver animations, performance profiling |
|
|
|
|
***
|
|
|
|
## Next Steps
|
|
|
|
1. Set up infrastructure-as-code (Terraform/CDK) for reproducible environments
|
|
2. Implement authentication flow and user service
|
|
3. Build challenge service with seed data for 5 launch challenges
|
|
4. Develop progress service with offline-first client integration
|
|
5. Establish CI/CD pipeline with staging environment |