This document provides a high-level overview of our system architecture, request flows, and key design decisions.

Components Overview

API Server (FastAPI)

Purpose: Handle HTTP requests, validate input, return responses quickly (<500ms)

Responsibilities:

  • Receive and validate HTTP requests
  • Authenticate and authorise users
  • Execute business logic via service layer
  • Return JSON responses
  • Queue background jobs for long-running tasks

Technology: FastAPI with Uvicorn (ASGI server)

Scaling: Horizontal - run multiple instances behind a load balancer

Database (PostgreSQL)

Purpose: Persistent data storage

Responsibilities:

  • Store all application data
  • Enforce data integrity via constraints
  • Provide transactional guarantees
  • Execute queries with proper indexing

Technology: PostgreSQL 15+

Scaling: Vertical initially, read replicas if needed

Cache/Queue (Redis)

Purpose: Caching and message broker for background jobs

Responsibilities:

  • Cache frequently accessed data
  • Store session data
  • Message queue for Celery tasks
  • Rate limiting storage

Technology: Redis 7+

Scaling: Single instance with persistence for small teams, cluster for scale

Background Workers (Celery)

Purpose: Process long-running tasks asynchronously

Responsibilities:

  • Process webhook data
  • Send emails
  • Generate reports
  • Call external APIs
  • Sync data with third parties

Technology: Celery with Redis broker

Scaling: Horizontal - add more workers based on queue depth

Frontend (React)

Purpose: User interface

Responsibilities:

  • Render UI components
  • Handle user interactions
  • Make API calls
  • Manage client-side state
  • Display loading/error states

Technology: React 18 with TypeScript, Vite

Scaling: Static files served via CDN

Request Flow Patterns

1. Typical API Request (User Action)

sequenceDiagram
    participant C as Client
    participant A as API Server
    participant S as Service Layer
    participant D as Database
    participant R as Redis

    C->>A: POST /api/v1/orders
    A->>A: Validate JWT token
    A->>A: Validate request schema
    A->>S: create_order(data)
    S->>D: INSERT INTO orders
    D->>S: order_id
    S->>R: cache order data
    S->>A: Order object
    A->>C: 201 Created + order JSON

    Note over C,R: Total time: <500ms

Key Points:

  • Authentication happens at router level via dependency injection
  • Validation uses Pydantic schemas automatically
  • Business logic lives in service layer
  • Response returned quickly (<500ms)

2. Webhook Handler (Capture and Process Async)

sequenceDiagram
    participant SH as Shopify
    participant A as API Server
    participant D as Database
    participant Q as Redis Queue
    participant W as Celery Worker
    participant SA as Shopify API

    SH->>A: POST /webhooks/orders/create
    A->>A: Verify webhook signature
    A->>D: INSERT INTO webhook_events
    D->>A: event_id
    A->>Q: queue process_order_webhook.delay(event_id)
    A->>SH: 200 OK

    Note over SH,A: Response in <100ms

    Q->>W: Dequeue task
    W->>D: SELECT * FROM webhook_events
    W->>W: Process order data
    W->>D: UPDATE orders, inventory
    W->>SA: GET /orders/{id}/fulfillments
    SA->>W: Fulfillment data
    W->>D: UPDATE fulfillment_status

    Note over W,D: Processing time: 2-10 seconds

Key Points:

  • Webhook received and acknowledged immediately (<100ms)
  • Raw payload stored in database for replay/debugging
  • Processing happens asynchronously
  • Failures can be retried without losing data

3. Authentication Flow

sequenceDiagram
    participant C as Client
    participant A as API Server
    participant D as Database
    participant R as Redis

    C->>A: POST /auth/login {email, password}
    A->>D: SELECT * FROM users WHERE email=?
    D->>A: User record
    A->>A: Verify password hash
    A->>A: Generate JWT access + refresh tokens
    A->>R: Store refresh token (with TTL)
    A->>C: {access_token, refresh_token}

    Note over C,A: Subsequent requests

    C->>A: GET /api/v1/profile (Authorization: Bearer token)
    A->>A: Decode and verify JWT
    A->>A: Extract user_id from token
    A->>D: SELECT * FROM users WHERE id=?
    D->>A: User data
    A->>C: User profile JSON

Key Points:

  • JWT tokens are stateless for access tokens
  • Refresh tokens stored in Redis for revocation
  • Access tokens short-lived (30 min), refresh tokens long-lived (7 days)
  • User object injected via dependency injection

4. Background Job Processing

sequenceDiagram
    participant A as API Server
    participant Q as Redis Queue
    participant W as Celery Worker
    participant D as Database
    participant E as External API

    A->>Q: send_welcome_email.delay(user_id)
    A->>A: Continue processing

    Q->>W: Dequeue task
    W->>D: SELECT * FROM users WHERE id=?
    D->>W: User data
    W->>E: POST /send-email (AWS SES)
    E->>W: Message ID
    W->>D: INSERT INTO email_log
    W->>Q: Task complete

    Note over W: Retries on failure (max 3 attempts)

Key Points:

  • Tasks are queued with .delay() or .apply_async()
  • Workers poll Redis queue
  • Automatic retries with exponential backoff
  • All state persisted to database

Environment Differences

Local Development

graph LR
    Dev[Developer Machine] --> Docker[Docker Compose]
    Docker --> DB[(PostgreSQL)]
    Docker --> Redis[(Redis)]
    Docker --> Worker[Celery Worker]

Characteristics:

  • Everything runs in Docker containers
  • Single server (your laptop)
  • Hot reload enabled
  • Debug mode on
  • Simplified authentication
  • Test data seeded

Staging

graph LR
    Internet --> LB[Railway Load Balancer]
    LB --> App[Single Server]
    App --> DB[(PostgreSQL)]
    App --> Redis[(Redis)]
    App --> Worker[Celery Worker]

Characteristics:

  • All services on one server (cost optimisation)
  • Production-like configuration
  • Real domain with HTTPS
  • Connected to staging Shopify app
  • Test payment gateway
  • Automated deployments from staging branch

Production

graph TB
    Internet --> LB[Load Balancer]
    LB --> App1[API Server 1]
    LB --> App2[API Server 2]
    App1 --> DB[(PostgreSQL Primary)]
    App2 --> DB
    DB --> RR[(Read Replica)]
    App1 --> Redis[(Redis)]
    App2 --> Redis
    Redis --> W1[Worker 1]
    Redis --> W2[Worker 2]
    W1 --> DB
    W2 --> DB

Characteristics:

  • Multiple API server instances
  • Multiple Celery workers
  • Database read replicas (if needed)
  • Redis persistence enabled
  • Automated deployments from main branch
  • Real payment processing
  • Comprehensive monitoring and alerts

Key Design Decisions

1. Why FastAPI?

Rationale:

  • Automatic OpenAPI documentation
  • Built-in request/response validation with Pydantic
  • Excellent performance (ASGI-based)
  • Modern Python with type hints
  • Easy dependency injection
  • Great async support

Trade-off: Smaller ecosystem than Django, but we don’t need Django’s ORM or admin

2. Why PostgreSQL?

Rationale:

  • ACID compliance for data integrity
  • Rich data types (JSON, arrays, etc.)
  • Excellent performance with proper indexing
  • Mature, battle-tested
  • Great tooling ecosystem

Trade-off: Vertical scaling required eventually, but sufficient for our scale

3. Why Celery for Background Jobs?

Rationale:

  • Mature, widely adopted
  • Excellent retry mechanisms
  • Support for task chains and workflows
  • Built-in monitoring with Flower
  • Works well with Redis

Trade-off: Can be complex, but essential for async processing

4. Why Redis?

Rationale:

  • Fast in-memory storage
  • Works as both cache and message broker
  • Simple to operate
  • Excellent client libraries

Trade-off: Data must fit in memory, but works for our use case

5. Capture and Process Async Pattern

Rationale:

  • Ensures <500ms response times
  • Prevents webhook timeouts
  • Allows retries on failures
  • Maintains audit trail of raw webhooks

Implementation: Store immediately, return 200, queue processing

See ../02-standards/api-patterns.md for code examples.

6. Service Layer Pattern

Rationale:

  • Separates business logic from HTTP layer
  • Makes code testable (mock services easily)
  • Allows business logic reuse
  • Keeps controllers thin

Structure:

  Controller (route) → Service (business logic) → Model (database)
  

See ../02-standards/code-standards.md for examples.

7. API Versioning (/api/v1/)

Rationale:

  • Allows breaking changes without breaking existing clients
  • Clear migration path for clients
  • Explicitly communicate API stability

Trade-off: Some code duplication, but necessary for public APIs

8. Minimal Frontend State

Rationale:

  • Use React Query for server state
  • Keep UI state in components
  • Avoid complex global state managers
  • Simpler to reason about

Trade-off: Some prop drilling, but acceptable for our app size

Data Flow Overview

Read Operation

  Client → API Server → Service → Database → Service → API Server → Client
                        ↓
                      Redis Cache (if configured)
  

Write Operation

  Client → API Server → Service → Database → API Server → Client
                        ↓
                   Clear Cache (if cached)
  

Webhook Processing

  External → API Server → Database (raw payload) → Redis Queue
                            ↓
                       Return 200 OK

Redis Queue → Celery Worker → Process → Update Database
                                ↓
                         Call External APIs (if needed)
  

Documentation Locations

  • This folder (docs/01-getting-started/) - Architecture and setup
  • Standards (docs/02-standards/) - How to write code
  • Workflows (docs/03-workflows/) - Development processes
  • API Contracts (docs/api/endpoints.md) - API documentation
  • Runbooks (docs/05-runbooks/) - Troubleshooting and operations
  • Code - Inline docstrings and comments

Performance Targets

MetricTargetMonitoring
API Response Time (p95)<500msCloudWatch/Datadog
API Response Time (p99)<1000msCloudWatch/Datadog
Database Query Time (p95)<100msSlow query log
Background Job Processing<10sCelery monitoring
Uptime>99.9%Health checks

Security Overview

Authentication

  • JWT tokens with RS256 signing
  • Access tokens (short-lived): 30 minutes
  • Refresh tokens (long-lived): 7 days
  • Refresh tokens stored in Redis for revocation

Authorisation

  • Role-based access control (RBAC)
  • Permissions checked at endpoint level
  • Resource ownership validated in service layer

Data Protection

  • All passwords hashed with bcrypt
  • HTTPS enforced in production
  • API keys stored in environment variables
  • Database credentials rotated quarterly

Rate Limiting

  • Per-user rate limits enforced
  • Per-IP limits for unauthenticated endpoints
  • Webhook signature verification

Monitoring and Observability

Logging

  • Structured JSON logging
  • Request/response logging
  • Error tracking with stack traces
  • Background job status logging

Metrics

  • API response times
  • Database query performance
  • Background job queue depth
  • Error rates
  • Cache hit rates

Alerts

  • API error rate > 5%
  • Response time p95 > 500ms
  • Database connection pool exhausted
  • Worker queue depth > 1000
  • Disk space < 20%

See ../05-runbooks/debugging.md for monitoring details.

Deployment Architecture

CI/CD Pipeline

graph LR
    Commit[Git Push] --> GH[GitHub Actions]
    GH --> Test[Run Tests]
    Test --> Lint[Linting]
    Lint --> Build[Build Docker Image]
    Build --> Push[Push to Registry]
    Push --> Deploy[Deploy to Railway]
    Deploy --> Health[Health Check]
    Health --> Notify[Slack Notification]

Deployment Process

  1. Developer pushes to staging or main branch
  2. GitHub Actions runs tests and linting
  3. Docker image built and pushed to registry
  4. Railway pulls new image
  5. Rolling deployment (zero downtime)
  6. Health checks verify deployment
  7. Slack notification sent to #deployments

See ../03-workflows/deployment.md for details.

Next Steps

Now that you understand the architecture:

  1. Read ../02-standards/code-standards.md to learn our coding patterns
  2. Review ../02-standards/api-patterns.md for API design
  3. Study ../03-workflows/feature-development.md for how to build features
  4. Check ../05-runbooks/debugging.md to learn how to debug issues

Questions? Ask in #dev-team on Slack!