On this page

Architecture Overview

This document provides a high-level overview of our system architecture, request flows, and key design decisions.

Components Overview

API Server (FastAPI)

Purpose: Handle HTTP requests, validate input, return responses quickly (<500ms)

Responsibilities:

Receive and validate HTTP requests
Authenticate and authorise users
Execute business logic via service layer
Return JSON responses
Queue background jobs for long-running tasks

Technology: FastAPI with Uvicorn (ASGI server)

Scaling: Horizontal - run multiple instances behind a load balancer

Database (PostgreSQL)

Purpose: Persistent data storage

Responsibilities:

Store all application data
Enforce data integrity via constraints
Provide transactional guarantees
Execute queries with proper indexing

Technology: PostgreSQL 15+

Scaling: Vertical initially, read replicas if needed

Cache/Queue (Redis)

Purpose: Caching and message broker for background jobs

Responsibilities:

Cache frequently accessed data
Store session data
Message queue for Celery tasks
Rate limiting storage

Technology: Redis 7+

Scaling: Single instance with persistence for small teams, cluster for scale

Background Workers (Celery)

Purpose: Process long-running tasks asynchronously

Responsibilities:

Process webhook data
Send emails
Generate reports
Call external APIs
Sync data with third parties

Technology: Celery with Redis broker

Scaling: Horizontal - add more workers based on queue depth

Frontend (React)

Purpose: User interface

Responsibilities:

Render UI components
Handle user interactions
Make API calls
Manage client-side state
Display loading/error states

Technology: React 18 with TypeScript, Vite

Scaling: Static files served via CDN

Request Flow Patterns

1. Typical API Request (User Action)

sequenceDiagram
    participant C as Client
    participant A as API Server
    participant S as Service Layer
    participant D as Database
    participant R as Redis

    C->>A: POST /api/v1/orders
    A->>A: Validate JWT token
    A->>A: Validate request schema
    A->>S: create_order(data)
    S->>D: INSERT INTO orders
    D->>S: order_id
    S->>R: cache order data
    S->>A: Order object
    A->>C: 201 Created + order JSON

    Note over C,R: Total time: <500ms

Key Points:

Authentication happens at router level via dependency injection
Validation uses Pydantic schemas automatically
Business logic lives in service layer
Response returned quickly (<500ms)

2. Webhook Handler (Capture and Process Async)

sequenceDiagram
    participant SH as Shopify
    participant A as API Server
    participant D as Database
    participant Q as Redis Queue
    participant W as Celery Worker
    participant SA as Shopify API

    SH->>A: POST /webhooks/orders/create
    A->>A: Verify webhook signature
    A->>D: INSERT INTO webhook_events
    D->>A: event_id
    A->>Q: queue process_order_webhook.delay(event_id)
    A->>SH: 200 OK

    Note over SH,A: Response in <100ms

    Q->>W: Dequeue task
    W->>D: SELECT * FROM webhook_events
    W->>W: Process order data
    W->>D: UPDATE orders, inventory
    W->>SA: GET /orders/{id}/fulfillments
    SA->>W: Fulfillment data
    W->>D: UPDATE fulfillment_status

    Note over W,D: Processing time: 2-10 seconds

Key Points:

Webhook received and acknowledged immediately (<100ms)
Raw payload stored in database for replay/debugging
Processing happens asynchronously
Failures can be retried without losing data

3. Authentication Flow

sequenceDiagram
    participant C as Client
    participant A as API Server
    participant D as Database
    participant R as Redis

    C->>A: POST /auth/login {email, password}
    A->>D: SELECT * FROM users WHERE email=?
    D->>A: User record
    A->>A: Verify password hash
    A->>A: Generate JWT access + refresh tokens
    A->>R: Store refresh token (with TTL)
    A->>C: {access_token, refresh_token}

    Note over C,A: Subsequent requests

    C->>A: GET /api/v1/profile (Authorization: Bearer token)
    A->>A: Decode and verify JWT
    A->>A: Extract user_id from token
    A->>D: SELECT * FROM users WHERE id=?
    D->>A: User data
    A->>C: User profile JSON

Key Points:

JWT tokens are stateless for access tokens
Refresh tokens stored in Redis for revocation
Access tokens short-lived (30 min), refresh tokens long-lived (7 days)
User object injected via dependency injection

4. Background Job Processing

sequenceDiagram
    participant A as API Server
    participant Q as Redis Queue
    participant W as Celery Worker
    participant D as Database
    participant E as External API

    A->>Q: send_welcome_email.delay(user_id)
    A->>A: Continue processing

    Q->>W: Dequeue task
    W->>D: SELECT * FROM users WHERE id=?
    D->>W: User data
    W->>E: POST /send-email (AWS SES)
    E->>W: Message ID
    W->>D: INSERT INTO email_log
    W->>Q: Task complete

    Note over W: Retries on failure (max 3 attempts)

Key Points:

Tasks are queued with .delay() or .apply_async()
Workers poll Redis queue
Automatic retries with exponential backoff
All state persisted to database

Environment Differences

Local Development

graph LR
    Dev[Developer Machine] --> Docker[Docker Compose]
    Docker --> DB[(PostgreSQL)]
    Docker --> Redis[(Redis)]
    Docker --> Worker[Celery Worker]

Characteristics:

Everything runs in Docker containers
Single server (your laptop)
Hot reload enabled
Debug mode on
Simplified authentication
Test data seeded

Staging

graph LR
    Internet --> LB[Railway Load Balancer]
    LB --> App[Single Server]
    App --> DB[(PostgreSQL)]
    App --> Redis[(Redis)]
    App --> Worker[Celery Worker]

Characteristics:

All services on one server (cost optimisation)
Production-like configuration
Real domain with HTTPS
Connected to staging Shopify app
Test payment gateway
Automated deployments from staging branch

Production

graph TB
    Internet --> LB[Load Balancer]
    LB --> App1[API Server 1]
    LB --> App2[API Server 2]
    App1 --> DB[(PostgreSQL Primary)]
    App2 --> DB
    DB --> RR[(Read Replica)]
    App1 --> Redis[(Redis)]
    App2 --> Redis
    Redis --> W1[Worker 1]
    Redis --> W2[Worker 2]
    W1 --> DB
    W2 --> DB

Characteristics:

Multiple API server instances
Multiple Celery workers
Database read replicas (if needed)
Redis persistence enabled
Automated deployments from main branch
Real payment processing
Comprehensive monitoring and alerts

Key Design Decisions

1. Why FastAPI?

Rationale:

Automatic OpenAPI documentation
Built-in request/response validation with Pydantic
Excellent performance (ASGI-based)
Modern Python with type hints
Easy dependency injection
Great async support

Trade-off: Smaller ecosystem than Django, but we don’t need Django’s ORM or admin

2. Why PostgreSQL?

Rationale:

ACID compliance for data integrity
Rich data types (JSON, arrays, etc.)
Excellent performance with proper indexing
Mature, battle-tested
Great tooling ecosystem

Trade-off: Vertical scaling required eventually, but sufficient for our scale

3. Why Celery for Background Jobs?

Rationale:

Mature, widely adopted
Excellent retry mechanisms
Support for task chains and workflows
Built-in monitoring with Flower
Works well with Redis

Trade-off: Can be complex, but essential for async processing

4. Why Redis?

Rationale:

Fast in-memory storage
Works as both cache and message broker
Simple to operate
Excellent client libraries

Trade-off: Data must fit in memory, but works for our use case

5. Capture and Process Async Pattern

Rationale:

Ensures <500ms response times
Prevents webhook timeouts
Allows retries on failures
Maintains audit trail of raw webhooks

Implementation: Store immediately, return 200, queue processing

See ../02-standards/api-patterns.md for code examples.

6. Service Layer Pattern

Rationale:

Separates business logic from HTTP layer
Makes code testable (mock services easily)
Allows business logic reuse
Keeps controllers thin

Structure:

  Controller (route) → Service (business logic) → Model (database)

See ../02-standards/code-standards.md for examples.

7. API Versioning (/api/v1/)

Rationale:

Allows breaking changes without breaking existing clients
Clear migration path for clients
Explicitly communicate API stability

Trade-off: Some code duplication, but necessary for public APIs

8. Minimal Frontend State

Rationale:

Use React Query for server state
Keep UI state in components
Avoid complex global state managers
Simpler to reason about

Trade-off: Some prop drilling, but acceptable for our app size

Data Flow Overview

Read Operation

  Client → API Server → Service → Database → Service → API Server → Client
                        ↓
                      Redis Cache (if configured)

Write Operation

  Client → API Server → Service → Database → API Server → Client
                        ↓
                   Clear Cache (if cached)

Webhook Processing

  External → API Server → Database (raw payload) → Redis Queue
                            ↓
                       Return 200 OK

Redis Queue → Celery Worker → Process → Update Database
                                ↓
                         Call External APIs (if needed)

Documentation Locations

This folder (docs/01-getting-started/) - Architecture and setup
Standards (docs/02-standards/) - How to write code
Workflows (docs/03-workflows/) - Development processes
API Contracts (docs/api/endpoints.md) - API documentation
Runbooks (docs/05-runbooks/) - Troubleshooting and operations
Code - Inline docstrings and comments

Performance Targets

Metric	Target	Monitoring
API Response Time (p95)	<500ms	CloudWatch/Datadog
API Response Time (p99)	<1000ms	CloudWatch/Datadog
Database Query Time (p95)	<100ms	Slow query log
Background Job Processing	<10s	Celery monitoring
Uptime	>99.9%	Health checks

Security Overview

Authentication

JWT tokens with RS256 signing
Access tokens (short-lived): 30 minutes
Refresh tokens (long-lived): 7 days
Refresh tokens stored in Redis for revocation

Authorisation

Role-based access control (RBAC)
Permissions checked at endpoint level
Resource ownership validated in service layer

Data Protection

All passwords hashed with bcrypt
HTTPS enforced in production
API keys stored in environment variables
Database credentials rotated quarterly

Rate Limiting

Per-user rate limits enforced
Per-IP limits for unauthenticated endpoints
Webhook signature verification

Monitoring and Observability

Logging

Structured JSON logging
Request/response logging
Error tracking with stack traces
Background job status logging

Metrics

API response times
Database query performance
Background job queue depth
Error rates
Cache hit rates

Alerts

API error rate > 5%
Response time p95 > 500ms
Database connection pool exhausted
Worker queue depth > 1000
Disk space < 20%

See ../05-runbooks/debugging.md for monitoring details.

Deployment Architecture

CI/CD Pipeline

graph LR
    Commit[Git Push] --> GH[GitHub Actions]
    GH --> Test[Run Tests]
    Test --> Lint[Linting]
    Lint --> Build[Build Docker Image]
    Build --> Push[Push to Registry]
    Push --> Deploy[Deploy to Railway]
    Deploy --> Health[Health Check]
    Health --> Notify[Slack Notification]

Deployment Process

Developer pushes to staging or main branch
GitHub Actions runs tests and linting
Docker image built and pushed to registry
Railway pulls new image
Rolling deployment (zero downtime)
Health checks verify deployment
Slack notification sent to #deployments

See ../03-workflows/deployment.md for details.

Next Steps

Now that you understand the architecture:

Read ../02-standards/code-standards.md to learn our coding patterns
Review ../02-standards/api-patterns.md for API design
Study ../03-workflows/feature-development.md for how to build features
Check ../05-runbooks/debugging.md to learn how to debug issues

Questions? Ask in #dev-team on Slack!

API Testing with Postman

Common Issues and Solutions

Architecture Overview

Components Overview link

API Server (FastAPI) link

Database (PostgreSQL) link

Cache/Queue (Redis) link

Background Workers (Celery) link

Frontend (React) link

Request Flow Patterns link

1. Typical API Request (User Action) link

2. Webhook Handler (Capture and Process Async) link

3. Authentication Flow link

4. Background Job Processing link

Environment Differences link

Local Development link

Staging link

Production link

Key Design Decisions link

1. Why FastAPI? link

2. Why PostgreSQL? link

3. Why Celery for Background Jobs? link

4. Why Redis? link

5. Capture and Process Async Pattern link

6. Service Layer Pattern link

7. API Versioning (/api/v1/) link

8. Minimal Frontend State link

Data Flow Overview link

Read Operation link

Write Operation link

Webhook Processing link

Documentation Locations link

Performance Targets link

Security Overview link

Authentication link

Authorisation link

Data Protection link

Rate Limiting link

Monitoring and Observability link

Logging link

Metrics link

Alerts link

Deployment Architecture link

CI/CD Pipeline link

Deployment Process link

Next Steps link

Components Overview

API Server (FastAPI)

Database (PostgreSQL)

Cache/Queue (Redis)

Background Workers (Celery)

Frontend (React)

Request Flow Patterns

1. Typical API Request (User Action)

2. Webhook Handler (Capture and Process Async)

3. Authentication Flow

4. Background Job Processing

Environment Differences

Local Development

Staging

Production

Key Design Decisions

1. Why FastAPI?

2. Why PostgreSQL?

3. Why Celery for Background Jobs?

4. Why Redis?

5. Capture and Process Async Pattern

6. Service Layer Pattern

7. API Versioning (/api/v1/)

8. Minimal Frontend State

Data Flow Overview

Read Operation

Write Operation

Webhook Processing

Documentation Locations

Performance Targets

Security Overview

Authentication

Authorisation

Data Protection

Rate Limiting

Monitoring and Observability

Logging

Metrics

Alerts

Deployment Architecture

CI/CD Pipeline

Deployment Process

Next Steps