This runbook provides solutions to frequently encountered issues during development.

Table of Contents

  1. Database Issues
  2. Docker Issues
  3. Authentication & API Issues
  4. Performance Issues
  5. Background Job Issues
  6. Build & Deployment Issues
  7. Development Environment Issues

Database Issues

Issue: Database Connection Failed

Symptom:

  sqlalchemy.exc.OperationalError: could not connect to server: Connection refused
  

Cause: PostgreSQL service not running or wrong connection settings

Solution:

  # Check if PostgreSQL is running
docker-compose ps db

# If not running, start it
docker-compose up -d db

# Check logs
docker-compose logs db

# Verify connection settings in .env
cat .env | grep DATABASE_URL

# Test connection
docker-compose exec db psql -U postgres -c "SELECT 1"
  

Issue: Migration Conflicts

Symptom:

  alembic.util.exc.CommandError: Target database is not up to date.
  

Cause: Multiple migrations created from different branches

Solution:

  # Check current version
alembic current

# Check migration history
alembic history

# If migrations conflict, you may need to:

# Option 1: Merge migrations (if in development)
alembic merge heads -m "merge migrations"
alembic upgrade head

# Option 2: Reset database (DEVELOPMENT ONLY)
docker-compose down -v
docker-compose up -d db
alembic upgrade head

# Option 3: Manually resolve (production)
# Contact team lead for guidance
  

Issue: Slow Query Performance

Symptom:

  • API endpoints taking >1 second
  • Database CPU usage high

Cause: Missing indexes or N+1 queries

Solution:

  # 1. Enable query logging
# In your endpoint, add:
import logging
logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)

# 2. Check for N+1 queries
# Look for multiple SELECT queries in logs

# 3. Use eager loading
from sqlalchemy.orm import joinedload

# BAD - N+1 query
orders = db.query(Order).all()
for order in orders:
    print(order.user.email)  # Each iteration triggers a query

# GOOD - Eager loading
orders = db.query(Order).options(joinedload(Order.user)).all()
for order in orders:
    print(order.user.email)  # No additional queries

# 4. Add indexes
# See docs/02-standards/database-patterns.md
# Add index in migration:
op.create_index('idx_orders_user_id', 'orders', ['user_id'])
  

Issue: Database Locked (SQLite Development)

Symptom:

  sqlite3.OperationalError: database is locked
  

Cause: Multiple processes accessing SQLite simultaneously

Solution:

  # Use PostgreSQL instead of SQLite for development
# Update docker-compose.yml and use PostgreSQL service

# If you must use SQLite:
# 1. Close all database connections
# 2. Delete the database file
# 3. Run migrations again
rm test.db
alembic upgrade head
  

Docker Issues

Issue: Port Already in Use

Symptom:

  Error: Bind for 0.0.0.0:5432 failed: port is already allocated
  

Cause: Another service using the same port

Solution:

  # Find process using the port
lsof -i :5432  # On macOS/Linux
netstat -ano | findstr :5432  # On Windows

# Kill the process
kill -9 <PID>

# Or change port in docker-compose.yml
ports:
  - "5433:5432"  # Use different host port

# Restart services
docker-compose down
docker-compose up -d
  

Issue: Docker Container Won’t Start

Symptom: Container exits immediately after starting

Cause: Configuration error, missing dependencies, or resource limits

Solution:

  # Check container logs
docker-compose logs app

# Check container status
docker-compose ps

# Common fixes:

# 1. Environment variable missing
docker-compose config  # Validate docker-compose.yml

# 2. Build issues
docker-compose build --no-cache app
docker-compose up -d app

# 3. Resource limits
# Increase Docker Desktop memory/CPU limits

# 4. Permission issues
chmod +x entrypoint.sh

# 5. Start container interactively for debugging
docker-compose run --rm app /bin/bash
  

Issue: Out of Disk Space

Symptom:

  Error: no space left on device
  

Cause: Docker images and volumes filling up disk

Solution:

  # Check Docker disk usage
docker system df

# Clean up unused resources
docker system prune -a  # Remove all unused images, containers, networks

# Remove unused volumes
docker volume prune

# Remove specific volume
docker volume rm yourapp_postgres_data

# Check system disk space
df -h
  

Authentication & API Issues

Issue: 401 Unauthorized on All Requests

Symptom: All API requests return 401 even with valid token

Cause: JWT secret mismatch or token expiration

Solution:

  # 1. Check JWT_SECRET_KEY in .env matches across services
cat .env | grep JWT_SECRET_KEY

# 2. Verify token hasn't expired
# Decode JWT at https://jwt.io
# Check exp (expiration) claim

# 3. Generate new token
curl -X POST http://localhost:8000/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"username": "user@example.com", "password": "password"}'

# 4. Clear old tokens from localStorage (frontend)
localStorage.clear()

# 5. Restart services
docker-compose restart app
  

Issue: CORS Errors

Symptom:

  Access to fetch at 'http://localhost:8000' from origin 'http://localhost:3000'
has been blocked by CORS policy
  

Cause: Frontend origin not in CORS_ORIGINS

Solution:

  # 1. Check CORS configuration in .env
CORS_ORIGINS=["http://localhost:3000", "http://localhost:5173"]

# 2. Update app/main.py
from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:3000", "http://localhost:5173"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# 3. Restart API server
docker-compose restart app

# 4. Clear browser cache
# Chrome: Cmd+Shift+R (Mac) or Ctrl+Shift+R (Windows)
  

Issue: API Endpoint Returns 500 Internal Server Error

Symptom: Request succeeds in Postman but returns 500 from frontend

Cause: Usually a Python exception in the endpoint

Solution:

  # 1. Check API logs
docker-compose logs -f app

# 2. Look for stack trace
# Error will show which line caused the exception

# 3. Common causes:
# - Missing database record (use .first() not .one())
# - Type mismatch (string passed where int expected)
# - Missing environment variable
# - External API call failed

# 4. Add try/except with logging
try:
    result = some_operation()
except Exception as e:
    logger.error(f"Operation failed: {str(e)}")
    raise HTTPException(status_code=500, detail=str(e))
  

Performance Issues

Issue: Slow API Response Times (>500ms)

Symptom: API endpoints taking longer than 500ms to respond

Cause: Multiple possible causes

Solution:

  # 1. Check which part is slow

# Add timing logs to your endpoint:
import time

start = time.time()
# Your code here
logger.info(f"Operation took {time.time() - start:.2f}s")

# 2. Common slow operations:

# Database queries
# Solution: Add indexes, use eager loading, check for N+1

# External API calls
# Solution: Use async/await or move to background job

# File processing
# Solution: Move to background job

# 3. Use profiling
# Install:
pip install py-spy

# Profile running process:
py-spy top --pid <PID>

# 4. Check database slow query log
# See debugging.md for EXPLAIN ANALYZE
  

Issue: Memory Leak

Symptom: Container memory usage grows over time, eventually crashes

Cause: Objects not being garbage collected

Solution:

  # 1. Monitor memory usage
docker stats yourapp_api

# 2. Check for common causes:

# - Database connections not closed
# Solution: Use context managers or dependencies

# - Large objects cached in memory
# Solution: Clear cache periodically, use Redis

# - Circular references
# Solution: Use weakref where appropriate

# 3. Profile memory usage
pip install memory_profiler

# Add to your code:
from memory_profiler import profile

@profile
def your_function():
    # Your code

# Run and check output

# 4. Restart container as temporary fix
docker-compose restart app
  

Background Job Issues

Issue: Celery Worker Not Processing Jobs

Symptom: Jobs queued but not executing

Cause: Worker not running or Redis connection failed

Solution:

  # 1. Check if worker is running
docker-compose ps worker

# 2. Check worker logs
docker-compose logs -f worker

# 3. Check Redis connection
docker-compose exec redis redis-cli ping
# Should return: PONG

# 4. Check queue depth
docker-compose exec redis redis-cli LLEN celery

# 5. Restart worker
docker-compose restart worker

# 6. Purge queue if needed (DEVELOPMENT ONLY)
celery -A app.worker purge
  

Issue: Background Jobs Failing

Symptom: Jobs start but fail with errors

Cause: Exception in task code

Solution:

  # 1. Check worker logs for stack trace
docker-compose logs -f worker

# 2. Test task manually
python
>>> from app.tasks.your_task import your_task
>>> your_task.delay(args)

# 3. Common causes:
# - Database connection not available in worker
# - Environment variables not set for worker
# - External API unreachable

# 4. Add retry logic
from celery import Task

@celery_app.task(bind=True, max_retries=3)
def your_task(self, arg):
    try:
        # Your code
    except Exception as exc:
        raise self.retry(exc=exc, countdown=60)
  

Issue: Redis Connection Refused

Symptom:

  redis.exceptions.ConnectionError: Error connecting to Redis
  

Cause: Redis service not running

Solution:

  # Check Redis status
docker-compose ps redis

# Start Redis
docker-compose up -d redis

# Test connection
docker-compose exec redis redis-cli ping

# Check REDIS_URL in .env
cat .env | grep REDIS_URL
# Should be: redis://redis:6379/0 (in Docker)
# or: redis://localhost:6379/0 (local)
  

Build & Deployment Issues

Issue: Pre-commit Hooks Failing

Symptom: Commit blocked by pre-commit hooks

Cause: Code doesn’t meet formatting/linting standards

Solution:

  # 1. See what failed
# Output shows which hook failed and why

# 2. Run hooks manually to see details
pre-commit run --all-files

# 3. Common failures:

# Black (formatting)
black app/

# isort (imports)
isort app/

# flake8 (linting)
flake8 app/
# Fix issues shown

# mypy (type checking)
mypy app/
# Add type hints

# 4. Try commit again
git add .
git commit -m "your message"

# 5. If urgent, skip hooks (NOT RECOMMENDED)
git commit --no-verify
  

Issue: Tests Passing Locally but Failing in CI

Symptom: pytest passes on your machine but fails in GitHub Actions

Cause: Environment differences

Solution:

  # Common causes:

# 1. Different Python/Node version
# Solution: Match versions in CI config

# 2. Missing environment variables
# Solution: Add to GitHub Secrets

# 3. Time zone differences
# Solution: Use UTC in tests

# 4. File path differences (Windows vs Linux)
# Solution: Use pathlib instead of string paths

# 5. Test depends on local data
# Solution: Use fixtures, don't rely on local files

# 6. Race condition in async tests
# Solution: Add proper awaits/sleeps

# Replicate CI environment locally:
docker run -it python:3.11 /bin/bash
# Run tests in container
  

Development Environment Issues

Issue: Hot Reload Not Working

Symptom: Changes to code don’t trigger automatic restart

Cause: Volume mounting issue or file watcher not detecting changes

Solution:

  # For FastAPI (uvicorn):
# 1. Ensure --reload flag is set
uvicorn app.main:app --reload

# 2. Check volume mounts in docker-compose.yml
volumes:
  - .:/app  # Should mount current directory

# 3. Restart container
docker-compose restart app

# For React (Vite):
# 1. Check vite.config.ts
server: {
  watch: {
    usePolling: true  # For Docker
  }
}

# 2. Restart frontend
npm run dev

# WSL2 specific:
# File changes on Windows might not trigger in WSL
# Solution: Move project inside WSL filesystem
  

Issue: Import Errors After Adding New Module

Symptom:

  ModuleNotFoundError: No module named 'your_module'
  

Cause: Virtual environment not activated or package not installed

Solution:

  # 1. Activate virtual environment
source venv/bin/activate  # macOS/Linux
venv\Scripts\activate  # Windows

# 2. Install package
pip install your-package

# 3. For development dependencies
pip install -r requirements-dev.txt

# 4. In Docker
docker-compose build app
docker-compose up -d app

# 5. Check Python path
python -c "import sys; print(sys.path)"
  

Getting Help

If these solutions don’t resolve your issue:

  1. Check logs - Most issues show clear errors in logs
  2. Search Slack - #dev-team channel history
  3. Ask in Slack - #dev-team with:
    • What you’re trying to do
    • What error you’re seeing
    • What you’ve already tried
  4. Create GitHub issue - For bugs that need tracking
  5. Pair with senior dev - For complex issues

Prevention

Many issues can be prevented by:

  • ✅ Running tests before committing
  • ✅ Using pre-commit hooks
  • ✅ Keeping dependencies updated
  • ✅ Reading error messages carefully
  • ✅ Following our standards (see docs/02-standards/)

Next Steps