Introduction
FastAPI has become my go-to framework for building Python backends. Its async-first design, automatic OpenAPI docs, and type safety make it perfect for high-performance APIs.
This article shares how we built FastAPI microservices handling 10K+ concurrent requests, reduced response times by 40%, and designed for predictable failure modes.
Why FastAPI?
Before FastAPI, we used Flask. The migration was driven by:
| Aspect | Flask | FastAPI |
|---|---|---|
| Async support | Bolted on | Native |
| Type checking | Optional | Built-in |
| API docs | Manual | Automatic |
| Performance | ~1000 RPS | ~3000 RPS |
| Validation | External | Pydantic |
| -------- | ------- | --------- |
|---|---|---|
| Type checking | Optional | Built-in |
| API docs | Manual | Automatic |
| Performance | ~1000 RPS | ~3000 RPS |
| Validation | External | Pydantic |
| Async support | Bolted on | Native |
|---|---|---|
| API docs | Manual | Automatic |
| Performance | ~1000 RPS | ~3000 RPS |
| Validation | External | Pydantic |
| Type checking | Optional | Built-in |
|---|---|---|
| Performance | ~1000 RPS | ~3000 RPS |
| Validation | External | Pydantic |
| API docs | Manual | Automatic |
|---|---|---|
| Validation | External | Pydantic |
| Performance | ~1000 RPS | ~3000 RPS |
|---|
The performance difference alone justified the migration.
Architecture: From Monolith to Microservices
Before: The Monolith
1┌────────────────────────────────────────┐2│ Flask Monolith │3│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │4│ │ Auth │ │ User │ │ Task │ │ Data │ │5│ └──────┘ └──────┘ └──────┘ └──────┘ │6└────────────────────────────────────────┘Problems:
- Single point of failure
- Can't scale components independently
- Deployments affect everything
After: Microservices
1┌─────────────┐2│ API Gateway │3└──────┬──────┘4 │5┌──────┴──────┬──────────────┬──────────────┐6│ │ │ │7▼ ▼ ▼ ▼8┌─────┐ ┌──────┐ ┌──────┐ ┌──────┐9│Auth │ │ User │ │ Task │ │ Data │10│ API │ │ API │ │ API │ │ API │11└─────┘ └──────┘ └──────┘ └──────┘Each service:
- Scales independently
- Has its own database
- Can be deployed separately
- Fails in isolation
Building High-Performance FastAPI Services
Async All The Way
The key to FastAPI performance is embracing async:
1from fastapi import FastAPI2from httpx import AsyncClient3from sqlalchemy.ext.asyncio import AsyncSession4
5app = FastAPI()6
7# Bad: Blocking database call8@app.get("/users/{user_id}")9def get_user(user_id: int, db: Session = Depends(get_db)):10 return db.query(User).filter(User.id == user_id).first()11
12# Good: Async database call13@app.get("/users/{user_id}")14async def get_user(user_id: int, db: AsyncSession = Depends(get_async_db)):15 result = await db.execute(select(User).where(User.id == user_id))16 return result.scalar_one_or_none()Connection Pooling
Database connections are expensive. Pool them:
1from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession2from sqlalchemy.orm import sessionmaker3
4engine = create_async_engine(5 DATABASE_URL,6 pool_size=20,7 max_overflow=30,8 pool_timeout=30,9 pool_recycle=1800,10)11
12AsyncSessionLocal = sessionmaker(13 engine,14 class_=AsyncSession,15 expire_on_commit=False16)Response Caching
Not everything needs to hit the database:
1from fastapi_cache import FastAPICache2from fastapi_cache.backends.redis import RedisBackend3from fastapi_cache.decorator import cache4
5@app.on_event("startup")6async def startup():7 redis = aioredis.from_url("redis://localhost")8 FastAPICache.init(RedisBackend(redis), prefix="api-cache")9
10@app.get("/products/{product_id}")11@cache(expire=300) # Cache for 5 minutes12async def get_product(product_id: int):13 # This result will be cached14 return await fetch_product(product_id)Handling 10K Concurrent Requests
Load Testing Results
Using Locust for load testing:
1# locustfile.py2from locust import HttpUser, task, between3
4class APIUser(HttpUser):5 wait_time = between(0.1, 0.5)6
7 @task(3)8 def get_tasks(self):9 self.client.get("/api/v1/tasks")10
11 @task(1)12 def create_task(self):13 self.client.post("/api/v1/tasks", json={14 "title": "Test task",15 "priority": "high"16 })Results at 10K concurrent users:
| Metric | Before Optimization | After Optimization |
|---|---|---|
| RPS | 2,500 | 4,200 |
| P50 Latency | 180ms | 95ms |
| P95 Latency | 850ms | 280ms |
| P99 Latency | 2.1s | 520ms |
| Error Rate | 2.3% | 0.1% |
| -------- | -------------------- | -------------------- |
|---|---|---|
| P50 Latency | 180ms | 95ms |
| P95 Latency | 850ms | 280ms |
| P99 Latency | 2.1s | 520ms |
| Error Rate | 2.3% | 0.1% |
| RPS | 2,500 | 4,200 |
|---|---|---|
| P95 Latency | 850ms | 280ms |
| P99 Latency | 2.1s | 520ms |
| Error Rate | 2.3% | 0.1% |
| P50 Latency | 180ms | 95ms |
|---|---|---|
| P99 Latency | 2.1s | 520ms |
| Error Rate | 2.3% | 0.1% |
| P95 Latency | 850ms | 280ms |
|---|---|---|
| Error Rate | 2.3% | 0.1% |
| P99 Latency | 2.1s | 520ms |
|---|
Key Optimizations
- Async database driver (asyncpg instead of psycopg2)
- Connection pooling (20 base, 30 overflow)
- Redis caching for read-heavy endpoints
- Pagination for list endpoints
- Query optimization (proper indexes, eager loading)
Predictable Failure Modes
Systems will fail. The goal is predictable, graceful failure.
Structured Error Responses
1from fastapi import HTTPException2from pydantic import BaseModel3
4class ErrorResponse(BaseModel):5 error_code: str6 message: str7 details: dict | None = None8
9@app.exception_handler(HTTPException)10async def http_exception_handler(request, exc):11 return JSONResponse(12 status_code=exc.status_code,13 content=ErrorResponse(14 error_code=f"ERR_{exc.status_code}",15 message=exc.detail,16 ).dict()17 )Circuit Breakers
1from circuitbreaker import circuit2
3@circuit(failure_threshold=5, recovery_timeout=30)4async def call_external_service(data: dict):5 async with httpx.AsyncClient() as client:6 response = await client.post(EXTERNAL_URL, json=data)7 response.raise_for_status()8 return response.json()Health Checks
1@app.get("/health")2async def health_check():3 checks = {4 "database": await check_database(),5 "redis": await check_redis(),6 "external_api": await check_external_api(),7 }8
9 status = "healthy" if all(checks.values()) else "degraded"10 return {"status": status, "checks": checks}Graceful Degradation
1@app.get("/recommendations/{user_id}")2async def get_recommendations(user_id: int):3 try:4 # Try personalized recommendations5 return await ml_service.get_personalized(user_id)6 except ServiceUnavailable:7 # Fall back to popular items8 return await get_popular_items()9 except Exception:10 # Ultimate fallback11 return {"recommendations": [], "fallback": True}Observability
Structured Logging
1import structlog2
3logger = structlog.get_logger()4
5@app.middleware("http")6async def logging_middleware(request: Request, call_next):7 request_id = str(uuid.uuid4())8
9 with structlog.contextvars.bound_contextvars(10 request_id=request_id,11 path=request.url.path,12 method=request.method,13 ):14 logger.info("request_started")15
16 start = time.perf_counter()17 response = await call_next(request)18 duration = time.perf_counter() - start19
20 logger.info(21 "request_completed",22 status_code=response.status_code,23 duration_ms=round(duration * 1000, 2)24 )25
26 return responseMetrics
1from prometheus_fastapi_instrumentator import Instrumentator2
3Instrumentator().instrument(app).expose(app)This gives you automatic metrics for:
- Request count by endpoint
- Request latency histograms
- Response status codes
- In-flight requests
Results
After the migration and optimizations:
- 40% reduction in average response time
- 10K+ concurrent requests handled reliably
- 99.5% deployment success rate with CI/CD
- Zero-downtime deployments with rolling updates
- Predictable failure modes with circuit breakers
Key Takeaways
- Go async: FastAPI's async support is its superpower—use it everywhere
- Pool connections: Database connections are expensive; pool aggressively
- Cache strategically: Redis caching can eliminate most database load
- Design for failure: Circuit breakers and graceful degradation are essential
- Observe everything: You can't optimize what you can't measure
FastAPI makes building high-performance Python APIs accessible. The key is understanding async patterns and designing for scale from the start.
Questions about FastAPI or microservices? Connect with me on LinkedIn or GitHub.