Rate Limiting in .NET 10: The Complete Deep Dive
Rate limiting in .NET has always been an afterthought. You either bolted on AspNetCoreRateLimit (which hasn't aged gracefully), hacked together a custom ActionFilter with a ConcurrentDictionary, or shipped without any limiting and hoped nobody would hammer your endpoints. We've all been there.
.NET 10's Microsoft.AspNetCore.RateLimiting middleware changes this. Four algorithms, partitioned keys, per-endpoint policies, clean rejection handling. All built in. No extra NuGet packages for the 90% case. I've been running this in production for months now, and it replaces what used to take a third-party library plus custom Redis logic. Under 50 lines of configuration gets you proper multi-policy rate limiting.
The Four Algorithms
The framework ships four rate limiting algorithms. Each solves a different problem. Pick wrong and you'll either block legitimate users or let abuse through.
Fixed Window
The simplest option. You define a time window (say, 1 minute) and a permit limit (say, 5 requests). Counter resets at the window boundary.
options.AddFixedWindowLimiter("fixed", opt =>
{
opt.PermitLimit = 5;
opt.Window = TimeSpan.FromMinutes(1);
opt.QueueLimit = 0;
opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
});
Fixed window is perfect for hard caps: login attempts, password resets, anything where you want a firm "no more than X per minute." The downside: a user can fire 5 requests at 0:59 and another 5 at 1:01, effectively getting 10 in 2 seconds. That's the burst-at-boundaries problem.
Sliding Window
Solves the burst problem by dividing the window into segments and sliding forward. Instead of a hard reset, old segments expire gradually.
options.AddSlidingWindowLimiter("sliding", opt =>
{
opt.PermitLimit = 100;
opt.Window = TimeSpan.FromMinutes(1);
opt.SegmentsPerWindow = 6; // 10-second segments
opt.QueueLimit = 0;
});
Sliding window is my go-to for general API endpoints. The SegmentsPerWindow setting controls granularity. More segments means smoother distribution but slightly more memory. Six segments for a 1-minute window gives you 10-second resolution, which is plenty for most APIs.
Token Bucket
The classic algorithm for public-facing APIs. You have a bucket with a maximum number of tokens. Each request costs one token. Tokens replenish at a steady rate.
options.AddTokenBucketLimiter("token", opt =>
{
opt.TokenLimit = 100; // max burst size
opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
opt.TokensPerPeriod = 10; // steady rate: 1/sec
opt.QueueLimit = 0;
opt.AutoReplenishment = true;
});
Token bucket is my default recommendation for most REST APIs. It naturally allows short bursts (a client loading a dashboard that fires 20 parallel requests) while enforcing a sustained rate. TokenLimit controls burst size, TokensPerPeriod controls sustained throughput. You tune these independently, which is the whole point.
Concurrency Limiter
Different beast entirely. Not time-based. Limits how many requests can be in-flight simultaneously.
options.AddConcurrencyLimiter("concurrent", opt =>
{
opt.PermitLimit = 3;
opt.QueueLimit = 5;
opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
});
Concurrency limiter gets overlooked but it's critical for expensive operations: report generation, file processing, AI inference calls, anything CPU or memory bound. Three simultaneous heavy operations won't bring your server down. Five more can queue. Everything else gets rejected immediately.
Decision Matrix
| Scenario | Algorithm | Why |
|---|---|---|
| Login/auth brute-force protection | Fixed Window | Hard cap, simple mental model |
| General API endpoints | Sliding Window | Smooth, no burst-at-boundary |
| Public REST API (external consumers) | Token Bucket | Allows bursts, enforces sustained rate |
| Expensive operations (reports, AI, uploads) | Concurrency | Protect server resources, not time-based |
| WebSocket connection limits | Concurrency | Limit simultaneous connections |
Basic Setup and Per-Endpoint Policies
Registration lives in Program.cs. Middleware placement matters. Put it after authentication (so you have access to user claims) but before endpoint routing.
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddRateLimiter(options =>
{
options.AddFixedWindowLimiter("auth", opt =>
{
opt.PermitLimit = 5;
opt.Window = TimeSpan.FromMinutes(1);
});
options.AddTokenBucketLimiter("api", opt =>
{
opt.TokenLimit = 100;
opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
opt.TokensPerPeriod = 10;
});
options.AddConcurrencyLimiter("heavy", opt =>
{
opt.PermitLimit = 3;
opt.QueueLimit = 5;
});
});
var app = builder.Build();
app.UseAuthentication();
app.UseAuthorization();
app.UseRateLimiter(); // After auth, before endpoints
// Minimal API — per route group
var authGroup = app.MapGroup("/auth").RequireRateLimiting("auth");
authGroup.MapPost("/login", HandleLogin);
authGroup.MapPost("/reset-password", HandlePasswordReset);
// Per individual endpoint
app.MapGet("/api/products", GetProducts).RequireRateLimiting("api");
app.MapPost("/api/reports", GenerateReport).RequireRateLimiting("heavy");
// Health checks — no limiting
app.MapGet("/health", () => Results.Ok()).DisableRateLimiting();
app.Run();
For controller-based APIs, use attributes:
[ApiController]
[Route("api/[controller]")]
[EnableRateLimiting("api")]
public class ProductsController : ControllerBase
{
[HttpGet]
public IActionResult GetAll() => Ok(_products);
[HttpPost("export")]
[EnableRateLimiting("heavy")] // Override at action level
public IActionResult Export() => Ok(GenerateExport());
[HttpGet("health")]
[DisableRateLimiting] // Opt out entirely
public IActionResult Health() => Ok();
}
The pattern: apply a relaxed policy at the group/controller level, then override with stricter policies on sensitive endpoints. [DisableRateLimiting] opts out completely. Use it for health checks and readiness probes.
Partitioned Rate Limiting: Per-User, Per-Tenant, Per-Key
A global rate limit is a blunt instrument. One abusive user exhausts the quota and every legitimate user gets rejected. Partitioned rate limiting gives each user (or tenant, or API key) their own independent bucket.
The AddPolicy method with a partition factory is where this gets interesting:
builder.Services.AddRateLimiter(options =>
{
options.AddPolicy("per-user", context =>
{
var userId = context.User?.FindFirst(ClaimTypes.NameIdentifier)?.Value
?? context.Connection.RemoteIpAddress?.ToString()
?? "anonymous";
return RateLimitPartition.GetTokenBucketLimiter(userId, _ => new TokenBucketRateLimiterOptions
{
TokenLimit = 100,
ReplenishmentPeriod = TimeSpan.FromSeconds(10),
TokensPerPeriod = 10,
AutoReplenishment = true
});
});
});
Each unique partition key gets its own independent limiter. Authenticated users are partitioned by user ID. Anonymous traffic falls back to IP address. One user burning through their quota doesn't affect anyone else.
Multi-Tenant with Tiered Limits
This is where it gets really useful for SaaS. Different tiers, different limits, resolved at request time from user claims:
builder.Services.AddRateLimiter(options =>
{
options.AddPolicy("tiered", context =>
{
var tenantId = context.User?.FindFirst("tenant_id")?.Value ?? "unknown";
var tier = context.User?.FindFirst("subscription_tier")?.Value ?? "free";
return tier switch
{
"premium" => RateLimitPartition.GetTokenBucketLimiter(tenantId, _ =>
new TokenBucketRateLimiterOptions
{
TokenLimit = 1000,
ReplenishmentPeriod = TimeSpan.FromSeconds(10),
TokensPerPeriod = 100,
AutoReplenishment = true
}),
"business" => RateLimitPartition.GetTokenBucketLimiter(tenantId, _ =>
new TokenBucketRateLimiterOptions
{
TokenLimit = 500,
ReplenishmentPeriod = TimeSpan.FromSeconds(10),
TokensPerPeriod = 50,
AutoReplenishment = true
}),
_ => RateLimitPartition.GetTokenBucketLimiter(tenantId, _ =>
new TokenBucketRateLimiterOptions
{
TokenLimit = 100,
ReplenishmentPeriod = TimeSpan.FromSeconds(10),
TokensPerPeriod = 10,
AutoReplenishment = true
})
};
});
});
Premium users get 10x the limit of free-tier users. Each tenant is isolated. The partition key is the tenant ID, so even within the same tier, tenants don't compete with each other. This used to take custom middleware, a Redis sorted set, and about 200 lines of code. Now it's declarative config.
Custom Rejection Handling and RFC 9457 ProblemDetails
The default rejection behaviour returns a 503 Service Unavailable. That's semantically wrong. A rate-limited request should return 429 Too Many Requests. Fix it with the OnRejected callback:
builder.Services.AddRateLimiter(options =>
{
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
options.OnRejected = async (context, cancellationToken) =>
{
context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;
context.HttpContext.Response.ContentType = "application/problem+json";
// Extract Retry-After from the lease metadata
if (context.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter))
{
context.HttpContext.Response.Headers.RetryAfter =
((int)retryAfter.TotalSeconds).ToString();
}
var problem = new
{
type = "https://httpstatuses.io/429",
title = "Too Many Requests",
status = 429,
detail = "Rate limit exceeded. Check the Retry-After header for when to retry.",
instance = context.HttpContext.Request.Path.ToString()
};
await context.HttpContext.Response.WriteAsJsonAsync(problem, cancellationToken);
// Log for observability
var logger = context.HttpContext.RequestServices
.GetRequiredService<ILoggerFactory>()
.CreateLogger("RateLimiting");
logger.LogWarning(
"Rate limit rejected: {Method} {Path} from {IP}",
context.HttpContext.Request.Method,
context.HttpContext.Request.Path,
context.HttpContext.Connection.RemoteIpAddress);
};
});
This gives your clients a proper RFC 9457 ProblemDetails response with a Retry-After header. They know exactly when to retry. Your observability pipeline picks up the rejections. Clean, professional API behaviour.
The response looks like:
{
"type": "https://httpstatuses.io/429",
"title": "Too Many Requests",
"status": 429,
"detail": "Rate limit exceeded. Check the Retry-After header for when to retry.",
"instance": "/api/products"
}
Distributed Rate Limiting with Redis
I need to be upfront here: the built-in rate limiter is in-memory and per-instance. If you're running three replicas behind a load balancer, each instance tracks limits independently. A user gets 3x the actual limit.
For many teams, this is fine. If your limit is 100 requests per minute and you're running 3 instances, the effective limit is around 300. If that's acceptable headroom, don't add complexity.
When you do need accurate distributed limits (public APIs with strict quotas, billing-tied usage) you need Redis. The RedisRateLimiting package by Cristi Pufu wraps StackExchange.Redis with atomic Lua scripts:
// Install: dotnet add package RedisRateLimiting
builder.Services.AddRateLimiter(options =>
{
var redisConnection = ConnectionMultiplexer.Connect("localhost:6379");
options.AddPolicy("distributed-api", context =>
{
var userId = context.User?.FindFirst(ClaimTypes.NameIdentifier)?.Value
?? context.Connection.RemoteIpAddress?.ToString()
?? "anonymous";
return RedisRateLimitPartition.GetTokenBucketLimiter(userId, _ =>
new RedisTokenBucketRateLimiterOptions
{
TokenLimit = 100,
ReplenishmentPeriod = TimeSpan.FromSeconds(10),
TokensPerPeriod = 10,
ConnectionMultiplexerFactory = () => redisConnection
});
});
});
The trade-off is real: every request now hits Redis for an atomic check-and-decrement. That's roughly 1-2ms of added latency. For high-throughput internal APIs, that might not be worth it. For external-facing APIs with contractual rate limits, it absolutely is.
My rule of thumb: use in-memory until you're running 3+ instances AND your rate limits are customer-facing commitments. Otherwise, the slight inaccuracy of per-instance limits is a feature, not a bug. It's free, fast, and zero-dependency.
Complete Multi-Tier API Example
Here's a complete Program.cs showing everything working together. This is close to what I actually ship: a multi-tier API with endpoint-specific policies, per-user partitioning, and proper rejection handling.
using System.Security.Claims;
using System.Threading.RateLimiting;
using Microsoft.AspNetCore.RateLimiting;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddRateLimiter(options =>
{
// Global rejection handling
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
options.OnRejected = async (context, token) =>
{
context.HttpContext.Response.ContentType = "application/problem+json";
if (context.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter))
{
context.HttpContext.Response.Headers.RetryAfter =
((int)retryAfter.TotalSeconds).ToString();
}
await context.HttpContext.Response.WriteAsJsonAsync(new
{
type = "https://httpstatuses.io/429",
title = "Too Many Requests",
status = 429,
detail = "Rate limit exceeded. Retry after the indicated duration.",
instance = context.HttpContext.Request.Path.ToString()
}, token);
};
// Policy 1: Auth endpoints — brute-force protection (per IP)
options.AddPolicy("auth", context =>
{
var ip = context.Connection.RemoteIpAddress?.ToString() ?? "unknown";
return RateLimitPartition.GetFixedWindowLimiter(ip, _ =>
new FixedWindowRateLimiterOptions
{
PermitLimit = 5,
Window = TimeSpan.FromMinutes(1)
});
});
// Policy 2: Public API — token bucket (per user/IP)
options.AddPolicy("api", context =>
{
var key = context.User?.FindFirst(ClaimTypes.NameIdentifier)?.Value
?? context.Connection.RemoteIpAddress?.ToString()
?? "anonymous";
var tier = context.User?.FindFirst("subscription_tier")?.Value ?? "free";
var (tokens, perPeriod) = tier switch
{
"premium" => (1000, 100),
"business" => (500, 50),
_ => (100, 10)
};
return RateLimitPartition.GetTokenBucketLimiter(key, _ =>
new TokenBucketRateLimiterOptions
{
TokenLimit = tokens,
ReplenishmentPeriod = TimeSpan.FromSeconds(10),
TokensPerPeriod = perPeriod,
AutoReplenishment = true
});
});
// Policy 3: Heavy operations — concurrency limit (global)
options.AddConcurrencyLimiter("heavy", opt =>
{
opt.PermitLimit = 3;
opt.QueueLimit = 10;
opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
});
});
var app = builder.Build();
app.UseAuthentication();
app.UseAuthorization();
app.UseRateLimiter();
// Auth routes — strict per-IP limiting
var auth = app.MapGroup("/auth").RequireRateLimiting("auth");
auth.MapPost("/login", () => Results.Ok(new { token = "..." }));
auth.MapPost("/reset-password", () => Results.Ok());
// API routes — tiered per-user limiting
var api = app.MapGroup("/api").RequireRateLimiting("api");
api.MapGet("/products", () => Results.Ok(new[] { "Widget", "Gadget" }));
api.MapGet("/products/{id}", (int id) => Results.Ok(new { id, name = "Widget" }));
api.MapGet("/orders", () => Results.Ok(Array.Empty<object>()));
// Heavy operations — concurrency limited
api.MapPost("/reports/generate", () =>
{
Thread.Sleep(5000); // Simulate expensive work
return Results.Ok(new { report = "generated" });
}).RequireRateLimiting("heavy");
api.MapPost("/ai/summarize", () =>
{
Thread.Sleep(3000); // Simulate AI call
return Results.Ok(new { summary = "..." });
}).RequireRateLimiting("heavy");
// Health — no limiting
app.MapGet("/health", () => Results.Ok()).DisableRateLimiting();
app.Run();
That's under 90 lines including the rejection handler. Three distinct protection strategies, per-user partitioning with tier awareness, proper 429 responses with Retry-After, and zero external dependencies.
What This Replaces
Let me spell out what you can delete:
- AspNetCoreRateLimit NuGet package. The built-in middleware covers the same algorithms with better integration.
- Custom Redis Lua scripts for rate counting. Not needed for single-instance deployments.
- Custom ActionFilter middleware. You get first-class middleware with proper pipeline placement now.
- Manual 429 response formatting. Handled once in
OnRejected, applies to all policies. - Per-user tracking with ConcurrentDictionary. Replaced by partition keys.
That's the "75% easier" promise. What used to be a multi-file, multi-dependency concern is now declarative configuration in one place.
Start Today
You don't need to implement all of this at once. Start with a single global policy:
builder.Services.AddRateLimiter(options =>
{
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext>(context =>
{
return RateLimitPartition.GetTokenBucketLimiter(
context.Connection.RemoteIpAddress?.ToString() ?? "unknown",
_ => new TokenBucketRateLimiterOptions
{
TokenLimit = 100,
ReplenishmentPeriod = TimeSpan.FromSeconds(10),
TokensPerPeriod = 10,
AutoReplenishment = true
});
});
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
});
That's 15 lines. Per-IP token bucket limiting with proper 429 responses. Add it to any .NET 10 API right now, then refine with per-endpoint policies when you need them.
Rate limiting isn't optional anymore. It's a security baseline. And with .NET 10, there's no excuse not to have it. The framework did the hard work. You just configure it.