Rate Limiting in .NET 10: The Complete Deep Dive

Rate Limiting in .NET 10: The Complete Deep Dive
Photo by Sam Xu / Unsplash

Rate limiting in .NET has always been an afterthought. You either bolted on AspNetCoreRateLimit (which hasn't aged gracefully), hacked together a custom ActionFilter with a ConcurrentDictionary, or shipped without any limiting and hoped nobody would hammer your endpoints. We've all been there.

.NET 10's Microsoft.AspNetCore.RateLimiting middleware changes this. Four algorithms, partitioned keys, per-endpoint policies, clean rejection handling. All built in. No extra NuGet packages for the 90% case. I've been running this in production for months now, and it replaces what used to take a third-party library plus custom Redis logic. Under 50 lines of configuration gets you proper multi-policy rate limiting.

The Four Algorithms

The framework ships four rate limiting algorithms. Each solves a different problem. Pick wrong and you'll either block legitimate users or let abuse through.

Fixed Window

The simplest option. You define a time window (say, 1 minute) and a permit limit (say, 5 requests). Counter resets at the window boundary.

options.AddFixedWindowLimiter("fixed", opt =>
{
    opt.PermitLimit = 5;
    opt.Window = TimeSpan.FromMinutes(1);
    opt.QueueLimit = 0;
    opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
});

Fixed window is perfect for hard caps: login attempts, password resets, anything where you want a firm "no more than X per minute." The downside: a user can fire 5 requests at 0:59 and another 5 at 1:01, effectively getting 10 in 2 seconds. That's the burst-at-boundaries problem.

Sliding Window

Solves the burst problem by dividing the window into segments and sliding forward. Instead of a hard reset, old segments expire gradually.

options.AddSlidingWindowLimiter("sliding", opt =>
{
    opt.PermitLimit = 100;
    opt.Window = TimeSpan.FromMinutes(1);
    opt.SegmentsPerWindow = 6; // 10-second segments
    opt.QueueLimit = 0;
});

Sliding window is my go-to for general API endpoints. The SegmentsPerWindow setting controls granularity. More segments means smoother distribution but slightly more memory. Six segments for a 1-minute window gives you 10-second resolution, which is plenty for most APIs.

Token Bucket

The classic algorithm for public-facing APIs. You have a bucket with a maximum number of tokens. Each request costs one token. Tokens replenish at a steady rate.

options.AddTokenBucketLimiter("token", opt =>
{
    opt.TokenLimit = 100;           // max burst size
    opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
    opt.TokensPerPeriod = 10;      // steady rate: 1/sec
    opt.QueueLimit = 0;
    opt.AutoReplenishment = true;
});

Token bucket is my default recommendation for most REST APIs. It naturally allows short bursts (a client loading a dashboard that fires 20 parallel requests) while enforcing a sustained rate. TokenLimit controls burst size, TokensPerPeriod controls sustained throughput. You tune these independently, which is the whole point.

Concurrency Limiter

Different beast entirely. Not time-based. Limits how many requests can be in-flight simultaneously.

options.AddConcurrencyLimiter("concurrent", opt =>
{
    opt.PermitLimit = 3;
    opt.QueueLimit = 5;
    opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
});

Concurrency limiter gets overlooked but it's critical for expensive operations: report generation, file processing, AI inference calls, anything CPU or memory bound. Three simultaneous heavy operations won't bring your server down. Five more can queue. Everything else gets rejected immediately.

Decision Matrix

Scenario Algorithm Why
Login/auth brute-force protection Fixed Window Hard cap, simple mental model
General API endpoints Sliding Window Smooth, no burst-at-boundary
Public REST API (external consumers) Token Bucket Allows bursts, enforces sustained rate
Expensive operations (reports, AI, uploads) Concurrency Protect server resources, not time-based
WebSocket connection limits Concurrency Limit simultaneous connections

Basic Setup and Per-Endpoint Policies

Registration lives in Program.cs. Middleware placement matters. Put it after authentication (so you have access to user claims) but before endpoint routing.

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("auth", opt =>
    {
        opt.PermitLimit = 5;
        opt.Window = TimeSpan.FromMinutes(1);
    });

    options.AddTokenBucketLimiter("api", opt =>
    {
        opt.TokenLimit = 100;
        opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
        opt.TokensPerPeriod = 10;
    });

    options.AddConcurrencyLimiter("heavy", opt =>
    {
        opt.PermitLimit = 3;
        opt.QueueLimit = 5;
    });
});

var app = builder.Build();

app.UseAuthentication();
app.UseAuthorization();
app.UseRateLimiter(); // After auth, before endpoints

// Minimal API — per route group
var authGroup = app.MapGroup("/auth").RequireRateLimiting("auth");
authGroup.MapPost("/login", HandleLogin);
authGroup.MapPost("/reset-password", HandlePasswordReset);

// Per individual endpoint
app.MapGet("/api/products", GetProducts).RequireRateLimiting("api");
app.MapPost("/api/reports", GenerateReport).RequireRateLimiting("heavy");

// Health checks — no limiting
app.MapGet("/health", () => Results.Ok()).DisableRateLimiting();

app.Run();

For controller-based APIs, use attributes:

[ApiController]
[Route("api/[controller]")]
[EnableRateLimiting("api")]
public class ProductsController : ControllerBase
{
    [HttpGet]
    public IActionResult GetAll() => Ok(_products);

    [HttpPost("export")]
    [EnableRateLimiting("heavy")] // Override at action level
    public IActionResult Export() => Ok(GenerateExport());

    [HttpGet("health")]
    [DisableRateLimiting] // Opt out entirely
    public IActionResult Health() => Ok();
}

The pattern: apply a relaxed policy at the group/controller level, then override with stricter policies on sensitive endpoints. [DisableRateLimiting] opts out completely. Use it for health checks and readiness probes.

Partitioned Rate Limiting: Per-User, Per-Tenant, Per-Key

A global rate limit is a blunt instrument. One abusive user exhausts the quota and every legitimate user gets rejected. Partitioned rate limiting gives each user (or tenant, or API key) their own independent bucket.

The AddPolicy method with a partition factory is where this gets interesting:

builder.Services.AddRateLimiter(options =>
{
    options.AddPolicy("per-user", context =>
    {
        var userId = context.User?.FindFirst(ClaimTypes.NameIdentifier)?.Value
            ?? context.Connection.RemoteIpAddress?.ToString()
            ?? "anonymous";

        return RateLimitPartition.GetTokenBucketLimiter(userId, _ => new TokenBucketRateLimiterOptions
        {
            TokenLimit = 100,
            ReplenishmentPeriod = TimeSpan.FromSeconds(10),
            TokensPerPeriod = 10,
            AutoReplenishment = true
        });
    });
});

Each unique partition key gets its own independent limiter. Authenticated users are partitioned by user ID. Anonymous traffic falls back to IP address. One user burning through their quota doesn't affect anyone else.

Multi-Tenant with Tiered Limits

This is where it gets really useful for SaaS. Different tiers, different limits, resolved at request time from user claims:

builder.Services.AddRateLimiter(options =>
{
    options.AddPolicy("tiered", context =>
    {
        var tenantId = context.User?.FindFirst("tenant_id")?.Value ?? "unknown";
        var tier = context.User?.FindFirst("subscription_tier")?.Value ?? "free";

        return tier switch
        {
            "premium" => RateLimitPartition.GetTokenBucketLimiter(tenantId, _ =>
                new TokenBucketRateLimiterOptions
                {
                    TokenLimit = 1000,
                    ReplenishmentPeriod = TimeSpan.FromSeconds(10),
                    TokensPerPeriod = 100,
                    AutoReplenishment = true
                }),
            "business" => RateLimitPartition.GetTokenBucketLimiter(tenantId, _ =>
                new TokenBucketRateLimiterOptions
                {
                    TokenLimit = 500,
                    ReplenishmentPeriod = TimeSpan.FromSeconds(10),
                    TokensPerPeriod = 50,
                    AutoReplenishment = true
                }),
            _ => RateLimitPartition.GetTokenBucketLimiter(tenantId, _ =>
                new TokenBucketRateLimiterOptions
                {
                    TokenLimit = 100,
                    ReplenishmentPeriod = TimeSpan.FromSeconds(10),
                    TokensPerPeriod = 10,
                    AutoReplenishment = true
                })
        };
    });
});

Premium users get 10x the limit of free-tier users. Each tenant is isolated. The partition key is the tenant ID, so even within the same tier, tenants don't compete with each other. This used to take custom middleware, a Redis sorted set, and about 200 lines of code. Now it's declarative config.

Custom Rejection Handling and RFC 9457 ProblemDetails

The default rejection behaviour returns a 503 Service Unavailable. That's semantically wrong. A rate-limited request should return 429 Too Many Requests. Fix it with the OnRejected callback:

builder.Services.AddRateLimiter(options =>
{
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

    options.OnRejected = async (context, cancellationToken) =>
    {
        context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;
        context.HttpContext.Response.ContentType = "application/problem+json";

        // Extract Retry-After from the lease metadata
        if (context.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter))
        {
            context.HttpContext.Response.Headers.RetryAfter =
                ((int)retryAfter.TotalSeconds).ToString();
        }

        var problem = new
        {
            type = "https://httpstatuses.io/429",
            title = "Too Many Requests",
            status = 429,
            detail = "Rate limit exceeded. Check the Retry-After header for when to retry.",
            instance = context.HttpContext.Request.Path.ToString()
        };

        await context.HttpContext.Response.WriteAsJsonAsync(problem, cancellationToken);

        // Log for observability
        var logger = context.HttpContext.RequestServices
            .GetRequiredService<ILoggerFactory>()
            .CreateLogger("RateLimiting");

        logger.LogWarning(
            "Rate limit rejected: {Method} {Path} from {IP}",
            context.HttpContext.Request.Method,
            context.HttpContext.Request.Path,
            context.HttpContext.Connection.RemoteIpAddress);
    };
});

This gives your clients a proper RFC 9457 ProblemDetails response with a Retry-After header. They know exactly when to retry. Your observability pipeline picks up the rejections. Clean, professional API behaviour.

The response looks like:

{
  "type": "https://httpstatuses.io/429",
  "title": "Too Many Requests",
  "status": 429,
  "detail": "Rate limit exceeded. Check the Retry-After header for when to retry.",
  "instance": "/api/products"
}

Distributed Rate Limiting with Redis

I need to be upfront here: the built-in rate limiter is in-memory and per-instance. If you're running three replicas behind a load balancer, each instance tracks limits independently. A user gets 3x the actual limit.

For many teams, this is fine. If your limit is 100 requests per minute and you're running 3 instances, the effective limit is around 300. If that's acceptable headroom, don't add complexity.

When you do need accurate distributed limits (public APIs with strict quotas, billing-tied usage) you need Redis. The RedisRateLimiting package by Cristi Pufu wraps StackExchange.Redis with atomic Lua scripts:

// Install: dotnet add package RedisRateLimiting

builder.Services.AddRateLimiter(options =>
{
    var redisConnection = ConnectionMultiplexer.Connect("localhost:6379");

    options.AddPolicy("distributed-api", context =>
    {
        var userId = context.User?.FindFirst(ClaimTypes.NameIdentifier)?.Value
            ?? context.Connection.RemoteIpAddress?.ToString()
            ?? "anonymous";

        return RedisRateLimitPartition.GetTokenBucketLimiter(userId, _ =>
            new RedisTokenBucketRateLimiterOptions
            {
                TokenLimit = 100,
                ReplenishmentPeriod = TimeSpan.FromSeconds(10),
                TokensPerPeriod = 10,
                ConnectionMultiplexerFactory = () => redisConnection
            });
    });
});

The trade-off is real: every request now hits Redis for an atomic check-and-decrement. That's roughly 1-2ms of added latency. For high-throughput internal APIs, that might not be worth it. For external-facing APIs with contractual rate limits, it absolutely is.

My rule of thumb: use in-memory until you're running 3+ instances AND your rate limits are customer-facing commitments. Otherwise, the slight inaccuracy of per-instance limits is a feature, not a bug. It's free, fast, and zero-dependency.

Complete Multi-Tier API Example

Here's a complete Program.cs showing everything working together. This is close to what I actually ship: a multi-tier API with endpoint-specific policies, per-user partitioning, and proper rejection handling.

using System.Security.Claims;
using System.Threading.RateLimiting;
using Microsoft.AspNetCore.RateLimiting;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddRateLimiter(options =>
{
    // Global rejection handling
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
    options.OnRejected = async (context, token) =>
    {
        context.HttpContext.Response.ContentType = "application/problem+json";
        if (context.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter))
        {
            context.HttpContext.Response.Headers.RetryAfter =
                ((int)retryAfter.TotalSeconds).ToString();
        }
        await context.HttpContext.Response.WriteAsJsonAsync(new
        {
            type = "https://httpstatuses.io/429",
            title = "Too Many Requests",
            status = 429,
            detail = "Rate limit exceeded. Retry after the indicated duration.",
            instance = context.HttpContext.Request.Path.ToString()
        }, token);
    };

    // Policy 1: Auth endpoints — brute-force protection (per IP)
    options.AddPolicy("auth", context =>
    {
        var ip = context.Connection.RemoteIpAddress?.ToString() ?? "unknown";
        return RateLimitPartition.GetFixedWindowLimiter(ip, _ =>
            new FixedWindowRateLimiterOptions
            {
                PermitLimit = 5,
                Window = TimeSpan.FromMinutes(1)
            });
    });

    // Policy 2: Public API — token bucket (per user/IP)
    options.AddPolicy("api", context =>
    {
        var key = context.User?.FindFirst(ClaimTypes.NameIdentifier)?.Value
            ?? context.Connection.RemoteIpAddress?.ToString()
            ?? "anonymous";
        var tier = context.User?.FindFirst("subscription_tier")?.Value ?? "free";

        var (tokens, perPeriod) = tier switch
        {
            "premium" => (1000, 100),
            "business" => (500, 50),
            _ => (100, 10)
        };

        return RateLimitPartition.GetTokenBucketLimiter(key, _ =>
            new TokenBucketRateLimiterOptions
            {
                TokenLimit = tokens,
                ReplenishmentPeriod = TimeSpan.FromSeconds(10),
                TokensPerPeriod = perPeriod,
                AutoReplenishment = true
            });
    });

    // Policy 3: Heavy operations — concurrency limit (global)
    options.AddConcurrencyLimiter("heavy", opt =>
    {
        opt.PermitLimit = 3;
        opt.QueueLimit = 10;
        opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
    });
});

var app = builder.Build();

app.UseAuthentication();
app.UseAuthorization();
app.UseRateLimiter();

// Auth routes — strict per-IP limiting
var auth = app.MapGroup("/auth").RequireRateLimiting("auth");
auth.MapPost("/login", () => Results.Ok(new { token = "..." }));
auth.MapPost("/reset-password", () => Results.Ok());

// API routes — tiered per-user limiting
var api = app.MapGroup("/api").RequireRateLimiting("api");
api.MapGet("/products", () => Results.Ok(new[] { "Widget", "Gadget" }));
api.MapGet("/products/{id}", (int id) => Results.Ok(new { id, name = "Widget" }));
api.MapGet("/orders", () => Results.Ok(Array.Empty<object>()));

// Heavy operations — concurrency limited
api.MapPost("/reports/generate", () =>
{
    Thread.Sleep(5000); // Simulate expensive work
    return Results.Ok(new { report = "generated" });
}).RequireRateLimiting("heavy");

api.MapPost("/ai/summarize", () =>
{
    Thread.Sleep(3000); // Simulate AI call
    return Results.Ok(new { summary = "..." });
}).RequireRateLimiting("heavy");

// Health — no limiting
app.MapGet("/health", () => Results.Ok()).DisableRateLimiting();

app.Run();

That's under 90 lines including the rejection handler. Three distinct protection strategies, per-user partitioning with tier awareness, proper 429 responses with Retry-After, and zero external dependencies.

What This Replaces

Let me spell out what you can delete:

  • AspNetCoreRateLimit NuGet package. The built-in middleware covers the same algorithms with better integration.
  • Custom Redis Lua scripts for rate counting. Not needed for single-instance deployments.
  • Custom ActionFilter middleware. You get first-class middleware with proper pipeline placement now.
  • Manual 429 response formatting. Handled once in OnRejected, applies to all policies.
  • Per-user tracking with ConcurrentDictionary. Replaced by partition keys.

That's the "75% easier" promise. What used to be a multi-file, multi-dependency concern is now declarative configuration in one place.

Start Today

You don't need to implement all of this at once. Start with a single global policy:

builder.Services.AddRateLimiter(options =>
{
    options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext>(context =>
    {
        return RateLimitPartition.GetTokenBucketLimiter(
            context.Connection.RemoteIpAddress?.ToString() ?? "unknown",
            _ => new TokenBucketRateLimiterOptions
            {
                TokenLimit = 100,
                ReplenishmentPeriod = TimeSpan.FromSeconds(10),
                TokensPerPeriod = 10,
                AutoReplenishment = true
            });
    });
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
});

That's 15 lines. Per-IP token bucket limiting with proper 429 responses. Add it to any .NET 10 API right now, then refine with per-endpoint policies when you need them.

Rate limiting isn't optional anymore. It's a security baseline. And with .NET 10, there's no excuse not to have it. The framework did the hard work. You just configure it.

Read more