Designing a URL Shortener

Table of Contents

Introduction
#

As part of this post, we’ll be covering the design of a modern, highly available, and scalable URL shortening service like TinyURL or Bit.ly.

1. Requirements
#

Functional Requirements
#

Shorten - Given a long URL, generate a unique, short alias (e.g. https://tny.li/abc123).
Redirect - Given a short URL, redirect the user to the original long URL with minimal latency.
Custom Aliases - Allow users to pick a custom human-readable alias (e.g. tny.li/my-resume).
Link Expiration - Support optional TTL (Time to Live) on links.
Analytics - Track per-link click counts, device types, referrers, and geographic distribution.

Non-Functional Requirements
#

Low Latency - Redirects must be served in < 10ms at the P99 level. Reads far outnumber writes.
High Availability - The redirect path is the core business operation. Any downtime directly impacts every shortened link on the internet that points to us. Target 99.99% availability.
Scalability - The system should sustain 100K+ redirects/s and 1K new URLs/s with headroom to scale horizontally.
Durability - Once a short URL is created, the mapping must never be lost.
Uniqueness Guarantee - Two different long URLs must never produce the same short URL.

Back-of-the-Envelope Calculation
#

These numbers allow us to make concrete decisions about storage, caching and partitioning in later sections.

Metric	Estimate
New URLs per day	~100 million (≈1K/s)
Read:Write ratio	~100:1
Redirects per day	~10 billion (≈100K/s)
Average URL length	~200 bytes
Metadata per URL	~500 bytes (URL + timestamps + user info)
Storage per year	~100M × 365 × 500B ≈ 18 TB/year
Links retained	5 years → ~90 TB total

With a short code of 7 characters using Base62 (a-z A-Z 0-9), we get 62^7 ≈ 3.5 trillion unique codes.

2. High-Level Design (30,000 Feet View)
#

At the highest level, the system has two data paths:

Write Path — A client POSTs a long URL; the system generates a short code, stores the mapping, and returns the short URL.
Read Path — A user clicks a short URL; the system looks up the mapping and issues an HTTP 301/302 redirect to the original URL.

Core APIs
#

POST /api/v1/urls — Shorten a URL

Request:
{
  "long_url":    "https://example.com/very/long/path?q=1",
  "custom_alias": "my-link",          // optional
  "expires_at":  "2027-01-01T00:00Z"  // optional
}

Response (201 Created):
{
  "short_url":   "https://tny.li/abc123",
  "long_url":    "https://example.com/very/long/path?q=1",
  "expires_at":  "2027-01-01T00:00Z",
  "created_at":  "2026-04-29T17:00:00Z"
}

GET /{shortCode} — Redirect

Response
HTTP/1.1 301 Moved Permanently
Location: https://example.com/very/long/path?q=1

GET /api/v1/urls/{shortCode}/stats — Analytics

Response:
{
  "total_clicks": 42318,
  "clicks_24h":   870,
  "top_referrers": ["twitter.com", "linkedin.com"],
  "top_countries": ["US", "IN", "DE"]
}

301 vs 302 — Which Redirect?
#

	301 Moved Permanently	302 Found (Temporary)
Browser caching	Cached — browser won’t hit us again	Not cached — browser always comes back
SEO	Link equity passes to target	Link equity stays with short URL
Analytics accuracy	Under-counts (cached clicks are invisible)	Accurate (every click hits server)
When to Use	Use for permanent, non-expiring links	Use for links with TTL or analytics-heavy use

In practice, most services like Bit.ly issue 302 redirects so they can count every click accurately.

3. Short Code Generation — The Core Algorithm
#

This is the heart of the system. We need a short code that is:

Unique — no collisions
Compact — 7 characters (Base62) gives us ~3.5T codes // depends on how many URLs we want to store
Fast to generate — must not bottleneck write throughput

Let’s explore three strategies and compare them.

Option A: Random Generation + Collision Check
#

Generate a random 7-character Base62 string and check if it already exists in the database.

import java.security.SecureRandom;

public class RandomCodeGenerator {

    private static final String BASE62 =
        "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
    private static final int CODE_LENGTH = 7;
    private final SecureRandom random = new SecureRandom();

    public String generate() {
        StringBuilder sb = new StringBuilder(CODE_LENGTH);
        for (int i = 0; i < CODE_LENGTH; i++) {
            sb.append(BASE62.charAt(random.nextInt(BASE62.length())));
        }
        return sb.toString();
    }
}

Pros: Simple, no coordination needed.

Cons: As the keyspace fills, collision probability grows. We need a retry loop:

public String generateUnique(Db db) {
    for (int attempt = 0; attempt < MAX_RETRIES; attempt++) {
        String code = generate();
        if (!db.exists(code)) {
            return code;
        }
    }
    throw new CodeGenerationException("Failed after " + MAX_RETRIES + " retries");
}

Verdict: Works for small scale, but collision checks add latency and DB load under high write rates.

Option B: Base62 Encoding of a Unique Counter (Snowflake / DB Sequence)
#

Maintain a globally unique counter and encode the counter value in Base62.

public class Base62Encoder {

    private static final String BASE62 =
        "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

    public static String encode(long num) {
        if (num == 0) return String.valueOf(BASE62.charAt(0));
        StringBuilder sb = new StringBuilder();
        while (num > 0) {
            sb.append(BASE62.charAt((int) (num % 62)));
            num /= 62;
        }
        return sb.reverse().toString();
    }

    public static long decode(String code) {
        long num = 0;
        for (char c : code.toCharArray()) {
            num = num * 62 + BASE62.indexOf(c);
        }
        return num;
    }
}

The counter itself can come from:

Database auto-increment — simple but a single point of failure.
Distributed ID generator (X, Snowflake): Embeds timestamp | machine_id | sequence into a 64-bit long, then Base62-encode it.

/**
 * Simplified Snowflake-style ID generator.
 *
 * Layout of the 64-bit ID:
 *   [1 bit unused] [41 bits timestamp] [10 bits machine_id] [12 bits sequence]
 *
 * - Supports 1024 machines
 * - 4096 IDs per millisecond per machine
 * - ~69 years of timestamps from a custom epoch
 */
public class SnowflakeIdGenerator {

    // Jan 1, 2026 in millis — starting here instead of Unix epoch (1970)
    // gives us the full ~69 years of the 41-bit timestamp field
    private static final long CUSTOM_EPOCH = 1_767_225_600_000L;
    private static final int MACHINE_ID_BITS = 10;
    private static final int SEQUENCE_BITS = 12;
    private static final long MAX_SEQUENCE = (1L << SEQUENCE_BITS) - 1;

    private final long machineId;
    private long lastTimestamp = -1L;
    private long sequence = 0L;

    public SnowflakeIdGenerator(long machineId) {
        if (machineId < 0 || machineId >= (1L << MACHINE_ID_BITS)) {
            throw new IllegalArgumentException("Invalid machine ID");
        }
        this.machineId = machineId;
    }

    public synchronized long nextId() {
        long timestamp = System.currentTimeMillis() - CUSTOM_EPOCH;

        if (timestamp == lastTimestamp) {
            sequence = (sequence + 1) & MAX_SEQUENCE;
            if (sequence == 0) {
                // Sequence exhausted — wait for next millisecond
                while (timestamp <= lastTimestamp) {
                    timestamp = System.currentTimeMillis() - CUSTOM_EPOCH;
                }
            }
        } else {
            sequence = 0;
        }

        lastTimestamp = timestamp;

        return (timestamp << (MACHINE_ID_BITS + SEQUENCE_BITS))
             | (machineId << SEQUENCE_BITS)
             | sequence;
    }
}

Then convert to a short code:

long id = snowflake.nextId();
String shortCode = Base62Encoder.encode(id);
// e.g. id = 7255988608614400 → shortCode = "bW3eKp2D"

Pros: Guaranteed uniqueness. No DB lookup on write. Sortable by time.

Cons: Codes are not random-looking (sequential-ish). Leaks creation order. Requires machine ID management.

Option C: Pre-generated Key Service (KGS)
#

A dedicated Key Generation Service pre-generates and stores millions of unused short codes in a database. When the URL Service needs a code, it grabs one from the pool.

┌──────────────┐       ┌──────────────┐
│ KGS Worker   │──────►│   Key Pool   │
│ (background) │       │   (DB table) │
└──────────────┘       └──────┬───────┘
                              │ batch fetch
                              ▼
                       ┌──────────────┐
                       │ URL Service  │
                       │ (in-memory   │
                       │    buffer)   │
                       └──────────────┘

public class KeyGenerationService {

    private final BlockingQueue<String> keyBuffer;
    private final KeyPoolRepository keyPool;

    public KeyGenerationService(KeyPoolRepository keyPool, int bufferSize) {
        this.keyPool = keyPool;
        this.keyBuffer = new LinkedBlockingQueue<>(bufferSize);
        startBackgroundRefill();
    }

    /** Called by URL Service to get a pre-generated key. */
    public String getKey() throws InterruptedException {
        return keyBuffer.take();
    }

    private void startBackgroundRefill() {
        ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor();
        executor.scheduleAtFixedRate(() -> {
            if (keyBuffer.size() < keyBuffer.remainingCapacity()) {
                List<String> batch = keyPool.fetchAndMarkUsed(1000);
                keyBuffer.addAll(batch);
            }
        }, 0, 5, TimeUnit.SECONDS);
    }
}

The Key Pool DB has two tables:

Table	Schema
`unused_keys`	`short_code VARCHAR(7) PRIMARY KEY`
`used_keys`	`short_code VARCHAR(7) PRIMARY KEY`

When a key is fetched, it moves atomically from unused_keys to used_keys.

Pros: Zero collision risk. Write path is fast (no hash + no collision check). Easy to control keyspace.

Cons: Extra infrastructure. If a KGS instance crashes, the in-memory batch is “leaked” (acceptable — small loss in a 3.5T keyspace).

Comparison
#

Criteria	Random + Check	Counter (Snowflake)	Pre-generated KGS
Collision risk	Grows over time	None	None
Write latency	Higher (DB check)	Lowest	Low
Coordination	Retry loop	Machine ID mgmt	KGS cluster
Code predictability	Random	Sequential	Random
Best fit	Prototyping	High-throughput writes	Production at scale

Use the KGS approach for production. It decouples key generation from the write path, gives us random codes, and eliminates collision risk. You can also use the Snowflake approach as the ID generator inside the KGS itself to keep the key generation process fast.

4. Database Design
#

Schema
#

CREATE TABLE url_mappings (
    short_code   VARCHAR(7)   PRIMARY KEY,
    long_url     TEXT         NOT NULL,
    user_id      BIGINT,
    created_at   TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at   TIMESTAMP,
    click_count  BIGINT       DEFAULT 0
);

CREATE INDEX idx_url_mappings_user ON url_mappings (user_id);
CREATE INDEX idx_url_mappings_expiry ON url_mappings (expires_at)
    WHERE expires_at IS NOT NULL;

SQL vs NoSQL — Choosing the Right Store
#

Factor	SQL (PostgreSQL)	NoSQL (DynamoDB / Cassandra)
Data model	Simple key → value; no joins	Simple key → value; no joins
Read pattern	Point lookups by `short_code`	Point lookups by partition key
Write pattern	Sequential inserts	Distributed writes
Transactions	Full ACID	Limited (single-partition)
Scaling	Vertical / read replicas	Horizontal (auto-sharding)
Verdict	Good at moderate scale	Better at massive scale

Let’s break each factor down:

Data Model: A URL shortener’s core table is a single key-value mapping (short_code → long_url + metadata). There are no foreign keys, no joins, no multi-table transactions. This flat structure is a natural fit for both SQL and NoSQL — neither has a structural advantage here.
Read Pattern: Every redirect is a point lookup: SELECT * FROM url_mappings WHERE short_code = ?. In PostgreSQL, this hits a B-tree index — fast, but the query still goes through the SQL parser, planner, and executor. In DynamoDB, a GetItem on the partition key skips all of that overhead and goes straight to the storage node that owns that partition. At 100K reads/s, that overhead difference matters.
Write Pattern: PostgreSQL writes go through a single-leader instance. Even with connection pooling, you’re ultimately serialising writes through one WAL (Write-Ahead Log). NoSQL stores like DynamoDB and Cassandra distribute writes across partitions automatically — each partition is an independent write path, so throughput scales linearly with partition count.
Transactions: We don’t need multi-row transactions. Creating a short URL is a single-row insert. The only “transaction” we care about is idempotency (don’t insert the same short_code twice), which both stores handle via conditional writes — INSERT ... ON CONFLICT in PostgreSQL, PutItem with attribute_not_exists in DynamoDB. Full ACID is available in PostgreSQL but we simply don’t need it.
Scaling: This is where the choice becomes clear at higher scale. PostgreSQL scales vertically (bigger machine) and horizontally via read replicas for read-heavy workloads. But adding write capacity requires manual sharding (Citus, Vitess, or application-level sharding), which is operationally complex. DynamoDB and Cassandra auto-shard by partition key — you add capacity by increasing throughput settings or adding nodes, with zero application changes.

Bottom line: At moderate scale (< 10K reads/s), PostgreSQL with a couple of read replicas is simpler to operate and reason about. At the scale we’re targeting (100K+ reads/s, 1K+ writes/s, multi-region), a NoSQL store like DynamoDB is the better fit — single-digit ms reads on partition key lookups, automatic sharding, on-demand capacity scaling, and built-in global replication via Global Tables.

DynamoDB Table Design
#

Table: url_mappings
  Partition Key: short_code (S)

Attributes:
  long_url    (S)
  user_id     (N)
  created_at  (N)  — epoch millis
  expires_at  (N)  — epoch millis, with TTL enabled
  click_count (N)

GSI: user-urls-index
  Partition Key: user_id (N)
  Sort Key: created_at (N)

DynamoDB’s built-in TTL automatically deletes expired items — no cron job needed.

Java Data Access Layer
#

@DynamoDbBean
public class UrlMapping {

    private String shortCode;
    private String longUrl;
    private Long userId;
    private Instant createdAt;
    private Instant expiresAt;
    private long clickCount;

    @DynamoDbPartitionKey
    @DynamoDbAttribute("short_code")
    public String getShortCode() { return shortCode; }
    public void setShortCode(String shortCode) { this.shortCode = shortCode; }

    @DynamoDbAttribute("long_url")
    public String getLongUrl() { return longUrl; }
    public void setLongUrl(String longUrl) { this.longUrl = longUrl; }

    @DynamoDbAttribute("created_at")
    public Instant getCreatedAt() { return createdAt; }
    public void setCreatedAt(Instant createdAt) { this.createdAt = createdAt; }

    @DynamoDbAttribute("expires_at")
    public Instant getExpiresAt() { return expiresAt; }
    public void setExpiresAt(Instant expiresAt) { this.expiresAt = expiresAt; }

    // ... other getters/setters
}

public class UrlRepository {

    private final DynamoDbTable<UrlMapping> table;

    public UrlRepository(DynamoDbEnhancedClient client) {
        this.table = client.table("url_mappings",
            TableSchema.fromBean(UrlMapping.class));
    }

    public void save(UrlMapping mapping) {
        table.putItem(mapping);
    }

    public Optional<UrlMapping> findByShortCode(String shortCode) {
        UrlMapping item = table.getItem(
            Key.builder().partitionValue(shortCode).build());
        return Optional.ofNullable(item);
    }

    public boolean exists(String shortCode) {
        return findByShortCode(shortCode).isPresent();
    }
}

5. The Write Path — Creating a Short URL
#

Now let’s tie together the short code generation, validation, and persistence.

@Service
public class UrlShorteningService {

    private final KeyGenerationService kgs;
    private final UrlRepository urlRepository;
    private final CacheService cacheService;

    public UrlShorteningService(KeyGenerationService kgs,
                                 UrlRepository urlRepository,
                                 CacheService cacheService) {
        this.kgs = kgs;
        this.urlRepository = urlRepository;
        this.cacheService = cacheService;
    }

    public UrlMapping shorten(ShortenRequest request) {
        // 1. Validate the long URL
        validateUrl(request.getLongUrl());

        // 2. Get a short code (custom or auto-generated)
        String shortCode;
        if (request.getCustomAlias() != null) {
            shortCode = request.getCustomAlias();
            if (urlRepository.exists(shortCode)) {
                throw new AliasAlreadyExistsException(shortCode);
            }
        } else {
            shortCode = kgs.getKey();
        }

        // 3. Build and persist the mapping
        UrlMapping mapping = new UrlMapping();
        mapping.setShortCode(shortCode);
        mapping.setLongUrl(request.getLongUrl());
        mapping.setUserId(request.getUserId());
        mapping.setCreatedAt(Instant.now());
        mapping.setExpiresAt(request.getExpiresAt());
        mapping.setClickCount(0);

        urlRepository.save(mapping);

        // 4. Populate cache proactively
        cacheService.put(shortCode, mapping.getLongUrl());

        return mapping;
    }

    private void validateUrl(String url) {
        try {
            new java.net.URI(url).toURL();
        } catch (Exception e) {
            throw new InvalidUrlException("Malformed URL: " + url);
        }
    }
}

6. The Read Path — Redirect Flow
#

The read path is the most latency-sensitive and highest-throughput part of the system.

@RestController
public class RedirectController {

    private final CacheService cache;
    private final UrlRepository urlRepository;
    private final AnalyticsPublisher analyticsPublisher;

    @GetMapping("/{shortCode}")
    public ResponseEntity<Void> redirect(
            @PathVariable String shortCode,
            HttpServletRequest request) {

        // 1. Try cache first
        String longUrl = cache.get(shortCode);

        // 2. Cache miss → hit database
        if (longUrl == null) {
            UrlMapping mapping = urlRepository.findByShortCode(shortCode)
                .orElseThrow(() -> new ShortCodeNotFoundException(shortCode));

            // Check expiration
            if (mapping.getExpiresAt() != null
                    && Instant.now().isAfter(mapping.getExpiresAt())) {
                throw new LinkExpiredException(shortCode);
            }

            longUrl = mapping.getLongUrl();
            cache.put(shortCode, longUrl);
        }

        // 3. Fire analytics event asynchronously
        analyticsPublisher.publish(new ClickEvent(
            shortCode, request.getRemoteAddr(),
            request.getHeader("User-Agent"),
            request.getHeader("Referer"),
            Instant.now()
        ));

        // 4. Redirect
        return ResponseEntity.status(HttpStatus.FOUND)
            .header(HttpHeaders.LOCATION, longUrl)
            .build();
    }
}

Key decisions:

We use 302 (Found) so every click is counted.
Analytics are published to Kafka asynchronously — the redirect response does not wait for analytics to complete.
The cache shields the database from the full 100K reads/s load.

7. Caching Strategy
#

With a 100:1 read-write ratio, caching is non-negotiable. We use Redis as a distributed cache.

Cache Architecture
#

┌──────────┐      ┌───────────┐      ┌──────────┐
│ Redirect │─────►│   Redis   │─────►│ Database │
│ Service  │ miss │  Cluster  │ miss │          │
└──────────┘      └───────────┘      └──────────┘
                       │
              Cache-aside pattern

Java Cache Service
#

@Service
public class CacheService {

    private final RedisTemplate<String, String> redis;
    private static final Duration DEFAULT_TTL = Duration.ofHours(24);

    public CacheService(RedisTemplate<String, String> redis) {
        this.redis = redis;
    }

    public String get(String shortCode) {
        return redis.opsForValue().get("url:" + shortCode);
    }

    public void put(String shortCode, String longUrl) {
        redis.opsForValue().set("url:" + shortCode, longUrl, DEFAULT_TTL);
    }

    public void evict(String shortCode) {
        redis.delete("url:" + shortCode);
    }
}

Cache Sizing
#

Assume the top 20% of URLs receive 80% of traffic (Pareto / Zipf distribution).
If we have 1 billion total URLs, caching the hot 200 million is overkill. In reality, the top 10 million URLs likely cover 90%+ of traffic.
Each cache entry: key (~20 bytes) + value (~200 bytes) ≈ 220 bytes
10M entries × 220 bytes ≈ 2.2 GB — fits comfortably in a single Redis instance. A cluster of 3 nodes with replication gives us HA.

Cache Invalidation
#

On Expiry: DynamoDB TTL deletes expired rows. The Redis TTL on the cache key should roughly match the link’s expires_at.
On Update / Delete: When a user deletes a link, we evict the cache key and remove the DB row.
Stampede Prevention: If popular keys expire simultaneously, many requests will hit the database. Use a probabilistic early expiration (also known as XFetch) or a distributed lock so only one thread rebuilds the cache.

/**
 * Probabilistic Early Recomputation to prevent cache stampedes.
 *
 * Instead of waiting for the key to fully expire, each request has
 * a small probability of refreshing the cache BEFORE it expires.
 * This spreads the refresh load over time.
 */
public String getWithEarlyRefresh(String shortCode) {
    String key = "url:" + shortCode;
    String value = redis.opsForValue().get(key);
    Long ttl = redis.getExpire(key, TimeUnit.SECONDS);

    if (value != null && ttl != null && ttl > 0) {
        // Probabilistic early refresh: higher chance as TTL approaches 0
        double probability = Math.exp(-ttl / 60.0);  // ramp up in last ~60s
        if (ThreadLocalRandom.current().nextDouble() < probability) {
            // Refresh cache from DB in the background
            refreshCacheAsync(shortCode);
        }
        return value;
    }

    // True miss — synchronous fetch from DB
    return null;
}

8. Rate Limiting
#

Short URL creation is an abuse vector (spam, phishing). We need to rate-limit both creation and redirect APIs.

Token Bucket in Redis
#

We implement a distributed token bucket using Redis Lua scripts for atomicity.

@Service
public class RateLimiter {

    private final StringRedisTemplate redis;

    // Lua script: atomic token bucket implementation
    private static final String LUA_SCRIPT = """
        local key = KEYS[1]
        local max_tokens = tonumber(ARGV[1])
        local refill_rate = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])

        local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
        local tokens = tonumber(bucket[1])
        local last_refill = tonumber(bucket[2])

        if tokens == nil then
            tokens = max_tokens
            last_refill = now
        end

        local elapsed = now - last_refill
        tokens = math.min(max_tokens, tokens + elapsed * refill_rate)

        if tokens >= 1 then
            tokens = tokens - 1
            redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
            redis.call('EXPIRE', key, 3600)
            return 1
        else
            redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
            redis.call('EXPIRE', key, 3600)
            return 0
        end
    """;

    public boolean allowRequest(String clientId, int maxTokens, double refillRate) {
        Long result = redis.execute(
            RedisScript.of(LUA_SCRIPT, Long.class),
            List.of("rate:" + clientId),
            String.valueOf(maxTokens),
            String.valueOf(refillRate),
            String.valueOf(System.currentTimeMillis() / 1000.0)
        );
        return result != null && result == 1;
    }
}

9. Scaling the System
#

So far, we have a working system. Now let’s look at what breaks at scale and how to fix it.

9.1 Database Scaling
#

Problem: A single database instance can’t handle 100K reads/s and 1K writes/s simultaneously.

Solution: Horizontal Sharding by short_code.

With DynamoDB, this is automatic — the partition key (short_code) naturally distributes data across partitions. For SQL databases, we shard manually:

Shard 0: short_codes starting with [a-j]
Shard 1: short_codes starting with [k-t]
Shard 2: short_codes starting with [u-z, A-Z, 0-9]

Or use consistent hashing to map hash(short_code) → shard:

public class ConsistentHashRouter {

    private final TreeMap<Long, String> ring = new TreeMap<>();
    private final int virtualNodes;

    public ConsistentHashRouter(List<String> shards, int virtualNodes) {
        this.virtualNodes = virtualNodes;
        for (String shard : shards) {
            for (int i = 0; i < virtualNodes; i++) {
                long hash = hash(shard + "#" + i);
                ring.put(hash, shard);
            }
        }
    }

    public String getShardFor(String shortCode) {
        long hash = hash(shortCode);
        Map.Entry<Long, String> entry = ring.ceilingEntry(hash);
        return (entry != null) ? entry.getValue() : ring.firstEntry().getValue();
    }

    private long hash(String key) {
        return Hashing.murmur3_128().hashString(key, StandardCharsets.UTF_8)
                      .asLong();
    }
}

9.2 Cache Scaling
#

Problem: A single Redis instance becomes a bottleneck.

Solution:

Redis Cluster — Partition keys across multiple Redis nodes. The Redis protocol handles this transparently.
Local In-Process Cache (L1) + Redis (L2): Add a Caffeine cache on each application server. Hot keys are served from L1 (nanosecond latency), falling back to L2 (Redis).

@Service
public class TwoLevelCacheService {

    private final Cache<String, String> localCache;
    private final CacheService redisCache;

    public TwoLevelCacheService(CacheService redisCache) {
        this.redisCache = redisCache;
        this.localCache = Caffeine.newBuilder()
            .maximumSize(100_000)
            .expireAfterWrite(Duration.ofMinutes(5))
            .build();
    }

    public String get(String shortCode) {
        // L1: local cache
        String value = localCache.getIfPresent(shortCode);
        if (value != null) return value;

        // L2: Redis
        value = redisCache.get(shortCode);
        if (value != null) {
            localCache.put(shortCode, value);
        }
        return value;
    }

    public void put(String shortCode, String longUrl) {
        localCache.put(shortCode, longUrl);
        redisCache.put(shortCode, longUrl);
    }
}

9.3 Multi-Region Deployment
#

For a global service, we deploy across regions to reduce redirect latency for users worldwide.

          ┌──────────────────────────────────────────┐
          │              Global DNS (Route53)         │
          │    Latency-based routing to nearest PoP   │
          └────────────┬────────────┬─────────────────┘
                       │            │
               ┌───────▼──┐   ┌────▼──────┐
               │ US-East   │   │ EU-West   │
               │ ┌───────┐ │   │ ┌───────┐ │
               │ │Redirect│ │   │ │Redirect│ │
               │ │Service │ │   │ │Service │ │
               │ └───┬───┘ │   │ └───┬───┘ │
               │ ┌───▼───┐ │   │ ┌───▼───┐ │
               │ │ Redis  │ │   │ │ Redis  │ │
               │ │Replica │ │   │ │Replica │ │
               │ └───────┘ │   │ └───────┘ │
               └───────────┘   └───────────┘
                       │            │
                       └─────┬──────┘  cross-region
                         ┌───▼───┐     replication
                         │DynamoDB│
                         │Global  │
                         │Tables  │
                         └───────┘

Key decisions:

DynamoDB Global Tables replicate data across regions with eventual consistency. Since URLs are write-once / read-many, this is perfectly acceptable.
Each region has its own Redis replica for local cache hits.
DNS-based routing (Route53 latency routing) sends users to the closest region.

9.4 KGS Scaling
#

The Key Generation Service must not become a bottleneck:

Run multiple KGS instances, each assigned a non-overlapping range of the keyspace or pulling from a partitioned key pool table.
Each URL Service instance maintains an in-memory buffer of ~10K pre-fetched keys. A buffer of 10K keys at 1K writes/s gives 10 seconds of runway — enough to survive a brief KGS outage.
If a KGS instance dies, its in-memory keys are lost. Given a keyspace of 3.5T, losing a few thousand keys is negligible.

10. Handling Edge Cases
#

10.1 Duplicate Long URLs
#

Should shorten("https://example.com") always return the same short code?

Option A: Always generate a new code. Simpler. Different users get different short URLs, enabling per-user analytics.

Option B: Dedup with a hash lookup. Store hash(long_url) → short_code in a secondary index. If the same user shortens the same URL, return the existing code.

For most production systems, Option A is preferred — it keeps the write path simple and supports per-user link management. Users can always see their own previously shortened URLs via the dashboard.

10.2 Abuse & Phishing Prevention
#

URL Blacklisting: Check the target URL against Google Safe Browsing API before creating the short link.
Redirect Interstitial: For flagged URLs, show a warning page instead of a direct redirect.
Link Reporting: Provide a mechanism for users to report malicious links.

public class UrlSafetyChecker {

    private final SafeBrowsingClient safeBrowsingClient;

    public boolean isSafe(String url) {
        ThreatMatch match = safeBrowsingClient.lookup(url);
        return match == null; // null = no threats found
    }
}

10.3 Link Expiration Cleanup
#

DynamoDB TTL: Items with expires_at in the past are deleted automatically (within ~48 hours).
Redis TTL: Cache entries have TTL aligned with link expiration.
Proactive Cleanup: A background job periodically scans and purges expired entries that DynamoDB TTL hasn’t cleaned up yet. This is a low-priority batch process.

12. Putting It All Together — The Complete Architecture
#

┌────────────────────────────────────────────────────────────────────────┐
│                         GLOBAL LAYER                                   │
│                                                                        │
│  DNS                                   │
│  CDN (CloudFront — caches 301 redirects at edge)                       │
│  WAF (Web Application Firewall — DDoS, bot protection)                 │
└────────────────────────────┬───────────────────────────────────────────┘
                             │
┌────────────────────────────▼───────────────────────────────────────────┐
│                     PER-REGION                                         │
│                                                                        │
│  ┌──────────────┐   ┌───────────────┐   ┌───────────────┐             │
│  │  API Gateway  │──►│  URL Service   │──►│     KGS       │             │
│  │  (Rate Limit) │   │  (Write Path)  │   │ (Key Gen Svc) │             │
│  └──────┬───────┘   └───────┬───────┘   └───────────────┘             │
│         │                   │                                          │
│  ┌──────▼───────┐   ┌───────▼───────┐                                  │
│  │  Redirect    │──►│  L1: Caffeine  │                                  │
│  │  Service     │   │  L2: Redis     │                                  │
│  │  (Read Path) │   │  L3: DynamoDB  │                                  │
│  └──────┬───────┘   └───────────────┘                                  │
│         │                                                              │
│  ┌──────▼───────┐   ┌───────────────┐   ┌───────────────┐             │
│  │    Kafka     │──►│ Click Consumer │──►│  ClickHouse   │             │
│  │(click events)│   │ (Enrichment)   │   │  (Analytics)  │             │
│  └──────────────┘   └───────────────┘   └───────────────┘             │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘
                             │
               Cross-Region Replication
                             │
┌────────────────────────────▼───────────────────────────────────────────┐
│                     DATA LAYER                                         │
│                                                                        │
│  DynamoDB Global Tables (url_mappings)                                 │
│  Redis Cluster (per-region cache)                                      │
│  ClickHouse Cluster (analytics, replicated)                            │
│  S3 (click event archive, long-term retention)                         │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

Introduction #

1. Requirements #

Functional Requirements #

Non-Functional Requirements #

Back-of-the-Envelope Calculation #

2. High-Level Design (30,000 Feet View) #

Core APIs #

301 vs 302 — Which Redirect? #

3. Short Code Generation — The Core Algorithm #

Option A: Random Generation + Collision Check #

Option B: Base62 Encoding of a Unique Counter (Snowflake / DB Sequence) #

Option C: Pre-generated Key Service (KGS) #

Comparison #

4. Database Design #

Schema #

SQL vs NoSQL — Choosing the Right Store #

DynamoDB Table Design #

Java Data Access Layer #

5. The Write Path — Creating a Short URL #

6. The Read Path — Redirect Flow #

7. Caching Strategy #

Cache Architecture #

Java Cache Service #

Cache Sizing #

Cache Invalidation #

8. Rate Limiting #

Token Bucket in Redis #

9. Scaling the System #

9.1 Database Scaling #

9.2 Cache Scaling #

9.3 Multi-Region Deployment #

9.4 KGS Scaling #

10. Handling Edge Cases #

10.1 Duplicate Long URLs #

10.2 Abuse & Phishing Prevention #

10.3 Link Expiration Cleanup #

12. Putting It All Together — The Complete Architecture #

Related