kan01234 - Software Engineer Notes

Logo

A backend engineer's journey of learning and growth.

View the Project on GitHub kan01234/post

23 June 2025

System Design: How to Build a URL Shortening Service at Scale

by kan01234

๐Ÿ”ง Goal: Design a URL Shortening Service

Like Bitly: Given a long URL, generate a short unique alias (e.g., bit.ly/abc123) and redirect users when they visit that short link.

๐Ÿงฑ Functional Requirements

โŒ Non-Functional Requirements

๐Ÿงฉ High-Level Components

+-------------+       +------------------+       +--------------------+
|   Clients   | <---> |   API Gateway    | <---> |   URL Shorten API  |
+-------------+       +------------------+       +---------+----------+
                                                            |
                                                            v
                                              +-------------+-------------+
                                              |     Storage System        |
                                              |  (DB + Cache + Indexes)   |
                                              +---------------------------+

๐Ÿ”‘ Core Design Decision: How to Generate Short URL?

Option A: Base62 Encoding of ID

Option B: Random String

Option C: Hash (MD5/SHA1)

๐Ÿ‘‰ Most systems use Base62 of an integer ID or random + uniqueness check.

๐Ÿ—ƒ๏ธ Database Schema

Table: urls
- id (BIGINT PK)
- short_code (VARCHAR UNIQUE)
- long_url (TEXT)
- created_at
- expiration_time (optional)
- owner_id (optional)

โš™๏ธ Flow: Shortening a URL (POST)

  1. Client calls POST /shorten with long URL
  2. Server generates a short code: 2.1 Either by inserting and getting auto-increment ID, then encode 2.2 Or generate random string + check uniqueness
  3. Store (short_code, long_url) in DB
  4. Return short_url = domain/short_code

โšก Flow: Redirect (GET /abc123)

  1. Client requests GET /abc123
  2. Lookup short code: 2.1 First check cache (Redis) 2.2 If not found, hit DB
  3. If found, redirect (HTTP 301 or 302); Else, return 404

๐Ÿง  Optimization

Caching

๐Ÿ“ˆ Analytics (Optional)

๐Ÿงฎ Scale Considerations

Component Strategy
DB Writes Sharded ID generator, UUID
DB Reads Cache layer, CDN
API Layer Stateless, horizontally scaled
Storage Use consistent hashing or sharding
Backup Replicate data, versioning

๐Ÿšจ Failure & Recovery

Trade-offs

Choice Pros Cons
Base62 ID Compact, ordered Global ID = bottleneck
Random codes No dependency Needs collision detection
Hashing Deterministic Collisions + long to short loss
Cache layer Super fast reads Stale data risk

๐Ÿ”จ What Datastores to Use?

๐Ÿง  What Are We Storing?

  1. ๐Ÿ”— URL Mappings
  1. ๐Ÿ“ˆ Analytics (Optional)
  1. โšก Caching

Datastores to Use

๐Ÿงฎ Primary: Relational Database (RDBMS)

โœ… Best for: URL mappings

Pros:

Cons:


๐Ÿ’จ Cache Layer: KVS (Redis)

โœ… Best for: Low-latency reads

Why KVS (Redis)?


๐Ÿ“Š Analytics Store: Append-Only Event System + Columnar DB

โœ… Best for: Tracking clicks and usage

Why?


๐Ÿ” Optional: NoSQL (Only If Needed)

For extreme scale, consider NoSQL alternatives: A. Cassandra / DynamoDB

๐Ÿ’ก Only introduce NoSQL when:

๐Ÿง  Storage Summary Table

Data Store Why
Short URL mappings PostgreSQL Relational integrity, easy indexing
Hot URL cache Redis Sub-ms lookup, TTL support
Click analytics Kafka + BQ Async, scalable, columnar analytics
Extreme key-value scale DynamoDB/Cassandra Only if millions of writes/sec required

For 99% of teams, this is enough:

tags: system-design