A backend engineer's journey of learning and growth.
by kan01234
In any financial system β especially one handling wallet balances and real-time payments β we must ensure:
Idempotency
Every client request must be processed once and only once.
Retried requests (due to network issues or timeouts) must not result in double charges.
Recoverability
The system must be resilient to partial failures (e.g., crash before commit).
State must be reconstructable from durable records like transactions and ledgers.
Robustness
No inconsistent state even under high concurrency.
Must guard against race conditions, double-spends, phantom reads, etc.
[Client]
β
[API Gateway]
β
[Payment Service]
ββββ Idempotency Store
ββββ Transaction Store
ββββ Ledger Store
ββββ Wallet Store (row-level locks)
ββββ Notification Queue (optional)
Property | Why It Matters in Finance |
---|---|
Atomicity | Ensures either all updates (wallet, ledger, tx) happen, or none β no partial writes. |
Consistency | Enforces rules (e.g., non-negative balance, foreign keys). |
Isolation | Prevents race conditions under concurrent transactions (important for wallet updates). |
Durability | Once committed, all writes are safe β even after power failure or crash. |
π§ In a payment system, you canβt afford half-complete operations β ACID is non-negotiable.
You need to update:
β all in one coordinated transaction.
π οΈ RDBs allow this out-of-the-box using BEGIN β¦ COMMIT.
You have:
These are highly interrelated, and RDBs let you model and enforce:
π§© Perfect match for structured, interdependent financial data.
π These are essential for financial observability and recovery.
Companies like Stripe, PayPal, Square, and Mercari itself use RDBs (e.g., PostgreSQL or MySQL) for core financial systems.
Proven at millions of transactions per day, when properly partitioned and scaled.
π Alternatives β and Why RDB Still Wins
Option | Why Itβs Risky Alone |
---|---|
NoSQL | Weak consistency; canβt do ACID across documents |
KV Stores | No schema, no joins, harder for audits |
Event Store | Great for logging, but complex for balance logic |
In-memory DB | Not durable enough for financial systems |
For a core payment ledger, an RDB gives you:
β ACID β Reliability β Auditability β Schema enforcement β Recovery options β Fast joins & transactional updates
π§ Itβs still the most trustable, explainable, and recoverable option for moving money safely.
POST /payments
{
"payerId": "u123",
"payeeId": "m456",
"amount": 100,
"requestId": "REQ-abc123"
}
SELECT * FROM idempotency_keys WHERE request_id = 'REQ-abc123';
β Step 2a: Insert idempotency key (must be first for deduplication) INSERT INTO idempotency_keys (request_id, status) VALUES (βREQ-abc123β, βPENDINGβ);
β Step 2b: Insert transaction record INSERT INTO transactions (txn_id, payer, payee, amount, status) VALUES (βtxn-001β, βu123β, βm456β, 100, βPENDINGβ);
#### π‘ Why Insert Both Early?
| Reason | Benefit |
| ----------------------------- | ----------------------------------------------------------------------- |
| **Crash safety** | Partial progress is visible; no hidden side effects |
| **Deduplication** | Prevents duplicate request from executing again |
| **Debugging & observability** | Operators can observe in-flight or failed transactions |
| **Retry support** | Allows clean recovery: continue where it left off or return cached data |
| **Auditability** | You have a durable footprint of what was *about to happen* |
### π 3. Begin transaction
All the following actions are wrapped in a single ACID DB transaction.
### π³ 4. Insert initial transaction record
INSERT INTO transactions (txn_id, payer, payee, amount, status) VALUES (βtxn-001β, βu123β, βm456β, 100, βPENDINGβ);
### π 5. Lock payer wallet row
SELECT balance FROM wallets WHERE user_id = βu123β FOR UPDATE;
### β 6. Check and deduct balance
UPDATE wallets SET balance = balance - 100 WHERE user_id = βu123β;
### π 7. Insert ledger records
INSERT INTO ledger (β¦) VALUES (βu123β, 100, βtxn-001β, βDEBITβ), (βm456β, 100, βtxn-001β, βCREDITβ);
### β
8. Mark transaction as SUCCESS
UPDATE transactions SET status = βSUCCESSβ WHERE txn_id = βtxn-001β;
### ποΈ 9. Update idempotency record
UPDATE idempotency_keys SET status = βSUCCESSβ, response_json = β{β¦}β WHERE request_id = βREQ-abc123β;
### π¦ 10. Commit transaction
After commit, response to client the transaction is success
### π£ 11. Optional: push notifications to payer/payee
## π οΈ Unhappy Path - Example Scenario: Failure During Wallet Deduction
### Suppose this flow:
1. Request comes in β requestId = REQ-abc123
2. Inserted into:
2.1 idempotency_keys (PENDING)
2.2 transactions (PENDING)
3. Locked payer wallet
π₯ App crashes before updating balance or inserting ledger
### π§° How System Handles It
| Concern | Mitigation |
| --------------------------- | -------------------------------------------------------------------------------------------------------------------------- |
| **No partial side effects** | Since all actions were inside a single DB transaction and it was never committed, **everything rolls back automatically**. |
| **Client retry** | The next request will hit `idempotency_keys` β status = `PENDING`.<br>You can return "in-progress" or retry the full flow. |
| **No double spend** | Since nothing was committed, no wallet deduction or ledger entry happened. |
| **Safe to resume** | The retry can safely **reuse the existing transaction record**, continue from known safe point, and commit the full flow. |
### πΎ Recovery Principles
#### Use Idempotency Key as the Anchor
Any retry uses the same requestId, which either:
- Returns existing response (if SUCCESS)
- Detects PENDING and waits or resumes
#### Store durable intent early
Ensures that either:
- Wallet + ledger + transaction + idempotency key are all updated
- Or none of them are
#### Use atomic DB transactions
- Insert transactions and idempotency_keys first.
- These act like a crash-safe bookmark for recovery tools or retries.
### Summary
| Failure Point | Outcome | Why It's Safe |
| ------------------- | ----------------------------- | --------------------------- |
| Before transaction | Nothing persisted | No action needed |
| During transaction | Rolled back on crash | ACID rollback |
| After commit | Success is visible to retries | No side effects repeated |
| Retry after failure | Detected via idempotency key | Ensures exactly-once effect |
## β
How to Use the Ledger for Auditing
1. Transaction Verification
- Every txn_id must result in a net-zero effect.
- For every DEBIT entry, there should be a corresponding CREDIT entry of the same amount.
- You can run integrity checks like:
SELECT txn_id, SUM(CASE type WHEN βDEBITβ THEN -amount ELSE amount END) as net FROM ledger GROUP BY txn_id HAVING net != 0;
2. User History / Statement Generation
- The ledger is an append-only journal.
- You can reconstruct a userβs balance by replaying entries over time.
SELECT * FROM ledger WHERE user_id = βu123β ORDER BY created_at;
3. Balance Verification (Snapshot Check)
- Each ledger entry can store a balance_snapshot at the moment of mutation.
- Periodically, you can validate:
SELECT MAX(balance_snapshot) FROM ledger WHERE user_id = βu123β;
vs.
SELECT balance FROM wallets WHERE user_id = βu123β; ```
π‘οΈ In summary:
Use Case | Ledger Role |
---|---|
Recover from crash | Replay entries to rebuild balances |
Detect inconsistencies | Compare ledger-derived balance vs. wallet table |
Prove transactions | Immutable record of all changes |
Generate statements | Use ordered ledger events |
Ensure correctness | Check that DEBIT + CREDIT are always paired |
Requirement | How Itβs Handled |
---|---|
Idempotency | idempotency_keys table with unique requestId ensures safe retries. |
Recoverability | All state changes are durable and atomic. PENDING status helps detect failures. |
Robustness | Pessimistic locking (FOR UPDATE ) + ACID transactions avoid race conditions. |
Signal Type | Examples |
---|---|
Logs | Request/response traces, txn state transitions |
Metrics | wallet_lock_timeout_total , payments_success_total , duplicate_request_total |
Traces | End-to-end spans from API β Wallet β Ledger β Commit |
Dashboards | Wallet latency, txn throughput, balance errors |
Alerts | Long lock waits, transaction failures, inconsistent ledger balance |
Bottleneck | Description |
---|---|
Pessimistic locking | SELECT ... FOR UPDATE serializes access β can lead to contention on hot wallets |
Single DB transaction | All logic in one DB β risks long transactions & lock contention under load |
Idempotency table growth | Needs periodic pruning or TTL management |
Retry behavior | Poor retry strategies can cause thundering herd effects |
Use logical sharding by user_id to reduce lock contention.
Add timeouts and monitoring on long-lived locks.
Consider outbox pattern or saga pattern for decoupling ledger, notifications.
Consider event-driven PSP integration for multi-provider support.
tags: system-design