flows/status-matrix/07-edge-cases.md

07 · Edge Cases & Recovery

Các tình huống race, stuck, timeout, và đường recovery. Liên kết trực tiếp với Flow 7–16 trong ../payment-flow.md.


1. Race conditions

1a. PaymentAuthorized listener vs admin cancel concurrent

Time     Actor         Action
─────    ──────        ─────────────────────────────────
T0       Customer      Hoàn tất card form on Bambora
T1       Bambora       GET webhook /webhooks/payments/bambora/:tenantId
T2       Admin UI      STAFF bấm "Cancel" booking PENDING
T3       Webhook       ProcessWebhookInboxService → Payment.authorize → PaymentAuthorized event → outbox
T4       API           BookingService.updateStatus(CANCELLED) → tx commit → BookingCancelled event
T5       OutboxPub     deliver PaymentAuthorized → listener
T6       Listener      fetch booking → status = CANCELLED → skip (idempotent check)

Winner: admin (cancel first).
Handling: listener skip because status !== PENDING.
Payment đã AUTHORIZED → phải onBookingCancelled → VOID hold. Check timing:

  • Nếu BookingCancelled event đến PaymentIntegrationService.onBookingCancelled trước PaymentAuthorized (order phụ thuộc outbox) → Payment vẫn INITIATED → decideCancellationRefund(INITIATED) → VOID (no transactionId yet) → possibly no-op nếu Payment chưa có providerRef.
  • Nếu PaymentAuthorized processed trước → Payment AUTHORIZED → onBookingCancelled(AUTHORIZED) → VOID command → release hold.

Gap verify: outbox order guarantee trên cùng aggregate? Hiện DomainEventOutboxoccurredAt nhưng publisher poll published_at IS NULL + sort by id (uuidV7 monotonic). Cùng booking thì OK. Nhưng PaymentAuthorized + BookingCancelled là 2 aggregate khác nhau → race vẫn tồn tại. Xem P2-11.

1b. Webhook arrives before createSession returns (Flow 9)

T0  API    provider.createSession() → sent
T1  Bambora processes + webhook POST → webhook-inbox row created
T2  WebhookInboxProcessor → findByProviderTransactionId → NOT FOUND (Payment chưa save)
T3  Processor mark webhook 'pending_payment_record', retry 5s
T4  API provider.createSession() returns → Payment.authorize + save
T5  Next tick processor → find Payment → apply event

Strategy: retry + backoff, 5 min timeout.

1c. Double-click "Confirm" button (FE)

  • FE disable button trên click (SubmitButton component handles isLoading).
  • API: race nếu 2 request đến cùng lúc:
    • First → update PENDING → CONFIRMED
    • Second → fetch → status = CONFIRMED → isValidTransition(CONFIRMED, CONFIRMED) = false → throw 422
  • Idempotent by error — FE show toast "Already confirmed".

1d. PaymentFailed followed by admin manual confirm

T0  Payment FAILED event → outbox
T1  Admin sees PENDING booking, bấm Confirm
T2  updateStatus(CONFIRMED) success (không check deposit) → booking CONFIRMED
T3  Outbox publisher → PaymentFailed → OnPaymentSettledNegativeListener
T4  Listener fetch → status = CONFIRMED → skip

→ Booking CONFIRMED without deposit. Gap P1-5 + P0-2 combined.


2. Stuck states

2a. Booking PENDING + Payment INITIATED forever

Customer bấm book, landed on Bambora, close tab trước khi nhập card.

  • Payment INITIATED, expiresAt = createdAt + 7d (default hold window, docs payment-flow.md Flow 11).
  • Booking PENDING.
  • Cron payment-expiry:sweep mỗi 15 phút → markExpired sau expiresAt.
  • PaymentExpired event → OnPaymentSettledNegativeListener → booking CANCELLED.

→ Đã có recovery qua cron. OK. [x]

2b. Booking CONFIRMED + Payment AUTHORIZED held quá 7 ngày

Booking startTime > now + 7d → Bambora auth hold lapses → markExpired trước khi khách đến.

  • Cron markExpired → PaymentExpired event.
  • Negative listener check booking.status = CONFIRMED → skip.
  • Booking CONFIRMED nhưng Payment EXPIRED. Salon tưởng khách đã trả deposit nhưng thật ra hold đã mất.

Gap P1-11: Khi PaymentExpired arrive cho booking CONFIRMED → cần action:

  • Option A: Notification admin "Deposit hold expired, yêu cầu khách re-authorize"
  • Option B: Auto cancel + re-book request
  • Option C: maxBookingDaysInAdvance ≤ 7 enforce khi depositEnabled (docs mục 20 payment-flow đã warning settings form, nhưng chỉ là warning không block)

2c. Loyalty RESERVED forever

  • Booking ở terminal state (CANCELLED / NO_SHOW) trước khi L4 ship.
  • LoyaltyRedemption.status = RESERVED mãi không cập nhật.
  • Customer query list redemption → thấy RESERVED cho booking đã CANCELLED (confusing).

→ Gap P1-6 L4 listener + migration dọn data orphan.

2d. Outbox dead letter

  • OutboxPublisher retry 10 lần → abandon.
  • Event chết, no business consequence triggered.
  • Admin ops UI cần xem list dead letter, manual retrigger.

→ Gap P2-12: Admin ops UI cho outbox dead letter.


3. Timeout scenarios

3a. Provider outage dài

BamboraAdapter retry 3× with 500ms/1s/2s backoff → throw ProviderUnavailableError.

  • Payment markFailed('PROVIDER_UNAVAILABLE') → event.
  • Negative listener → booking CANCELLED.

Problematic: provider outage 30s không nên destroy booking. Gap P1-4:

  • Distinguish transient (5xx, timeout) vs permanent (400 invalid card)
  • Transient → Payment trạng thái riêng RETRY_PENDING → background retry
  • Permanent → FAILED như hiện tại

3b. Webhook từ Bambora bị delayed 2h

  • Customer trả, chờ FE polling ... FE fallback timeout.
  • Booking stays PENDING, Payment INITIATED.
  • Webhook cuối cùng đến → authorize → listener confirm booking.
  • Customer đã đóng browser — nhận email confirm sau 2h.

→ [~] UX chấp nhận được với email notification. Nhưng nếu webhook không đến (mất) → stuck INITIATED đến khi cron (2a).

3c. Reconciliation job (Flow 13)

Daily cron fetch provider list, compare DB. Detect:

  • Provider CAPTURED, local INITIATED → apply event
  • Provider REFUNDED, local CAPTURED → apply refund

→ Gap: status Reconciliation job chưa verify implemented. P2-13.


4. Tenant-level changes mid-flight

4a. OWNER tắt depositEnabled giữa chừng

Trạng thái trước toggle: nhiều booking PENDING với Payment INITIATED chờ.

  • Toggle off → depositEnabled=false.
  • Booking cũ vẫn có Payment INITIATED — không ai cancel Payment.
  • Cron sẽ expire → negative listener cancel booking.
  • → Booking mất hết dù customer không biết.

Expected: OWNER toggle off → dialog cảnh báo N booking đang chờ deposit sẽ bị cancel hoặc skip deposit.

→ Gap P2-14.

4b. OWNER thay đổi cancellationHours từ 24 xuống 12

  • Customer book trước toggle, expect hủy trong 24h.
  • Sau toggle, customer cố hủy ở 18h trước → block với 12h.
  • Expected: grandfather cũ bookings với window ban đầu.

→ Gap P2-15 (hoặc accept trade-off).

4c. OWNER đổi Bambora credentials

  • Flow 12 payment-flow.md covered: rotate credentials + health check.
  • Existing payments vẫn OK (providerRef đã có, capture/refund dùng credentials mới).

5. Data integrity edge cases

5a. payableTotal = 0 (full loyalty discount)

  • depositAmount = payableTotal × depositPct = 0.
  • onBookingCreated: depositAmount <= 0 → return → không tạo Payment.
  • Booking PENDING (autoConfirm=false) → stuck, không có event để flip CONFIRMED.
  • Hoặc autoConfirm=true → CONFIRMED ngay, OK.

→ Gap P2-9: khi payableTotal = 0 + autoConfirm=false → nên auto-confirm vì không có payment barrier. Hoặc cho admin bấm confirm (P0-2 guard should exempt this case).

5b. Booking items removed sau khi Payment captured

  • Admin edit booking (xóa service) → totalAmount giảm.
  • Payment đã CAPTURED ở amount cao hơn.
  • → Outstanding negative? Refund difference manually?

→ Gap P2-16 update booking sau capture.

5c. Resource deleted trong lúc booking active

  • Booking has resourceId pointing to deleted resource.
  • findById join → null resource.
  • UI broken.

→ Soft delete resources được không? Check schema. [~] Verify.


6. Multi-concurrency scenarios

6a. 2 admin đồng thời edit cùng 1 booking

  • No pessimistic lock.
  • Last write wins.
  • updatedAt timestamp giúp detect conflict → FE có thể show "booking đã bị update, reload".

→ Gap P2-17 optimistic locking (nice-to-have).

6b. Customer book + staff walk-in cùng slot

  • checkConflict trên cùng resource × time.
  • Lần 2 gọi sẽ fail với RESOURCE_CONFLICT.
  • allowDoubleBooking=true skip conflict check — OK cho tenant cho phép.

→ [x] OK.


7. Recovery paths (cho production incident)

7a. Webhook lost cho nhiều payment

  • Reconciliation job (Flow 13) sẽ sync từ provider.
  • Manual: admin truy cập /admin/payments/webhooks → retry dead letter.

→ [ ] Admin ops UI chưa có (P2-12).

7b. Outbox table bloat

  • DomainEventOutbox không có retention policy.
  • Hàng trăm nghìn rows published → disk bloat.

→ Gap P2-18: retention job xóa published_at < now() - 30d.

7c. Listener bug publish duplicate

  • E.g. capture command throw, listener re-run → double capture?
  • Idempotency key ${booking.idempotencyKey}-cap ngăn provider double-charge.
  • Local Payment aggregate: dual-write với outbox, DB unique constraint ngăn duplicate row.

→ [x] OK.

7d. Payment aggregate corrupt (provider says CAPTURED, local INITIATED)

  • Reconciliation detect mismatch → apply fetchStatus event.
  • Admin có thể manual fix qua PATCH /admin/payments/:id/reconcile (chưa có, nếu cần).

→ Gap P2-19.


8. Checklist edge cases

Race + ordering

  • Authorized listener idempotent
  • Authorized vs cancel race swallow INVALID_STATUS_TRANSITION
  • [~] Cross-aggregate outbox order (P2-11)
  • Webhook-before-session Flow 9 retry
  • Double-click FE

Stuck recovery

  • INITIATED expire cron → cancel PENDING
  • [!] AUTHORIZED expired với booking CONFIRMED (>7d lead) — P1-11
  • Loyalty RESERVED orphan migration (L4 + dọn data)
  • Outbox dead-letter UI (P2-12)

Timeout

  • [!] Provider outage destroy booking — P1-4
  • [~] Webhook delayed — tolerate with cron fallback
  • Reconciliation cron verify shipped (P2-13)

Tenant changes

  • Toggle depositEnabled mid-flight (P2-14)
  • Change cancellationHours grandfather (P2-15 / accept)
  • Rotate credentials OK

Data integrity

  • [!] payableTotal = 0 stuck — P2-9
  • Update booking sau capture (P2-16)
  • Soft-delete resource (P2-?)

Multi-concurrency

  • Optimistic locking (P2-17)
  • Resource conflict enforced

Ops tooling

  • Admin UI webhook retry (P2-12)
  • Outbox retention (P2-18)
  • Manual reconcile payment (P2-19)

→ Consolidated: gaps-and-plan.md.