07 · Edge Cases & Recovery
Các tình huống race, stuck, timeout, và đường recovery. Liên kết trực tiếp với Flow 7–16 trong ../payment-flow.md.
1. Race conditions
1a. PaymentAuthorized listener vs admin cancel concurrent
Time Actor Action
───── ────── ─────────────────────────────────
T0 Customer Hoàn tất card form on Bambora
T1 Bambora GET webhook /webhooks/payments/bambora/:tenantId
T2 Admin UI STAFF bấm "Cancel" booking PENDING
T3 Webhook ProcessWebhookInboxService → Payment.authorize → PaymentAuthorized event → outbox
T4 API BookingService.updateStatus(CANCELLED) → tx commit → BookingCancelled event
T5 OutboxPub deliver PaymentAuthorized → listener
T6 Listener fetch booking → status = CANCELLED → skip (idempotent check)
Winner: admin (cancel first).
Handling: listener skip because status !== PENDING.
Payment đã AUTHORIZED → phải onBookingCancelled → VOID hold. Check timing:
- Nếu
BookingCancelledevent đếnPaymentIntegrationService.onBookingCancelledtrướcPaymentAuthorized(order phụ thuộc outbox) → Payment vẫn INITIATED →decideCancellationRefund(INITIATED)→ VOID (no transactionId yet) → possibly no-op nếu Payment chưa có providerRef. - Nếu
PaymentAuthorizedprocessed trước → Payment AUTHORIZED →onBookingCancelled(AUTHORIZED)→ VOID command → release hold.
→ Gap verify: outbox order guarantee trên cùng aggregate? Hiện DomainEventOutbox có occurredAt nhưng publisher poll published_at IS NULL + sort by id (uuidV7 monotonic). Cùng booking thì OK. Nhưng PaymentAuthorized + BookingCancelled là 2 aggregate khác nhau → race vẫn tồn tại. Xem P2-11.
1b. Webhook arrives before createSession returns (Flow 9)
T0 API provider.createSession() → sent
T1 Bambora processes + webhook POST → webhook-inbox row created
T2 WebhookInboxProcessor → findByProviderTransactionId → NOT FOUND (Payment chưa save)
T3 Processor mark webhook 'pending_payment_record', retry 5s
T4 API provider.createSession() returns → Payment.authorize + save
T5 Next tick processor → find Payment → apply event
Strategy: retry + backoff, 5 min timeout.
1c. Double-click "Confirm" button (FE)
- FE disable button trên click (
SubmitButtoncomponent handles isLoading). - API: race nếu 2 request đến cùng lúc:
- First → update PENDING → CONFIRMED
- Second → fetch → status = CONFIRMED →
isValidTransition(CONFIRMED, CONFIRMED)= false → throw 422
- Idempotent by error — FE show toast "Already confirmed".
1d. PaymentFailed followed by admin manual confirm
T0 Payment FAILED event → outbox
T1 Admin sees PENDING booking, bấm Confirm
T2 updateStatus(CONFIRMED) success (không check deposit) → booking CONFIRMED
T3 Outbox publisher → PaymentFailed → OnPaymentSettledNegativeListener
T4 Listener fetch → status = CONFIRMED → skip
→ Booking CONFIRMED without deposit. Gap P1-5 + P0-2 combined.
2. Stuck states
2a. Booking PENDING + Payment INITIATED forever
Customer bấm book, landed on Bambora, close tab trước khi nhập card.
- Payment INITIATED,
expiresAt = createdAt + 7d(default hold window, docspayment-flow.mdFlow 11). - Booking PENDING.
- Cron
payment-expiry:sweepmỗi 15 phút → markExpired sauexpiresAt. - →
PaymentExpiredevent →OnPaymentSettledNegativeListener→ booking CANCELLED.
→ Đã có recovery qua cron. OK. [x]
2b. Booking CONFIRMED + Payment AUTHORIZED held quá 7 ngày
Booking startTime > now + 7d → Bambora auth hold lapses → markExpired trước khi khách đến.
- Cron markExpired →
PaymentExpiredevent. - Negative listener check booking.status = CONFIRMED → skip.
- → Booking CONFIRMED nhưng Payment EXPIRED. Salon tưởng khách đã trả deposit nhưng thật ra hold đã mất.
→ Gap P1-11: Khi PaymentExpired arrive cho booking CONFIRMED → cần action:
- Option A: Notification admin "Deposit hold expired, yêu cầu khách re-authorize"
- Option B: Auto cancel + re-book request
- Option C:
maxBookingDaysInAdvance ≤ 7enforce khi depositEnabled (docs mục 20 payment-flow đã warning settings form, nhưng chỉ là warning không block)
2c. Loyalty RESERVED forever
- Booking ở terminal state (CANCELLED / NO_SHOW) trước khi L4 ship.
LoyaltyRedemption.status = RESERVEDmãi không cập nhật.- Customer query list redemption → thấy RESERVED cho booking đã CANCELLED (confusing).
→ Gap P1-6 L4 listener + migration dọn data orphan.
2d. Outbox dead letter
OutboxPublisherretry 10 lần → abandon.- Event chết, no business consequence triggered.
- Admin ops UI cần xem list dead letter, manual retrigger.
→ Gap P2-12: Admin ops UI cho outbox dead letter.
3. Timeout scenarios
3a. Provider outage dài
BamboraAdapter retry 3× with 500ms/1s/2s backoff → throw ProviderUnavailableError.
- Payment markFailed('PROVIDER_UNAVAILABLE') → event.
- Negative listener → booking CANCELLED.
→ Problematic: provider outage 30s không nên destroy booking. Gap P1-4:
- Distinguish transient (5xx, timeout) vs permanent (400 invalid card)
- Transient → Payment trạng thái riêng
RETRY_PENDING→ background retry - Permanent → FAILED như hiện tại
3b. Webhook từ Bambora bị delayed 2h
- Customer trả, chờ FE polling ... FE fallback timeout.
- Booking stays PENDING, Payment INITIATED.
- Webhook cuối cùng đến → authorize → listener confirm booking.
- Customer đã đóng browser — nhận email confirm sau 2h.
→ [~] UX chấp nhận được với email notification. Nhưng nếu webhook không đến (mất) → stuck INITIATED đến khi cron (2a).
3c. Reconciliation job (Flow 13)
Daily cron fetch provider list, compare DB. Detect:
- Provider CAPTURED, local INITIATED → apply event
- Provider REFUNDED, local CAPTURED → apply refund
→ Gap: status Reconciliation job chưa verify implemented. P2-13.
4. Tenant-level changes mid-flight
4a. OWNER tắt depositEnabled giữa chừng
Trạng thái trước toggle: nhiều booking PENDING với Payment INITIATED chờ.
- Toggle off →
depositEnabled=false. - Booking cũ vẫn có Payment INITIATED — không ai cancel Payment.
- Cron sẽ expire → negative listener cancel booking.
- → Booking mất hết dù customer không biết.
Expected: OWNER toggle off → dialog cảnh báo N booking đang chờ deposit sẽ bị cancel hoặc skip deposit.
→ Gap P2-14.
4b. OWNER thay đổi cancellationHours từ 24 xuống 12
- Customer book trước toggle, expect hủy trong 24h.
- Sau toggle, customer cố hủy ở 18h trước → block với 12h.
- Expected: grandfather cũ bookings với window ban đầu.
→ Gap P2-15 (hoặc accept trade-off).
4c. OWNER đổi Bambora credentials
- Flow 12 payment-flow.md covered: rotate credentials + health check.
- Existing payments vẫn OK (providerRef đã có, capture/refund dùng credentials mới).
5. Data integrity edge cases
5a. payableTotal = 0 (full loyalty discount)
depositAmount = payableTotal × depositPct = 0.onBookingCreated:depositAmount <= 0 → return→ không tạo Payment.- Booking PENDING (autoConfirm=false) → stuck, không có event để flip CONFIRMED.
- Hoặc autoConfirm=true → CONFIRMED ngay, OK.
→ Gap P2-9: khi payableTotal = 0 + autoConfirm=false → nên auto-confirm vì không có payment barrier. Hoặc cho admin bấm confirm (P0-2 guard should exempt this case).
5b. Booking items removed sau khi Payment captured
- Admin edit booking (xóa service) →
totalAmountgiảm. - Payment đã CAPTURED ở amount cao hơn.
- → Outstanding negative? Refund difference manually?
→ Gap P2-16 update booking sau capture.
5c. Resource deleted trong lúc booking active
- Booking has resourceId pointing to deleted resource.
- findById join → null resource.
- UI broken.
→ Soft delete resources được không? Check schema. [~] Verify.
6. Multi-concurrency scenarios
6a. 2 admin đồng thời edit cùng 1 booking
- No pessimistic lock.
- Last write wins.
updatedAttimestamp giúp detect conflict → FE có thể show "booking đã bị update, reload".
→ Gap P2-17 optimistic locking (nice-to-have).
6b. Customer book + staff walk-in cùng slot
checkConflicttrên cùng resource × time.- Lần 2 gọi sẽ fail với RESOURCE_CONFLICT.
allowDoubleBooking=trueskip conflict check — OK cho tenant cho phép.
→ [x] OK.
7. Recovery paths (cho production incident)
7a. Webhook lost cho nhiều payment
- Reconciliation job (Flow 13) sẽ sync từ provider.
- Manual: admin truy cập
/admin/payments/webhooks→ retry dead letter.
→ [ ] Admin ops UI chưa có (P2-12).
7b. Outbox table bloat
- DomainEventOutbox không có retention policy.
- Hàng trăm nghìn rows published → disk bloat.
→ Gap P2-18: retention job xóa published_at < now() - 30d.
7c. Listener bug publish duplicate
- E.g. capture command throw, listener re-run → double capture?
- Idempotency key
${booking.idempotencyKey}-capngăn provider double-charge. - Local Payment aggregate: dual-write với outbox, DB unique constraint ngăn duplicate row.
→ [x] OK.
7d. Payment aggregate corrupt (provider says CAPTURED, local INITIATED)
- Reconciliation detect mismatch → apply fetchStatus event.
- Admin có thể manual fix qua
PATCH /admin/payments/:id/reconcile(chưa có, nếu cần).
→ Gap P2-19.
8. Checklist edge cases
Race + ordering
- Authorized listener idempotent
- Authorized vs cancel race swallow INVALID_STATUS_TRANSITION
- [~] Cross-aggregate outbox order (P2-11)
- Webhook-before-session Flow 9 retry
- Double-click FE
Stuck recovery
- INITIATED expire cron → cancel PENDING
- [!] AUTHORIZED expired với booking CONFIRMED (>7d lead) — P1-11
- Loyalty RESERVED orphan migration (L4 + dọn data)
- Outbox dead-letter UI (P2-12)
Timeout
- [!] Provider outage destroy booking — P1-4
- [~] Webhook delayed — tolerate with cron fallback
- Reconciliation cron verify shipped (P2-13)
Tenant changes
- Toggle depositEnabled mid-flight (P2-14)
- Change cancellationHours grandfather (P2-15 / accept)
- Rotate credentials OK
Data integrity
- [!]
payableTotal = 0stuck — P2-9 - Update booking sau capture (P2-16)
- Soft-delete resource (P2-?)
Multi-concurrency
- Optimistic locking (P2-17)
- Resource conflict enforced
Ops tooling
- Admin UI webhook retry (P2-12)
- Outbox retention (P2-18)
- Manual reconcile payment (P2-19)
→ Consolidated: gaps-and-plan.md.