Promises, Async/Await, and the Hidden Complexity of AI API Calls

javascript promises async awaitabort controllerpromise.allsettledretry patternstreaming api

The race condition that showed the wrong answer

A product team built an AI search feature. Type a query, get a streamed answer. Worked great in demos. In production, users reported seeing answers to previous questions replace answers to current ones.

The root cause: two requests in flight, the slower one resolving after the faster one:

// User types "react hooks" → request A fires
// User quickly types "react server components" → request B fires
// Request B resolves first (shorter answer)
// Request A resolves second (longer answer) → overwrites the screen
// User now sees the answer to "react hooks" while the input says "react server components"

This is not an edge case. It is the default behavior of unmanaged async code. Every search box, every AI chat, every typeahead has this bug unless you explicitly prevent it.

Cancellation is not optional

The fix is AbortController, but most tutorials show a toy version. Here is what production cancellation actually looks like:

function createCancelableQuery() {
  let currentController = null
  let currentRequestId = 0

  return async function query(input, onResult, onError) {
    // 1. Cancel any in-flight request
    currentController?.abort()
    currentController = new AbortController()

    // 2. Stamp this request so we can detect staleness
    const requestId = ++currentRequestId

    try {
      const response = await fetch('/api/search', {
        method: 'POST',
        body: JSON.stringify({ query: input }),
        signal: currentController.signal,
        headers: { 'Content-Type': 'application/json' },
      })

      if (!response.ok) {
        throw new Error(`HTTP ${response.status}: ${response.statusText}`)
      }

      const data = await response.json()

      // 3. Only apply the result if this is still the latest request
      //    (belt-and-suspenders: abort should prevent this,
      //     but network quirks and cached responses can bypass it)
      if (requestId === currentRequestId) {
        onResult(data)
      }
    } catch (err) {
      if (err.name === 'AbortError') {
        // Intentional cancellation — not an error
        return
      }
      if (requestId === currentRequestId) {
        onError(err)
      }
    }
  }
}

Key details people miss:

The request ID check is not redundant. Abort signals do not always fire instantly, and cached responses can resolve synchronously after abort.
AbortError is not an error from the user's perspective. Swallow it silently.
The controller must be replaced before the new fetch, not after. Otherwise there is a window where both requests are un-cancelable.

Streaming: reading a response body as it arrives

The fetch API gives you a ReadableStream on the response body. Here is the actual pattern for consuming server-sent chunks:

async function streamResponse(url, body, signal, onChunk, onDone) {
  const response = await fetch(url, {
    method: 'POST',
    body: JSON.stringify(body),
    signal,
    headers: { 'Content-Type': 'application/json' },
  })

  if (!response.ok) {
    const errorBody = await response.text()
    throw new Error(`${response.status}: ${errorBody}`)
  }

  const reader = response.body.getReader()
  const decoder = new TextDecoder()
  let buffer = ''

  try {
    while (true) {
      const { done, value } = await reader.read()
      if (done) break

      buffer += decoder.decode(value, { stream: true })

      // SSE format: lines starting with "data: "
      const lines = buffer.split('\n')
      // Keep the last (potentially incomplete) line in the buffer
      buffer = lines.pop() || ''

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const payload = line.slice(6)
          if (payload === '[DONE]') {
            onDone()
            return
          }
          try {
            onChunk(JSON.parse(payload))
          } catch {
            // Malformed JSON in stream — log but don't crash
          }
        }
      }
    }
    onDone()
  } finally {
    reader.releaseLock()
  }
}

Things this handles that simpler examples skip:

TextDecoder with { stream: true } — multi-byte characters (like emoji) can be split across chunks. Without this flag, you get garbled text.
Line buffering — SSE data can arrive mid-line. The buffer holds incomplete lines until the next chunk completes them.
Reader cleanup — releaseLock() in finally ensures the stream is properly released even on error or abort.

Retry strategies for different failure types

Not all errors deserve a retry. Here is a decision framework:

┌──────────────────────┬────────────┬─────────────────────────┐
│ Error type           │ Retry?     │ Strategy                │
├──────────────────────┼────────────┼─────────────────────────┤
│ 429 Rate Limited     │ Yes        │ Respect Retry-After     │
│                      │            │ header, exponential     │
│                      │            │ backoff                 │
│                      │            │                         │
│ 500/502/503 Server   │ Yes        │ Exponential backoff,    │
│                      │            │ max 3 attempts          │
│                      │            │                         │
│ 408 Timeout          │ Yes        │ Same request, maybe     │
│                      │            │ with longer timeout     │
│                      │            │                         │
│ 400 Bad Request      │ No         │ Fix the request. Show   │
│                      │            │ the user what's wrong   │
│                      │            │                         │
│ 401/403 Auth         │ No*        │ Redirect to login, or   │
│                      │            │ refresh token then      │
│                      │            │ retry once              │
│                      │            │                         │
│ 422 Content Filter   │ No         │ Tell the user their     │
│                      │            │ input was blocked       │
│                      │            │                         │
│ Network error        │ Yes        │ Check navigator.onLine, │
│                      │            │ retry with backoff      │
│                      │            │                         │
│ AbortError           │ Never      │ User intended this      │
└──────────────────────┴────────────┴─────────────────────────┘

And here is exponential backoff that actually works:

async function fetchWithRetry(url, options, {
  maxAttempts = 3,
  baseDelay = 1000,
  maxDelay = 10000,
} = {}) {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      const response = await fetch(url, options)

      // Don't retry client errors (except 429)
      if (response.status >= 400 && response.status < 500 && response.status !== 429) {
        return response  // Let the caller handle the error response
      }

      if (response.ok) return response

      // Server error or rate limit — retry
      if (attempt === maxAttempts) return response

      // Respect Retry-After header if present
      const retryAfter = response.headers.get('Retry-After')
      if (retryAfter) {
        await sleep(parseInt(retryAfter, 10) * 1000)
      } else {
        // Exponential backoff with jitter
        const delay = Math.min(
          baseDelay * Math.pow(2, attempt - 1) + Math.random() * 1000,
          maxDelay
        )
        await sleep(delay)
      }
    } catch (err) {
      if (err.name === 'AbortError') throw err  // Never retry aborts

      if (attempt === maxAttempts) throw err

      const delay = Math.min(
        baseDelay * Math.pow(2, attempt - 1) + Math.random() * 1000,
        maxDelay
      )
      await sleep(delay)
    }
  }
}

function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms))
}

Why the jitter matters: without Math.random() * 1000, all clients that failed at the same time will retry at the same time, causing a thundering herd that triggers more 429s.

Promise.allSettled vs Promise.all — a real scenario

Consider a dashboard that loads AI enrichments for a list of items:

// ❌ Promise.all — one failure kills everything
async function enrichItems(items) {
  const enriched = await Promise.all(
    items.map(item => fetchAIEnrichment(item.id))
  )
  return enriched
}
// If item #3 fails, items #1, #2, #4, #5 are all lost

// ✅ Promise.allSettled — partial success is still useful
async function enrichItems(items) {
  const results = await Promise.allSettled(
    items.map(item => fetchAIEnrichment(item.id))
  )

  return results.map((result, i) => ({
    ...items[i],
    enrichment: result.status === 'fulfilled' ? result.value : null,
    enrichmentError: result.status === 'rejected' ? result.reason.message : null,
  }))
}
// Item #3 shows a fallback; items #1, #2, #4, #5 display normally

Use Promise.all when partial results are meaningless (all-or-nothing transactions).
Use Promise.allSettled when each result is independently useful (dashboards, enrichments, multi-source search).

Promise.race for timeouts (done right)

Promise.race is commonly used for timeouts, but most implementations leak the losing promise:

// ❌ The fetch continues running after the timeout
const result = await Promise.race([
  fetch('/api/slow-endpoint'),
  new Promise((_, reject) =>
    setTimeout(() => reject(new Error('Timeout')), 5000)
  ),
])

// ✅ Cancel the fetch when it loses the race
async function fetchWithTimeout(url, options, timeoutMs = 5000) {
  const controller = new AbortController()
  const timeoutId = setTimeout(() => controller.abort(), timeoutMs)

  try {
    const response = await fetch(url, {
      ...options,
      signal: AbortSignal.any([
        controller.signal,
        // Respect any signal the caller passed too
        ...(options?.signal ? [options.signal] : []),
      ]),
    })
    return response
  } finally {
    clearTimeout(timeoutId)
  }
}

AbortSignal.any() (available in modern browsers) combines multiple abort signals — the request cancels if either the timeout fires or the caller aborts.

Putting it together: a complete async flow

Here is a realistic AI feature request lifecycle:

class AIRequestManager {
  #controller = null
  #requestId = 0

  async submitPrompt(prompt, {
    onToken,
    onComplete,
    onError,
    onRetry,
  }) {
    // Cancel previous
    this.#controller?.abort()
    this.#controller = new AbortController()
    const requestId = ++this.#requestId
    const isStale = () => requestId !== this.#requestId

    let partialResponse = ''

    const execute = async (attempt = 1) => {
      try {
        await streamResponse(
          '/api/chat',
          { prompt, partial: partialResponse },
          this.#controller.signal,
          (chunk) => {
            if (isStale()) return
            partialResponse += chunk.text
            onToken(chunk.text)
          },
          () => {
            if (!isStale()) onComplete(partialResponse)
          }
        )
      } catch (err) {
        if (err.name === 'AbortError' || isStale()) return

        if (isRetryable(err) && attempt < 3) {
          onRetry(attempt)
          const delay = 1000 * Math.pow(2, attempt - 1) + Math.random() * 500
          await sleep(delay)
          if (!isStale()) return execute(attempt + 1)
        } else {
          onError(err, partialResponse)
        }
      }
    }

    return execute()
  }

  cancel() {
    this.#controller?.abort()
  }
}

This handles: cancellation, staleness detection, streaming, partial response preservation on error, retry with backoff, and clean separation of concerns.

The concepts that connect from here

These async patterns are the building blocks for:

Search autocomplete — cancellation, debouncing, race prevention
Notification systems — multi-source async coordination
How the event loop schedules all of this — why buffering and yielding matter
Streaming chat architecture — where these patterns meet real UI constraints

LLM-friendly summary

A frontend async architecture guide covering cancellation, retries, allSettled vs race, and the edge cases introduced by AI API calls and streaming responses.