What are embeddings, actually?
If you have used search, recommendations, or any AI feature — embeddings were involved. But the explanations aimed at ML engineers are full of linear algebra notation that frontend engineers do not need.
Here is the practical version: an embedding is a list of numbers that represents the meaning of a piece of text. Similar meanings produce similar numbers.
// These two sentences have similar embeddings
embed("How do I reset my password?")
// → [0.12, -0.34, 0.56, 0.78, ...] (384 numbers)
embed("I forgot my login credentials")
// → [0.11, -0.31, 0.54, 0.80, ...] (very similar numbers)
// This sentence has a very different embedding
embed("The weather is nice today")
// → [-0.45, 0.67, -0.12, 0.23, ...] (different numbers)
The embedding model (a neural network) has learned that "reset password" and "forgot credentials" are semantically similar even though they share zero words. That is the power — search by meaning, not by keyword matching.
Cosine similarity — the math you actually need
To compare two embeddings, you compute cosine similarity: how much do these two vectors point in the same direction?
function cosineSimilarity(a: number[], b: number[]): number {
let dot = 0;
let normA = 0;
let normB = 0;
for (let i = 0; i < a.length; i++) {
dot += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}
// Returns a number between -1 and 1:
// 1.0 = identical meaning
// 0.7+ = very similar
// 0.5 = somewhat related
// 0.0 = unrelated
// -1.0 = opposite meaning
That is the entire math. You do not need to understand backpropagation or attention mechanisms. You need to know: generate embeddings, compare with cosine similarity, rank by score.
Generating embeddings in the browser
You have two options: run the embedding model locally or call an API.
Option 1: Transformers.js (fully client-side)
import { pipeline } from '@xenova/transformers';
// Load the model (first call downloads ~23MB quantized)
const embedder = await pipeline(
'feature-extraction',
'Xenova/all-MiniLM-L6-v2',
{ quantized: true }
);
async function embed(text: string): Promise<number[]> {
const output = await embedder(text, {
pooling: 'mean',
normalize: true,
});
return Array.from(output.data);
}
// Usage
const queryVec = await embed("How do I reset my password?");
// → Float32Array of 384 dimensions, normalized to unit length
Option 2: Chrome Built-in AI (emerging, experimental)
// Chrome 131+ with AI Origin Trial
// No model download — uses Chrome's built-in model
async function embedWithChromeAI(text: string): Promise<number[]> {
if (!('ai' in self) || !('languageModel' in (self as any).ai)) {
throw new Error('Chrome AI not available');
}
const session = await (self as any).ai.languageModel.create();
// Note: API is still evolving — check current docs
const embedding = await session.embed(text);
return Array.from(embedding);
}
Option 3: Server API (OpenAI, Cohere, Voyage)
async function embedViaAPI(texts: string[]): Promise<number[][]> {
const res = await fetch('/api/embed', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ texts }),
});
const data = await res.json();
return data.embeddings;
}
| Approach | Model size | Latency | Privacy | Offline | Dimensions |
|---|---|---|---|---|---|
| Transformers.js (MiniLM) | 23MB (quantized) | ~50ms/text | Full | Yes | 384 |
| Chrome AI | 0 (built-in) | ~20ms/text | Full | Yes | Varies |
| OpenAI text-embedding-3-small | 0 (API) | ~200ms/batch | Server-side | No | 1536 |
| Voyage voyage-3-lite | 0 (API) | ~150ms/batch | Server-side | No | 512 |
For a privacy-first client-side search feature, Transformers.js with all-MiniLM-L6-v2 is the proven choice. 384 dimensions, 23MB download, excellent quality for search and similarity.
Storing vectors in IndexedDB
You need somewhere to store embeddings between sessions. IndexedDB is the only browser API with enough capacity.
interface VectorRecord {
id: string;
text: string;
embedding: number[];
metadata?: Record<string, any>;
updatedAt: number;
}
class VectorStore {
private db: IDBDatabase | null = null;
private readonly dbName: string;
private readonly storeName = 'vectors';
constructor(dbName: string) {
this.dbName = dbName;
}
async open(): Promise<void> {
return new Promise((resolve, reject) => {
const request = indexedDB.open(this.dbName, 1);
request.onupgradeneeded = () => {
const db = request.result;
if (!db.objectStoreNames.contains(this.storeName)) {
db.createObjectStore(this.storeName, { keyPath: 'id' });
}
};
request.onsuccess = () => {
this.db = request.result;
resolve();
};
request.onerror = () => reject(request.error);
});
}
async upsert(record: VectorRecord): Promise<void> {
return new Promise((resolve, reject) => {
const tx = this.db!.transaction(this.storeName, 'readwrite');
tx.objectStore(this.storeName).put(record);
tx.oncomplete = () => resolve();
tx.onerror = () => reject(tx.error);
});
}
async getAll(): Promise<VectorRecord[]> {
return new Promise((resolve, reject) => {
const tx = this.db!.transaction(this.storeName, 'readonly');
const req = tx.objectStore(this.storeName).getAll();
req.onsuccess = () => resolve(req.result);
req.onerror = () => reject(req.error);
});
}
async search(queryEmbedding: number[], topK = 5): Promise<Array<VectorRecord & { score: number }>> {
const all = await this.getAll();
const scored = all.map(record => ({
...record,
score: cosineSimilarity(queryEmbedding, record.embedding),
}));
scored.sort((a, b) => b.score - a.score);
return scored.slice(0, topK);
}
}
This is a brute-force search — it compares the query against every vector. For up to ~10,000 documents, this is fast enough (under 50ms on a modern device). Beyond that, you need approximate nearest neighbor (ANN) techniques.
Scaling: approximate nearest neighbor search
At 50,000+ vectors, brute-force search becomes too slow. ANN algorithms trade a small amount of accuracy for massive speed gains.
Simple approach: partition by clusters
// Pre-compute cluster centroids (do this once during indexing)
function buildClusters(vectors: VectorRecord[], numClusters = 16): Map<number, VectorRecord[]> {
// K-means clustering (simplified — use a library in production)
const centroids = vectors.slice(0, numClusters).map(v => v.embedding);
const clusters = new Map<number, VectorRecord[]>();
for (const vec of vectors) {
let bestCluster = 0;
let bestSim = -1;
for (let i = 0; i < centroids.length; i++) {
const sim = cosineSimilarity(vec.embedding, centroids[i]);
if (sim > bestSim) { bestSim = sim; bestCluster = i; }
}
if (!clusters.has(bestCluster)) clusters.set(bestCluster, []);
clusters.get(bestCluster)!.push(vec);
}
return clusters;
}
// Search: only compare against vectors in the closest clusters
function searchWithClusters(
query: number[],
clusters: Map<number, VectorRecord[]>,
centroids: number[][],
nProbe = 3,
topK = 5
): Array<VectorRecord & { score: number }> {
// Find the closest clusters to the query
const clusterScores = centroids.map((c, i) => ({
index: i,
score: cosineSimilarity(query, c),
}));
clusterScores.sort((a, b) => b.score - a.score);
// Only search vectors in the top nProbe clusters
const candidates: Array<VectorRecord & { score: number }> = [];
for (const { index } of clusterScores.slice(0, nProbe)) {
const clusterVecs = clusters.get(index) || [];
for (const vec of clusterVecs) {
candidates.push({ ...vec, score: cosineSimilarity(query, vec.embedding) });
}
}
candidates.sort((a, b) => b.score - a.score);
return candidates.slice(0, topK);
}
With 16 clusters and nProbe=3, you search ~19% of vectors instead of 100%. At 50K vectors, that is 9,500 comparisons instead of 50,000 — fast enough for sub-100ms results.
Hybrid search: combining vectors with keywords
Pure semantic search has a weakness: it can miss exact term matches. If a user searches for "error ERR_CONNECTION_REFUSED" they expect an exact match, not a semantically similar result about network problems.
The solution is hybrid search — combine vector similarity with keyword matching:
interface HybridResult {
id: string;
text: string;
semanticScore: number;
keywordScore: number;
combinedScore: number;
}
function hybridSearch(
query: string,
queryEmbedding: number[],
records: VectorRecord[],
topK = 10,
alpha = 0.7 // Weight: 0 = all keyword, 1 = all semantic
): HybridResult[] {
const queryTerms = query.toLowerCase().split(/s+/);
const results: HybridResult[] = records.map(record => {
const semanticScore = cosineSimilarity(queryEmbedding, record.embedding);
// Simple TF-based keyword score
const textLower = record.text.toLowerCase();
const matchedTerms = queryTerms.filter(term => textLower.includes(term));
const keywordScore = matchedTerms.length / queryTerms.length;
return {
id: record.id,
text: record.text,
semanticScore,
keywordScore,
combinedScore: alpha * semanticScore + (1 - alpha) * keywordScore,
};
});
results.sort((a, b) => b.combinedScore - a.combinedScore);
return results.slice(0, topK);
}
The alpha parameter controls the balance. Start with 0.7 (favor semantic), then tune based on user feedback and search analytics.
Incremental indexing
Documents change. Notes get edited. Pages get updated. You need to re-embed only what changed, not the entire corpus.
async function incrementalIndex(
store: VectorStore,
embedFn: (text: string) => Promise<number[]>,
documents: Array<{ id: string; text: string; updatedAt: number }>
): Promise<{ indexed: number; skipped: number }> {
const existing = await store.getAll();
const existingMap = new Map(existing.map(r => [r.id, r]));
let indexed = 0;
let skipped = 0;
for (const doc of documents) {
const prev = existingMap.get(doc.id);
// Skip if unchanged
if (prev && prev.updatedAt >= doc.updatedAt) {
skipped++;
continue;
}
const embedding = await embedFn(doc.text);
await store.upsert({
id: doc.id,
text: doc.text,
embedding,
updatedAt: doc.updatedAt,
});
indexed++;
}
return { indexed, skipped };
}
Practice designing this
Ready to apply these concepts?
- AI-Powered Smart Search — design a semantic search system with hybrid keyword fallback
- Client-Side Embedding Search — design a privacy-first search system using IndexedDB vector storage
For the full RAG pipeline (retrieval + generation), see The Frontend Engineer's Guide to RAG.
LLM-friendly summary
A frontend-focused explanation of embeddings and vector search, covering what embeddings are, cosine similarity math, generating embeddings with Transformers.js, storing vectors in IndexedDB, and building semantic search entirely client-side.