Sangwoo Lee

Posted on Nov 4 • Edited on Nov 5

Refactoring Mass Push Notifications: From Offset to Cursor Pagination

#firebase #nestjs #pagination #cursor

How I achieved 17x-400x performance improvement by switching from offset to cursor-based pagination in my mass notification server?

When your Firebase push notification server needs to process millions of user tokens, every query matters. I learned this the hard way when my offset-based pagination started taking 30+ seconds to process deep pages. Here's how I achieved up to 400x performance improvement with one architectural change. —and the critical bug I almost shipped to production.

The Problem: Offset Pagination at Scale

My original implementation used TypeORM's skip() and take() methods—the standard offset-based pagination approach that works beautifully for small datasets but becomes a performance nightmare at scale.

Before Refactoring

// notification.service.ts - Original Implementation
async getUserTokensForNotification(offset: number, limit: number) {
  const users = await this.userRepository
    .createQueryBuilder('user')
    .select(['user.push_token', 'user.cell_phone', 'user.logindate'])
    .orderBy('user.id', 'ASC')
    .skip(offset)      // OFFSET in SQL
    .take(limit)       // LIMIT in SQL
    .getMany();

  return users;
}

// Processing loop
for (let page = 0; page < totalPages; page++) {
  const offset = page * BATCH_SIZE;
  const tokens = await getUserTokensForNotification(offset, BATCH_SIZE);
  await sendNotifications(tokens);
}

This generated SQL queries like:

SELECT push_token, cell_phone, logindate 
FROM users 
ORDER BY id ASC 
LIMIT 1000 OFFSET 50000;

Why This Hurts Performance

The fundamental issue with OFFSET is what database engineers call the row skipping problem. Here's what happens under the hood:

Database scans rows from the beginning
Counts and discards all rows up to the OFFSET value
Returns only the next LIMIT rows
For OFFSET 100000, LIMIT 1000, the database processes 101,000 rows but returns only 1,000

Real performance data from my PostgreSQL instance with 1M user records:

Scenario	Query Time	Slowdown Factor
OFFSET 0	0.025ms	baseline
OFFSET 10,000	3.138ms	125x slower
OFFSET 50,000	16.933ms	677x slower
OFFSET 100,000	30.166ms	1,207x slower

As you paginate deeper, performance degrades linearly. When processing millions of tokens for mass notifications, pages 100+ were taking 30-80 seconds each.

Additional Problems I Encountered

Beyond performance, offset pagination caused operational issues:

Duplicate notifications: When new users registered during processing, existing users shifted between pages, causing some to receive duplicate notifications
Missing users: Deleted accounts caused page shifts, skipping some users entirely
Non-deterministic ordering: Without a stable sort key (just using id), results could vary between requests on tie-breaks

The Solution: Cursor-Based Pagination

Cursor pagination (also called "keyset pagination" or "seek method") eliminates row scanning by using indexed WHERE clauses instead of OFFSET.

After Refactoring

// notification.service.ts - Refactored Implementation
async getUserTokensForNotification(
  lastSeq?: number, 
  limit: number = 1000
): Promise {
  const queryBuilder = this.userRepository
    .createQueryBuilder('user')
    .select(['user.seq', 'user.push_token', 'user.cell_phone', 'user.logindate'])
    .orderBy('user.seq', 'ASC')
    .limit(limit + 1);  // Fetch one extra to detect hasMore

  // Cursor-based filtering: seek directly to position
  if (lastSeq) {
    queryBuilder.where('user.seq > :lastSeq', { lastSeq });
  }

  const users = await queryBuilder.getMany();
  const hasMore = users.length > limit;

  if (hasMore) {
    users.pop();  // Remove the extra item
  }

  return {
    users,
    nextCursor: hasMore ? users[users.length - 1].seq : null
  };
}

// Processing loop - now cursor-based
let cursor: number | null = null;
do {
  const result = await getUserTokensForNotification(cursor, BATCH_SIZE);
  await sendNotifications(result.users);
  cursor = result.nextCursor;
} while (cursor !== null);

The generated SQL now uses an indexed seek:

-- First page
SELECT seq, push_token, cell_phone, logindate 
FROM users 
ORDER BY seq ASC 
LIMIT 1001;

-- Subsequent pages (lastSeq = 1000)
SELECT seq, push_token, cell_phone, logindate 
FROM users 
WHERE seq > 1000
ORDER BY seq ASC 
LIMIT 1001;

Why This Works So Much Better
The magic is in the WHERE seq > :lastSeq clause. With a proper index on the seqcolumn, the database can:

Jump directly to the position using the index (O(log n) operation)
Read forward from that position
No row scanning or counting required

With my composite index in place:

CREATE INDEX idx_users_seq ON users(seq ASC);

The database execution plan changed from a sequential scan to an index seek:

Before (OFFSET 100,000):

Execution Time: 30.166ms
- Sequential Scan on users
- Rows Scanned: 101,000
- Rows Returned: 1,000

After (Cursor with Index):

Execution Time: 0.027ms (1,117x faster!)
- Index Scan using idx_users_seq
- Index Cond: (seq > 100000)
- Rows Scanned: 1,001
- Rows Returned: 1,000

Performance became constant regardless of position in the dataset.

Implementation Pitfall: The Infinite Loop Bug

Just when I thought the refactoring was complete, I discovered a critical bug that could have caused an infinite loop in production. This happened when I combined cursor pagination with in-memory filtering for complex business logic.

The Bug: Post-Query Filtering Breaks Cursor Logic

My notification system needed to filter users based on their last login date—but the date format in the database was inconsistent (mix of Korean locale strings like "2025-10-30오후 5:28:28" and standard datetime formats). Parsing this in SQL would be complex and slow, so I decided to filter in-memory after fetching.

The Broken Implementation:

while (true) {
  // Fetch 10,000 records from database
  let usersInPage = await queryBuilder
    .where('user.seq > :lastSeq', { lastSeq })
    .orderBy('user.seq', 'ASC')
    .take(10000)
    .getMany();

  if (usersInPage.length === 0) break;

  // Apply in-memory filter (removes most records!)
  usersInPage = filterByLoginDate(usersInPage, {
    minDaysSinceLogin: 10,
    maxDaysSinceLogin: 30
  });
  // Only ~1,875 records remain after filtering

  // Collect tokens...
  for (const user of usersInPage) {
    tokens.add(user.push_token);
  }

  // 🔴 BUG: Using filtered array to update cursor!
  if (usersInPage.length > 0) {
    lastSeq = usersInPage[usersInPage.length - 1].seq;
  }
}

What Went Wrong: The Infinite Loop Mechanism

Here's the deadly sequence that caused infinite pagination:

First Query: WHERE seq > 0 → Returns seq 1-10000 (10,000 records)
In-Memory Filter: Apply logindate conditions → Only 1,875 records remain (Sparse distribution: seq values like 1, 50, 100, 250, ..., 2000)
Cursor Update: lastSeq = 2000 ← Using the filtered array's last seq!
Second Query: WHERE seq > 2000 → Returns seq 2001-12000 (10,000 records)
❌ Problem: Seqs 2001-10000 were already processed in the first query but are being fetched again!

The cursor jumped backward in the actual dataset, causing overlapping queries:

Query 1: seq 1    ─────────────────> seq 10000  (fetch)
                       ↓ filter ↓
         seq 1, 50, 100, ..., 2000              (keep)
                                  ↑
                          lastSeq = 2000 ❌

Query 2:                 seq 2001 ──────> seq 12000  (fetch)
         ├─── OVERLAP! ──┤
         2001-10000 processed twice!

Query 3:                         seq 12001 ──────> ...
         ├──── OVERLAP! ─────┤
         ...continues forever...

Real Production Logs:

[Job abc123] 1,875 records matched (Total: 1,068,158)
[Job abc123] 1,875 records matched (Total: 1,070,033)
[Job abc123] 1,875 records matched (Total: 1,071,908)
[Job abc123] 1,875 records matched (Total: 1,073,783)
...
[Job abc123] 1,875 records matched (Total: 1,096,283)

Notice how we're accumulating 1,875 records per iteration infinitely—the filtered count, not the fetched count. The cursor was stuck in a loop, re-processing the same dataset range over and over. We had processed over 1 million records when the actual target was only 65,679 unique users!

The Fix: Capture Cursor Before Filtering

The solution is simple but critical: save the original cursor position before applying any in-memory filtering.

Corrected Implementation:

while (true) {
  // Fetch from database
  let usersInPage = await queryBuilder
    .where('user.seq > :lastSeq', { lastSeq })
    .orderBy('user.seq', 'ASC')
    .take(10000)
    .getMany();

  if (usersInPage.length === 0) break;

  // ✅ FIX: Capture the last seq BEFORE filtering
  const lastSeqInOriginalBatch = usersInPage[usersInPage.length - 1].seq;

  // Now safe to filter in-memory
  usersInPage = filterByLoginDate(usersInPage, {
    minDaysSinceLogin: 10,
    maxDaysSinceLogin: 30
  });

  // Collect tokens from filtered results
  for (const user of usersInPage) {
    tokens.add(user.push_token);
  }

  // ✅ Use the original batch's last seq for cursor
  lastSeq = lastSeqInOriginalBatch;

  await delay(100); // Brief delay to reduce DB load
}

Why This Works:

Now the cursor correctly advances through the entire dataset:

Query 1: seq 1 ───────────> seq 10000  (fetch)
                                    ↑
                     lastSeq = 10000 ✅

Query 2:                   seq 10001 ───────────> seq 20000
                                                        ↑
                                          lastSeq = 20000 ✅

Query 3:                                       seq 20001 ──────> ...

No overlaps, no infinite loops. Each query picks up exactly where the previous one left off, regardless of how many records were filtered out.

Results After the Fix:

[Job abc123] 10,000 fetched → 1,875 matched (Total: 1,875)
[Job abc123] 10,000 fetched → 2,134 matched (Total: 4,009)
[Job abc123] 10,000 fetched → 1,956 matched (Total: 5,965)
...
[Job abc123] 6,234 fetched → 1,103 matched (Total: 65,679)
[Job abc123] 0 fetched → pagination complete ✅

Final: 65,679 unique users processed (matching SQL query result)

Key Lessons Learned

1. Cursor Must Always Reflect Database State, Not Filtered State

Your cursor should advance through the actual dataset, not your application's filtered view of it. The cursor is a database concept, not an application concept.

// ❌ Wrong: Cursor based on application logic
lastSeq = filteredResults[filteredResults.length - 1].seq;

// ✅ Correct: Cursor based on database scan position
lastSeq = originalResults[originalResults.length - 1].seq;

2. If You Must Filter In-Memory, Capture Cursor First

When combining cursor pagination with post-query filtering:

const originalBatch = await query.getMany();
const cursorValue = originalBatch[originalBatch.length - 1].cursorField;

// Now safe to mutate or filter
const filtered = applyBusinessLogic(originalBatch);

// Use original cursor, not filtered cursor
nextCursor = cursorValue;

3. Better: Push Filtering Into SQL Whenever Possible

In-memory filtering was a code smell. I later refactored to handle date parsing in SQL:

queryBuilder
  .where('user.seq > :lastSeq', { lastSeq })
  .andWhere('user.logindate IS NOT NULL')
  .andWhere(`
    DATEDIFF(DAY, 
      CONVERT(DATE, LEFT(user.logindate, 10)), 
      GETDATE()
    ) BETWEEN :minDays AND :maxDays
  `, { minDays: 10, maxDays: 30 });

This eliminated in-memory filtering entirely, improved performance further, and made the code more maintainable.

4. Add Defensive Pagination Limits

To catch infinite loop bugs early in development:

const MAX_ITERATIONS = 1000;
let iterations = 0;

while (true) {
  if (++iterations > MAX_ITERATIONS) {
    throw new Error(
      `Pagination exceeded ${MAX_ITERATIONS} iterations. Possible infinite loop!`
    );
  }
  // ... pagination logic
}

5. Monitor Cumulative Metrics, Not Just Per-Page Metrics

Track total records processed vs. expected:

console.log(`Processed: ${totalProcessed} | Expected: ~${expectedTotal}`);

if (totalProcessed > expectedTotal * 1.5) {
  logger.warn('Processing significantly more records than expected - investigate for duplicates');
}

Implementation Best Practices

1. Always Use a Unique, Sequential Column

I used seq (a sequential integer column) rather than id (UUID) because:

Sequential integers are index-friendly
Natural ordering enables efficient range scans
UUIDs lack sequential properties

@Entity()
export class User {
  @PrimaryGeneratedColumn('uuid')
  id: string;

  @Column()
  @Index()  // Critical for performance!
  seq: number;  // Auto-incrementing sequence

  @Column()
  push_token: string;
}

2. Handle the Edge Case: First Page

Notice my code handles the optional cursor:

if (lastSeq) {
  queryBuilder.where('user.seq > :lastSeq', { lastSeq });
}

When lastSeq is null or undefined, skip the WHERE clause to fetch from the beginning.

3. Fetch N+1 Items to Detect hasMore

Request limit + 1 items:

.limit(limit + 1);  // Request one extra

Then check if you got more than requested:

const hasMore = users.length > limit;
if (hasMore) {
  users.pop();  // Remove the extra item
}

This tells us whether there's a next page without a separate COUNT query.

4. Return Structured Results

Our refactored method returns both data and pagination state:

return {
  users,              // The actual data
  nextCursor: hasMore ? users[users.length - 1].seq : null  // Next page cursor
};

This makes the consuming code clean and the pagination state explicit.

5. Process in a Simple Loop

The cursor-based loop is elegantly simple:

let cursor: number | null = null;
let processedCount = 0;

do {
  const result = await getUserTokensForNotification(cursor, BATCH_SIZE);

  // Process this batch
  await sendNotifications(result.users);
  processedCount += result.users.length;

  // Move cursor forward
  cursor = result.nextCursor;

  this.logger.log(`Processed ${processedCount} users so far...`);
} while (cursor !== null);

this.logger.log(`Completed: ${processedCount} total users processed`);

Performance Results

After deploying cursor pagination to production:

Metric	Before (Offset)	After (Cursor)	Improvement
First page	0.025ms	0.025ms	Same
Page 50 (50K offset)	16.933ms	0.027ms	627x faster
Page 100 (100K offset)	30.166ms	0.027ms	1,117x faster
End-to-end (1M users)	~45 minutes	~2 minutes	22x faster
Database CPU usage	85% peak	12% peak	7x reduction
Infinite loop bug caught	N/A	Prevented	Avoided disaster

The real-world impact was dramatic. Our daily batch notification job that previously took 45 minutes now completes in under 2 minutes. And by catching the infinite loop bug before production, we avoided what could have been a severe incident.

Trade-offs and When to Use Each Approach

Use Cursor Pagination When:

✅ Dataset is large (>100K records)
✅ Processing sequentially (batch jobs, exports)
✅ Performance is critical
✅ Real-time feeds or infinite scroll
✅ High write volume (avoids duplicate/skip issues)
✅ Need stable iteration (no duplicates/skips from concurrent writes)

Use Offset Pagination When:

✅ Small datasets (<10K records)
✅ Need page numbers or jump-to-page functionality
✅ Admin interfaces where UX expectations include page numbers
✅ Read-only or rarely-changing data
✅ Sorting by non-unique, non-indexed columns

Conclusion

Switching from offset to cursor pagination in my Firebase push notification server was one of the highest-impact refactorings I've made. The 1,000x+ performance improvement on deep pages and 22x faster end-to-end processing came from understanding a fundamental database principle: indexed seeks are orders of magnitude faster than sequential scans.

But the journey also taught me a critical lesson about cursor integrity: your cursor must always reflect the database's position, not your application's filtered view. Mixing cursor pagination with in-memory filtering nearly shipped an infinite loop to production that would have cost us thousands in unnecessary database load and failed notification deliveries.

For any system processing large datasets—whether sending mass notifications, exporting data, or building infinite scroll feeds—cursor-based pagination isn't just an optimization; it's a necessity for scalability. Just remember: capture your cursor before any filtering, push logic into SQL when possible, and always monitor for anomalous iteration counts.

In Part 2 of this series, I'll explore moving from cell-phone-based to push-token-based deduplication, and why understanding Firebase token lifecycle is critical for notification systems.

Key Takeaways

Offset pagination degrades linearly with dataset size (O(n) where n = offset)
Cursor pagination maintains constant performance via indexed seeks (O(log n))
Always index your cursor column and use sequential/unique values
Capture cursor position BEFORE any in-memory filtering to avoid infinite loops
Monitor cumulative metrics to detect pagination bugs early
Fetch N+1 items to detect hasMore without extra COUNT queries
For batch processing at scale, cursor pagination is essential

DEV Community