A year ago, an early version of Nudge would send a flurry of notifications a day to a busy user. Today it sends a small handful, timed to moments you're likely to act. Nothing about the product got less ambitious. The task system got larger, the AI features got better, the user base grew. What changed is that we stopped measuring the wrong thing.
The wrong thing is how many useful prompts an AI can generate. The right thing is how many prompts a real person can absorb before they start ignoring the app. Those numbers are not close.
The temptation
Any AI that can reason about your tasks can generate an impressive number of "helpful" things to tell you. It can remind you of deadlines. Suggest subtasks. Flag conflicts. Offer to reschedule. Propose better wording. Point out that you've been snoozing this one task for three weeks. Celebrate streaks.
Each of these individually is defensible. Stacked together they make a phone that buzzes every 40 minutes. That is the thing users uninstall.
A naive implementation of an AI productivity assistant looks something like: every time a new task is added, the AI evaluates it, proposes a plan, schedules reminders for relevant checkpoints, and sends a nudge when each checkpoint arrives. Add 10 tasks in a week and you have 40 to 60 nudges queued up, spread across the week. The first week feels smart. The second feels oppressive. The third week the notification badge stops being read.
We know this because we built roughly that version first.
The scoring problem
The hard part was figuring out how to decide what not to send. An LLM is excellent at producing candidate nudges and mediocre at ranking them against each other. Everything looks plausible in isolation, which is the same problem humans have when they write their own to-do lists.
The approach we landed on is a two-stage filter. Stage one is a fast scorer that runs on every candidate nudge. It asks four questions:
Is this nudge redundant? If we sent something similar in the last 90 minutes, drop it. If the user has completed a related task recently, drop it.
Does the user have cognitive budget right now? A rolling seven-day model of when the user actually acts on notifications feeds this. If we're in a predicted low-availability window, delay or drop.
Is the task actually near its action moment? A deadline in four days with no fixed time does not need a nudge today. A deadline tomorrow at 5 PM might.
Would another nudge for the same task undermine the first? We track a per-task daily cap. The default is two. Above that the scheduler refuses to add more, no matter how smart the reasoning path.
Stage two is a global cooldown. Even if stage one approves, a nudge does not fire if a user received any other nudge in the previous 15 minutes. That single rule produced the biggest drop in daily nudge counts of anything we shipped. It also forced us to get smarter about which nudges made it through the filter, because we could no longer blast low-signal reminders and rely on volume.
AI restraint is harder than AI capability
The cultural problem inside any team building with LLMs is that producing output is cheap and restraining output is expensive. A junior engineer can wire up "the LLM suggests three improvements to your task" in an afternoon. Deciding when not to show those suggestions requires telemetry, user research, ranking models, and a willingness to ship fewer features. That work isn't demoable. It doesn't look like progress in a screenshot.
We shipped the restraint work because of a specific metric: the Quiet Score. Every nudge we send has a follow-up signal within 24 hours. Did the user act on it? Did they ignore it? Did they snooze? Did a later nudge for the same task also fail? A high-quality delivery system is one where most nudges produce action, and most of the rest produce nothing (which is fine). A bad system is one where most nudges produce a dismissal, because dismissals cost trust.
When we started plotting Quiet Score against notification volume, the shape was obvious. At 8 notifications a day the score was terrible. At 3 it was reasonable. At 1 it was good but coverage suffered, some real deadlines slipped through. Three turned out to be the local optimum.
What got cut
A few concrete things we pulled out of the system along the way.
Morning summaries. An early version sent a "here's your day" notification every morning. Users liked the idea in interviews and ignored the notifications in practice. We kept the feature but moved it to the in-app home screen instead of a push. Same information. Zero notification cost.
Streak reminders. "You've completed 5 tasks in a row!" is the kind of notification that feels celebratory for about three days and annoying forever. Cut.
Suggestion notifications. "We noticed you snoozed this task twice. Want to break it into subtasks?" Good intent, bad timing. These now live as a gentle card on the task detail screen. The user sees it when they are already looking at the task, which is a context in which the suggestion is welcome.
Automatic re-scheduling pings. If Nudge's planner reshuffled a task to a new day, it used to send a notification. Users found this confusing and mildly alarming. Now the planner just does its job quietly and shows the change on the home screen.
The pattern in all of these is the same. The information is still available to the user, but it has moved to pull instead of push. A push notification is a claim on the user's attention. It should be reserved for things that specifically need the user to act now.
The engineering side
A few implementation notes, if you're building something similar.
Our nudge scheduler uses Celery with ETA-based tasks for precise timing, backed by a Redis queue and a Postgres audit trail. Every nudge has a provenance field (AI_INFERRED vs USER_EXPLICIT) that gates how aggressively we're allowed to cancel or reschedule it. We never auto-cancel a user-explicit reminder for volume reasons. That rule is load-bearing for trust. If you set a reminder for your mom's birthday, we're going to deliver it no matter what the model thinks.
The ranking and cancellation logic lives in a daily planner that runs once per user per day. It's allowed to cancel stale AI-inferred nudges (up to 50 per pass) and re-queue fresh ones. It's forbidden from touching anything the user asked for. That boundary is more important than the ranking model.
The point
When we describe Nudge to people, the most common response is surprise that the AI doesn't do more. Where's the daily coaching. Why doesn't it analyze my productivity. Can't it send me encouragement.
The short answer is that it could, and it makes the product worse when it does. Attention is a finite budget. Every notification is a withdrawal. A lot of AI productivity tools lean heavy on volume, hoping that the user treats the first few as deposits toward a future where the app is indispensable. In practice users treat them as the reason they're going to silence the app.
Building AI that knows when to shut up is less exciting than building AI that has a lot to say. It is also, in our experience, what gets people to keep using the product past week two. Related: the science of getting nudged at the right time, and a task manager for people who hate task managers.
Nudge is built on the premise that fewer, better-timed nudges outperform more, slightly-off ones. Free on iPhone and web.



