On-Call for Small Engineering Teams: A Practical Guide
Running on-call with a small engineering team is a unique challenge. The standard advice—"have at least 6 people in your rotation"—doesn't help when your entire engineering team is 4 people. Yet your systems still need monitoring, your customers still expect reliability, and incidents still happen at 2 AM.
This guide is for the scrappy teams making it work with limited resources. Here's how to build sustainable on-call when you don't have enterprise headcount.
The Small Team Reality
Let's be honest about the constraints small teams face:
Limited People
With 3-5 engineers, everyone is in the rotation. There's no "let someone else handle it."
Broad Responsibilities
The same person who wrote the feature might also be managing infrastructure, handling customer support, and now responding to incidents.
Budget Constraints
Enterprise on-call tools at $20-30/user/month feel excessive when that money could go toward another service or tool.
No Dedicated SRE
You don't have a Site Reliability team. Reliability is everyone's job.
Despite these constraints, you can absolutely build a sustainable on-call program. It just requires smart choices about what matters.
Essential vs. Nice-to-Have for Small Teams
Essential (Do This First)
- Basic monitoring - Know when things break
- Alert routing - Get notifications to the right person
- Clear escalation - What to do when stuck
- Simple rotation - Who's responsible when
Nice-to-Have (Add Later)
- Automated runbooks
- Multiple escalation tiers
- Sophisticated alert correlation
- Phone/SMS paging (if Slack push notifications work)
Don't let perfect be the enemy of good. Start simple and iterate.
On-Call Patterns for Small Teams
Pattern 1: Simple Weekly Rotation (3-5 people)
The most straightforward approach:
Week 1: Alice
Week 2: Bob
Week 3: Carol
(repeat)
Make it work:
- Shift starts Monday morning, ends Sunday night
- Primary person handles everything unless truly stuck
- If stuck, message the team Slack channel for help
- No formal secondary—anyone available can jump in
With OnCallManager: Set up a single rotation with all team members. The app handles the scheduling, notifications, and visibility automatically.
Pattern 2: Business Hours + Overnight Split (2-3 people)
When you're really small, consider splitting day and night:
Business Hours (9 AM - 9 PM): On-call rotation
Overnight (9 PM - 9 AM): Critical alerts only → page all engineers
Rationale: Most incidents happen during usage peaks. Overnight, you only page for true emergencies affecting production.
Implementation:
- Configure monitoring to have higher thresholds overnight
- Only P1 (full outage) alerts page after hours
- P2/P3 can wait until morning
Pattern 3: Shared On-Call (2 people)
With only 2 engineers, you're both effectively always on-call. Make it explicit:
Week 1: Primary: Alice, Backup: Bob
Week 2: Primary: Bob, Backup: Alice
Rules:
- Primary gets paged first
- Primary can hand off to backup if unavailable (dinner, gym, etc.)
- Both check Slack regularly during on-call weeks
- Either can acknowledge and handle alerts
Pattern 4: Founder-Included Rotation
At very early-stage startups, founders should be in the on-call rotation:
Week 1: CTO
Week 2: Engineer 1
Week 3: Engineer 2
Week 4: CTO
Why this works:
- Founders feel the pain of incidents directly
- Keeps founding team connected to technical reality
- Signals that on-call is important, not a dumping ground
- Provides backup when team is small
As the team grows, founders can rotate out.
Reducing Alert Volume (Critical for Small Teams)
With limited people, every unnecessary alert is a bigger burden. Ruthlessly optimize:
Delete Useless Alerts
Ask for each alert: "What action do I take when this fires?" If the answer is "nothing" or "check it tomorrow," delete or downgrade the alert.
Increase Thresholds
If 80% CPU pages you but requires no action, raise the threshold to 95%. Only alert on actionable conditions.
Consolidate Alerts
Instead of 10 alerts for related symptoms, create one alert for the actual problem. "Database slow" beats "query 1 slow + query 2 slow + connection slow..."
Schedule Non-Urgent Alerts
P3/P4 issues don't need to page overnight. Queue them for business hours.
Target: Less than 1 actionable alert per on-call shift (yes, really). Investigate every page and either fix the underlying issue or adjust the alert.
Tools for Budget-Conscious Teams
Free and Low-Cost Monitoring
- Uptime Robot - Free basic uptime monitoring
- Prometheus + Alertmanager - Free, self-hosted
- Grafana Cloud Free Tier - Limited but useful
- PagerDuty Free Tier - Up to 5 users with limitations
Affordable On-Call Management
- OnCallManager - $50/month flat, unlimited users
- Perfect for small teams: no per-user cost that grows
- Slack-native, where you already work
- Simple rotation management without enterprise complexity
Slack as Command Center
Your team likely already pays for Slack. Use it as your incident hub:
- Alert notifications to channels
- On-call rotation visibility (with OnCallManager)
- Incident coordination in threads
- Handoff notes in team channel
Managing Burnout with Limited People
Small teams are at higher burnout risk because the same people handle every shift. Protect your team:
Set Boundaries
Even when on-call, define response expectations:
- P1: Respond within 15 minutes
- P2: Respond within 1 hour
- P3: Respond next business day
Not everything requires dropping dinner immediately.
Comp Time Matters
If someone is up at 3 AM, don't expect a full workday tomorrow. Have a explicit policy:
- Major overnight incident → Morning off
- Weekend incident work → Comp time during week
Rotate Fairly
With 3 people, being on-call every third week is already a lot. Track the load:
- Are incidents evenly distributed?
- Is one person getting all the 3 AM calls?
- Adjust rotation starting days to redistribute bad timing.
Invest in Prevention
The best alert is the one that never fires. Spend engineering time on:
- Fixing flaky systems
- Improving error handling
- Adding graceful degradation
- Building better monitoring
Every hour spent preventing incidents is worth five hours of on-call burden.
Growing Your On-Call Program
As your team grows, your on-call program should evolve:
3 Engineers → 5 Engineers
- Maintain simple weekly rotation
- On-call frequency improves from every 3 weeks to every 5 weeks
- Consider adding secondary on-call for backup
5 Engineers → 10 Engineers
- May split into two rotations by domain (frontend/backend, or by service)
- Introduce more formal escalation paths
- Start tracking on-call metrics
10+ Engineers
- You're no longer a "small team" for on-call purposes
- Consider dedicated SRE hires
- Evaluate more sophisticated tools if needed
Common Small Team Mistakes
Over-Engineering Too Early
You don't need PagerDuty's full feature set with 3 engineers. Start simple.
Copying Enterprise Playbooks
What works for a 100-person eng org won't work for you. Adapt, don't adopt blindly.
Not Having On-Call at All
"We're too small for on-call" means "we don't know when things break." Bad plan.
Making On-Call Punitive
On-call shouldn't be the person who last broke prod. It should be a shared responsibility.
Ignoring Alert Fatigue
Small teams can't afford alert fatigue. One burned-out engineer is 33% of your team (if you have 3).
Setting Up On-Call This Week
Here's a practical plan to get started:
Day 1: Choose Your Pattern
- Pick a rotation pattern from above that fits your team size
- Decide on shift length (weekly is usually best)
- Write down your escalation path (who to call when stuck)
Day 2: Set Up Basic Monitoring
- Ensure you have uptime monitoring for critical endpoints
- Configure alerts to go to a Slack channel
- Set up OnCallManager for rotation visibility
Day 3: Create Your First Rotation
- Add team members to OnCallManager
- Set the rotation schedule
- Test that notifications work
Day 4: Document Expectations
- Write a short on-call guide (1 page max)
- Define severity levels and response times
- Share with the team
Day 5: Go Live
- Start the rotation
- Commit to reviewing after 2 weeks
- Iterate based on real experience
The OnCallManager Advantage for Small Teams
We built OnCallManager with small teams in mind:
Flat Pricing: $50/month regardless of team size. No per-user costs that grow as you hire.
Slack-Native: No new tool to learn. Manage rotations where you already work.
Simple Setup: Create a rotation in minutes, not hours. No enterprise configuration required.
Just What You Need: Rotation management without incident lifecycle features you don't need yet.
14-Day Free Trial: Try it without commitment. No credit card required.
For a 5-person team, OnCallManager is ~90% cheaper than per-user alternatives while providing the core rotation management you need.
Conclusion
On-call for small teams isn't about having less on-call—it's about having smarter on-call. With limited resources, every decision matters more:
- Choose simple patterns that match your team size
- Ruthlessly reduce alert volume
- Protect your team from burnout
- Use tools that don't punish you for growing
Small teams can absolutely provide reliable service and maintain quality of life. It just takes intentional design and the right tools.
Ready to simplify on-call for your small team? Add OnCallManager to Slack and get started with a 14-day free trial. Flat pricing means it works for your team today and as you grow.
Related reading: