Web application monitoring is the nervous system for our software—continuously listening, collecting signals, and analyzing them to catch problems before they become outages. A one-hour outage can cost millions. Amazon lost an estimated USD 34 million in 2021, while Meta lost close to USD 100 million in a similar incident. Effective monitoring moves our team from reactive firefighting to proactive fire prevention.
The Four Pillars of Telemetry
Modern applications are complex, distributed systems with dozens of moving parts. Without visibility into what’s happening inside, we’re flying blind. Monitoring solves this by collecting four types of telemetry data:
1. Metrics: The Vital Signs
Metrics are numeric measurements taken at regular intervals—response time, error rate, CPU usage, and throughput. They’re cheap to store and fast to query, making them perfect for dashboards and alerts.
2. Logs: The Detailed Narrative
While metrics tell us what happened, logs tell us why. When a 500 Internal Server Error occurs, the log entry provides the exact error message, stack trace, and crucial context like user ID or request parameters.
3. Traces: The Journey of a Request
A trace maps a request’s entire journey from the user’s browser, through backend APIs, to the database, and back. If a page loads slowly, a trace might reveal that 90% of the request time was spent waiting for a third-party authentication service.
4. Real User Monitoring: The User’s Perspective
Real User Monitoring (RUM) collects performance data directly from actual users’ browsers, giving us ground truth on load times, JavaScript errors, and how performance varies by browser, device, or location.
Monitoring Architecture: From Code to Dashboard
A modern monitoring setup captures telemetry at every step of a user interaction:
- Instrumentation: Add monitoring agents to our React frontend, Rails API, and database servers to capture metrics, logs, and traces.
- Aggregation: Send raw data to a central collector that standardizes and batches it.
- Storage: Index and store data in specialized time-series databases.
- Visualization: Build dashboards with tools like Grafana and set up alerting rules.
Setting Performance Goals with SLOs
Collecting telemetry data is just the start. We need to define what “good” looks like using Service Level Objectives (SLOs).
The SLI/SLO Framework
- Service Level Indicator (SLI): A quantifiable measurement like request latency or error rate.
- Service Level Objective (SLO): The target we set for an SLI over time—our internal quality promise.
Good SLIs measure what users actually care about:
- Availability: Percentage of requests that finish without errors (e.g., successful HTTP
200responses). - Latency: Percentage of requests completed faster than a threshold (e.g., 500 milliseconds).
Example: API Endpoint SLO
For a critical endpoint /api/v1/products, we might set:
- Latency SLO: 99% of requests served in under 300ms
- Availability SLO: 99.9% of requests successful
This gives us a clear, measurable definition of “good enough.”
If performance dips below these numbers, it’s time to pause feature work and focus on reliability. For expert help pinpointing impactful indicators, Ruby on Rails performance services can assist.
Practical Setup for Rails and React
Let’s implement monitoring on a real-world stack: Rails API, React frontend, and PostgreSQL database.
Rails Backend
Use opentelemetry-sdk to automatically instrument ActiveRecord, Rack,
and Faraday.
For structured logging, add a simple initializer:
# config/initializers/logging.rb
Rails.application.configure do
config.log_formatter = proc do |severity, datetime, progname, msg|
{
level: severity,
timestamp: datetime.iso8601,
message: msg,
pid: Process.pid
}.to_json + "\n"
end
endThis formats logs as JSON, making them machine-readable for any logging platform.
React Frontend
Use OpenTelemetry for JavaScript SDK to capture performance data and user interactions. This gives us visibility into crucial metrics like Largest Content-ful Paint (LCP), First Input Delay (FID), and Cumulative Layout Shift (CLS). We can connect slow frontend interactions to exact backend traces.
PostgreSQL Database
The pg_stat_statements extension tracks execution stats for every SQL statement. We can quickly identify:
- Queries with highest total execution time
- Most frequently executed queries
- Queries with high I/O wait times
Combined with Rails traces, we get a complete, end-to-end picture of request life cycles.
From Fighting Fires to Building Fortresses
We’ve covered a lot of ground, from the basic “what is it?” of web application monitoring to a practical roadmap for getting it done. By now, one thing should be crystal clear: good monitoring isn’t just a passive utility running in the background. It’s an active, daily discipline. It’s the bedrock for building high-quality software that doesn’t just work, but actually keeps users happy and coming back for more.
The real goal here is to fundamentally shift how our team operates. It’s about moving away from a frantic, reactive cycle of firefighting—where we’re constantly scrambling to fix things after they break—to a calm, proactive stance.
Imagine a culture where we spot and solve potential problems long before a customer ever feels the slightest hiccup. That’s the superpower a mature monitoring strategy gives us.
Engineering a More Resilient System
When we properly use the four pillars of telemetry—metrics, logs, traces, and real user monitoring—we’re essentially giving ourselves x-ray vision into our application. This deep visibility is what allows us to build genuinely resilient systems.
When we pair that raw data with meaningful SLOs anchored to what users actually care about, we’re no longer guessing. We’re making smart, data-driven decisions about where to invest our time and effort. Stretching this visibility across our entire stack, from the React frontend all the way down to the PostgreSQL database, is what connects all the dots. It transforms a jumble of isolated components into a single, understandable system.
The core idea is simple: monitoring creates a continuous feedback loop. We observe the system’s real-world behavior, pinpoint opportunities for improvement, and roll out changes that make our application stronger and more reliable with every cycle.
This iterative rhythm moves us from basic maintenance to strategic engineering. We start building systems that are not just functional, but fundamentally robust by design. Plenty of teams have used this exact approach to build incredibly dependable platforms.
For a great example of how a well-instrumented Rails foundation can support complex, critical workflows, check out the Activate Care case study.
Start Small, Focus on What Matters
Getting to a sophisticated monitoring setup doesn’t mean we have to boil the ocean on day one. In fact, that’s usually a recipe for failure. The most successful strategies always start small and grow over time. Begin by zeroing in on a handful of metrics that directly reflect our users’ happiness. Things like page load times, error rates, and key transaction speeds are perfect starting points. From that solid foundation, we can continuously refine and expand our monitoring as our application—and our team’s understanding of it—matures.
Start Small, Build Resilience
The most successful monitoring strategies start small and grow over time. Begin with a handful of metrics that directly reflect user happiness: page load times, error rates, and key transaction speeds. When we combine metrics, logs, traces, and real user monitoring with meaningful SLOs, we move from reactive firefighting to proactive fire prevention.
Monitoring creates a continuous feedback loop. We observe real-world behavior, pinpoint improvements, and roll out changes that make our application stronger with every cycle.
For a great example of how well-instrumented Rails foundations support critical workflows, check out the Activate Care case study.
At Saeloun, we partner with companies to build and implement monitoring strategies that give them a true, deep understanding of their application’s health. If you’re looking to boost your app’s reliability and performance, we’re here to help.
