<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://blog.errsight.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.errsight.com/" rel="alternate" type="text/html" hreflang="en" /><updated>2026-06-14T19:37:11+05:30</updated><id>https://blog.errsight.com/feed.xml</id><title type="html">ErrSight Blog</title><subtitle>Engineering deep-dives, guides, and product updates from the team behind ErrSight: real-time error tracking and log management for production apps.</subtitle><author><name>ErrSight</name><email>hi@errsight.com</email></author><entry><title type="html">Introducing ErrSight: see every error before your users do</title><link href="https://blog.errsight.com/2026/06/09/introducing-errsight/" rel="alternate" type="text/html" title="Introducing ErrSight: see every error before your users do" /><published>2026-06-09T10:00:00+05:30</published><updated>2026-06-09T10:00:00+05:30</updated><id>https://blog.errsight.com/2026/06/09/introducing-errsight</id><content type="html" xml:base="https://blog.errsight.com/2026/06/09/introducing-errsight/"><![CDATA[<p>Your users should never be the ones who tell you production is broken. Today we’re launching ErrSight: real-time error tracking and log management in a single tool, built to catch problems before a support ticket ever lands.</p>

<!--more-->

<h2 id="the-problem-errors-reach-users-first">The problem: errors reach users <em>first</em></h2>

<p>Most error tracking setups fail in one of two predictable ways.</p>

<p>The first is silence. An exception fires deep in a background job, gets swallowed by a generic rescue, and the only signal you get is a confused user emailing support three days later. By then the stack trace is gone and you’re reverse-engineering a bug from a screenshot.</p>

<p>The second is noise, and cost. The incumbents will happily capture everything, then bill you for the privilege across a tangle of event categories, reserved volumes, and pay-as-you-go overages. You tune sample rates not because it’s good engineering, but because you’re scared of the invoice. The dashboard fills with duplicate alerts for the same root cause until nobody reads them.</p>

<p>We wanted error tracking that is loud about real problems, quiet about duplicates, and honest about the bill. So we built one.</p>

<h2 id="what-errsight-is">What ErrSight <em>is</em></h2>

<p>ErrSight is real-time error tracking <strong>and</strong> log management, together, with a tagline we actually mean: see every error before your users do.</p>

<p>It does two jobs that usually require two vendors:</p>

<ul>
  <li><strong>Exception tracking</strong>: automatic capture, grouped into actionable issues you can triage.</li>
  <li><strong>Log management</strong>: a live, searchable stream of everything your app is saying, in the same place.</li>
</ul>

<p>You install it in under two minutes, ship it where you already ship, and stop bouncing between a logging tool and an error tool to reconstruct a single incident.</p>

<h2 id="the-core-in-plain-terms">The core, in plain terms</h2>

<h3 id="automatic-capture-and-fingerprinting">Automatic capture and fingerprinting</h3>

<p>ErrSight captures exceptions automatically, with no manual <code class="language-plaintext highlighter-rouge">try/catch</code> plumbing around every call site. The important part is what happens next: identical errors are collapsed via <strong>fingerprinting</strong> into one issue. A bug that throws ten thousand times is one line in your dashboard with a count of ten thousand, not ten thousand lines. That’s the difference between a signal and a denial-of-service attack on your own attention.</p>

<h3 id="a-live-log-viewer-that-feels-like-a-terminal">A live log viewer that feels like a terminal</h3>

<p>The <strong>live log viewer</strong> is a terminal-style stream with infinite scroll and keyword search. It’s the view you reach for at 2 a.m.: watch events arrive in real time, search for a request ID, and follow the story across services without grepping ten boxes by hand.</p>

<h3 id="automatic-user-context">Automatic user context</h3>

<p>Every event carries <strong>user context</strong> automatically: id, email, session, and plan. When an error lands, you already know <em>who</em> hit it and <em>what tier</em> they’re on, so you can tell “one flaky test account” apart from “every Business customer at once” instantly.</p>

<h3 id="triage-that-respects-your-time">Triage that respects your time</h3>

<p>Issues aren’t just a feed. You can <strong>mark, assign, and snooze</strong> them, so the on-call rotation stays focused on what’s actually actionable instead of drowning in a wall of red.</p>

<h3 id="sub-millisecond-overhead">Sub-millisecond overhead</h3>

<p>Observability shouldn’t tax the thing it observes. ErrSight batches events off the request path, so per-request overhead is <strong>sub-millisecond</strong> and nothing is dropped on exit. We break down the mechanics in the <a href="/2026/06/06/error-tracking-in-rails-in-2-minutes/">Rails guide</a>.</p>

<h2 id="ship-where-you-already-ship">Ship where you <em>already</em> ship</h2>

<p>ErrSight meets your stack where it lives. Here’s what’s shipping today and what’s on the way.</p>

<table>
  <thead>
    <tr>
      <th>Platform</th>
      <th>Package</th>
      <th>Status</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Ruby on Rails</td>
      <td><code class="language-plaintext highlighter-rouge">errsight</code></td>
      <td>Available</td>
    </tr>
    <tr>
      <td>Python (3.8+)</td>
      <td><code class="language-plaintext highlighter-rouge">pip install errsight</code></td>
      <td>Available</td>
    </tr>
    <tr>
      <td>Rust (1.85+)</td>
      <td><code class="language-plaintext highlighter-rouge">errsight = "0.1"</code></td>
      <td>Available</td>
    </tr>
    <tr>
      <td>React / JavaScript</td>
      <td><code class="language-plaintext highlighter-rouge">errsight</code></td>
      <td>Available</td>
    </tr>
    <tr>
      <td>React Native</td>
      <td><code class="language-plaintext highlighter-rouge">errsight-rn</code></td>
      <td>Available</td>
    </tr>
    <tr>
      <td>REST API</td>
      <td><code class="language-plaintext highlighter-rouge">POST /api/v1/events</code></td>
      <td>Available</td>
    </tr>
    <tr>
      <td>Node, Go, PHP/Laravel, Elixir/Phoenix</td>
      <td>n/a</td>
      <td>Coming soon</td>
    </tr>
  </tbody>
</table>

<p>A Rails setup is the canonical “two-minute install”: one gem and one initializer:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># config/initializers/errsight.rb</span>
<span class="no">Errsight</span><span class="p">.</span><span class="nf">configure</span> <span class="p">{</span> <span class="o">|</span><span class="n">c</span><span class="o">|</span> <span class="n">c</span><span class="p">.</span><span class="nf">api_key</span> <span class="o">=</span> <span class="no">ENV</span><span class="p">[</span><span class="s2">"ERRSIGHT_KEY"</span><span class="p">]</span> <span class="p">}</span>
</code></pre></div></div>

<p>That auto-captures every Rails exception and routing error, broadcasts from <code class="language-plaintext highlighter-rouge">Rails.logger</code> at all levels, and attaches request, user, controller action, and URL context. It has first-class Devise and ActiveAdmin support, and it respects <code class="language-plaintext highlighter-rouge">config.filter_parameters</code> so sensitive data stays local. We walk through the whole thing in <a href="/2026/06/06/error-tracking-in-rails-in-2-minutes/">error tracking in Rails in 2 minutes</a>.</p>

<p>The Python SDK ships middleware for Django, Flask, FastAPI, and Starlette (sync and async), plus Celery, RQ, and AWS Lambda, with <code class="language-plaintext highlighter-rouge">ContextVar</code> scope isolation and a <code class="language-plaintext highlighter-rouge">logging.Handler</code> drop-in:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span>errsight
</code></pre></div></div>

<p>The React/JS package has <strong>zero dependencies</strong>, is ESM-first, and gives you a drop-in <code class="language-plaintext highlighter-rouge">&lt;ErrorBoundary&gt;</code> that captures <code class="language-plaintext highlighter-rouge">window.onerror</code> and <code class="language-plaintext highlighter-rouge">unhandledrejection</code>:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="p">{</span> <span class="nx">init</span> <span class="p">}</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">errsight</span><span class="dl">"</span><span class="p">;</span>
<span class="nf">init</span><span class="p">({</span> <span class="na">apiKey</span><span class="p">:</span> <span class="dl">"</span><span class="s2">elp_live_…</span><span class="dl">"</span><span class="p">,</span> <span class="na">env</span><span class="p">:</span> <span class="dl">"</span><span class="s2">production</span><span class="dl">"</span> <span class="p">});</span>
</code></pre></div></div>

<p>And when you need raw access, the REST API takes single events or batches of up to 100, authed with an <code class="language-plaintext highlighter-rouge">X-API-Key</code> header, CORS enabled, with idempotency keys:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-X</span> POST https://errsight.com/api/v1/events <span class="nt">-H</span> <span class="s2">"X-API-Key: elp_…"</span> <span class="nt">-H</span> <span class="s2">"Content-Type: application/json"</span> <span class="se">\</span>
  <span class="nt">-d</span> <span class="s1">'{"level":"error","message":"Payment failed"}'</span>
</code></pre></div></div>

<p>See the full lineup on the <a href="https://errsight.com/integrations">integrations page</a> and the <a href="https://errsight.com/docs">docs</a>.</p>

<h2 id="our-philosophy-predictable-bills-no-lock-in">Our philosophy: predictable bills, no lock-in</h2>

<p>Two principles shaped ErrSight more than any feature.</p>

<p><strong>Flat, predictable pricing.</strong> Pick a tier, know your bill. No overage tiers, no asterisks, no “contact sales for volume.” When you approach your monthly event limit, ErrSight <em>notifies</em> you rather than silently dropping data, then you upgrade or grab an add-on pack. The contrast with usage-and-quota billing is the whole point, and we lay out the full model honestly on our <a href="https://errsight.com/pricing">pricing page</a>.</p>

<p>And there’s a <strong>free tier at $0/month, forever, with no credit card</strong>, genuinely useful for side projects and small apps, not a teaser that expires.</p>

<p><strong>No lock-in by design.</strong> <a href="https://github.com/ErrSight/ErrSight-OSS">ErrSight is open source under AGPLv3</a>. The OSS edition runs the <em>same</em> ingestion, fingerprinting, real-time logs, and alerting engine as the SaaS, just with billing and quotas stripped out. Self-host it on Docker Compose plus Postgres. Your data, your call.</p>

<h2 id="try-it-today">Try it today</h2>

<p>You can be watching live errors in the next two minutes. Spin up the <a href="https://errsight.com">free tier on errsight.com</a> (no credit card, no sales call) and see every error before your users do. When you’re ready to dig in, the <a href="https://errsight.com/#pricing">pricing</a> is right there in the open, exactly where pricing should be.</p>]]></content><author><name>ErrSight</name></author><category term="announcements" /><category term="error tracking" /><category term="logging" /><category term="observability" /><category term="open source" /><category term="launch" /><summary type="html"><![CDATA[ErrSight is real-time error tracking and log management in one tool: auto capture, fingerprinting, live logs, flat pricing, AGPLv3 self-host.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.errsight.com/assets/images/og-card.png" /><media:content medium="image" url="https://blog.errsight.com/assets/images/og-card.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Error tracking in Rails in 2 minutes with ErrSight</title><link href="https://blog.errsight.com/2026/06/06/error-tracking-in-rails-in-2-minutes/" rel="alternate" type="text/html" title="Error tracking in Rails in 2 minutes with ErrSight" /><published>2026-06-06T10:00:00+05:30</published><updated>2026-06-06T10:00:00+05:30</updated><id>https://blog.errsight.com/2026/06/06/error-tracking-in-rails-in-2-minutes</id><content type="html" xml:base="https://blog.errsight.com/2026/06/06/error-tracking-in-rails-in-2-minutes/"><![CDATA[<p>Rails error tracking should not be a weekend project. With the <code class="language-plaintext highlighter-rouge">errsight</code> gem you add one dependency, set one environment variable, and write a four-line initializer. Then every exception, routing error, and log line shows up in real time.</p>

<!--more-->

<p>This is a practical how-to. By the end you will have production-grade Rails error tracking wired up, a test error confirmed in the live log viewer, and rich user context attached to every event. Total hands-on time: under two minutes.</p>

<h2 id="step-1-add-the-gem">Step 1: Add the <em>gem</em></h2>

<p>Drop the gem into your <code class="language-plaintext highlighter-rouge">Gemfile</code>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Gemfile</span>
<span class="n">gem</span> <span class="s2">"errsight"</span>
</code></pre></div></div>

<p>Then install it:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bundle <span class="nb">install</span>
</code></pre></div></div>

<p>That is the only dependency you add. The gem hooks into Rails through standard middleware and <code class="language-plaintext highlighter-rouge">Rails.logger</code>, so there is nothing to patch and nothing to monkey-fix later.</p>

<h2 id="step-2-grab-an-api-key-and-set-the-environment">Step 2: Grab an API key and set the <em>environment</em></h2>

<p>Create a project in ErrSight and copy its API key. Keys look like <code class="language-plaintext highlighter-rouge">elp_live_…</code> for production traffic (generic keys use the <code class="language-plaintext highlighter-rouge">elp_…</code> prefix). Keep the key out of source control by setting it as an environment variable:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">ERRSIGHT_KEY</span><span class="o">=</span><span class="s2">"elp_live_your_key_here"</span>
</code></pre></div></div>

<p>In development you would put this in your <code class="language-plaintext highlighter-rouge">.env</code>, your shell profile, or your secrets manager. In production, set it through your platform’s config (Heroku config vars, a Kubernetes secret, your CI/CD environment, wherever you already manage env). The point of ErrSight is that you ship where you already ship; the key travels with the rest of your environment.</p>

<h2 id="step-3-the-initializer">Step 3: The <em>initializer</em></h2>

<p>Create a single initializer. This is the whole configuration:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># config/initializers/errsight.rb</span>
<span class="no">Errsight</span><span class="p">.</span><span class="nf">configure</span> <span class="p">{</span> <span class="o">|</span><span class="n">c</span><span class="o">|</span> <span class="n">c</span><span class="p">.</span><span class="nf">api_key</span> <span class="o">=</span> <span class="no">ENV</span><span class="p">[</span><span class="s2">"ERRSIGHT_KEY"</span><span class="p">]</span> <span class="p">}</span>
</code></pre></div></div>

<p>Boot your app. That is the full setup: one gem, one env var, one initializer. You are now tracking errors.</p>

<h2 id="step-4-what-you-get-for-free">Step 4: What you get for <em>free</em></h2>

<p>The default install is deliberately generous. Without writing another line, the gem:</p>

<ul>
  <li><strong>Captures every Rails exception and routing error</strong> automatically, including the 404s and <code class="language-plaintext highlighter-rouge">ActionController::RoutingError</code>s that usually slip past.</li>
  <li><strong>Broadcasts from <code class="language-plaintext highlighter-rouge">Rails.logger</code> at all levels</strong> (<code class="language-plaintext highlighter-rouge">debug</code> through <code class="language-plaintext highlighter-rouge">fatal</code>) into a terminal-style live log viewer with infinite scroll and keyword search.</li>
  <li><strong>Attaches request context on every event</strong>: the request URL, the controller and action, request metadata, and the current user.</li>
  <li><strong>Speaks Devise and ActiveAdmin out of the box</strong>, with first-class support, so the authenticated user and admin context come through without glue code.</li>
  <li><strong>Respects <code class="language-plaintext highlighter-rouge">config.filter_parameters</code></strong>: anything you already mask (passwords, tokens, card numbers) stays local and never leaves your app.</li>
</ul>

<p>That last point matters: ErrSight reuses the parameter filtering you have already configured, so sensitive fields are redacted before they are ever sent.</p>

<p>Duplicate exceptions are grouped through fingerprinting, so a failing query that fires on every request in a hot loop becomes one actionable issue you can triage (mark, assign, or snooze) instead of a wall of identical notifications.</p>

<h2 id="step-5-trigger-a-test-error-to-verify">Step 5: Trigger a test error to <em>verify</em></h2>

<p>Let’s confirm it works. Add a throwaway route and action that raises:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># config/routes.rb</span>
<span class="n">get</span> <span class="s2">"/boom"</span><span class="p">,</span> <span class="ss">to: </span><span class="s2">"diagnostics#boom"</span>
</code></pre></div></div>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># app/controllers/diagnostics_controller.rb</span>
<span class="k">class</span> <span class="nc">DiagnosticsController</span> <span class="o">&lt;</span> <span class="no">ApplicationController</span>
  <span class="k">def</span> <span class="nf">boom</span>
    <span class="k">raise</span> <span class="s2">"ErrSight smoke test. If you can read this in the log viewer, it works"</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Start the server and hit the route:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bin/rails server
curl <span class="nt">-i</span> http://localhost:3000/boom
</code></pre></div></div>

<p>Open the live log viewer in your ErrSight dashboard. Within a moment you will see the exception, grouped into an issue, complete with the stack trace, the <code class="language-plaintext highlighter-rouge">/boom</code> URL, the <code class="language-plaintext highlighter-rouge">DiagnosticsController#boom</code> action, and (if a user is signed in) their identity. Search the log viewer for <code class="language-plaintext highlighter-rouge">smoke test</code> to jump straight to the broadcast line.</p>

<p>Once you have confirmed it, delete the route and controller. They were only there to prove the wiring.</p>

<h2 id="step-6-adding-richer-user-context">Step 6: Adding richer <em>user context</em></h2>

<p>You already get the current user automatically (id, email, session, plan). When you want to attach more (a tenant id, a feature-flag cohort, the plan tier at the moment of failure), set it explicitly. A common pattern is a <code class="language-plaintext highlighter-rouge">before_action</code>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># app/controllers/application_controller.rb</span>
<span class="k">class</span> <span class="nc">ApplicationController</span> <span class="o">&lt;</span> <span class="no">ActionController</span><span class="o">::</span><span class="no">Base</span>
  <span class="n">before_action</span> <span class="ss">:set_errsight_context</span>

  <span class="kp">private</span>

  <span class="k">def</span> <span class="nf">set_errsight_context</span>
    <span class="k">return</span> <span class="k">unless</span> <span class="n">current_user</span>

    <span class="no">Errsight</span><span class="p">.</span><span class="nf">set_user</span><span class="p">(</span>
      <span class="ss">id:    </span><span class="n">current_user</span><span class="p">.</span><span class="nf">id</span><span class="p">,</span>
      <span class="ss">email: </span><span class="n">current_user</span><span class="p">.</span><span class="nf">email</span><span class="p">,</span>
      <span class="ss">plan:  </span><span class="n">current_user</span><span class="p">.</span><span class="nf">plan</span><span class="p">,</span>
      <span class="ss">org:   </span><span class="n">current_user</span><span class="p">.</span><span class="nf">organization_id</span>
    <span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Now every event captured during that request carries the context you care about. When a 500 lands at 3 a.m., you already know who hit it and what plan they were on. No log spelunking required.</p>

<h2 id="is-this-safe-for-production">Is this safe for <em>production</em>?</h2>

<p>Yes: that is the whole design. Events are batched on a background thread and flushed on a timer, so the work never touches your request cycle. The async overhead per request is sub-millisecond, and nothing blocks the response. On process exit the buffer is flushed, so you do not drop the errors that happen during a deploy or a crash.</p>

<table>
  <thead>
    <tr>
      <th>Concern</th>
      <th>How ErrSight handles it</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Request latency</td>
      <td>Sub-millisecond async overhead; nothing blocks the request</td>
    </tr>
    <tr>
      <td>Throughput</td>
      <td>Events batched on a separate thread, flushed on a timer</td>
    </tr>
    <tr>
      <td>Dropped events</td>
      <td>Buffer flushed on process exit, nothing lost at shutdown</td>
    </tr>
    <tr>
      <td>Sensitive data</td>
      <td><code class="language-plaintext highlighter-rouge">config.filter_parameters</code> respected; filtered fields stay local</td>
    </tr>
    <tr>
      <td>Noise</td>
      <td>Duplicate exceptions grouped via fingerprinting into one issue</td>
    </tr>
  </tbody>
</table>

<p>In other words, you can leave it on at full volume in production without worrying that error tracking is the thing that slows down your app.</p>

<h2 id="where-to-go-next">Where to go next</h2>

<p>That is Rails error tracking in two minutes: one gem, one key, one initializer, and a live view of every exception and log line. From here you can:</p>

<ul>
  <li>Browse the <a href="https://errsight.com/docs">ErrSight documentation</a> for advanced configuration and triage workflows.</li>
  <li>See the other SDKs and platforms on the <a href="https://errsight.com/integrations">integrations page</a>: Python, Rust, React/JavaScript, and React Native are shipping today, with Node, Go, PHP/Laravel, and Elixir on the way.</li>
  <li>Read <a href="/2026/06/09/introducing-errsight/">why we built ErrSight</a> for the philosophy behind native logs plus exception tracking.</li>
  <li>Review the <a href="https://errsight.com/pricing">pricing</a>: flat monthly tiers, no overage fees.</li>
</ul>

<p>Want to self-host instead? The engine is open source under AGPLv3 at <a href="https://github.com/ErrSight/ErrSight-OSS">ErrSight-OSS</a>: same ingestion, fingerprinting, and alerting, no lock-in by design.</p>

<h2 id="start-tracking-today">Start tracking <em>today</em></h2>

<p>Spin up a free project ($0/month, forever, no credit card) and point your Rails app at it. Add the gem, set <code class="language-plaintext highlighter-rouge">ERRSIGHT_KEY</code>, and watch your first error land in the live viewer. Get started at <a href="https://errsight.com">errsight.com</a>.</p>]]></content><author><name>ErrSight</name></author><category term="tutorials" /><category term="rails" /><category term="ruby" /><category term="error-tracking" /><category term="logging" /><category term="devise" /><summary type="html"><![CDATA[A step-by-step guide to Rails error tracking with ErrSight: add the gem, set one key, and capture every exception plus live logs in under 2 minutes.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.errsight.com/assets/images/og-card.png" /><media:content medium="image" url="https://blog.errsight.com/assets/images/og-card.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Building an Error Monitoring Tool Without Pricing Overages</title><link href="https://blog.errsight.com/2026/05/26/building-an-error-monitoring-tool-without-pricing-overages/" rel="alternate" type="text/html" title="Building an Error Monitoring Tool Without Pricing Overages" /><published>2026-05-26T10:00:00+05:30</published><updated>2026-05-26T10:00:00+05:30</updated><id>https://blog.errsight.com/2026/05/26/building-an-error-monitoring-tool-without-pricing-overages</id><content type="html" xml:base="https://blog.errsight.com/2026/05/26/building-an-error-monitoring-tool-without-pricing-overages/"><![CDATA[<p>Picture the worst version of a Tuesday. You ship a deploy, a downstream API starts timing out, and your retry logic turns one failure into forty. A single broken code path is now throwing the same exception in a hot loop. By the time you have rolled back, your app has emitted two million error events in ninety minutes.</p>

<!--more-->

<p>Your error monitoring tool ingested every single one of them. It was very good at its job.</p>

<p>Then, a few days later, the second incident arrives: an invoice. The $26/month plan you signed up for has quietly become a $390 bill, because you blew through your included event volume and the meter kept running at some fraction of a cent per event. Nobody asked you. Nobody could ask you, because the events arrived faster than any human could approve them.</p>

<p>This is the part of usage-based monitoring that I find genuinely backwards. <strong>The pricing is anti-correlated with your wellbeing.</strong> The tool charges you the most at the exact moment you are already having your worst day. A traffic spike, a bad release, a noisy dependency, a retry storm: every one of these is both an operational emergency and a billing event. The product that is supposed to help you through the incident is, at the same time, metering you for the privilege.</p>

<p>I am building an error tracker called ErrSight, and early on I decided it would not work this way. No overage charges. Not “low overage charges,” not “overage charges with a generous buffer.” None. The ceiling on your plan is a real ceiling, not the starting line for a surprise invoice.</p>

<p>That turns out to be a more interesting engineering problem than it sounds, so let me walk through how it actually works.</p>

<h2 id="overage-billing-is-a-choice-not-a-law-of-physics">Overage billing is a choice, not a law of physics</h2>

<p>Before the architecture, it is worth being honest about why overage pricing is so common. It is not because it is the only way. It is because it is the easiest and most profitable way.</p>

<p>It is the easiest because the implementation is trivial: count what arrives, multiply by a rate, send the total at the end of the month. You never have to make a decision in the hot path. You never have to tell a customer “no.” You just let everything in and reconcile later.</p>

<p>It is the most profitable because <strong>the meter runs before the bill arrives.</strong> By the time the customer sees the number, the spend already happened. They cannot decline it. The asymmetry is the entire business model.</p>

<p>Once you see it that way, “no overages” stops being a pricing gimmick and becomes a design constraint. It means moving the decision to the front, into the request path, while the customer can still be protected. Concretely, I wrote down three rules:</p>

<ol>
  <li>When an account is out of quota, <strong>stop ingesting and say so clearly.</strong> Do not silently accept the data and invoice for it.</li>
  <li>Make it <strong>architecturally impossible to overshoot the cap</strong>, even under a burst of concurrent requests, because bursts are exactly when this matters.</li>
  <li>If a customer genuinely needs more capacity, make getting it a <strong>deliberate, opt-in decision</strong> with a known price, not a default that happens to them.</li>
</ol>

<p>Everything below is in service of those three rules.</p>

<h2 id="mechanic-1-stop-do-not-bill">Mechanic 1: stop, do not bill</h2>

<p>The ingestion endpoint is the front door. Before it does any real work, it asks one question: is this project allowed to ingest right now? The answer comes from a single method that collapses every “no” reason into one place.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">drop_reason</span>
  <span class="k">return</span> <span class="s2">"ingestion_paused"</span>        <span class="k">if</span> <span class="n">ingestion_paused?</span>
  <span class="k">return</span> <span class="s2">"events_over_limit"</span>       <span class="k">if</span> <span class="n">organization</span><span class="p">.</span><span class="nf">over_events_limit?</span>
  <span class="k">return</span> <span class="s2">"storage_limit_exceeded"</span>  <span class="k">if</span> <span class="n">storage_limit_exceeded?</span>
  <span class="kp">nil</span>
<span class="k">end</span>
</code></pre></div></div>

<p>If there is a reason to drop, the controller returns an HTTP <code class="language-plaintext highlighter-rouge">429 Too Many Requests</code> with a machine-readable code, and that is the end of it.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">when</span> <span class="s2">"events_over_limit"</span>
  <span class="n">notify_once</span><span class="p">(</span><span class="vi">@project</span><span class="p">.</span><span class="nf">organization_id</span><span class="p">,</span> <span class="s2">"events"</span><span class="p">)</span>   <span class="c1"># one email, debounced</span>
  <span class="n">render</span> <span class="ss">json: </span><span class="p">{</span>
    <span class="ss">error: </span><span class="s2">"Monthly event limit reached"</span><span class="p">,</span>
    <span class="ss">code:  </span><span class="s2">"EVENTS_LIMIT_EXCEEDED"</span>
  <span class="p">},</span> <span class="ss">status: :too_many_requests</span>
</code></pre></div></div>

<p>Notice what is <em>not</em> here. There is no branch that says “over limit, so accept the event and tack it onto the overage counter.” Going over your limit is a <code class="language-plaintext highlighter-rouge">429</code>, not a bigger invoice. The client SDK receives a clear, documented code (<code class="language-plaintext highlighter-rouge">EVENTS_LIMIT_EXCEEDED</code>) and can back off, buffer, or surface a warning in your own dashboards. The signal is honest: your data is being dropped, here is exactly why, and your bill is not moving.</p>

<p>The customer also gets one email when they hit the wall. Exactly one. A client hammering the endpoint while over quota could otherwise enqueue thousands of identical “you are over your limit” notifications per minute, so the notification is debounced with an atomic cache write that only the first caller in a one-hour window wins:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">notify_once</span><span class="p">(</span><span class="n">org_id</span><span class="p">,</span> <span class="n">kind</span><span class="p">)</span>
  <span class="n">key</span> <span class="o">=</span> <span class="s2">"quota_notified:</span><span class="si">#{</span><span class="n">org_id</span><span class="si">}</span><span class="s2">:</span><span class="si">#{</span><span class="n">kind</span><span class="si">}</span><span class="s2">"</span>
  <span class="k">if</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">cache</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="kp">true</span><span class="p">,</span> <span class="ss">unless_exist: </span><span class="kp">true</span><span class="p">,</span> <span class="ss">expires_in: </span><span class="mi">1</span><span class="p">.</span><span class="nf">hour</span><span class="p">)</span>
    <span class="no">NotifyQuotaOverageJob</span><span class="p">.</span><span class="nf">perform_later</span><span class="p">(</span><span class="n">org_id</span><span class="p">,</span> <span class="n">kind</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>The customer gets told. The customer does not get charged.</p>

<h2 id="mechanic-2-you-literally-cannot-overshoot-the-cap">Mechanic 2: you literally cannot overshoot the cap</h2>

<p>Here is the part that took the most care, and it is the reason “no overages” is harder to build than overage billing.</p>

<p>The naive version of a quota check has a race condition that bursts will find immediately:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># WRONG: two concurrent requests can both pass this check</span>
<span class="k">if</span> <span class="n">organization</span><span class="p">.</span><span class="nf">total_events_this_month</span> <span class="o">+</span> <span class="n">count</span> <span class="o">&lt;=</span> <span class="n">organization</span><span class="p">.</span><span class="nf">events_limit</span>
  <span class="n">accept</span><span class="p">(</span><span class="n">events</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Imagine a project sitting at 49,950 events against a 50,000 limit, so there are 50 events of real headroom left. Two batches of 40 events arrive at the same millisecond, handled by two different Puma workers, possibly on two different replicas. Each batch on its own fits comfortably. But both workers read the same starting count of 49,950, both compute <code class="language-plaintext highlighter-rouge">49,950 + 40 = 49,990</code>, both see that as under the limit, and both commit. The project lands at 50,030. The two batches were 80 events against 50 events of headroom, and the cap leaked by 30. Multiply that by a real burst across many workers and your “hard” limit leaks by thousands of events. Each leaked event is either free (you eat the cost) or billed (the customer eats it). There is no version of the leak that is fair.</p>

<p>A guarantee has to actually be a guarantee, so the reservation happens inside a transaction, serialized by a Postgres advisory lock keyed to the organization and billing period:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">reserve_events!</span><span class="p">(</span><span class="n">count</span><span class="p">:)</span>
  <span class="n">org</span>      <span class="o">=</span> <span class="n">organization</span>
  <span class="n">month</span>    <span class="o">=</span> <span class="n">org</span><span class="p">.</span><span class="nf">quota_period_start</span>
  <span class="n">lock_key</span> <span class="o">=</span> <span class="no">Zlib</span><span class="p">.</span><span class="nf">crc32</span><span class="p">(</span><span class="s2">"errsight:quota:</span><span class="si">#{</span><span class="n">org</span><span class="p">.</span><span class="nf">id</span><span class="si">}</span><span class="s2">:</span><span class="si">#{</span><span class="n">month</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span> <span class="o">%</span> <span class="mi">2</span><span class="o">**</span><span class="mi">31</span>

  <span class="n">transaction</span> <span class="k">do</span>
    <span class="c1"># Every project in this org/period serializes through this lock,</span>
    <span class="c1"># so two concurrent bursts cannot both read "under limit" and</span>
    <span class="c1"># both commit. The lock releases automatically at transaction end.</span>
    <span class="n">connection</span><span class="p">.</span><span class="nf">execute</span><span class="p">(</span><span class="s2">"SELECT pg_advisory_xact_lock(</span><span class="si">#{</span><span class="n">lock_key</span><span class="si">}</span><span class="s2">)"</span><span class="p">)</span>

    <span class="n">current</span> <span class="o">=</span> <span class="no">Usage</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="ss">organization_id: </span><span class="n">org</span><span class="p">.</span><span class="nf">id</span><span class="p">,</span> <span class="ss">month: </span><span class="n">month</span><span class="p">).</span><span class="nf">sum</span><span class="p">(</span><span class="ss">:events_count</span><span class="p">)</span>
    <span class="k">return</span> <span class="kp">false</span> <span class="k">if</span> <span class="n">current</span> <span class="o">+</span> <span class="n">count</span> <span class="o">&gt;</span> <span class="n">org</span><span class="p">.</span><span class="nf">events_limit</span>   <span class="c1"># the ceiling holds</span>

    <span class="c1"># reserve `count` against this month's usage, then return true</span>
    <span class="n">bump_usage!</span><span class="p">(</span><span class="n">count</span><span class="p">)</span>
    <span class="kp">true</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">pg_advisory_xact_lock</code> gives me a mutex that lives in the database, not in any single Ruby process, which is the only place it can live if the limit is going to hold across many workers and replicas. Two bursts hitting the same account at the same instant now line up behind the lock. The first one reserves its quota and commits. The second one reads the <em>post-commit</em> total, sees there is no room, and gets <code class="language-plaintext highlighter-rouge">false</code>. The controller turns that <code class="language-plaintext highlighter-rouge">false</code> into a <code class="language-plaintext highlighter-rouge">429</code>. The cap is exact, even at the millisecond boundary, even during the spike that an overage model would have cashed in on.</p>

<p>This is the trade at the heart of “no overages.” Overage billing never needs this lock, because it never needs to say no. Choosing to say no means choosing to build the machinery that can say it correctly under load.</p>

<h2 id="mechanic-3-more-capacity-is-a-decision-not-an-accident">Mechanic 3: more capacity is a decision, not an accident</h2>

<p>A hard cap with no escape hatch is just a worse product. The point is not to punish growth, it is to make growth a choice the customer makes on purpose, with the price known in advance.</p>

<p>So the limit a project is actually checked against is never just the plan limit. It is the plan limit plus any capacity the customer has deliberately added:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">events_limit</span>
  <span class="n">plan_record</span><span class="p">.</span><span class="nf">events_limit</span> <span class="o">+</span> <span class="n">active_pack_event_credit</span>
<span class="k">end</span>

<span class="k">def</span> <span class="nf">active_pack_event_credit</span><span class="p">(</span><span class="ss">at: </span><span class="no">Time</span><span class="p">.</span><span class="nf">current</span><span class="p">)</span>
  <span class="n">purchased_packs</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="ss">status: </span><span class="s2">"active"</span><span class="p">)</span>
                 <span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="s2">"expires_at &gt; ?"</span><span class="p">,</span> <span class="n">at</span><span class="p">)</span>
                 <span class="p">.</span><span class="nf">sum</span><span class="p">(</span><span class="ss">:events_credit</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>

<p>There are two ways to add capacity, and both are opt-in:</p>

<ul>
  <li><strong>Upgrade the plan.</strong> The tiers are flat monthly prices with included volume: Free is 5,000 events a month, Pro is $29 for 50,000, Growth is $79 for 200,000, Business is $199 for 750,000. You always know what the next step costs before you take it.</li>
  <li><strong>Buy an add-on pack.</strong> If you are mostly fine but had one heavy month, a $9 pack adds 50,000 events and 2 GB of storage on a 30-day rolling window. It is a one-time purchase, not a recurring commitment, and it stacks if you need a few.</li>
</ul>

<p>The crucial difference from an overage line item is <em>when the decision happens</em>. An overage charge is a decision the system makes for you, after the spend, that you discover on an invoice. A pack or an upgrade is a decision you make for yourself, before the spend, at a price you agreed to. Same outcome of “you needed more and you paid for more,” opposite relationship with the customer.</p>

<h2 id="a-second-dial-capping-the-burn-rate-not-just-the-total">A second dial: capping the burn rate, not just the total</h2>

<p>A hard monthly ceiling solves the billing problem, but go back to the retry storm from the top of this post: two million events in ninety minutes. Even with overage charges off the table, a spike like that can burn through an entire month of quota before lunch, and then ingestion is capped for the rest of the month and you are flying blind through the part of the incident that matters most. A ceiling on the total is not the same as a ceiling on the rate.</p>

<p>So every project also has a per-minute rate limit, and on paid plans the customer sets it themselves. The plan defines the maximum you are allowed to choose, and you pick any number underneath it. It is enforced by a fixed-window limiter that lives in Postgres rather than in process memory, because a per-worker counter cannot hold a real limit once you are running several Puma workers across replicas:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">rate</span> <span class="o">=</span> <span class="no">IngestionRateLimiter</span><span class="p">.</span><span class="nf">check!</span><span class="p">(</span><span class="vi">@project</span><span class="p">,</span> <span class="ss">count: </span><span class="n">events_data</span><span class="p">.</span><span class="nf">length</span><span class="p">)</span>
<span class="k">unless</span> <span class="n">rate</span><span class="p">.</span><span class="nf">allowed</span>
  <span class="n">response</span><span class="p">.</span><span class="nf">headers</span><span class="p">[</span><span class="s2">"Retry-After"</span><span class="p">]</span>       <span class="o">=</span> <span class="n">rate</span><span class="p">.</span><span class="nf">retry_after</span><span class="p">.</span><span class="nf">to_s</span>
  <span class="n">response</span><span class="p">.</span><span class="nf">headers</span><span class="p">[</span><span class="s2">"X-RateLimit-Limit"</span><span class="p">]</span> <span class="o">=</span> <span class="n">rate</span><span class="p">.</span><span class="nf">limit</span><span class="p">.</span><span class="nf">to_s</span>
  <span class="k">return</span> <span class="n">render</span> <span class="ss">json: </span><span class="p">{</span>
    <span class="ss">error:       </span><span class="s2">"Rate limit exceeded, retry in </span><span class="si">#{</span><span class="n">rate</span><span class="p">.</span><span class="nf">retry_after</span><span class="si">}</span><span class="s2">s"</span><span class="p">,</span>
    <span class="ss">code:        </span><span class="s2">"RATE_LIMIT_EXCEEDED"</span><span class="p">,</span>
    <span class="ss">retry_after: </span><span class="n">rate</span><span class="p">.</span><span class="nf">retry_after</span>
  <span class="p">},</span> <span class="ss">status: :too_many_requests</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Now a runaway loop can spend at most the configured number of events per minute. The bad deploy still hurts, but it cannot vaporize your whole month in the first ninety minutes, and the <code class="language-plaintext highlighter-rouge">Retry-After</code> header tells a well-behaved SDK exactly how long to back off. The customer ends up inside two ceilings at once: the monthly total they are billed against, and the per-minute rate they chose. As a side effect, it also shields my ingestion path from a single misbehaving client, which is the first thing standing between a customer’s spike and my own infrastructure bill. Which brings me to the cost side.</p>

<h2 id="the-economics-that-let-me-say-yes-to-this">The economics that let me say yes to this</h2>

<p>There is a reason a lot of founders would call “no overages” financially reckless, and they would be right if you ignore the cost side. If your own costs scale linearly and without bound, then capping the customer’s bill while your infrastructure bill runs free is a great way to go broke on your most successful day. “No overages” only works if you have first made <em>your</em> costs predictable.</p>

<p>For an error tracker, the dominant cost driver is storage. Error events are write-heavy, append-mostly, time-ordered, and they pile up fast. So the events table is a <a href="https://www.timescale.com/">TimescaleDB</a> hypertable partitioned on time, with columnar compression that kicks in automatically after a week:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">create_hypertable</span><span class="p">(</span><span class="s1">'events'</span><span class="p">,</span> <span class="s1">'occurred_at'</span><span class="p">,</span> <span class="n">migrate_data</span> <span class="o">=&gt;</span> <span class="k">true</span><span class="p">);</span>

<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">events</span> <span class="k">SET</span> <span class="p">(</span>
  <span class="n">timescaledb</span><span class="p">.</span><span class="n">compress</span><span class="p">,</span>
  <span class="n">timescaledb</span><span class="p">.</span><span class="n">compress_segmentby</span> <span class="o">=</span> <span class="s1">'project_id'</span><span class="p">,</span>
  <span class="n">timescaledb</span><span class="p">.</span><span class="n">compress_orderby</span>   <span class="o">=</span> <span class="s1">'occurred_at DESC, id'</span>
<span class="p">);</span>

<span class="c1">-- Compress any chunk older than 7 days.</span>
<span class="k">SELECT</span> <span class="n">add_compression_policy</span><span class="p">(</span><span class="s1">'events'</span><span class="p">,</span> <span class="n">INTERVAL</span> <span class="s1">'7 days'</span><span class="p">);</span>
</code></pre></div></div>

<p>Segmenting by <code class="language-plaintext highlighter-rouge">project_id</code> and ordering by time means the recent, hot data stays fast to query for the dashboard, while everything older than a week gets squeezed into compressed columnar chunks. Error events compress extremely well, because they are full of repeating values: the same fingerprints, the same stack frames, the same environment strings, over and over. That repetition is exactly what columnar compression eats for breakfast.</p>

<p>The second lever is retention. Every plan has a retention window (7 days on Free, up to 90 on the higher tiers), and a background job prunes anything past it and re-derives usage so the numbers stay honest:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cutoff</span> <span class="o">=</span> <span class="n">org</span><span class="p">.</span><span class="nf">retention_days</span><span class="p">.</span><span class="nf">days</span><span class="p">.</span><span class="nf">ago</span>
<span class="n">count</span><span class="p">,</span> <span class="n">bytes</span> <span class="o">=</span> <span class="no">EventRepository</span><span class="p">.</span><span class="nf">prune_older_than!</span><span class="p">(</span><span class="ss">project_id: </span><span class="nb">id</span><span class="p">,</span> <span class="ss">cutoff: </span><span class="n">cutoff</span><span class="p">)</span>
</code></pre></div></div>

<p>Compression bounds the cost of the data you keep. Retention bounds how much data you keep at all. Together they turn storage from an unbounded liability into a known, modeled number per plan. Once I can predict my cost per account, I can confidently promise a fixed price to the account.</p>

<p>I extended the same logic to hosting. The app runs on a platform with a hard spending cap and per-second billing, which suits a workload that is quiet most of the time and spiky during incidents. I am not going to ask customers to live with a predictable bill while I refuse to give myself one. The predictability has to go all the way down, or the promise on the pricing page is just optimism.</p>

<h2 id="what-this-costs-me-honestly">What this costs me, honestly</h2>

<p>I want to be straight about the trade-offs, because “no overages” is not free for the person offering it.</p>

<p><strong>I leave money on the table.</strong> Every overage charge I do not send is revenue I did not collect. The spiky months that would have been the most lucrative under metered billing are exactly the months I am choosing to cap. That is real money, and pretending otherwise would be dishonest.</p>

<p><strong>A customer who hits the wall is a worse short-term outcome for them than being billed silently.</strong> Dropped events during an incident is a genuinely bad moment. I mitigate it with clear <code class="language-plaintext highlighter-rouge">429</code> codes, an immediate email, and one-click add-on packs, but the honest version is that a hard cap can bite. The bet is that being told “you are out of room, here is the button” is more respectful than being billed for data you never agreed to pay for, and that developers, of all customers, would rather have the explicit signal.</p>

<p><strong>I had to build the hard version.</strong> The advisory lock, the atomic reservation, the debounced notifications, the usage reconciliation after pruning: none of that exists in a system that just counts and multiplies at month end. Saying “no” correctly is more code than never saying it at all.</p>

<p>I think it is worth every bit of that, because of one rule of thumb I keep coming back to:</p>

<blockquote>
  <p>Your billing model should never be anti-correlated with your customer’s worst day.</p>
</blockquote>

<p>If the only way your pricing makes its best money is by charging customers more during their outages, their spikes, and their emergencies, then your incentives are quietly pointed away from theirs. I would rather have a model where my best day and my customer’s calm month are the same thing, and where their disaster does not show up as a line item on my invoice to them.</p>

<h2 id="what-a-bad-deploy-actually-costs">What a bad deploy actually <em>costs</em></h2>

<p>It helps to put real numbers on this. Take the retry storm from the top of this post: a bad deploy turns one failing code path into a flood, and your app emits roughly two million extra error events in ninety minutes. Here is how that single incident lands on the bill under two pricing models.</p>

<p>First, the plans. I am comparing ErrSight’s Growth tier against Sentry because the base prices line up closely. Sentry is a broader platform than ErrSight, with performance tracing, session replay, cron monitoring, and more under one roof, so treat this strictly as a comparison of error-event overage economics, not of everything the two products do.</p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>ErrSight Growth</th>
      <th>Sentry Team</th>
      <th>Sentry Business</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Base price</td>
      <td>$79 / month</td>
      <td>$26 / month</td>
      <td>$80 / month</td>
    </tr>
    <tr>
      <td>Included errors</td>
      <td>200,000 / month</td>
      <td>50,000 / month</td>
      <td>50,000 / month</td>
    </tr>
    <tr>
      <td>Past the quota (default)</td>
      <td>Ingestion stops with an HTTP <code class="language-plaintext highlighter-rouge">429</code>; the bill does not move</td>
      <td>Pay-as-you-go keeps ingesting</td>
      <td>Pay-as-you-go keeps ingesting</td>
    </tr>
    <tr>
      <td>Extra capacity</td>
      <td>Opt-in $9 pack adds 50,000 events (30-day rolling)</td>
      <td>Raise reserved volume or on-demand budget</td>
      <td>Raise reserved volume or on-demand budget</td>
    </tr>
    <tr>
      <td>On-demand overage rate</td>
      <td>none</td>
      <td>about $0.00025 per error event*</td>
      <td>about $0.00025 per error event*</td>
    </tr>
    <tr>
      <td>Hard spend cap available?</td>
      <td>Always; it is the only mode</td>
      <td>Yes, set a pay-as-you-go budget (events then drop at the cap)</td>
      <td>Yes, set a pay-as-you-go budget (events then drop at the cap)</td>
    </tr>
  </tbody>
</table>

<p>*Representative published on-demand rate; recent reporting puts it around $0.00025 to $0.00029 per error event. Pricing changes, so check <a href="https://sentry.io/pricing/">sentry.io/pricing</a> for current numbers.</p>

<p>Now the incident itself. Assume normal traffic has already consumed the included quota, and the storm adds two million events on top of that.</p>

<table>
  <thead>
    <tr>
      <th>The bad deploy (about 2,000,000 extra events)</th>
      <th>ErrSight Growth</th>
      <th>Sentry, pay-as-you-go left on (the default)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Events billed beyond quota</td>
      <td>0 (capped)</td>
      <td>about 2,000,000</td>
    </tr>
    <tr>
      <td>Extra charge for the incident</td>
      <td>$0</td>
      <td>2,000,000 × $0.00025 ≈ <strong>$500</strong></td>
    </tr>
    <tr>
      <td>The month’s total</td>
      <td>$79</td>
      <td>about $580 (Business base plus on-demand)</td>
    </tr>
    <tr>
      <td>What you captured during the storm</td>
      <td>the month’s first 200,000 events, then drops</td>
      <td>all 2,000,000</td>
    </tr>
  </tbody>
</table>

<p>On ErrSight the storm costs nothing extra: the Growth cap holds at $79, and if you want to keep capturing through the incident you opt into a $9 pack or two on purpose. On Sentry with on-demand left on, the same storm adds roughly <strong>$500</strong> to the month. <strong>That is about a $500 swing from one bad afternoon</strong>, and it is the kind of swing you discover after the fact, on an invoice, for traffic you did not choose.</p>

<p>Two honest caveats, because the comparison only means anything if it is fair:</p>

<ul>
  <li><strong>Sentry can be capped too.</strong> If you set Sentry’s pay-as-you-go budget to $0, Sentry also stops at your reserved limit and you pay no overage, exactly like ErrSight. The real difference is the default and the failure mode it chooses: Sentry defaults to protecting your data and billing for it, ErrSight defaults to protecting your bill and shedding the surplus. ErrSight just makes the wallet-safe behavior the only mode, so there is nothing to remember to configure before the storm hits.</li>
  <li><strong>The dropped events are mostly duplicates.</strong> During a retry storm those two million events are overwhelmingly the same exception, which both tools fingerprint down to a single issue. So what ErrSight sheds at the cap is mostly redundant copies of something you have already seen, not two million distinct bugs. It is still a genuine trade-off, since a brand-new error raised mid-storm could be among the dropped events, and that is the cost I accept in exchange for a bill that never surprises you.</li>
</ul>

<p>The point is not that one model is universally correct. It is that under usage-based on-demand, one bad deploy can quietly turn an $80 month into a roughly $580 one, and under a hard cap it cannot. If a predictable bill matters to you more than capturing every duplicate during an outage, the savings on your worst day are real, and they are roughly the price of the incident itself.</p>

<h2 id="wrapping-up">Wrapping up</h2>

<p>“No overages” sounded like a marketing decision when I started. It turned into an architecture: a hard quota ceiling enforced by a database-level lock so it cannot leak under load, an honest <code class="language-plaintext highlighter-rouge">429</code> instead of a silent meter, opt-in capacity for the people who genuinely need it, and TimescaleDB compression plus retention to keep my own costs bounded enough that I can afford the promise.</p>

<p>If you have built quota or billing systems that try to stay on the customer’s side, I would love to hear how you handled the boundary cases. The lock-and-reserve pattern is the cleanest answer I found, but I doubt it is the only one.</p>

<p>If you would rather see the result than the plumbing, the tool is live at <a href="https://errsight.com">errsight.com</a>.</p>]]></content><author><name>Jijo Bose</name></author><category term="backend" /><category term="rails" /><category term="monitoring" /><category term="postgres" /><category term="rails" /><category term="saas" /><summary type="html"><![CDATA[How ErrSight enforces a hard event quota with a Postgres advisory lock instead of billing overages, and the TimescaleDB economics behind flat pricing.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.errsight.com/assets/images/og-card.png" /><media:content medium="image" url="https://blog.errsight.com/assets/images/og-card.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>