A year of abuse reports: patterns, false positives, and what we changed

Twelve months ago we set up abuse@smartordercapture.com as a single email address with no form, no triage UI, and no SLA published anywhere. The reasoning at the time was simple: we wanted to know what people would tell us when we made it as low-friction as possible, before we decided what the workflow should look like.

We just hit the one-year mark. This post is what we've learned from triaging those reports: which categories turned out to be real, which were noise, the four denylist additions that came directly from this channel, and one report that changed how we think about review SLAs.

The numbers

Over the twelve months we received 312 reports. That's roughly one every 28 hours. Some were single-line messages and some were multi-page writeups with screenshots and packet captures. We responded to every one, usually within three business days, and we kept a private spreadsheet with category, severity, action taken, and a one-line resolution note. The spreadsheet is the dataset behind this post.

The headline breakdown:

114 (37%) were unrelated to our product. Most were reports about other Android apps the sender mistakenly thought we made. A surprising number were about phishing SMS messages.
87 (28%) were product bug reports submitted to the wrong address. We forwarded these to support and added a line to our /contact page explaining the difference.
61 (20%) were legitimate abuse reports about smartordercapture itself. Of these, 49 led to a workflow being removed from the marketplace, a user being suspended, or a denylist addition.
34 (11%) were from competitors or pretextual reports designed to get a specific marketplace template removed. We could usually tell within thirty seconds.
16 (5%) were genuinely hard to classify. We'll come back to these.

So the actionable-abuse rate is about 1 in 5. Lower than we expected; we'd budgeted for maybe 1 in 3. The friction of typing an email rather than clicking a "report" button in the app appears to filter out the lowest-effort reports, which is fine — those would have been almost all noise anyway. We have no plans to add an in-app button.

The categories that were real

Of the 61 actionable reports, four categories dominated. We didn't predict the relative weights.

Gig-delivery offer-acceptance bots (24 reports). By far the largest category. The pattern was always the same: a marketplace template that targeted a specific delivery app's package, used action.waitForElement to detect the "new offer" banner, and then dispatched action.tap against the accept button as fast as possible. The relevant package names were already on the denylist, so the templates failed validation at server-side save — but the reporters were noticing that the templates had been shared as exported JSON and were being imported by users who'd patched their local app to skip validation. The fix wasn't to the denylist; the fix was to make exported JSON unimportable if any node references a denylisted package, with that check happening in the importer at the second layer rather than relying on the importer's own validation.

Ticket-resale workflows (14 reports). Targeting Ticketmaster, StubHub, and one regional event platform. These were lower-volume than gig-delivery but more sophisticated, frequently chained multiple workflows together with manual handoffs, and used legitimate CAPTCHA-solving services on the side. We extended the denylist to cover the regional platform after the first three reports about it; the major platforms were already covered.

Ad-network click farms (7 reports). Smaller volume, much higher per-incident severity. These workflows would open ad-network apps, simulate the watch-an-ad reward flow, and collect points or in-app currency programmatically. Some of them were tied to coordinated rings across multiple devices. Two of these reports came from the ad networks themselves, which was a useful working relationship; we set up a back-channel contact with one of them after that.

Stalker and partner-monitoring workflows (4 reports). The category we hoped not to see and the one that bothered us most. The reports described workflows that targeted messaging apps (WhatsApp, Telegram, Signal), opened them on a schedule, captured the contents of recent threads, and either exported them or forwarded them to another account. The reporters were the victims, in two cases tipped off because the workflow had failed visibly on their partner's phone. We added the major messaging apps to a new sensitive-target denylist tier and shipped a feature in the next Android release that requires explicit per-workflow user confirmation before any workflow can target a package in that tier, even for legitimate use cases. The tradeoff is friction for accessibility users; we accepted it.

The noise

The 114 misdirected reports were not all useless. They taught us something we didn't realize: a meaningful fraction of consumers cannot tell our app apart from any other automation-adjacent Android app, including some that look nothing like ours. When someone gets a phishing SMS, they search "automation phone scam" and email the first credible-looking address they find. That used to feel like a problem with the reporters; we now think it's a problem with the broader category, and there's not much we can do about it except respond politely and forward where possible.

The 34 competitor or pretextual reports are a different shape. They were almost always extremely well-written, would cite specific marketplace templates by URL, and would frame the report in language that mirrored our own published positions. The tell, usually, was that the cited template didn't actually do what the report claimed. We adopted a rule that any report involving marketplace removal gets the template reviewed by two people independently, and a third only if they disagree. False-positive rate dropped to zero after that. The competitors went quiet within four months once it became clear we didn't unpublish on first complaint.

The four denylist additions

Each came from this channel and is documented in our public denylist log. Briefly:

The regional ticket platform mentioned above. Added in May, after three independent reports across two months.
Two ad-reward apps we hadn't heard of before the reports came in. Added in July as part of a single batch.
A messaging-app companion that's commonly used as a relay in stalker workflows. Added in November under the sensitive-target tier.
One social platform's video-creator suite, added in February, after a report about workflows being sold as "growth tools" that automated follow / unfollow cycles to manipulate engagement metrics.

None of these would have been on our radar without the abuse channel. The denylist additions we make on our own initiative tend to come from product reading the news; the additions that come from abuse@ tend to be apps we'd never heard of, which is exactly why the channel is worth running.

The one report that changed our SLA thinking

In month four, we got a report about a workflow that was in the marketplace and targeted a domestic-violence-support app. The intent appeared to be to detect when a victim opened the support app and silently dismiss or close it. The reporter was a counselor at the corresponding nonprofit; they had been told about the workflow by a client.

We removed the workflow within forty minutes of the report, added the app to the sensitive-target tier, and contacted the user who had published it. Their account was suspended pending review and ultimately terminated.

We had no published SLA at the time. The internal target was "respond within three business days, action within five." For the categories of report we'd been seeing, that was reasonable. For this one, it would have been a moral failure if a reporter had gotten that response.

We now have a two-tier internal SLA: anything tagged as a sensitive-target report gets a thirty-minute first-touch and a four-hour action commitment, with on-call coverage. Everything else stays on the three-business-day cadence. We haven't published the SLAs because we don't want to anchor expectations to numbers we can't always hit, but the on-call rotation is real and it's funded out of engineering time, not marketing time.

The 16 hard cases

The category we marked "hard to classify" deserves a paragraph of its own. They were almost all variants on one shape: a workflow that did something legal and consensual when used by the person who built it, but that could be repurposed by someone else for surveillance, harassment, or unauthorized access if the device were handed over or coerced.

We don't have a satisfying answer for that shape. The denylist is a blunt instrument; it works because the bad use cases for a delivery-driver app are unambiguously bad and the legitimate use cases are negligible. The same isn't true for, say, a workflow that automates message backup, which is useful when you do it for yourself and dangerous when someone does it to you. We've leaned on the sensitive-target tier (which requires per-workflow user confirmation) as a partial mitigation, and we've leaned on a UI affordance that warns the user any time a workflow they didn't author wants to run for the first time. Neither is a clean fix. We expect this category to keep growing as the product gets more capable.

What we'd publish if we had to

The full spreadsheet, anonymized, is the version we'd publish if a regulator asked. We wouldn't volunteer it. Two reasons.

First, publishing the category breakdown without redaction reveals to bad-faith reporters which categories we triage fastest. The category that gets the four-hour SLA would, predictably, see a spike in low-quality reports designed to occupy our on-call attention. Security-by-obscurity is weak in general; for triage prioritization specifically, we think it's defensible.

Second, several of the reports include personal information about the reporter — the stalker-workflow victims, in particular — that we don't have the legal or ethical authority to publish in any form, even paraphrased.

What we will publish — and started publishing this quarter — is the count of denylist additions per quarter, with one-sentence categorical descriptions of each. That's enough for the community to see we're acting on what we receive without giving away the parts that should stay private. It's also a quiet accountability mechanism for us: when a quarter goes by without additions, we get to ask ourselves whether the channel got quieter or our triage got slower.

What we'd tell another company starting this channel

Make the address an email, not a form. Forms select for users who are willing to deal with a form, which is not the population you want feedback from. Respond to every report, even the misdirected ones, even the obviously pretextual ones, because the response is the signal to the sender that they were heard, and the population that has been heard once is the population that comes back to you with the serious report next time.

Keep the spreadsheet. The shape of the data only becomes clear in retrospect; we wouldn't have known to weight gig-delivery bots so heavily in our denylist work without seeing them as a third of all actionable reports.

Plan for the four-hour-SLA category before you encounter it. You will encounter it.