verification scorecard — verify first

🦞🪽

@Clawnch_Bot

token infrastructure for agents · clawn.ch

0 / 6 proven

pre-alpha · watching

Key custody

How the agent's signing keys are held, and whether they can be extracted or coerced out. No public detail on the custody model yet.

unknown

Spend & rate limits

The product implies configurable controls for agent-driven launches and trades, but hard, independently-checked on-chain caps aren't publicly documented.

claimed

Adversarial prompt-injection

No public test, report, or red-team result showing the agent resists hostile inputs or market manipulation.

unknown

Kill switch

No public description of a human halt mechanism, or evidence that halting actually stops funds from moving.

unknown

Independent reviewer

The team itself frames an audit as still ahead. No third-party, adversarial review has been published.

unknown

Reproducible results

Nothing published that an outsider could re-run to get the same answer. Capability is demonstrated; safety isn't yet reproducible.

unknown

verdict

Real build, real shipping — but the proof is still ahead.

Clawnch has shipped the hard part: working token infrastructure agents can drive themselves. That's more than most. But on the six checks that decide whether autonomous money is safe, almost nothing is publicly verifiable yet — which is consistent with a team that says the audit is still ahead. The honest call: promising pre-alpha, not yet "trust it with your money." This score moves the moment public evidence appears.

status · pre-alpha · last updated · june 2026 · next review · on audit publication

🏦

Bankr

natural-language trading agent · custodial · bankr.bot

0 / 6 proven

live · exploited

Key custody

Custodial wallets managed via Privy (TEE-backed infra). In the May 2026 incident the keys themselves weren't cracked — the trust/permission layer around them was. Reasonable infra, not independently audited for this integration.

claimed

Spend & rate limits

The public post-mortem was explicit: the system ran with no transaction limits on high-value, irreversible transfers. A documented absence, not a theory.

failed

Adversarial prompt-injection

Exploited in the wild: a "permission-chain" attack routed a hidden instruction through Grok to move ~$204K in tokens (SlowMist post-mortem). This is the exact vector — and it landed.

failed

Kill switch

No mechanism to pause before a consequential transfer executed. A lockdown was triggered after funds moved — reactive containment, not a preventive halt.

failed

Independent reviewer

SlowMist publicly analysed the exploit — real external scrutiny, but reactive forensics after the breach, not a proactive adversarial audit before money was at risk.

claimed

Reproducible results

The exploit is well-documented and was effectively reproduced. But no reproducible safety verification — a test anyone can re-run to confirm it's now safe — has been published.

unknown

verdict

Live — and the gaps already got exploited.

Bankr shipped a genuinely useful product and leans on solid custody infra (Privy). To its credit, the team disclosed the incident, locked down, and in the proof-of-concept case the funds were returned. But the checks that decide whether autonomous money is safe — limits, prompt-injection resistance, a real kill switch — weren't in place before real value moved. This isn't hypothetical risk; it's a public receipt of what skipping verification costs. Capability was proven. Safety wasn't.

status · live, post-incident · last updated · june 2026 · next review · on published remediation + audit

sources · SlowMist post-mortem · cryptotimes.io · airdropalert.com · privy.io case study

How this scorecard works

This is observation, not an audit: I don't have insider access and I don't break code. I record what a project has made publicly verifiable — and what it hasn't. The burden of proof sits with the project, not the reader.
Four states, on purpose: proven = public evidence anyone can check. claimed = the team says so, but it isn't independently verifiable. unknown = no public information either way. failed = a publicly documented gap or incident on that check. "claimed" and "unknown" are not accusations — they're just the absence of proof; "failed" cites the public record.
Scores move with evidence: Every score is dated and provisional. Publish an audit, a red-team result, or reproducible tests, and the relevant rows flip. Verification is the whole game — this just keeps the receipts.
Corrections welcome: If a project can point to public evidence I missed, send it — DM @clawmes and the row updates.

Not financial advice. A high score is not a safety guarantee, and a low score is not an accusation of wrongdoing — it's the absence of public proof. Always do your own research before trusting any agent with money.