The Complete Safety Guide

Don't Let Your Agent Burn Down the House

A plain-English guide to the safety risks hiding inside every AI-built app. No jargon. No gatekeeping. Just the stuff that actually breaks in production, explained like a friend walking you through each room of your house.

5
Rooms
18
Checkpoints
0
Jargon
A watercolor cross-section of a smart home showing all the rooms

Written by Stu Jordan ยท Agent Orchestrator ยท Evolution Unleashed

1ROOM 1

๐Ÿณ The Kitchen

Where Your App Talks to the Outside World


Watercolor illustration for The Kitchen
Interactive Demo
โ—‰ โ—‰
โ–ณ
๐Ÿ’ง
๐Ÿ”ฅ
๐Ÿ”ฅ
๐Ÿ”ฅ
๐Ÿšซ NO EXTINGUISHER

Every app has a kitchen. It is the spot where data gets prepared and served to the outside world. Your connections to payment tools, third-party services, and outside APIs all run through here. When things go wrong in the kitchen, they go wrong fast and expensive.

The biggest kitchen fire comes from having no rate limiting. Rate limiting is a valve that controls how many requests someone can send in a short window. Without that valve, a single bot can hit your app thousands of times per minute. Your server drowns, your hosting bill spikes overnight, and every real user gets locked out while the flood continues.

The second risk is fake orders. When you use payment tools like Stripe, they send your app a message every time someone pays. That message is called a webhook. The catch is that anyone can send a fake webhook pretending to be Stripe. If your app does not verify the cryptographic signature on that message, it believes every fake payment and gives away access for free.

The fix for both problems is the same instinct: verify what comes in and control how fast it arrives. Most managed platforms handle this at the infrastructure level. If you are building on raw frameworks, these protections need to be added by hand.

Key Takeaway

Control the flow and verify the source. Your kitchen handles everything that enters and leaves your app, so treat every incoming request like a stranger knocking at the back door.

Time to see if your kitchen passes the health inspection...

Check Yourself

Now that you understand the risks, how does your app stack up?

Rate Limiting

Does your app limit how many requests someone can send in a short window?

Webhook Verification

If you accept payment webhooks, does your app verify the cryptographic signature?

CORS Policy

Does your app restrict which external websites can make requests to it?

Background Processing

Are slow tasks like sending emails handled in the background instead of blocking the user?

2ROOM 2

๐Ÿšช The Front Door

Who Gets In and How Long They Stay


Watercolor illustration for The Front Door
Interactive Demo
COME ON IN
๐Ÿ‘ค
๐Ÿ‘ค
๐Ÿ‘ค
tap to close โ†’

Your front door is your login system. It decides who gets in, how they prove they belong, and how long their access lasts. A strong front door makes the rest of the house safer. A weak front door makes everything else pointless.

The first mistake is storing login tokens in the wrong spot. A token is like a hotel key card. If you store it in localStorage (a place in the browser that any script on the page can read), then one cross-site scripting attack hands that key card to a stranger instantly. The safer option is an httpOnly cookie, which the browser locks away from scripts entirely.

The second mistake is sessions that never die. When someone logs in and receives a token, that token should stop working after a set window. Without an expiry, a stolen token grants permanent access to that account. The attacker never needs to break in again because they already hold a key that works forever.

Password reset links carry the same risk at a smaller scale. A reset link that works indefinitely becomes a backdoor sitting in an old email inbox. If that inbox is compromised months later, the attacker walks right through. Reset links should expire within an hour at the absolute most.

Role-based access is the last piece of the door. If the only thing stopping a regular user from reaching your admin panel is the fact that they do not know the URL, you have no real protection. Your app needs to verify the user's role on every single request to a restricted page, not just at the initial login.

Key Takeaway

Lock the door, expire the keys, and check credentials at every room inside the house. Authentication is not a one-time event at the entrance.

Let's check those locks. Rattle every handle.

Check Yourself

Now that you understand the risks, how does your app stack up?

Token Storage

Are login tokens stored in httpOnly cookies instead of localStorage?

Session Expiry

Do login sessions automatically expire after a set time window?

Reset Link Expiry

Do password reset links expire within one hour or less?

Role-Based Access

Does your app check user roles on every request to admin or restricted pages?

3ROOM 3

๐Ÿ›‹๏ธ The Living Room

Everything Users Touch and Type


Watercolor illustration for The Living Room
Interactive Demo
Sign Up Form
your password here
Sign Up
tap password field โ†’

The living room is the part of your app that users interact with directly. Every form field, every search bar, every text input is a piece of furniture that visitors will lean on, push around, and occasionally try to break on purpose.

The interactive demo above shows the most well-known attack in web security history. SQL injection has existed for over twenty years, and it still works because developers forget to clean user inputs. Sanitization means checking and filtering whatever a user types before your app does anything with it. Without that filter, someone can type actual code into a form field and your database will execute it like a trusted command.

AI-generated code has a special weakness here. Language models write for the happy path, the version of reality where users behave exactly as expected. That code works perfectly under normal conditions and collapses the moment someone tries something unusual. Every input field needs rules that define what counts as acceptable and reject everything else before it reaches your backend.

Error boundaries round out the living room. When one part of your interface crashes, the whole screen should not go blank. An error boundary catches that crash in a specific area and shows a useful fallback message instead of a white page of nothing. One uncaught error in a tiny feature should never take down the entire experience for every user on the platform.

Key Takeaway

Never trust what users type into your app. Validate every input, catch every crash, and build every screen assuming that someone will try to break the furniture.

Poke the furniture. See if anything breaks.

Check Yourself

Now that you understand the risks, how does your app stack up?

Input Sanitization

Does your app validate and clean everything users type into forms before processing it?

Error Boundaries

Does your app catch crashes gracefully instead of showing a blank white screen?

Pagination

Does your app load data in small pages instead of fetching entire tables at once?

4ROOM 4

๐Ÿ”‘ The Garage

Where Secrets Should Stay Hidden


Watercolor illustration for The Garage
Interactive Demo
app.js โ€” View Source
const apiKey = "โ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ข"
const dbPass = "โ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ข"
const stripe = "โ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ขโ€ข"
tap to view source โ†’

The garage stores the dangerous stuff: API keys, database passwords, payment credentials, and every other secret your app needs to run. The question is whether you locked them in a safe or left them sitting on the workbench with the garage door open.

The demo above shows exactly what happens when secrets end up in frontend code. Anybody can open their browser, press F12, and read every line of JavaScript your app ships to the client. If an API key lives in that code, finding it takes about thirty seconds. From that point forward, someone else can make requests on your behalf, drain your account balance, or access whatever service that key connects to.

The rule is absolute: secrets belong on the server and never in the browser. Your frontend should talk to your own backend, and your backend uses the keys to communicate with outside services behind closed doors. The user never sees those keys because the keys never leave the server.

A related trap catches people in production. Your app depends on environment variables for things like database URLs, API keys, and feature flags. If one of those variables is missing when the app boots, the app should crash immediately with a clear error message. Without that startup check, the app launches looking perfectly healthy and then silently breaks the first time a user triggers the feature that needed the missing variable. You spend hours debugging something that a two-second validation check would have caught on deploy.

Key Takeaway

Keep every secret on the server side and validate that all required settings exist at startup. If a human can see it in a browser window, it stopped being a secret the moment the page loaded.

Open the garage door. What's hiding in there?

Check Yourself

Now that you understand the risks, how does your app stack up?

API Key Security

Are all API keys and secrets stored server-side, never in browser-visible code?

Startup Validation

Does your app check that all required environment variables exist when it boots?

Type Safety

Does your AI-generated code use TypeScript or another type-checking system?

5ROOM 5

๐ŸŒŠ The Basement

The Invisible Stuff Nobody Thinks About


Watercolor illustration for The Basement
Interactive Demo
โ˜€๏ธ
Mon
๐Ÿ’ง
Tue
๐ŸŒŠ
Wed
๐Ÿ“ง
Thu
๐Ÿ’€
Fri
App is running. No alerts, no logs, no backups. Everything feels fine.
tap each day โ†’

The basement holds everything that keeps the house standing but nobody looks at until the water is ankle-deep. Logging, database backups, uptime monitoring, and query performance all live down here. They are completely invisible when they work and absolutely catastrophic when they fail.

The timeline above tells a story that repeats itself every single week somewhere in the world. An app runs fine on Monday. Something starts failing on Tuesday, but no logs exist to catch it. By Wednesday the app is crashing under load, but no monitoring alerts exist to tell anyone. The developer finds out on Thursday when a paying customer sends a frustrated email. By Friday, a botched database migration wipes user data and no backup exists to restore it.

Production logging means your app records what happened, when it happened, and what went wrong. Without logs, fixing a production bug is like investigating a crime scene with no physical evidence. You know something broke because someone complained, but you have absolutely no trail to follow.

Database backups are the insurance policy everyone hopes they never need to use. One accidental delete, one corrupted table, one migration gone sideways can erase everything your users trusted you with. A daily backup stored separately from your primary database means the worst outcome is losing a single day of work instead of losing every piece of data you ever collected.

Indexing is the quietest risk in the basement. A database index tells your system exactly where to find specific data, the same way a book index tells you which page to flip to. Without indexes on the fields your app searches frequently, the database scans every single row on every single request. That brute-force approach works at a hundred users and completely collapses at a thousand.

Key Takeaway

Log everything, back up daily, set up alerts for downtime, and index your database. The basement never feels urgent until it floods, and by the time the water rises you have already lost what mattered most.

Grab a flashlight. We're going downstairs.

Check Yourself

Now that you understand the risks, how does your app stack up?

Production Logging

Does your app keep logs that record what happened and what went wrong?

Database Backups

Is your database backed up daily to a separate location?

Health Monitoring

Do you get alerts when your app goes down or stops responding?

Database Indexing

Are frequently searched fields in your database properly indexed?

Almost there...

Complete all 18 checkpoints to see your safety score.

๐Ÿš€

Your House Is Only as Strong as Its Weakest Room

Every risk in this guide is fixable, and most of them take less than an hour once you know what to look for. Join Evolution Unleashed VIP on Patreon to learn how to safely build and deploy AI agents.

Safety-first
Hands-on tutorials
Community
Learn to Build & Deploy Agents Safely

Join us on Patreon for exclusive agent-building tutorials