What Happens When You Let AI Click Things

The agent was supposed to update one field on one Salesforce record. The instructions in the prompt were specific — a single contact's title needed to change. Something in how the language was parsed turned the singular into plural. Ninety seconds later, the title field on 4,200 records had been overwritten.

The team had logs. They had a well-resourced vendor with a real platform behind it. What they didn't have was a path back. The previous values weren't preserved. There was no rollback. The Salesforce admin spent the rest of the day reconstructing what those 4,200 titles had been from a six-week-old export.

This isn't a story about that vendor. Their product worked the way the docs said it would. It's a story about what changes the moment AI gets write access to something — and how operationally similar that is to giving a brand-new employee admin rights on day one.

What "Agentic AI" Actually Means

Most AI tools you've used until recently generate text. You ask, the model answers, you decide what to do with the answer. The model has no hands. It can only describe.

Agentic AI has hands. It can click buttons, send emails, update records, run scripts, file tickets, approve invoices, and — depending on what you give it — spend money. The boundary between "it suggested something" and "it did something" moves to the model's side of the fence. You set the goal. The agent picks the steps.

In the MSP and IT space, this shows up in a few common shapes:

Helpdesk triage and resolution. Agents that read a ticket, decide what category it falls into, run diagnostic commands, and either resolve it or escalate with a summary.
Sales and CRM automation. Agents that update opportunities, log calls, send follow-ups, and schedule meetings based on inbox traffic.
Finance workflow automation. Agents that read invoices, match them to POs, post entries, and route exceptions.
Operational automation in dev tools. Agents that file pull requests, run tests, deploy to staging, and respond to incident alerts.

The capability is real. The productivity gains, when this works, are real. The risk profile is different from anything most SMBs have managed before.

The Permissions Problem

Agents act with the permissions you give them. Hand an agent the Salesforce admin's credentials and it has admin power over Salesforce. Give a helpdesk agent local admin on endpoints so it can run diagnostics, and it has local admin — including for actions it wasn't explicitly told to take.

The temptation is to give broad permissions so the agent can do its job without coming back to ask. The result is that one prompt misinterpretation, one indirect injection through a malicious ticket body, one off-by-one in the agent's reasoning, can act with the full breadth of what you handed it.

The principle isn't new. It's the same one that governs service accounts and shared credentials anywhere: least privilege, scoped credentials, time-limited tokens, audit logging. What's new is how much harder least privilege is when the actor isn't human and isn't a deterministic script. The agent's decisions are probabilistic — the same prompt might take three different paths across three runs.

The Blast Radius Question

Before you let an agent do anything, the operational question is: if this agent does the worst plausible thing, what stops working?

A helpdesk agent with Active Directory admin rights. If a prompt injection convinces it to disable accounts instead of resetting passwords, what's the recovery path?
A CRM agent that can send email on behalf of a sales rep. If it sends an inappropriate message to a top client, what's the audit trail?
An invoicing agent that approves payments under $5,000. If it approves the same invoice twice, what's the reconciliation?
A code agent that opens PRs against production. If it merges a change that compiles but breaks billing, what's the rollback?

The answer has to exist before you let the agent run, not after the first incident.

What Good Looks Like

Scoped credentials, not borrowed ones

The agent doesn't sign in as the admin. It has its own service account, with permissions narrowed to exactly what it needs. The helpdesk agent that resets Tier-1 passwords can't disable accounts, can't change MFA, and can't elevate itself. The Salesforce agent that updates contact titles can update titles — and can't run bulk operations above a defined size without human approval.

This is the cheapest control you can put in place, and the one that closes off the worst-case scenarios.

Action limits, not just permission limits

Permissions tell you what's allowed. Action limits tell you how much. An agent allowed to update Salesforce records should be allowed twenty per session, not five thousand. An agent allowed to send emails should be capped at ten per hour, not five hundred. An agent allowed to approve invoices should require human review above a defined threshold.

Most modern agentic platforms support this. The defaults rarely do. The work is to configure the limits explicitly.

Audit logs you can read after the fact

Every action the agent takes is logged — the prompt that triggered it, the reasoning trace, the action executed, the result returned. The log is retained outside the agent's control, and the IT team can read it without asking the vendor for an export.

If you can't reconstruct what an agent did and why three days later, you don't have an audit log. You have a usage report.

A rollback path that takes minutes, not hours

For each kind of action the agent can take, there's a documented way to undo it that doesn't require reconstructing data from backups. Writes preserve the previous values. Approvals can be reversed with a single action. Password resets retain the previous credential state for a defined window.

The team in the opening story had every other discipline. What they didn't have was a rollback. That's what made the recovery a day instead of a minute.

A human in the loop for the high-stakes actions

Not every action needs human approval. Most don't. But the ones above a threshold — large refunds, bulk record updates, anything affecting more than N users or records, any change to security-relevant config — should pause for a person. The threshold is a configuration decision, not a philosophical one. Set it where you can absorb the worst case of an unsupervised action.

Where Agents Are Earning Their Keep, and Where They Aren't

Despite the risk profile, agents are doing real work in MSP and IT-adjacent environments:

Tier-1 ticket triage and resolution — password resets, simple account questions, software installs from approved catalogs.
Log analysis and correlation across SIEM data, where the agent reads alerts and either dismisses them with a documented reason or escalates with a summary.
Documentation maintenance — keeping runbooks current against the actual state of the environment.
Onboarding and offboarding orchestration, where the agent runs the standard sequence with a human approving each step.

The pattern is the same in each: well-defined inputs, narrow scope, low blast radius, human-readable audit trail.

Where we don't recommend autonomous agentic AI yet: bulk write operations against systems of record without per-action limits; any action that costs money without a human-approved threshold; customer-facing communications without review; and security-sensitive configuration changes — firewall rules, conditional access policies, identity changes. These are the actions where probabilistic reasoning is exactly the wrong tool. Use deterministic automation for them, and reserve the agentic tools for the work where their judgment is the asset.

What This Connects To

Agentic AI is operationally similar to safer software rollouts: you're introducing a new actor with the power to change your environment. The same disciplines apply — rings, rollback paths, validation, a clear owner. It's also adjacent to endpoint management, because the agent's credentials and the devices it runs on need the same governance as a human employee's.

If you're being pitched an agentic AI tool and the conversation hasn't included scoped credentials, action limits, audit logs, and a rollback path, the conversation isn't done yet. Our IT consulting practice does AI tool evaluations as part of vendor reviews. If you want a read on what an agent would actually be able to do in your environment, that's a conversation we're happy to have.