Procurement Fundamentals

What Procurement AI Can and Can't Do — and the Difference Actually Matters

Sofia Mendes June 11, 2025 7 min read

Abstract concept showing the boundaries of procurement automation

There's a version of the procurement software pitch that sounds like this: connect your ERP, wait 48 hours, and automation handles everything from spend visibility to supplier selection. If you've been in procurement longer than six months, you already know that's wrong. The harder question is where, precisely, automation is genuinely useful versus where it produces confident-sounding output that a category manager then has to manually re-validate anyway.

We spend a lot of time at Proculr thinking about this boundary — partly because our product lives directly on it. We do classification, benchmarking, and negotiation brief generation. We don't do strategic sourcing decisions, supplier relationship management, or contract negotiation itself. That's not a feature gap. It's a deliberate architectural choice based on what we've seen automation actually deliver in practice, and where the abstraction breaks down.

Where Automation Genuinely Delivers

Spend Classification at Scale

Classifying invoice line items against a taxonomy — whether you're mapping to UNSPSC, a custom category tree, or a hybrid — is exactly the kind of repetitive, pattern-matching work that benefits from automation. A mid-market company running $80M through indirect spend might have 15,000 to 40,000 invoice lines per year across hundreds of suppliers. A category manager doing that by hand would spend months and still produce inconsistent results because humans applying a taxonomy get tired, and taxonomies have edge cases that reasonable people classify differently.

Automated classification, trained on enough procurement-specific data, gets to 85–92% accuracy on clean invoice data without much tuning. The remaining 8–15% — usually ambiguous line descriptions, new supplier types, or split categories — still needs a human review queue. That's fine. The point isn't 100% automation; it's reducing the manual burden to manageable exceptions rather than the entire dataset.

Benchmarking and Price Drift Detection

Price comparison across suppliers and time periods is a data processing problem at its core. Given a set of category-matched transaction records, identifying that your unit price for a given UNSPSC category has drifted 18% above the P50 benchmark over 24 months — that's arithmetic. It doesn't require human judgment; it requires clean data and a reliable benchmark dataset.

What automation adds here that's genuinely hard to replicate manually is the continuous monitoring piece. A procurement analyst checking prices quarterly will catch major drift; they won't catch a supplier who raised rates 3% every six months for two years. By the time the cumulative 18% variance is visible, the contract has already renewed twice.

Negotiation Brief Generation

Pulling together the relevant data points for a supplier negotiation — spend volume, price trend, benchmark position, contract expiry date, alternative supplier count — is prep work. It's important prep work, and doing it well takes two to four hours per category when done manually. Automating the data assembly means the negotiation brief is ready before the category manager even opens the calendar invite. The manager still writes the negotiation strategy. They still run the meeting. They still make the judgment call on which levers to pull. But they go in prepared rather than spending the first hour of prep time pulling numbers from three different spreadsheets.

Where Automation Breaks Down

Strategic Sourcing Decisions

We're not saying automation can't support strategic sourcing — it clearly can, by surfacing market data, generating supplier shortlists, and processing RFQ responses. But the actual sourcing decision involves context that doesn't live in your ERP. Why did you choose Supplier A three years ago despite being 12% higher-priced? Because Supplier B had a quality failure that cost you $400K and you quietly blacklisted them. That's institutional knowledge sitting in someone's head, maybe in an email thread, definitely not tagged in a spend cube.

Automation that doesn't have access to that context will recommend the cheaper option every time. Sometimes that's right. Sometimes it's how you end up repeating an expensive mistake.

A 450-person specialty distributor we worked with had exactly this problem: their automated spend analysis kept surfacing a consolidation opportunity with a lower-cost logistics provider. What the data didn't show was that their primary customer — a manufacturer with strict on-time delivery SLAs — had contractually prohibited use of that specific carrier after a failed pilot. The automation was technically correct. The recommendation was unusable.

Supplier Relationship Management

The ongoing health of a supplier relationship — trust calibration, communication cadence, managing through delivery failures without torching a partnership — is not a data problem. It's a human judgment problem. Automation can flag that a supplier's on-time delivery rate dropped from 94% to 87% in Q3. It can't tell you whether that's because of a staffing shortage you already know about and gave them a waiver on, or a systemic capacity problem that signals you need a backup supplier immediately.

The distinction matters because the wrong response is expensive in both directions. Escalating aggressively on a supplier who had a temporary, explainable issue damages a relationship you've spent years building. Holding back when the problem is structural means you're absorbing avoidable supply chain risk.

Contract Negotiation Itself

Automation can tell you that you're at the 75th percentile on price for professional cleaning services and that your current contract has no volume-tiered discount structure despite a 30% spend increase over three years. It cannot decide how aggressive to be in the next negotiation, what concessions to trade, or how to read the room when the supplier's account rep comes in with a competing offer they're clearly bluffing on. That's what experienced category managers are paid for.

The Overclaim Problem

The procurement technology market has a consistent overclaim problem, and it creates real damage. When companies buy tools that promise end-to-end automation and find out six months in that the "automated sourcing" feature still requires a procurement analyst to validate every output, two things happen. First, they've spent budget on a tool that doesn't deliver the headcount reduction they planned. Second, and worse, they've lost trust in procurement analytics broadly — so when genuinely useful automation is available, the team is skeptical of it.

We've been deliberate about building Proculr around the tasks where the automation claim is defensible: classification accuracy you can measure, benchmark data you can audit, briefs that surface specific data points with traceable methodology. We're not saying broader automation is impossible or that the category will never get there — we're saying that for a procurement team operating today, the value is in the specific tasks where the automation is reliable, not in believing the pitch that it covers everything.

A Practical Framework for Evaluating Automation Claims

When you're evaluating any procurement tool that promises automation, three questions tend to sort the signal from the noise:

Can you audit the output? If the system classifies a $50K spend category, can you see the logic — which attributes drove the classification, what alternative categories were considered? If it's a black box, you're trusting without verifying. That's fine for low-stakes categorization; it's a problem when the output drives a sourcing decision.

What does the exception queue look like? Every automation system has cases it can't handle confidently. The question is whether those exceptions are surfaced to a human for review or quietly auto-resolved. Tools that claim zero exceptions aren't being honest about the edge cases; they're just hiding the exception resolution inside the model.

What's the feedback loop? When a category manager overrides an automated classification or rejects a benchmark comparison as inapplicable, does that feedback improve the model? Or does it disappear into a void? The quality gap between tools that learn from practitioner corrections and tools that don't compounds significantly over 12 to 18 months of use.

Why This Matters for Mid-Market Procurement Teams Specifically

Enterprise procurement organizations with 50+ FTEs can afford to absorb unreliable automation — they have enough staff to validate outputs and enough political capital to keep a skeptical internal client base patient while a tool matures. A five-person procurement team at a $200M revenue company doesn't have that buffer. When the tool misfires, it's the category manager who takes the call from the CFO asking why the spend report looks wrong.

Mid-market procurement teams need automation they can trust and explain, not automation that sounds impressive in a demo. The bar isn't "does this tool have an AI feature." The bar is "can I stand behind this output in a conversation with my CFO, and does using this tool make me better at the parts of my job that actually require human judgment?"

That's the question we try to keep in front of us when we decide what Proculr should and shouldn't automate. The answer changes as the underlying capabilities improve. But right now, in mid-2025, the honest version of the answer is narrower than most procurement software vendors would like you to believe.