How Proculr Works

From Invoice Data to Savings Signal: What Happens Between ERP Export and Negotiation Brief

David Park 9 min read
Abstract data pipeline flow from ERP to procurement insights

When a customer exports their AP data from SAP Ariba, Oracle NetSuite, or Sage Intacct and uploads it to Proculr, a lot has to happen before a negotiation brief appears on the other end. Most of that work is invisible by design — the output is a clean, actionable brief, not a data processing report. But the pipeline between raw invoice data and savings signal is worth understanding if you care about where the numbers come from and why you should trust them.

This is a walkthrough of what we actually do, in the order we do it, with honest commentary on where the hard problems are. It's not a marketing pitch — some of what I'm about to describe is messy, and pretending otherwise would be misleading.

Step 1: ERP Export Ingestion and Normalization

Most mid-market ERP systems export AP data in structured formats — CSV or Excel from NetSuite, flat file exports from Ariba, report downloads from Sage Intacct. The core fields we need are: supplier name, invoice date, invoice amount, line item description (when present), cost center or department, and GL account code.

The first problem we hit every time is supplier name normalization. A company paying "Acme Cleaning Services Inc" via one department and "Acme Cleaning" via another, and "ACME CLEANING SVCS" in the older invoices, has three separate supplier records in their ERP that are actually one supplier. Before we can calculate total spend with that vendor, we need to collapse those variants into a single canonical supplier record.

We use a combination of fuzzy string matching, address normalization (when vendor address data is available), and tax ID matching (when it's in the export) to identify supplier duplicates. The recall on this isn't perfect — we miss some, and we occasionally over-merge suppliers that have similar names but are genuinely distinct entities. When we're uncertain, we flag it for human review rather than force-merge. A wrong merge is worse than a missed merge because it creates phantom spend concentrations that lead to bad negotiations.

GL code cleaning happens in parallel. GL codes tell us which budget bucket the spend came from, but they're a poor proxy for what was actually purchased. A company with a "Professional Services" GL code might be putting consulting, legal, IT contractors, and staffing agency spend all in one bucket. We use GL codes as one classification signal, not the authoritative classification.

Step 2: Spend Classification

Classification is the most consequential step in the pipeline, and also the most error-prone. We need to assign every supplier — and ideally every line item — to a category that's meaningful for procurement decision-making.

Our classification runs in two passes. The first pass uses the normalized supplier name to look up the supplier's primary business description. For most suppliers, the name plus basic firmographic data (industry code, website category if crawlable) gives a strong signal for the top-level category. "Johnson Controls" maps to facilities and HVAC services. "Zoom Video Communications" maps to collaboration software. Most major suppliers are deterministic.

The hard cases are mid-market suppliers with generic names and no strong industry signal. "Allied Business Solutions" could be IT staffing, office services, consulting, or a dozen other things. For these, we lean on line item descriptions when available, and fall back to GL codes as a secondary signal. When neither resolves the ambiguity, we assign a provisional category with a confidence flag and surface it for customer review during the onboarding step.

The second pass applies the customer's category overlay — the custom taxonomy they've defined for how their procurement function is organized. A customer who manages "Facilities" as a single category inclusive of janitorial, security, HVAC, and waste management gets all those sub-categories rolled up to their category definition. This is what makes the brief actionable rather than just analytically correct.

Classification accuracy on a clean data set is in the high 80s for supplier-level classification. On line-item-level classification it's lower, because line item descriptions in AP systems are often whatever the accounts payable team typed when they entered the invoice, which ranges from precise to completely uninformative. We're honest with customers about this: the brief reflects what the data supports, not what we wish it said.

Step 3: Spend Cube Construction

With normalized suppliers and classified transactions, we build the spend cube — the multi-dimensional aggregation that shows spend by supplier, by category, by time period, by cost center, and by GL code in any combination. The spend cube is the foundation for every analysis that follows.

A few things we derive at this stage that feed directly into brief generation:

Invoice frequency and payment patterns. How many invoices per supplier per year, and what's the average invoice size? A supplier with 180 invoices at $200 each is a different procurement problem than a supplier with 2 invoices at $18,000 each. High invoice frequency with low per-invoice value is a signal for consolidation opportunity or catalog buying inefficiency.

Spend trend over time. Supplier spend trending up 15-20% year-over-year without a corresponding change in service scope is a price drift signal. We flag these explicitly — they're often the highest-value targets in tail spend because the drift has been accumulating through passive renewals.

Supplier concentration within categories. If 85% of facilities maintenance spend is with one vendor and there are three alternatives in the same geography, that's a different negotiating position than if spend is spread across eight vendors with no clear primary. Concentration ratio by category shapes both the negotiation strategy and the benchmark comparison group.

Step 4: Benchmarking

This is where we answer the question "is what you're paying reasonable?" The benchmark step compares each supplier's effective rate against market-clearing rates for comparable organizations in the same geography and size tier.

What "market-clearing rate" means varies by category. For services categories, it's the distribution of rate structures and total cost per unit of output across comparable buyers. For product categories, it's the distribution of unit prices. We express benchmarks as percentile ranges — P25-P75 gives the interquartile range of what comparable organizations pay. P50 is the median. If your effective rate is above P75, you're above market by a meaningful amount and there's a factual basis for a rate renegotiation conversation.

We're transparent about the uncertainty in these benchmarks. For common SaaS categories, facilities maintenance, office supplies, and similar commoditized categories, our benchmark data is reasonably dense and the percentile estimates are reliable within a range. For specialized or highly customized services, the benchmark is less precise — we'll tell you "this appears to be above market but we have limited comparable data" rather than manufacture false precision.

The benchmark output for each supplier includes the current effective rate, the P25/P50/P75 range for comparable organizations, a percentile rank, and a confidence rating on the benchmark quality. These feed directly into the brief.

Step 5: Opportunity Prioritization

Not every above-market supplier is worth pursuing. Prioritization ranks the savings opportunities by expected value, taking into account spend magnitude, benchmark gap, and procurement effort estimate.

A supplier at the 80th percentile on a $15K annual spend generates a smaller dollar opportunity than a supplier at the 65th percentile on a $120K annual spend. The prioritization scoring reflects this: it's dollar-value-weighted, not percentile-gap-weighted. Chasing percentage savings in small categories while ignoring modest-percentage-gap large categories is a common procurement resource allocation mistake.

We also factor in sole-source risk. When a supplier appears to have no viable alternatives in the relevant geography or specialty, we flag it explicitly. The brief will still include benchmark data — knowing you're above market is useful even if you can't easily switch — but the recommended negotiation approach differs. A sole-source supplier requires a different conversation than one where you have three credible alternatives ready to quote.

Step 6: Brief Generation

The negotiation brief is the output layer. For each prioritized supplier, it assembles: the spend history, the benchmark comparison, the recommended negotiation position, the key talking points, and any available market context for the category.

We're not generating scripts or telling procurement professionals what to say. We're generating the factual foundation for a negotiation they're going to conduct. The brief tells you where you stand and what a reasonable ask looks like. What the category manager does with that — how they read the supplier relationship, what additional leverage they have, when in the quarter they choose to approach — is judgment that lives with the human.

The full pipeline from ERP export to final brief typically runs in under four hours for a mid-size data set. The biggest variable is classification review time — if a customer has a lot of ambiguous supplier records that need human review, that step can take a day or two. For clean data with clear GL coding and good supplier name consistency, the turnaround is faster.

The goal was always to get the analytical work out of the way so procurement can spend their time on the conversations that matter. The pipeline is means to that end, not the product itself.