Data & Methodology

How We Built a Tail Spend Benchmark Dataset Without Asking Anyone to Share Sensitive Data

Rachel Goldstein 8 min read
Abstract visualization representing procurement benchmark data methodology

The fundamental problem with procurement benchmarking is that the data that would make it useful is the data that nobody will share. What are your competitors paying for office cleaning? What did other companies your size negotiate for their CRM renewal? The answers are in someone's ERP — and they're staying there.

Every procurement benchmark report you've ever read is built on some combination of survey responses (subject to response bias and social desirability), published pricing data (which reflects list prices, not actual deal prices), and consulting firm proprietary databases (which are expensive, opaque, and often built from the same survey methodology you could run yourself).

When we started building Proculr, we knew we needed benchmark data to make negotiation briefs useful. The question was how to build a dataset that was actually representative of what mid-market companies pay — without asking anyone to share contract terms they'd never willingly disclose. Here's how we approached it, what worked, what didn't, and what the dataset's honest limitations are.

The Core Insight: Price Signals Are Everywhere Except in Contracts

Contract prices are private. But the information that determines what a market will bear is largely public — you just have to look in the right places.

For any given spend category, there are multiple sources of pricing signal that don't require anyone to reveal their actual invoices. Published rate cards and list pricing (which establish ceiling values), RFP responses that vendors have submitted to public-sector organizations (which are often subject to public records requirements), G2 and similar review platforms that publish pricing ranges by tier, vendor pricing pages for self-serve and SMB tiers (which anchor the bottom of the enterprise pricing range), and job postings and salary surveys (for services categories where labor cost is the primary driver).

None of these sources tells you what a $200M revenue manufacturing company with 150 seats paid to renew their HR software. But together, they let you construct a plausible range for what market-clearing prices look like for buyers of that size in that category. The range is wider than you'd get from actual contract data, but it's anchored in real market signals rather than survey responses.

Building the Category Signal Model

For each spend category in our benchmark coverage, we built what we call a signal model — a structured approach to estimating the P25, P50, and P75 price points for buyers in a given size tier and geography.

The signal model for a SaaS category looks different from the signal model for a facilities services category. For SaaS, the anchors are the published self-serve price (floor), the published enterprise pricing page when available (ceiling), vendor-disclosed discounting ranges from analyst reports and investor filings, and public sector contract data where available. For facilities services, the anchors are labor market rates from the Bureau of Labor Statistics for the relevant service category and geography, regional service pricing from franchise disclosure documents (which are public), and industry association rate surveys.

The resulting percentile estimates are expressed with explicit confidence ratings. A category with dense public pricing data (commodity SaaS, office supplies, staffing) gets a high confidence rating — the P25-P75 range is reasonably tight and we're confident it reflects where actual market transactions clear. A category with sparse pricing signals (specialized consulting, niche technical services) gets a low confidence rating — we can give you a directional indication, but the uncertainty band is wide.

We're not claiming precision we don't have. A negotiation brief that tells you "your current rate appears to be above the P75 benchmark range for this category (medium confidence)" is more useful than one that either claims false precision or refuses to benchmark at all.

Where Customer Data Fits — and Where It Doesn't

As Proculr customers process their spend data through our platform, we see effective rates by category — what companies actually paid, not what vendors posted publicly. This is, in principle, exactly the kind of closed-loop pricing data that would make benchmarks more accurate.

We want to be direct about how we handle this. We do not use customer spend data to build or update our benchmark datasets without explicit permission. The signal model benchmarks are built entirely from public data sources described above. When customers agree to contribute anonymized rate data to the benchmark pool, we use it to validate and refine the signal model estimates — we don't feed one customer's rates directly into another customer's benchmark comparison.

The reason for this design choice is straightforward: our customers' contract rates are competitively sensitive information that they've trusted us with for the purpose of generating their own negotiation briefs. Using that data to benefit other customers without clear consent would be a fundamental breach of that trust, regardless of whether it was technically anonymized. We chose to build a useful benchmark on public data rather than a more accurate benchmark on borrowed data.

The Size-Tier Segmentation Problem

Procurement benchmarks are only meaningful within comparable peer groups. A rate that's below market for a 500-person company might be above market for a 50-person company, because vendor pricing tiers are based on buyer scale. Using undifferentiated benchmarks — "the average company pays X" — produces misleading comparisons when your company size is far from average.

We segment our benchmark data by annual revenue tier and employee count tier. The current segments are roughly: under $50M revenue, $50-150M, $150-400M, and over $400M. Within each tier, we further segment by geography for categories where labor costs drive significant regional variation.

The honest limitation here is that our benchmark coverage is best for the $50-400M revenue range — what we think of as the mid-market core. Below $50M, many spend categories start to look more like consumer purchasing than enterprise procurement, and our signal models are less calibrated for that range. Above $400M, enterprise-specific pricing dynamics (custom SLAs, dedicated account teams, volume commitments) create a market that our signal models don't capture well.

We built for the gap in the market. Large enterprises have consultants and GPO relationships for benchmarking. Small companies have consumer pricing. The mid-market — companies with real indirect spend and no dedicated benchmarking resources — is where we focus.

Validation: How We Know the Benchmarks Are Useful

A benchmark dataset is only valuable if it actually predicts where negotiations can go. We track negotiation outcomes against our pre-negotiation benchmark assessments to see whether our "above market" flags translate into successful renegotiations at rates closer to the P50 benchmark.

The validation is still early — we don't have large enough sample sizes across all categories to say anything statistically robust. What we can say is that in categories where we have high confidence benchmarks, the briefs generate negotiation outcomes that move rates meaningfully toward the benchmark midpoint in the majority of cases. In categories where we have low confidence benchmarks, outcomes are more variable, which is what you'd expect if the benchmark is less accurate.

We're not satisfied with the current benchmark coverage depth for services categories. Facilities maintenance, professional services, and specialized consulting have the most room for improvement. These are also the categories where good benchmarks would be most valuable, because the pricing is most opaque. That's the gap we're working to close as our customer data sharing program grows.

What Good Benchmarking Actually Requires

The point of this post isn't to convince you that our benchmarks are perfect — they're not. The point is to make the methodology legible so you can assess where to trust the signal and where to be skeptical.

A benchmark that says your janitorial spend is at the 78th percentile for your geography and company size, with high confidence, is a factual foundation for a commercial conversation. A benchmark that says your specialized IT consulting rates appear above median for your region, with low confidence, is an indication worth investigating but not a number to cite in a negotiation.

The distinction matters. Procurement professionals who use benchmark data credibly — acknowledging uncertainty ranges and confidence levels — are more effective in negotiations than those who cite numbers as if they're precise. Vendors know when you're bluffing with false precision. A position anchored in real market data, presented with appropriate uncertainty, is harder to dismiss than a made-up number presented with false confidence.

This is the standard we're trying to build toward: benchmark data transparent enough that you know when to rely on it and when to treat it as directional guidance rather than fact.