“Just connect to the ERP.”

We hear this in almost every initial scoping conversation. A manufacturer wants to build an AI system — demand forecasting, quality prediction, scheduling optimization — and the assumption is that the data lives in the ERP and the hard part is the AI. So let’s just connect to the ERP and get started.

Then we actually look at the ERP.

The system was implemented 12 years ago. It’s been customized by three different consulting firms. The vendor’s standard API covers about 60% of the data the AI needs — the other 40% lives in custom tables that nobody documented. There are 47 custom reports, each querying the database differently, and no two of them agree on the definition of “on-time delivery.” The production module is used for scheduling, but the planners override it daily using a spreadsheet that has its own logic. And the system has 15 years of historical data — but the first 8 years were entered inconsistently because the company was still figuring out the software.

“Just connect to the ERP” is like saying “just climb that mountain” when you’re pointing at Everest wearing sneakers.

This is a core part of the technical debt tax manufacturers pay on every AI initiative. The ERP isn’t the problem. It does what it was designed to do — manage transactions, enforce business processes, and serve as the system of record. The problem is that ERPs were never designed to be data platforms for AI. And the gap between “transactional system” and “AI-ready data source” is where most AI projects go to die.

The 5 Most Common ERP Data Problems That Kill AI Projects

1. The Definition Problem

Ask your ERP what “on-time delivery” means. Actually, don’t ask the ERP — ask five people who use the ERP. You’ll get five different answers.

Does “on-time” mean the original requested date? The confirmed date? The most recent promised date after the third revision? Does “delivery” mean shipped from the warehouse or received at the customer site? Does it count if the order was 95% complete?

Why this kills AI: A demand forecasting model trained on “on-time delivery” data where the definition is inconsistent will produce forecasts that are precisely wrong. The model learns patterns in the noise and treats definitional inconsistency as signal.

The scope of the problem: In our assessments, we typically find 15-30 key business terms that have inconsistent definitions across the organization. This is a major contributor to the cost of bad data. Each one is a potential data integrity issue for any AI system that relies on it.

2. The Customization Problem

No mid-market manufacturer runs a vanilla ERP. The system has been customized to match your processes — which means the data model has been customized too. Custom fields, custom tables, custom workflows, custom validation rules.

Why this kills AI: Custom data structures aren’t covered by standard connectors, standard APIs, or standard documentation. Every integration is a custom engineering effort. When the ERP vendor releases an upgrade, custom elements may break. When the AI team needs a new data field, they’re modifying a customized system that nobody fully understands.

A precision machining company we worked with had 340 custom fields across their ERP. Of those, 87 were actively used. Of those 87, only 23 were documented. The remaining 64 required forensic investigation — interviewing users, reverse-engineering reports, and testing assumptions against actual data — before the AI team could use them.

3. The Historical Data Problem

AI models love historical data. ERPs have lots of it. But the historical data in most ERPs is unreliable for AI training, for reasons nobody mentions in the vendor pitch:

Schema changes over time. Fields were added, removed, or redefined. A field that means “product category” today might have meant “product line” three years ago — and the values weren’t migrated.
Process changes not reflected in data. When the company changed its inspection process in 2021, the data format changed too — but nobody flagged the pre-2021 data as following a different process.
Data entry inconsistency. Free-text fields that should have been dropdowns. Different conventions for part numbers, customer names, and addresses. Abbreviations that mean different things to different people.
Missing data. Fields that were “optional” during implementation but turn out to be critical for AI. Timestamps that weren’t captured. Status changes that weren’t logged.

Why this kills AI: You can’t train a model on 10 years of data if the first 7 years use different definitions, different schemas, and different data quality standards than the last 3. Most teams discover this months into the project, after they’ve already built the pipeline and started training.

4. The Integration Problem

The ERP doesn’t exist in isolation. It connects to (or should connect to) the MES, CMMS, QMS, WMS, CRM, and various other systems. In most mid-market manufacturers, these connections range from “fragile but functional” to “nonexistent.”

Why this kills AI: Most valuable AI use cases require data from multiple systems. Predictive quality needs ERP data (material, supplier, order) plus MES data (process parameters) plus QMS data (inspection results). If these systems don’t talk to each other reliably, the AI project becomes a data integration project with an AI component — and the integration costs dominate.

Common integration patterns we see:

Nightly batch exports: Data is 24 hours stale by the time the AI sees it. Fine for monthly reporting. Useless for real-time decision support.
Point-to-point custom scripts: Fragile, undocumented, and usually maintained by one person.
Manual re-entry: Someone types data from one system into another. The AI inherits the typos and delays.
No integration at all: Systems are islands, and the only “integration” is a person walking between screens.

5. The Access Problem

Even when the data exists and is reasonably clean, getting it out of the ERP in a format the AI can use is often harder than expected.

Direct database access is the fastest approach but introduces risk — poorly written queries can slow the production system, and direct access bypasses the business logic layer, which means you might get raw data that doesn’t reflect the actual state (e.g., soft-deleted records, pending transactions).

Standard APIs are safer but often limited. They expose the standard data model, not your customizations. They have rate limits. They may not support the bulk data extraction that model training requires.

Custom reports and extracts are available but add a maintenance burden. Every time the ERP changes, the extract needs updating. And the person who built the extract needs to still be around.

The Integration Layer Approach: Building AI on Top of Your ERP

Here’s the good news: you don’t need to replace your ERP to deploy AI. You need to build an intelligence layer between your ERP and your AI applications.

What the Integration Layer Does

The integration layer sits between your source systems (ERP, MES, QMS, etc.) and your AI applications. It:

Extracts data from each source system using the most appropriate method (API, database replication, file export)
Transforms data into a consistent, documented format — standardizing definitions, resolving conflicts, and applying business rules
Stores the transformed data in a purpose-built analytics/AI data store (data warehouse, lakehouse, or feature store)
Serves data to AI applications and analytics tools through clean, documented APIs

Why This Works Better Than Direct Connection

Decouples AI from the ERP. Changes to the ERP don’t break AI applications, because the integration layer absorbs the change. The AI sees a consistent interface regardless of what happens underneath.

Standardizes definitions. “On-time delivery” is defined once, in the transformation layer, using explicitly documented business logic. Every AI application and report uses the same definition.

Handles historical data issues. The transformation layer can apply consistent business rules to historical data, flagging or correcting inconsistencies that would otherwise corrupt model training.

Enables multi-system joins. The integration layer can combine data from the ERP, MES, QMS, and other systems into unified datasets that no single system can provide.

Creates reusability. Once you’ve built the integration for one AI project, subsequent projects leverage the same clean data. The investment compounds.

What the Architecture Looks Like

For most mid-market manufacturers on the Microsoft stack — and we’ve written extensively about Microsoft Fabric for manufacturing — the architecture looks like this:

Source systems → Azure Data Factory (extraction and orchestration) → Microsoft Fabric or Azure SQL (transformation and storage) → API layer (serving data to AI applications)

The AI applications — whether they’re custom models, Copilot Studio agents, or Power Platform automations — connect to the API layer, not to the ERP. They get clean, consistent, documented data without knowing or caring about the complexity underneath.

A Real Example: AI Without an ERP Replacement

A contract electronics manufacturer had been told by two different consultants that they needed to replace their 15-year-old ERP before they could do anything with AI. The ERP replacement would take 18-24 months and cost $1.2M+. They’d been putting off AI for three years waiting for the “right time” to tackle the ERP.

We took a different approach.

The goal: Build a yield prediction model that could identify PCB assembly defects earlier in the process, reducing rework costs.

The ERP reality: Epicor, heavily customized, with 12 years of production data. Standard APIs covered about half the needed data. No integration with the AOI (automated optical inspection) system or the test stations.

What we built:

Database replication from Epicor to Azure SQL — not the entire database, just the 14 tables relevant to production, quality, and material data. Replicated every 15 minutes.
AOI integration — built a simple data collector that captured inspection results from the AOI machines and pushed them to Azure SQL. This was the data that had been completely siloed in a standalone system.
Transformation layer — SQL-based transformations that standardized part numbers (the ERP and AOI systems used different formats), resolved date/time discrepancies, and created the unified dataset the model needed.
Feature store — a curated set of features derived from the combined ERP and AOI data, documented and versioned, ready for model training.
Yield prediction model — trained on the feature store, deployed as an API, consumed by a simple dashboard that the production team used at shift start.

Timeline: 14 weeks from kickoff to production deployment. Cost: $165K — about 14% of the proposed ERP replacement. Result: Rework costs dropped 23% in the first quarter. The model caught defect patterns that correlated with specific material lots and supplier changes — patterns that were invisible when the data was siloed.

The 15-year-old ERP is still running. It still does its job. But now there’s an intelligence layer on top of it that makes the data accessible, consistent, and useful for AI.

You don’t need a new ERP to deploy AI. You need a way to get clean, reliable data out of your existing ERP. That’s a data engineering problem, not an ERP replacement problem. And it’s a dramatically cheaper one.

The Direct Access vs. Integration Layer Debate

Some teams try to shortcut the integration layer by connecting the AI directly to the ERP database. “Why add a layer? Let’s just query the data we need.”

When direct access works:

One-off analysis or model prototyping (not production)
Simple queries against standard tables
Read-only access with no risk to the production system
A DBA who understands the schema and can write safe queries

When direct access fails (which is most production scenarios):

When you need data from multiple systems joined together
When business definitions require transformation logic
When multiple AI applications need the same data
When the ERP’s schema changes and every AI application breaks simultaneously
When query performance affects the production ERP
When you need historical data that’s been consistently transformed

The integration layer is more work upfront. But it pays off on the second AI project, and the third, and every one after that. Direct access is a shortcut that creates technical debt — the kind we wrote about in the hidden tax of technical debt. A proper data governance framework prevents these shortcuts from becoming permanent architecture.

What to Do Monday Morning

If you’re a manufacturer sitting on a heavily customized ERP and wondering how to start with AI, here’s the practical next step:

1. Inventory Your Data Needs

For your highest-priority AI use case, list every data element the model would need. Map each one to a source system. Identify whether it’s accessible via API, database, or not at all.

2. Assess Your ERP’s AI Readiness

How customized is the system? What APIs are available? How reliable is the historical data? What documentation exists? This doesn’t need to be a massive audit — a focused 2-week assessment of the specific data domains relevant to your AI use case.

3. Design the Integration Layer

Start small. Build the integration for your first AI use case. Design it to be extensible — use standardized patterns and documented interfaces so the next project can build on the same foundation.

4. Build the First Use Case

With clean data flowing through the integration layer, build the AI application. This should be dramatically faster and cheaper than building directly on top of the ERP.

5. Expand Incrementally

Each subsequent AI project adds to the integration layer — more data sources, more transformations, more features. The platform grows with your AI ambitions.

The Bottom Line

Your ERP is not the enemy of AI. It’s the source of some of the most valuable data your AI will ever use. But it wasn’t designed to serve AI workloads, and trying to force-fit AI directly on top of a transactional system built for a different purpose is a recipe for expensive failure.

The answer isn’t an ERP replacement. It’s an integration layer that extracts, transforms, and serves your ERP data in a format that AI applications can use. This approach is faster, cheaper, and less risky than replacing the ERP — and it delivers AI capabilities in months, not years.

Stop waiting for the ERP upgrade. Build the intelligence layer now, with what you have.

Want to understand how to get AI-ready data out of your existing ERP? Talk to our team about a data foundations assessment, or take our AI Readiness Assessment to see where you stand.

Why Your ERP Is the Biggest Bottleneck to AI — And What to Do About It