Migrating product data from a legacy database to a modern PIM: a checklist
By Philipp Kant 3 min read
Most “PIM rollout” projects don’t actually fail at the PIM. They fail at the data, at the years of accumulated product information sitting in some legacy database that nobody has wanted to look at closely since the person who built it left.
This is a practical checklist for the data layer of a PIM migration. Not the vendor selection, not the rollout plan, not the change management. Just the part where you have to move complex product data from a system that was designed twelve years ago for a different business into a PIM that has its own opinions about how product data should be shaped.
Before you touch the new PIM
Inventory what’s in the legacy system, including the things nobody documented. Every long-running database has an “official” schema and a parallel folk schema. The second one lives in column comments, naming conventions, and the heads of three people. Find both. The migration breaks at the folk schema, not the documented one.
Decide what is authoritative. For each attribute: where does the truth live right now? Catalog system, spreadsheet, ERP, the warehouse? Migrating data without resolving this just moves the contradiction into the new PIM.
Document the implicit rules. “Products that start with K- are
always vehicles, except the ones from 2014.” That sentence will exist
somewhere in the data. Write it down before the migration, not after.
The data layer
Map the schemas slowly. The seductive move is one big mapping table: old column → new field. The correct move is per-attribute, with explicit semantics: what the attribute meant in the source, what it means in the target, and what to do when those disagree.
Handle inconsistent historical data explicitly. Product names change spelling, units of measure change conventions, attributes get repurposed silently over the years. Decide for each case: clean during migration, clean after, or carry forward as-is. Don’t drift between those three decisions per attribute. Pick one rule.
Model the relationships, not just the fields. Especially in domains where the data is fundamentally relational: automotive parts to vehicle compatibility, accessories to base products, variants to masters. The schema mapping is easy; the relationship semantics are where the legacy system has been quietly drifting from reality.
Disambiguate overloaded attributes. A single column in the legacy DB often means three different things depending on product category. Splitting these into proper, distinct PIM attributes is unglamorous but essential. Otherwise the new PIM inherits the same overloading and the migration was theatrical.
The migration mechanics
Use structured exports, not direct DB-to-DB pipes. Export the legacy data to a structured intermediate format (CSV, JSON, an XML profile, or whatever the target PIM ingests cleanly). The intermediate buys you two things: an inspectable record of what was actually migrated, and a safe place to apply transformations without touching either source or target.
Automate the imports, but version them. Every successful import should be reproducible from the same export. When something breaks in production three weeks after cutover, you want to be able to re-run the exact migration step that loaded the affected products.
Build for idempotency. Re-running the migration must not duplicate, overwrite-and-lose, or partially apply. Idempotency is what lets you iterate during the rollout. Without it, every migration run is a one-way door and people stop running them.
Reconcile, don’t just count. Successful row counts are not successful migrations. Reconcile field values for a meaningful sample, including the products you know are weird. The known-weird products are the canary for everything else.
After cutover
Validate against the operational use cases, not just the data. Open the new PIM and walk through the workflows that the team will actually use: search this attribute, filter by that category, export to the format the catalog team needs. Data that’s correct in storage but wrong for the workflow is still a failed migration.
Stop the old system from being authoritative. As long as the legacy database is still in any operational loop, drift starts the moment cutover finishes. Decommission the write paths first, even if the read paths stay for a quarter.
Land the process changes at the same time. Product data migrations fail late when the team keeps doing things the old way against the new system. Workflow training, attribute ownership, and update processes should ship with the migration, not after.
If you’re staring down a PIM migration and the data layer is the part that’s making it look scary, that’s the part we’d usually pick up end-to-end. The PIM vendor will sell you the platform; the migration is its own engineering problem.