Skip to content

Data Preprocessing Pipeline

When a Loader Process runs, your CSV data goes through a multi-step pipeline before entities are created or updated. Understanding this pipeline helps you debug issues and write effective loader configurations.

High-Level Flow

graph TD A[CSV File Upload] --> B[Parse CSV] B --> C[Validate Data] C --> D[Preprocessing Pipeline] D --> E[Entity Construction] E --> F[API Upsert] subgraph "Preprocessing Pipeline (6 Steps)" D1["1. Convert Booleans"] --> D2["2. Apply Federated Mappings"] D2 --> D3["3. Remove Unwanted Properties"] D3 --> D4["4. Transform by Types (no warnings)"] D4 --> D5["5. Evaluate Conditional Columns"] D5 --> D6["6. Transform by Types (with warnings)"] end D --> D1 D6 --> E

Phase 1: CSV Parsing

The CSV file is downloaded and parsed with the following options:

  • Column headers from the first row are used as property keys
  • Empty lines are skipped
  • Quotes are relaxed (tolerant of mismatched quotes)
  • BOM (byte order mark) characters are handled automatically
  • Leading/trailing whitespace is trimmed from values

If the CSV cannot be parsed (malformed syntax, empty file), the load fails immediately with a PARSING_CSV failure area.

Phase 2: Validation

Before any transformations, the Loader validates the data against entity-type-specific rules. This catches configuration errors early, before any processing occurs.

See Validation Rules & Error Handling for the full list of validation checks per entity type.

If validation fails, the load stops with a VALIDATING_DATA failure area.

Phase 3: Preprocessing Pipeline

The core transformation pipeline runs 6 sequential steps on your data. Each step receives the output of the previous step and produces a new set of rows, along with any errors or warnings.

Phase 4: Entity Construction

After preprocessing, the transformed rows are built into entity objects. Properties are matched to the entity type's schema, values are coerced to the correct types, and entities are prepared for upsert. See Property Type Handling for details on type coercion.

Phase 5: API Upsert

Entities are looked up by their federated IDs. Existing entities are updated; new entities are created. For assortment loads, an entity comparison determines adds, updates, deletes, and unchanged items.


Preprocessing Steps in Detail

Step 1: Convert CSV Boolean Values

Converts string representations of booleans to actual boolean values. This runs first because CSV parsers always produce strings, but downstream logic needs real booleans.

CSV Value Converted To
"true" (case-insensitive) true
"false" (case-insensitive) false
Any other value Unchanged

This applies to every column in every row, regardless of the property's actual type. The boolean conversion is intentionally broad -- the entity builder will later handle type-specific coercion.

Step 2: Apply Federated Mappings

Applies the federatedMappings configuration to create new columns by copying values from existing columns, or by setting static values.

For each mapping entry { targetProperty: sourceColumn }:

  1. If sourceColumn matches a column name in the row, the value from that column is copied to targetProperty.
  2. If sourceColumn does not match any column, the literal string sourceColumn is set as the value for targetProperty on all rows.

The original source columns are preserved -- this step only adds or overwrites target columns.

Example:

Given CSV columns Style Name, SKU and mapping { "name": "Style Name", "origin": "PLM" }:

  • name is set to the value of Style Name (column copy)
  • origin is set to "PLM" for all rows (static value)

Step 3: Remove Unwanted Properties

Deletes columns listed in the propertiesToRemove configuration array from every row. This runs after federated mappings, so you can map a column first and then remove the original.

Example:

With "propertiesToRemove": ["tempCalc", "internalNotes"], those two columns are stripped from every row before further processing.

Step 4: Transform Data Based on Types (Without Warnings)

This step performs two critical transformations using the entity type's property definitions:

Label-to-Slug Conversion

CSV column headers can use either property slugs (keys) or display labels. This step converts label-based headers to their corresponding slugs.

For example, if your entity type has a property with slug fabricWeight and label Fabric Weight, a CSV header of Fabric Weight is renamed to fabricWeight.

  • Columns that match a property slug are kept as-is.
  • Columns that match a property label are renamed to the slug.
  • Columns that match neither are left unchanged (they will be ignored during entity construction).
  • If a column matches a disabled/archived property, a warning is recorded (in step 6).

Display-to-Value Conversion for Option Sets

For Single Select and Multi Select properties backed by option sets, display labels in the CSV are converted to the option set's internal values (keys).

For example, if an option set has option { key: "na", label: "North America" }, a CSV value of North America is converted to na.

Why this step runs without warnings

This step runs once before conditional columns (step 5) and once after (step 6). The first pass (this step) suppresses warnings because conditional columns may create or modify values that affect slug/value resolution. Running it first ensures that any columns referenced by conditional columns are properly transformed. Warnings are only emitted on the second pass, after all transformations are complete.

Step 5: Evaluate Conditional Columns

Applies the conditionalColumns configuration to create or overwrite column values based on conditions and expressions.

For each ColumnDefinition:

  1. If fromProperty is set and no conditions are provided, the value is copied from the source column.
  2. If conditions are provided, each condition's conditional expression is evaluated with row values interpolated. The first condition that returns true has its value applied.
  3. If no conditions match (or none are provided), the default value is used.
  4. Template expressions in values (strings containing {) are evaluated with row values interpolated.

Variable interpolation: {columnName} in expressions is replaced with the actual value from the current row. This works in both conditional and value/default fields.

Expression evaluation: Conditions are evaluated as JavaScript expressions. This means you can use standard JavaScript operators and string methods:

{
  "conditional": "{gender} === 'M' && {age} > 18",
  "value": "Adult Mens"
}
{
  "default": "{styleNumber}.split('-')[0]"
}

Step 6: Transform Data Based on Types (With Warnings)

This is the same transformation as Step 4 (label-to-slug conversion and display-to-value conversion), but this time warnings are recorded for:

  • CSV columns that don't match any property slug or label on the entity type
  • CSV columns that match a disabled or archived property
  • Option set values that don't match any option in the set
  • Columns that appear to be property labels but couldn't be matched

This second pass catches issues that arose from conditional column evaluation in step 5. For example, if a conditional column sets a typePath value, the entity type may change, which affects which properties are valid.


Step Results and Logging

Each preprocessing step produces a result object containing:

  • rows: The transformed data after the step
  • errors: Row-level issues with specific column details
  • warnings: Row-level warnings
  • generalErrors: File or config-level errors
  • generalWarnings: File or config-level warnings

These results are uploaded as JSON files attached to the LoaderProcess entity. You can access them through the API or Admin Console to debug transformation issues.

The step result files are named loader-process-step-<Step Name>.json and are owned by the loader-process entity.


Troubleshooting

Column not being mapped

If a CSV column is not being picked up as a property:

  1. Check that the column header matches either the property slug or label exactly (case-sensitive for slugs, case-insensitive for labels).
  2. If the column name differs, use federatedMappings to map it.
  3. Check that the property is not disabled/archived on the entity type.

Conditional column not evaluating correctly

  1. Verify that the columns referenced in {columnName} exist in your CSV or were created by federatedMappings.
  2. Remember that conditional columns are evaluated after federated mappings and after the first type transformation pass.
  3. Check that JavaScript expressions use correct syntax -- the expression is wrapped in return (expression) internally.

Unexpected warnings about unrecognized columns

If you see warnings about columns that you don't intend to load, use propertiesToRemove to strip them before processing. This is cleaner than ignoring the warnings.