Data Preprocessing Pipeline¶
When a Loader Process runs, your CSV data goes through a multi-step pipeline before entities are created or updated. Understanding this pipeline helps you debug issues and write effective loader configurations.
High-Level Flow¶
Phase 1: CSV Parsing¶
The CSV file is downloaded and parsed with the following options:
- Column headers from the first row are used as property keys
- Empty lines are skipped
- Quotes are relaxed (tolerant of mismatched quotes)
- BOM (byte order mark) characters are handled automatically
- Leading/trailing whitespace is trimmed from values
If the CSV cannot be parsed (malformed syntax, empty file), the load fails immediately with a PARSING_CSV failure area.
Phase 2: Validation¶
Before any transformations, the Loader validates the data against entity-type-specific rules. This catches configuration errors early, before any processing occurs.
See Validation Rules & Error Handling for the full list of validation checks per entity type.
If validation fails, the load stops with a VALIDATING_DATA failure area.
Phase 3: Preprocessing Pipeline¶
The core transformation pipeline runs 6 sequential steps on your data. Each step receives the output of the previous step and produces a new set of rows, along with any errors or warnings.
Phase 4: Entity Construction¶
After preprocessing, the transformed rows are built into entity objects. Properties are matched to the entity type's schema, values are coerced to the correct types, and entities are prepared for upsert. See Property Type Handling for details on type coercion.
Phase 5: API Upsert¶
Entities are looked up by their federated IDs. Existing entities are updated; new entities are created. For assortment loads, an entity comparison determines adds, updates, deletes, and unchanged items.
Preprocessing Steps in Detail¶
Step 1: Convert CSV Boolean Values¶
Converts string representations of booleans to actual boolean values. This runs first because CSV parsers always produce strings, but downstream logic needs real booleans.
| CSV Value | Converted To |
|---|---|
"true" (case-insensitive) |
true |
"false" (case-insensitive) |
false |
| Any other value | Unchanged |
This applies to every column in every row, regardless of the property's actual type. The boolean conversion is intentionally broad -- the entity builder will later handle type-specific coercion.
Step 2: Apply Federated Mappings¶
Applies the federatedMappings configuration to create new columns by copying values from existing columns, or by setting static values.
For each mapping entry { targetProperty: sourceColumn }:
- If
sourceColumnmatches a column name in the row, the value from that column is copied totargetProperty. - If
sourceColumndoes not match any column, the literal stringsourceColumnis set as the value fortargetPropertyon all rows.
The original source columns are preserved -- this step only adds or overwrites target columns.
Example:
Given CSV columns Style Name, SKU and mapping { "name": "Style Name", "origin": "PLM" }:
nameis set to the value ofStyle Name(column copy)originis set to"PLM"for all rows (static value)
Step 3: Remove Unwanted Properties¶
Deletes columns listed in the propertiesToRemove configuration array from every row. This runs after federated mappings, so you can map a column first and then remove the original.
Example:
With "propertiesToRemove": ["tempCalc", "internalNotes"], those two columns are stripped from every row before further processing.
Step 4: Transform Data Based on Types (Without Warnings)¶
This step performs two critical transformations using the entity type's property definitions:
Label-to-Slug Conversion¶
CSV column headers can use either property slugs (keys) or display labels. This step converts label-based headers to their corresponding slugs.
For example, if your entity type has a property with slug fabricWeight and label Fabric Weight, a CSV header of Fabric Weight is renamed to fabricWeight.
- Columns that match a property slug are kept as-is.
- Columns that match a property label are renamed to the slug.
- Columns that match neither are left unchanged (they will be ignored during entity construction).
- If a column matches a disabled/archived property, a warning is recorded (in step 6).
Display-to-Value Conversion for Option Sets¶
For Single Select and Multi Select properties backed by option sets, display labels in the CSV are converted to the option set's internal values (keys).
For example, if an option set has option { key: "na", label: "North America" }, a CSV value of North America is converted to na.
Why this step runs without warnings
This step runs once before conditional columns (step 5) and once after (step 6). The first pass (this step) suppresses warnings because conditional columns may create or modify values that affect slug/value resolution. Running it first ensures that any columns referenced by conditional columns are properly transformed. Warnings are only emitted on the second pass, after all transformations are complete.
Step 5: Evaluate Conditional Columns¶
Applies the conditionalColumns configuration to create or overwrite column values based on conditions and expressions.
For each ColumnDefinition:
- If
fromPropertyis set and noconditionsare provided, the value is copied from the source column. - If
conditionsare provided, each condition'sconditionalexpression is evaluated with row values interpolated. The first condition that returnstruehas itsvalueapplied. - If no conditions match (or none are provided), the
defaultvalue is used. - Template expressions in values (strings containing
{) are evaluated with row values interpolated.
Variable interpolation: {columnName} in expressions is replaced with the actual value from the current row. This works in both conditional and value/default fields.
Expression evaluation: Conditions are evaluated as JavaScript expressions. This means you can use standard JavaScript operators and string methods:
{
"conditional": "{gender} === 'M' && {age} > 18",
"value": "Adult Mens"
}
{
"default": "{styleNumber}.split('-')[0]"
}
Step 6: Transform Data Based on Types (With Warnings)¶
This is the same transformation as Step 4 (label-to-slug conversion and display-to-value conversion), but this time warnings are recorded for:
- CSV columns that don't match any property slug or label on the entity type
- CSV columns that match a disabled or archived property
- Option set values that don't match any option in the set
- Columns that appear to be property labels but couldn't be matched
This second pass catches issues that arose from conditional column evaluation in step 5. For example, if a conditional column sets a typePath value, the entity type may change, which affects which properties are valid.
Step Results and Logging¶
Each preprocessing step produces a result object containing:
- rows: The transformed data after the step
- errors: Row-level issues with specific column details
- warnings: Row-level warnings
- generalErrors: File or config-level errors
- generalWarnings: File or config-level warnings
These results are uploaded as JSON files attached to the LoaderProcess entity. You can access them through the API or Admin Console to debug transformation issues.
The step result files are named loader-process-step-<Step Name>.json and are owned by the loader-process entity.
Troubleshooting¶
Column not being mapped¶
If a CSV column is not being picked up as a property:
- Check that the column header matches either the property slug or label exactly (case-sensitive for slugs, case-insensitive for labels).
- If the column name differs, use
federatedMappingsto map it. - Check that the property is not disabled/archived on the entity type.
Conditional column not evaluating correctly¶
- Verify that the columns referenced in
{columnName}exist in your CSV or were created byfederatedMappings. - Remember that conditional columns are evaluated after federated mappings and after the first type transformation pass.
- Check that JavaScript expressions use correct syntax -- the expression is wrapped in
return (expression)internally.
Unexpected warnings about unrecognized columns¶
If you see warnings about columns that you don't intend to load, use propertiesToRemove to strip them before processing. This is cleaner than ignoring the warnings.