Skip to main content

Overview

The Data Pipeline is a three-stage processing system that takes raw product data and progressively transforms it into high-quality, feed-ready content. Each stage adds a layer of structure, validation, and enrichment — inspired by the medallion architecture used in data engineering.

Stage 1 — Bronze (Raw Ingest)

When products enter the system — whether via CSV upload, URL import, API, or MCP feed — they land in the Bronze layer.

Characteristics

  • Permissive — all data is accepted as-is, with minimal validation
  • Idempotent — re-importing the same product does not create duplicates; existing records are updated
  • Schema-free fields — non-standard columns are stored as flexible attributes
  • No transformation — titles, descriptions, and prices are preserved exactly as provided

What happens at Bronze

  1. File is parsed (CSV, XLSX, JSON feed, or scraped HTML)
  2. Each row is mapped to the internal product schema
  3. Required fields (title, sku) are checked for presence
  4. All records are written to the repository with pipeline_stage: "bronze"
  5. Import summary is returned: total, created, updated, errors

Supported entity types

EntityDescription
productBase product record with title, SKU, price, brand, category
variantSize/color/configuration variants linked to a product
mediaImages and video URLs attached to products

Stage 2 — Silver (Normalize)

Silver is the first active transformation stage. It normalizes, deduplicates, and validates product data to ensure consistency across the catalog.

Triggering Silver

Silver is not automatic by default. You trigger it:
  • Per product — via the product detail page → “Normalize” button
  • Batch — via the batch actions panel → select products → “Normalize”
  • APIPOST /api/workspace/{workspaceId}/catalogs/{catalogId}/batch/silver
Auto-trigger for Silver can be enabled in Pipeline Settings. When enabled, Silver runs automatically after every Bronze ingest.

What Silver does

OperationDetail
Field normalizationTitle case, trim whitespace, standardize currency codes
Category mappingMaps raw category strings to your workspace taxonomy
Duplicate detectionFinds products with matching SKU, GTIN, or title similarity
URL validationChecks that image and media URLs are reachable (HTTP 200)
Brand matchingLinks raw brand names to workspace brand records
Attribute schemaPromotes common attributes to structured fields

Custom Silver mappings

You can define custom field mapping rules in Pipeline Settings. For example:
  • Map "product_name"title
  • Map "item_code"sku
  • Map "cat"categoryPath with prefix "Apparel > "

Stage 3 — Gold (Score & Analyze)

Gold is the optional quality scoring stage. It analyzes each product against a configurable rubric and produces an optimization score — a single number from 0–100 that reflects content completeness and quality.

Triggering Gold

Gold is always manual (or API-triggered). It does not run automatically unless explicitly enabled.
  • Per product — “Analyze” button on product detail
  • Batch — select products → “Analyze”
  • APIPOST /api/workspace/{workspaceId}/catalogs/{catalogId}/batch/gold

Optimization score

The score is computed across 7 stages:
StageWeightWhat’s evaluated
Identity20%SKU, GTIN, brand presence
Taxonomy15%Category depth, subcategory
Content25%Title length, description quality, bullet points
Media20%Image count, video presence, minimum resolution
Pricing10%Price present, currency, original price for discounts
Attributes5%Key attributes for the product’s category
SEO5%Slug, meta description, keyword density

Score thresholds

RangeLabelMeaning
85–100ExcellentReady for all channels
65–84GoodMinor improvements needed
40–64WarningImportant fields missing
0–39PoorCritical gaps, not feed-ready

Gap detection

Gold produces a gaps list — fields that, if filled, would most increase the score. Example:
{
  "score": 58,
  "gaps": ["description", "gtin", "secondaryImages"],
  "missingFields": ["description", "gtin"]
}

Custom Gold weights

Scoring weights are configurable at the team or organization level via Pipeline Settings. For example, a media-heavy catalog might increase the Media weight to 35%.

Full pipeline flow


Pipeline settings

Pipeline behavior is configurable at workspace level:
  • Custom Silver mappings — map any source field to the Alana schema
  • Custom Gold weights — adjust scoring weights per workspace
  • Auto-trigger Silver — run Silver automatically after Bronze
  • Auto-trigger Gold — run Gold automatically after Silver (not recommended for large catalogs)
  • Preview mode — simulate pipeline changes without writing to products
See Pipeline Settings for the full configuration reference.

Best practices

Raw Bronze data can have inconsistent casing, missing brand links, and broken image URLs. Running Silver first gives you clean data to review.
You don’t need a Gold score to publish or distribute. Use Gold to identify which products need the most work before a major campaign or feed submission.
Instead of running Normalize or Analyze product-by-product, use the batch actions panel to process thousands of products in a single operation.
If your supplier files use non-standard column names, set up Silver field mappings before importing. This ensures your first import lands in the right shape.
Last modified on March 18, 2026