Data Pipeline

Overview

The Data Pipeline is a three-stage processing system that takes raw product data and progressively transforms it into high-quality, feed-ready content. Each stage adds a layer of structure, validation, and enrichment — inspired by the medallion architecture used in data engineering.

Stage 1 — Bronze (Raw Ingest)

When products enter the system — whether via CSV upload, URL import, API, or MCP feed — they land in the Bronze layer.

Characteristics

Permissive — all data is accepted as-is, with minimal validation
Idempotent — re-importing the same product does not create duplicates; existing records are updated
Schema-free fields — non-standard columns are stored as flexible attributes
No transformation — titles, descriptions, and prices are preserved exactly as provided

What happens at Bronze

File is parsed (CSV, XLSX, JSON feed, or scraped HTML)
Each row is mapped to the internal product schema
Required fields (title, sku) are checked for presence
All records are written to the repository with pipeline_stage: "bronze"
Import summary is returned: total, created, updated, errors

Supported entity types

Entity	Description
`product`	Base product record with title, SKU, price, brand, category
`variant`	Size/color/configuration variants linked to a product
`media`	Images and video URLs attached to products

Stage 2 — Silver (Normalize)

Silver is the first active transformation stage. It normalizes, deduplicates, and validates product data to ensure consistency across the catalog.

Triggering Silver

Silver is not automatic by default. You trigger it:

Per product — via the product detail page → “Normalize” button
Batch — via the batch actions panel → select products → “Normalize”
API — POST /api/workspace/{workspaceId}/catalogs/{catalogId}/batch/silver

Auto-trigger for Silver can be enabled in Pipeline Settings. When enabled, Silver runs automatically after every Bronze ingest.

What Silver does

Operation	Detail
Field normalization	Title case, trim whitespace, standardize currency codes
Category mapping	Maps raw category strings to your workspace taxonomy
Duplicate detection	Finds products with matching SKU, GTIN, or title similarity
URL validation	Checks that image and media URLs are reachable (HTTP 200)
Brand matching	Links raw brand names to workspace brand records
Attribute schema	Promotes common attributes to structured fields

Custom Silver mappings

You can define custom field mapping rules in Pipeline Settings. For example:

Map "product_name" → title
Map "item_code" → sku
Map "cat" → categoryPath with prefix "Apparel > "

Stage 3 — Gold (Score & Analyze)

Gold is the optional quality scoring stage. It analyzes each product against a configurable rubric and produces an optimization score — a single number from 0–100 that reflects content completeness and quality.

Triggering Gold

Gold is always manual (or API-triggered). It does not run automatically unless explicitly enabled.

Per product — “Analyze” button on product detail
Batch — select products → “Analyze”
API — POST /api/workspace/{workspaceId}/catalogs/{catalogId}/batch/gold

Optimization score

The score is computed across 7 stages:

Stage	Weight	What’s evaluated
Identity	20%	SKU, GTIN, brand presence
Taxonomy	15%	Category depth, subcategory
Content	25%	Title length, description quality, bullet points
Media	20%	Image count, video presence, minimum resolution
Pricing	10%	Price present, currency, original price for discounts
Attributes	5%	Key attributes for the product’s category
SEO	5%	Slug, meta description, keyword density

Score thresholds

Range	Label	Meaning
85–100	Excellent	Ready for all channels
65–84	Good	Minor improvements needed
40–64	Warning	Important fields missing
0–39	Poor	Critical gaps, not feed-ready

Gap detection

Gold produces a gaps list — fields that, if filled, would most increase the score. Example:

{
  "score": 58,
  "gaps": ["description", "gtin", "secondaryImages"],
  "missingFields": ["description", "gtin"]
}

Custom Gold weights

Scoring weights are configurable at the team or organization level via Pipeline Settings. For example, a media-heavy catalog might increase the Media weight to 35%.

Full pipeline flow

Pipeline settings

Pipeline behavior is configurable at workspace level:

Custom Silver mappings — map any source field to the Alana schema
Custom Gold weights — adjust scoring weights per workspace
Auto-trigger Silver — run Silver automatically after Bronze
Auto-trigger Gold — run Gold automatically after Silver (not recommended for large catalogs)
Preview mode — simulate pipeline changes without writing to products

See Pipeline Settings for the full configuration reference.

Best practices

Run Silver before reviewing products

Raw Bronze data can have inconsistent casing, missing brand links, and broken image URLs. Running Silver first gives you clean data to review.

Gold is optional — use it for prioritization

You don’t need a Gold score to publish or distribute. Use Gold to identify which products need the most work before a major campaign or feed submission.

Use batch actions for large catalogs

Instead of running Normalize or Analyze product-by-product, use the batch actions panel to process thousands of products in a single operation.

Configure custom mappings before the first import

If your supplier files use non-standard column names, set up Silver field mappings before importing. This ensures your first import lands in the right shape.

Getting Started

Core Concepts

Guides

Overview

Stage 1 — Bronze (Raw Ingest)

Characteristics

What happens at Bronze

Supported entity types

Stage 2 — Silver (Normalize)

Triggering Silver

What Silver does

Custom Silver mappings

Stage 3 — Gold (Score & Analyze)

Triggering Gold

Optimization score

Score thresholds

Gap detection

Custom Gold weights

Full pipeline flow

Pipeline settings

Best practices

Getting Started

Core Concepts

Guides

​Overview

​Stage 1 — Bronze (Raw Ingest)

​Characteristics

​What happens at Bronze

​Supported entity types

​Stage 2 — Silver (Normalize)

​Triggering Silver

​What Silver does

​Custom Silver mappings

​Stage 3 — Gold (Score & Analyze)

​Triggering Gold

​Optimization score

​Score thresholds

​Gap detection

​Custom Gold weights

​Full pipeline flow

​Pipeline settings

​Best practices

Overview

Stage 1 — Bronze (Raw Ingest)

Characteristics

What happens at Bronze

Supported entity types

Stage 2 — Silver (Normalize)

Triggering Silver

What Silver does

Custom Silver mappings

Stage 3 — Gold (Score & Analyze)

Triggering Gold

Optimization score

Score thresholds

Gap detection

Custom Gold weights

Full pipeline flow

Pipeline settings

Best practices