Overview
The Data Pipeline is a three-stage processing system that takes raw product data and progressively transforms it into high-quality, feed-ready content. Each stage adds a layer of structure, validation, and enrichment — inspired by the medallion architecture used in data engineering.Stage 1 — Bronze (Raw Ingest)
When products enter the system — whether via CSV upload, URL import, API, or MCP feed — they land in the Bronze layer.Characteristics
- Permissive — all data is accepted as-is, with minimal validation
- Idempotent — re-importing the same product does not create duplicates; existing records are updated
- Schema-free fields — non-standard columns are stored as flexible
attributes - No transformation — titles, descriptions, and prices are preserved exactly as provided
What happens at Bronze
- File is parsed (CSV, XLSX, JSON feed, or scraped HTML)
- Each row is mapped to the internal product schema
- Required fields (
title,sku) are checked for presence - All records are written to the repository with
pipeline_stage: "bronze" - Import summary is returned: total, created, updated, errors
Supported entity types
| Entity | Description |
|---|---|
product | Base product record with title, SKU, price, brand, category |
variant | Size/color/configuration variants linked to a product |
media | Images and video URLs attached to products |
Stage 2 — Silver (Normalize)
Silver is the first active transformation stage. It normalizes, deduplicates, and validates product data to ensure consistency across the catalog.Triggering Silver
Silver is not automatic by default. You trigger it:- Per product — via the product detail page → “Normalize” button
- Batch — via the batch actions panel → select products → “Normalize”
- API —
POST /api/workspace/{workspaceId}/catalogs/{catalogId}/batch/silver
Auto-trigger for Silver can be enabled in Pipeline Settings. When enabled, Silver runs automatically after every Bronze ingest.
What Silver does
| Operation | Detail |
|---|---|
| Field normalization | Title case, trim whitespace, standardize currency codes |
| Category mapping | Maps raw category strings to your workspace taxonomy |
| Duplicate detection | Finds products with matching SKU, GTIN, or title similarity |
| URL validation | Checks that image and media URLs are reachable (HTTP 200) |
| Brand matching | Links raw brand names to workspace brand records |
| Attribute schema | Promotes common attributes to structured fields |
Custom Silver mappings
You can define custom field mapping rules in Pipeline Settings. For example:- Map
"product_name"→title - Map
"item_code"→sku - Map
"cat"→categoryPathwith prefix"Apparel > "
Stage 3 — Gold (Score & Analyze)
Gold is the optional quality scoring stage. It analyzes each product against a configurable rubric and produces an optimization score — a single number from 0–100 that reflects content completeness and quality.Triggering Gold
Gold is always manual (or API-triggered). It does not run automatically unless explicitly enabled.- Per product — “Analyze” button on product detail
- Batch — select products → “Analyze”
- API —
POST /api/workspace/{workspaceId}/catalogs/{catalogId}/batch/gold
Optimization score
The score is computed across 7 stages:| Stage | Weight | What’s evaluated |
|---|---|---|
| Identity | 20% | SKU, GTIN, brand presence |
| Taxonomy | 15% | Category depth, subcategory |
| Content | 25% | Title length, description quality, bullet points |
| Media | 20% | Image count, video presence, minimum resolution |
| Pricing | 10% | Price present, currency, original price for discounts |
| Attributes | 5% | Key attributes for the product’s category |
| SEO | 5% | Slug, meta description, keyword density |
Score thresholds
| Range | Label | Meaning |
|---|---|---|
| 85–100 | Excellent | Ready for all channels |
| 65–84 | Good | Minor improvements needed |
| 40–64 | Warning | Important fields missing |
| 0–39 | Poor | Critical gaps, not feed-ready |
Gap detection
Gold produces agaps list — fields that, if filled, would most increase the score. Example:
Custom Gold weights
Scoring weights are configurable at the team or organization level via Pipeline Settings. For example, a media-heavy catalog might increase the Media weight to 35%.Full pipeline flow
Pipeline settings
Pipeline behavior is configurable at workspace level:- Custom Silver mappings — map any source field to the Alana schema
- Custom Gold weights — adjust scoring weights per workspace
- Auto-trigger Silver — run Silver automatically after Bronze
- Auto-trigger Gold — run Gold automatically after Silver (not recommended for large catalogs)
- Preview mode — simulate pipeline changes without writing to products
Best practices
Run Silver before reviewing products
Run Silver before reviewing products
Raw Bronze data can have inconsistent casing, missing brand links, and broken image URLs. Running Silver first gives you clean data to review.
Gold is optional — use it for prioritization
Gold is optional — use it for prioritization
You don’t need a Gold score to publish or distribute. Use Gold to identify which products need the most work before a major campaign or feed submission.
Use batch actions for large catalogs
Use batch actions for large catalogs
Instead of running Normalize or Analyze product-by-product, use the batch actions panel to process thousands of products in a single operation.
Configure custom mappings before the first import
Configure custom mappings before the first import
If your supplier files use non-standard column names, set up Silver field mappings before importing. This ensures your first import lands in the right shape.