How to map UN/EDIFACT B/L fields to Python dicts

Maritime terminal operators, port authorities, and shipping line IT teams routinely face operational friction when ingesting UN/EDIFACT Bill of Lading (B/L) messages. The raw interchange format is notoriously brittle: segment ordering varies by carrier, composite element delimiters shift across legacy systems, and mandatory compliance fields frequently drift between message versions (BAPLIE, CUSCAR, or custom carrier variants). Understanding how to map UN/EDIFACT B/L fields to Python dicts is not merely a parsing exercise; it is a foundational requirement for deterministic port operations automation. When implemented correctly, the mapping layer becomes the single source of truth that bridges carrier EDI gateways, terminal operating systems (TOS), and customs clearance workflows.

Architecture Alignment & Taxonomy Grounding

Before writing parsing logic, the mapping strategy must align with the broader Core Maritime Architecture & Taxonomy. In production port environments, B/L data does not exist in isolation. It intersects with vessel stowage plans, yard allocation matrices, and customs risk scoring engines. A robust Python mapping layer must enforce strict schema boundaries, normalize carrier-specific deviations into a canonical dictionary structure, and expose explicit failure states for downstream compliance gating. This architectural discipline prevents silent data corruption and ensures that terminal gate automation, crane scheduling, and customs declarations operate against validated, type-safe payloads.

Deterministic Segment Parsing & Real-World Quirks

flowchart TD
  A["Raw EDIFACT interchange"] --> B["Apply UNA service string
set delimiters"] B --> C["Split on segment terminator
respecting release char"] C --> D{"Segment tag"} D -->|BGM| E["B/L number · message type"] D -->|NAD| F["Shipper · consignee · notify"] D -->|LOC| G["Load / discharge ports"] D -->|CNI · GID · MEA · DGS| H["Container · goods · weight · hazard"] E --> R[("Normalised dict")] F --> R G --> R H --> R

UN/EDIFACT relies on a rigid but fragile delimiter hierarchy: UNA defines service string characters, UNB opens the interchange, and functional groups (BGM, NAD, LOC, GID, CNI, MEA) carry the B/L payload. Format drift occurs when carriers omit optional segments, reorder composites, or inject non-standard qualifiers. The parser must be state-aware, tolerant of missing delimiters, and explicitly defensive against malformed composites. We implement a streaming generator that processes segments sequentially, applies structured logging for auditability, and maps fields to a normalized Python dict structure aligned with Bill of Lading Schema Mapping.

import re
import logging
from typing import Dict, List, Optional, Generator, Any
from dataclasses import dataclass, field
from datetime import datetime, timezone

# Structured logging configuration for audit trails
logging.basicConfig(
    level=logging.INFO,
    format='{"ts":"%(asctime)s","lvl":"%(levelname)s","mod":"%(module)s","msg":"%(message)s"}'
)
logger = logging.getLogger(__name__)

@dataclass
class EDIConfig:
    segment_terminator: str = "'"
    element_separator: str = "+"
    component_separator: str = ":"
    release_character: str = "?"
    max_segments: int = 5000

class EDIParseError(Exception):
    """Raised when a mandatory UN/EDIFACT segment or composite fails validation."""
    pass

class BLFieldMapper:
    def __init__(self, config: Optional[EDIConfig] = None):
        self.cfg = config or EDIConfig()
        # Precompile regex for UNA override detection
        self._una_re = re.compile(r"^UNA.{6}")
        
    def _apply_una(self, segment: str) -> None:
        if self._una_re.match(segment):
            chars = segment[3:9]
            if len(chars) == 6:
                self.cfg.component_separator = chars[0]
                self.cfg.element_separator = chars[1]
                self.cfg.release_character = chars[3]
                self.cfg.segment_terminator = chars[5]
                logger.info("UNA override applied", extra={"delimiters": self.cfg.__dict__})

    def _split_elements(self, raw: str) -> List[str]:
        """Safely split elements respecting release characters and trailing separators."""
        parts: List[str] = []
        current: List[str] = []
        i = 0
        while i < len(raw):
            if raw[i] == self.cfg.release_character and i + 1 < len(raw):
                current.append(raw[i+1])
                i += 2
            elif raw[i] == self.cfg.element_separator:
                parts.append("".join(current))
                current = []
                i += 1
            else:
                current.append(raw[i])
                i += 1
        parts.append("".join(current))
        return parts

    def _split_composite(self, raw: str) -> List[str]:
        """Handle composite elements with release character tolerance."""
        if not raw:
            return []
        result: List[str] = []
        current: List[str] = []
        i = 0
        while i < len(raw):
            if raw[i] == self.cfg.release_character and i + 1 < len(raw):
                current.append(raw[i+1])
                i += 2
            elif raw[i] == self.cfg.component_separator:
                result.append("".join(current))
                current = []
                i += 1
            else:
                current.append(raw[i])
                i += 1
        result.append("".join(current))
        return result

    def stream_segments(self, raw_edi: str) -> Generator[str, None, None]:
        cleaned = raw_edi.replace("\r\n", "\n").replace("\r", "\n").strip()
        segments = cleaned.split(self.cfg.segment_terminator)
        for seg in segments:
            seg = seg.strip()
            if seg:
                yield seg

    def map_to_dict(self, raw_edi: str) -> Dict[str, Any]:
        bl_data: Dict[str, Any] = {
            "bl_number": None,
            "message_type": None,
            "shipper": None,
            "consignee": None,
            "notify_party": None,
            "vessel_voyage": None,
            "load_port": None,
            "discharge_port": None,
            "containers": [],
            "compliance_flags": {"vgm_present": False, "imdg_hazard": False, "customs_ready": False}
        }

        current_container: Optional[Dict[str, Any]] = None
        segment_count = 0

        for segment in self.stream_segments(raw_edi):
            segment_count += 1
            if segment_count > self.cfg.max_segments:
                raise EDIParseError(f"Exceeded max segment limit ({self.cfg.max_segments})")

            if segment.startswith("UNA"):
                self._apply_una(segment)
                continue

            tag = segment[:3]
            payload = segment[3:]
            # After the 3-char tag the payload begins with the element separator;
            # strip it so element indices align (elements[0] is the first data element).
            if payload.startswith(self.cfg.element_separator):
                payload = payload[1:]

            if tag == "UNB":
                bl_data["interchange_id"] = self._split_elements(payload)[1] if len(self._split_elements(payload)) > 1 else None
            elif tag == "BGM":
                elements = self._split_elements(payload)
                bl_data["bl_number"] = elements[1] if len(elements) > 1 else None
                bl_data["message_type"] = elements[0] if len(elements) > 0 else None
            elif tag == "NAD":
                elements = self._split_elements(payload)
                party_qualifier = elements[0] if elements else None
                party_name = self._split_composite(elements[1])[0] if len(elements) > 1 else None
                # UN/EDIFACT NAD qualifiers (DE 3035): CZ=consignor/shipper,
                # CN=consignee, NI=notify party.
                if party_qualifier == "CZ": bl_data["shipper"] = party_name
                elif party_qualifier == "CN": bl_data["consignee"] = party_name
                elif party_qualifier == "NI": bl_data["notify_party"] = party_name
            elif tag == "LOC":
                elements = self._split_elements(payload)
                loc_qualifier = elements[0] if elements else None
                loc_code = self._split_composite(elements[1])[0] if len(elements) > 1 else None
                if loc_qualifier == "5": bl_data["load_port"] = loc_code
                elif loc_qualifier == "61": bl_data["discharge_port"] = loc_code
            elif tag == "CNI":
                if current_container:
                    bl_data["containers"].append(current_container)
                elements = self._split_elements(payload)
                current_container = {
                    "container_number": elements[1] if len(elements) > 1 else None,
                    "seal_numbers": [],
                    "packages": 0,
                    "weight_kg": 0.0,
                    "commodity": None,
                    "hazard_class": None
                }
            elif tag == "GID":
                if current_container:
                    elements = self._split_elements(payload)
                    current_container["packages"] = int(elements[1]) if len(elements) > 1 and elements[1].isdigit() else 0
                    if len(elements) > 2:
                        current_container["commodity"] = self._split_composite(elements[2])[0]
            elif tag == "MEA":
                if current_container:
                    elements = self._split_elements(payload)
                    if len(elements) > 2 and elements[0] == "WT":
                        try:
                            current_container["weight_kg"] = float(self._split_composite(elements[2])[0])
                            bl_data["compliance_flags"]["vgm_present"] = True
                        except ValueError:
                            logger.warning("Invalid weight format in MEA segment", extra={"segment": segment})
            elif tag == "DGS":
                if current_container:
                    elements = self._split_elements(payload)
                    if len(elements) > 1:
                        current_container["hazard_class"] = elements[1]
                        bl_data["compliance_flags"]["imdg_hazard"] = True

        # Flush the final container after the interchange ends (UNZ is the
        # interchange trailer, not a consignment boundary).
        if current_container:
            bl_data["containers"].append(current_container)

        # Regulatory gating
        if not bl_data["compliance_flags"]["vgm_present"]:
            logger.warning("SOLAS VGM weight missing; stowage planning blocked", extra={"bl": bl_data["bl_number"]})
        if bl_data["compliance_flags"]["imdg_hazard"]:
            logger.info("IMDG hazardous cargo detected; segregation rules apply", extra={"bl": bl_data["bl_number"]})

        logger.info("BL mapping complete", extra={"bl_number": bl_data["bl_number"], "container_count": len(bl_data["containers"])})
        return bl_data

Regulatory Gating & Compliance Enforcement

Maritime data pipelines must enforce regulatory constraints at the mapping layer. SOLAS VGM mandates verified gross mass before stowage, while IMDG requires proper hazard class mapping from DGS and MEA segments. Customs authorities (e.g., CUSCAR, AMS, ENS) require strict HS code and consignee validation. The mapping dict surfaces compliance flags explicitly, allowing downstream systems to reject or quarantine non-conforming records before they trigger operational bottlenecks. For environments requiring Maritime Security Boundary Setup, the parser should integrate cryptographic hashing of the raw payload alongside the mapped dict to guarantee non-repudiation during port call handoffs.

Operational Deployment Notes

In production, carrier EDI streams rarely conform to textbook specifications. Implement fallback routing logic to route malformed or partially mapped records to a quarantine queue rather than failing the entire interchange. The structured logging configuration above outputs machine-readable JSON payloads that integrate directly with ELK stacks or Splunk for real-time anomaly detection. When scaling across multiple terminals, decouple the parsing step from the validation step to maintain low-latency ingestion while preserving strict audit trails. Reference official UN/EDIFACT syntax rules and Python’s logging documentation to align your implementation with international standards and enterprise observability practices.