L fields to Python dicts

This guide shows how to turn a raw UN/EDIFACT Bill of Lading (B/L) interchange into a normalized, type-safe Python dict — reading delimiters from the service string, walking segments defensively, and surfacing SOLAS VGM and IMDG compliance flags so downstream port automation never acts on a half-parsed record. The raw interchange is notoriously brittle: segment order drifts by carrier, composite delimiters shift across legacy translators, and mandatory fields move between directory versions (D95B, D16A, or bespoke carrier variants). A correct mapper is the single source of truth that bridges carrier EDI gateways, the terminal operating system (TOS), and customs clearance.

Architecture Alignment

This task is the segment-level detail beneath the Bill of Lading Schema Mapping layer, which is itself one discipline inside the Core Maritime Architecture & Taxonomy framework. The parent layer defines what canonical fields exist and how they are validated against registries; this page owns how raw UNB/BGM/NAD/LOC/CNI/GID/MEA/DGS segments are physically decoded into a Python structure before typing. Get this wrong and every downstream consumer — Container Hierarchy Data Models resolution, Port Call Workflow Design milestone events, customs manifests — inherits silent corruption. The mapper’s job is to be syntax-tolerant at the wire and semantics-explicit at the boundary: accept messy-but-parseable input, emit a dict whose every field carries a declared meaning.

Prerequisites & Environment Setup

Python 3.11+ — the code uses PEP 604 unions and dataclasses; nothing else is version-sensitive.
structlog for machine-readable JSON logs (pip install structlog). Bare print() has no place in an audit-traced maritime pipeline; every parse decision must be reconstructable during a port state control inspection.
pytest for the fixture-driven verification in the last section (pip install pytest).
The core mapper itself has no third-party parsing dependency — it decodes UN/EDIFACT with the standard library so it can run inside a locked-down ingestion worker with a minimal supply chain. If you prefer a library, pydantic slots in at the typing boundary exactly as in the parent Schema Validation Frameworks pattern.

import structlog

# One-time structured-logging setup at process start.
structlog.configure(
    processors=[
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer(),
    ]
)
log = structlog.get_logger()

Never hard-code delimiters or a codec. Carrier delimiter drift — a translator that emits : for components where the last message used . — is the single most common parse failure, and it is defeated only by reading the service string from the payload.

Step-by-step Implementation

Step 1 — Model the service-string delimiters

The optional UNA segment declares the six service characters. When present it overrides the defaults; when absent, the UN/EDIFACT syntax defaults (', +, :, ?) apply. Model them as mutable config so a single interchange can rebind them mid-stream.

from dataclasses import dataclass
from typing import Optional


@dataclass
class EDIConfig:
    segment_terminator: str = "'"
    element_separator: str = "+"
    component_separator: str = ":"
    release_character: str = "?"
    max_segments: int = 5000


class EDIParseError(Exception):
    """Raised when a mandatory UN/EDIFACT segment or composite fails validation."""

Step 2 — Split elements and composites, honoring the release character

The release character (? by default) escapes a delimiter that appears as literal data — PORT?+CRANE is one value PORT+CRANE, not two elements. A naive str.split("+") corrupts any party name or free-text description containing an escaped delimiter, so both splitters must walk character by character.

from typing import List


class BLFieldMapper:
    def __init__(self, config: Optional[EDIConfig] = None) -> None:
        self.cfg = config or EDIConfig()

    def _split_on(self, raw: str, sep: str) -> List[str]:
        """Split respecting release characters and trailing separators."""
        parts: List[str] = []
        current: List[str] = []
        i = 0
        while i < len(raw):
            if raw[i] == self.cfg.release_character and i + 1 < len(raw):
                current.append(raw[i + 1])  # escaped literal
                i += 2
            elif raw[i] == sep:
                parts.append("".join(current))
                current = []
                i += 1
            else:
                current.append(raw[i])
                i += 1
        parts.append("".join(current))
        return parts

    def _split_elements(self, raw: str) -> List[str]:
        return self._split_on(raw, self.cfg.element_separator)

    def _split_composite(self, raw: str) -> List[str]:
        return self._split_on(raw, self.cfg.component_separator) if raw else []

Step 3 — Detect the UNA override and stream segments

UNA is exactly six characters after the tag, positional and never delimited. Apply it before any other split, then yield non-empty segments. Cap the segment count so a corrupt terminator cannot fan out into an unbounded loop.

import re
from typing import Generator


class BLFieldMapper(BLFieldMapper):  # continued
    _una_re = re.compile(r"^UNA.{6}")

    def _apply_una(self, segment: str) -> None:
        if self._una_re.match(segment):
            chars = segment[3:9]
            self.cfg.component_separator = chars[0]
            self.cfg.element_separator = chars[1]
            self.cfg.release_character = chars[3]
            self.cfg.segment_terminator = chars[5]
            log.info("una_override_applied", delimiters=vars(self.cfg))

    def stream_segments(self, raw_edi: str) -> Generator[str, None, None]:
        cleaned = raw_edi.replace("\r\n", "\n").replace("\r", "\n").strip()
        for seg in cleaned.split(self.cfg.segment_terminator):
            seg = seg.strip()
            if seg:
                yield seg

Step 4 — Dispatch each segment tag into the normalized dict

Walk segments once, keyed on the 3-character tag. After the tag the payload begins with the element separator, so strip it to keep element indices aligned. Party roles come from the NAD DE 3035 qualifier (CZ consignor/shipper, CN consignee, NI notify); ports come from LOC qualifiers (5 load, 61 discharge). A new CNI flushes the previous container before opening the next.

from typing import Any, Dict


class BLFieldMapper(BLFieldMapper):  # continued
    def map_to_dict(self, raw_edi: str) -> Dict[str, Any]:
        bl: Dict[str, Any] = {
            "bl_number": None, "message_type": None,
            "shipper": None, "consignee": None, "notify_party": None,
            "load_port": None, "discharge_port": None,
            "containers": [],
            "compliance_flags": {"vgm_present": False, "imdg_hazard": False},
        }
        current: Optional[Dict[str, Any]] = None
        count = 0

        for segment in self.stream_segments(raw_edi):
            count += 1
            if count > self.cfg.max_segments:
                raise EDIParseError(f"Exceeded max segment limit ({self.cfg.max_segments})")
            if segment.startswith("UNA"):
                self._apply_una(segment)
                continue

            tag, payload = segment[:3], segment[3:]
            if payload.startswith(self.cfg.element_separator):
                payload = payload[1:]
            el = self._split_elements(payload)

            if tag == "BGM":
                bl["message_type"] = el[0] if el else None
                bl["bl_number"] = el[1] if len(el) > 1 else None
            elif tag == "NAD":
                role = {"CZ": "shipper", "CN": "consignee", "NI": "notify_party"}.get(el[0] if el else "")
                if role and len(el) > 1:
                    bl[role] = self._split_composite(el[1])[0]
            elif tag == "LOC":
                code = self._split_composite(el[1])[0] if len(el) > 1 else None
                if el and el[0] == "5":
                    bl["load_port"] = code
                elif el and el[0] == "61":
                    bl["discharge_port"] = code
            elif tag == "CNI":
                if current:
                    bl["containers"].append(current)
                current = {"container_number": el[1] if len(el) > 1 else None,
                           "packages": 0, "weight_kg": 0.0,
                           "commodity": None, "hazard_class": None}
            elif tag == "GID" and current:
                current["packages"] = int(el[1]) if len(el) > 1 and el[1].isdigit() else 0
                if len(el) > 2 and el[2]:
                    current["commodity"] = self._split_composite(el[2])[0]
            elif tag == "MEA" and current:
                if len(el) > 2 and el[0] == "WT":
                    try:
                        current["weight_kg"] = float(self._split_composite(el[2])[0])
                        bl["compliance_flags"]["vgm_present"] = True
                    except ValueError:
                        log.warning("mea_weight_unparseable", segment=segment)
            elif tag == "DGS" and current:
                if len(el) > 1:
                    current["hazard_class"] = el[1]
                    bl["compliance_flags"]["imdg_hazard"] = True

        if current:  # flush the final consignment; UNZ is a trailer, not a boundary
            bl["containers"].append(current)
        return bl

Step 5 — Gate on regulatory flags before returning

The dict is only useful if it makes compliance state explicit. SOLAS Verified Gross Mass (VGM) must exist before a box is stowed; IMDG hazardous cargo triggers segregation rules. Log these decisions so the Maritime Security Boundary Setup audit chain can prove what the mapper knew and when.

def gate(bl: Dict[str, Any]) -> Dict[str, Any]:
    if not bl["compliance_flags"]["vgm_present"]:
        log.warning("solas_vgm_missing", bl_number=bl["bl_number"])
    if bl["compliance_flags"]["imdg_hazard"]:
        log.info("imdg_segregation_required", bl_number=bl["bl_number"])
    log.info("bl_mapping_complete",
             bl_number=bl["bl_number"], container_count=len(bl["containers"]))
    return bl

Refer to the official UN/EDIFACT syntax rules (ISO 9735) when a carrier’s qualifier set diverges from the DE 3035 / DE 3227 code lists above — the standard, not the carrier’s habit, is the arbiter of what a qualifier means.

Edge Cases & Carrier Deviations

Missing UNA service string. Perfectly legal — the syntax defaults apply. Do not raise; only override when UNA is actually present, exactly as Step 3 does.
Non-standard NAD qualifiers. Some carriers emit SU (supplier) or SH for the shipper instead of CZ, or route the notify party under NAD+N1. Keep the qualifier→role map in one place and extend it per carrier rather than sprinkling if branches through the parser.
Escaped delimiters inside data. A free-text GID commodity description like STEEL COILS ?+ DUNNAGE will shred a naive splitter. The release-character walk in Step 2 is the guard; test it explicitly.
Reordered composites. A few legacy translators swap component order inside C517/C088. Index composites by qualifier where the standard allows, not by fixed position, when you see drift.
CNI versus consignment boundaries. Treat each CNI as the start of a new container and flush on the next CNI or at end of interchange — never on UNZ, which is the interchange trailer, not a cargo boundary.
MMSI in the IMO slot. A TDT/NAD carrying a 9-digit MMSI where a 7-digit IMO belongs is a mapping exception, not a parse error — flag it for the typing layer rather than silently coercing. Physical movements are later correlated against AIS Data Stream Integration, where the MMSI legitimately lives.

Verification & Testing

Assert correctness against a fixture that exercises the delimiters, party qualifiers, weight, and hazard path in one interchange. The test proves the release character, the NAD/LOC mapping, and the VGM flag together.

import pytest

FIXTURE = (
    "UNA:+.? '"
    "UNB+UNOA:2+MAEU+TERMINAL+240315:1030+1'"
    "BGM+340+MAEU123456789+9'"
    "NAD+CZ+ACME SHIPPING?+ CO'"
    "NAD+CN+GLOBAL IMPORTERS'"
    "LOC+5+USNYC'"
    "LOC+61+NLRTM'"
    "CNI+1+MSKU1234567'"
    "GID+1+12:PK'"
    "MEA+WT++15000:KGM'"
    "DGS+IMD+3.2'"
    "UNZ+1+1'"
)


def test_map_to_dict_happy_path() -> None:
    bl = BLFieldMapper().map_to_dict(FIXTURE)
    assert bl["bl_number"] == "MAEU123456789"
    assert bl["shipper"] == "ACME SHIPPING+ CO"   # escaped '+' preserved
    assert bl["consignee"] == "GLOBAL IMPORTERS"
    assert bl["load_port"] == "USNYC"
    assert bl["discharge_port"] == "NLRTM"
    assert len(bl["containers"]) == 1
    c = bl["containers"][0]
    assert c["container_number"] == "MSKU1234567"
    assert c["weight_kg"] == 15000.0
    assert c["hazard_class"] == "3.2"
    assert bl["compliance_flags"] == {"vgm_present": True, "imdg_hazard": True}


def test_missing_una_uses_defaults() -> None:
    no_una = FIXTURE.split("UNB", 1)[1]
    bl = BLFieldMapper().map_to_dict("UNB" + no_una)
    assert bl["bl_number"] == "MAEU123456789"

A green run emits one structured line per gated decision, e.g. {"event": "bl_mapping_complete", "bl_number": "MAEU123456789", "container_count": 1, "level": "info", "timestamp": "…"}. Assert on those events in a structlog capture fixture when you need to prove the audit trail itself, not just the returned dict.

Frequently Asked Questions

Why decode UN/EDIFACT by hand instead of using a parsing library?

For a single message family the standard-library walk is small, dependency-light, and fully auditable — every branch is visible for a port state control review, and there is no upstream package to vet or pin. A library earns its place once you parse many message types (IFCSUM, CUSCAR, BAPLIE) and want a shared grammar; at that point keep the same delimiter-from-payload and release-character discipline shown here, because most libraries still assume delimiters you should be reading from UNA.

Should the mapper return a plain dict or a validated model?

Return the dict from the mapper, then hand it straight to the typing boundary. Keeping decode and validation separate lets ingestion stay fast and lenient while the Bill of Lading Schema Mapping layer applies strict pydantic coercion, registry lookups, and quarantine routing. Fusing them makes both harder to test and forces the parser to know about business rules it should not carry.

How do we handle a carrier that reorders segments or omits optional ones?

The single-pass, tag-dispatch design is order-tolerant by construction: it reacts to whatever tag arrives next and leaves any field it never sees as None. Omitted optional segments simply yield None fields the typing layer can default or reject. Only mandatory-field absence is an error, and that judgment belongs downstream — the mapper’s contract is to report faithfully what was present, never to invent what was missing.

Bill of Lading Schema Mapping — the typing, validation, and quarantine layer this parser feeds
Container Hierarchy Data Models — resolving the mapped container list into the equipment topology
Schema Validation Frameworks — the structural/semantic/regulatory checks applied after decoding
IFCSUM EDI Message Parsing — the same delimiter discipline applied to aggregated manifests
Maritime Security Boundary Setup — audit and non-repudiation for the mapped payload

Up: Bill of Lading Schema Mapping — the parent layer that governs how these decoded fields are typed and validated.

How to map UN/EDIFACT B/L fields to Python dicts #

Architecture Alignment #

Prerequisites & Environment Setup #

Step-by-step Implementation #

Step 1 — Model the service-string delimiters #

Step 2 — Split elements and composites, honoring the release character #

Step 3 — Detect the UNA override and stream segments #

Step 4 — Dispatch each segment tag into the normalized dict #

Step 5 — Gate on regulatory flags before returning #

Edge Cases & Carrier Deviations #

Verification & Testing #

Frequently Asked Questions #

Related #

Related in Bill of Lading Schema Mapping