How to map UN/EDIFACT B/L fields to Python dicts
Maritime terminal operators, port authorities, and shipping line IT teams routinely face operational friction when ingesting UN/EDIFACT Bill of Lading (B/L) messages. The raw interchange format is notoriously brittle: segment ordering varies by carrier, composite element delimiters shift across legacy systems, and mandatory compliance fields frequently drift between message versions (BAPLIE, CUSCAR, or custom carrier variants). Understanding how to map UN/EDIFACT B/L fields to Python dicts is not merely a parsing exercise; it is a foundational requirement for deterministic port operations automation. When implemented correctly, the mapping layer becomes the single source of truth that bridges carrier EDI gateways, terminal operating systems (TOS), and customs clearance workflows.
Architecture Alignment & Taxonomy Grounding
Before writing parsing logic, the mapping strategy must align with the broader Core Maritime Architecture & Taxonomy. In production port environments, B/L data does not exist in isolation. It intersects with vessel stowage plans, yard allocation matrices, and customs risk scoring engines. A robust Python mapping layer must enforce strict schema boundaries, normalize carrier-specific deviations into a canonical dictionary structure, and expose explicit failure states for downstream compliance gating. This architectural discipline prevents silent data corruption and ensures that terminal gate automation, crane scheduling, and customs declarations operate against validated, type-safe payloads.
Deterministic Segment Parsing & Real-World Quirks
flowchart TD A["Raw EDIFACT interchange"] --> B["Apply UNA service string
set delimiters"] B --> C["Split on segment terminator
respecting release char"] C --> D{"Segment tag"} D -->|BGM| E["B/L number · message type"] D -->|NAD| F["Shipper · consignee · notify"] D -->|LOC| G["Load / discharge ports"] D -->|CNI · GID · MEA · DGS| H["Container · goods · weight · hazard"] E --> R[("Normalised dict")] F --> R G --> R H --> R
UN/EDIFACT relies on a rigid but fragile delimiter hierarchy: UNA defines service string characters, UNB opens the interchange, and functional groups (BGM, NAD, LOC, GID, CNI, MEA) carry the B/L payload. Format drift occurs when carriers omit optional segments, reorder composites, or inject non-standard qualifiers. The parser must be state-aware, tolerant of missing delimiters, and explicitly defensive against malformed composites. We implement a streaming generator that processes segments sequentially, applies structured logging for auditability, and maps fields to a normalized Python dict structure aligned with Bill of Lading Schema Mapping.
import re
import logging
from typing import Dict, List, Optional, Generator, Any
from dataclasses import dataclass, field
from datetime import datetime, timezone
# Structured logging configuration for audit trails
logging.basicConfig(
level=logging.INFO,
format='{"ts":"%(asctime)s","lvl":"%(levelname)s","mod":"%(module)s","msg":"%(message)s"}'
)
logger = logging.getLogger(__name__)
@dataclass
class EDIConfig:
segment_terminator: str = "'"
element_separator: str = "+"
component_separator: str = ":"
release_character: str = "?"
max_segments: int = 5000
class EDIParseError(Exception):
"""Raised when a mandatory UN/EDIFACT segment or composite fails validation."""
pass
class BLFieldMapper:
def __init__(self, config: Optional[EDIConfig] = None):
self.cfg = config or EDIConfig()
# Precompile regex for UNA override detection
self._una_re = re.compile(r"^UNA.{6}")
def _apply_una(self, segment: str) -> None:
if self._una_re.match(segment):
chars = segment[3:9]
if len(chars) == 6:
self.cfg.component_separator = chars[0]
self.cfg.element_separator = chars[1]
self.cfg.release_character = chars[3]
self.cfg.segment_terminator = chars[5]
logger.info("UNA override applied", extra={"delimiters": self.cfg.__dict__})
def _split_elements(self, raw: str) -> List[str]:
"""Safely split elements respecting release characters and trailing separators."""
parts: List[str] = []
current: List[str] = []
i = 0
while i < len(raw):
if raw[i] == self.cfg.release_character and i + 1 < len(raw):
current.append(raw[i+1])
i += 2
elif raw[i] == self.cfg.element_separator:
parts.append("".join(current))
current = []
i += 1
else:
current.append(raw[i])
i += 1
parts.append("".join(current))
return parts
def _split_composite(self, raw: str) -> List[str]:
"""Handle composite elements with release character tolerance."""
if not raw:
return []
result: List[str] = []
current: List[str] = []
i = 0
while i < len(raw):
if raw[i] == self.cfg.release_character and i + 1 < len(raw):
current.append(raw[i+1])
i += 2
elif raw[i] == self.cfg.component_separator:
result.append("".join(current))
current = []
i += 1
else:
current.append(raw[i])
i += 1
result.append("".join(current))
return result
def stream_segments(self, raw_edi: str) -> Generator[str, None, None]:
cleaned = raw_edi.replace("\r\n", "\n").replace("\r", "\n").strip()
segments = cleaned.split(self.cfg.segment_terminator)
for seg in segments:
seg = seg.strip()
if seg:
yield seg
def map_to_dict(self, raw_edi: str) -> Dict[str, Any]:
bl_data: Dict[str, Any] = {
"bl_number": None,
"message_type": None,
"shipper": None,
"consignee": None,
"notify_party": None,
"vessel_voyage": None,
"load_port": None,
"discharge_port": None,
"containers": [],
"compliance_flags": {"vgm_present": False, "imdg_hazard": False, "customs_ready": False}
}
current_container: Optional[Dict[str, Any]] = None
segment_count = 0
for segment in self.stream_segments(raw_edi):
segment_count += 1
if segment_count > self.cfg.max_segments:
raise EDIParseError(f"Exceeded max segment limit ({self.cfg.max_segments})")
if segment.startswith("UNA"):
self._apply_una(segment)
continue
tag = segment[:3]
payload = segment[3:]
# After the 3-char tag the payload begins with the element separator;
# strip it so element indices align (elements[0] is the first data element).
if payload.startswith(self.cfg.element_separator):
payload = payload[1:]
if tag == "UNB":
bl_data["interchange_id"] = self._split_elements(payload)[1] if len(self._split_elements(payload)) > 1 else None
elif tag == "BGM":
elements = self._split_elements(payload)
bl_data["bl_number"] = elements[1] if len(elements) > 1 else None
bl_data["message_type"] = elements[0] if len(elements) > 0 else None
elif tag == "NAD":
elements = self._split_elements(payload)
party_qualifier = elements[0] if elements else None
party_name = self._split_composite(elements[1])[0] if len(elements) > 1 else None
# UN/EDIFACT NAD qualifiers (DE 3035): CZ=consignor/shipper,
# CN=consignee, NI=notify party.
if party_qualifier == "CZ": bl_data["shipper"] = party_name
elif party_qualifier == "CN": bl_data["consignee"] = party_name
elif party_qualifier == "NI": bl_data["notify_party"] = party_name
elif tag == "LOC":
elements = self._split_elements(payload)
loc_qualifier = elements[0] if elements else None
loc_code = self._split_composite(elements[1])[0] if len(elements) > 1 else None
if loc_qualifier == "5": bl_data["load_port"] = loc_code
elif loc_qualifier == "61": bl_data["discharge_port"] = loc_code
elif tag == "CNI":
if current_container:
bl_data["containers"].append(current_container)
elements = self._split_elements(payload)
current_container = {
"container_number": elements[1] if len(elements) > 1 else None,
"seal_numbers": [],
"packages": 0,
"weight_kg": 0.0,
"commodity": None,
"hazard_class": None
}
elif tag == "GID":
if current_container:
elements = self._split_elements(payload)
current_container["packages"] = int(elements[1]) if len(elements) > 1 and elements[1].isdigit() else 0
if len(elements) > 2:
current_container["commodity"] = self._split_composite(elements[2])[0]
elif tag == "MEA":
if current_container:
elements = self._split_elements(payload)
if len(elements) > 2 and elements[0] == "WT":
try:
current_container["weight_kg"] = float(self._split_composite(elements[2])[0])
bl_data["compliance_flags"]["vgm_present"] = True
except ValueError:
logger.warning("Invalid weight format in MEA segment", extra={"segment": segment})
elif tag == "DGS":
if current_container:
elements = self._split_elements(payload)
if len(elements) > 1:
current_container["hazard_class"] = elements[1]
bl_data["compliance_flags"]["imdg_hazard"] = True
# Flush the final container after the interchange ends (UNZ is the
# interchange trailer, not a consignment boundary).
if current_container:
bl_data["containers"].append(current_container)
# Regulatory gating
if not bl_data["compliance_flags"]["vgm_present"]:
logger.warning("SOLAS VGM weight missing; stowage planning blocked", extra={"bl": bl_data["bl_number"]})
if bl_data["compliance_flags"]["imdg_hazard"]:
logger.info("IMDG hazardous cargo detected; segregation rules apply", extra={"bl": bl_data["bl_number"]})
logger.info("BL mapping complete", extra={"bl_number": bl_data["bl_number"], "container_count": len(bl_data["containers"])})
return bl_data
Regulatory Gating & Compliance Enforcement
Maritime data pipelines must enforce regulatory constraints at the mapping layer. SOLAS VGM mandates verified gross mass before stowage, while IMDG requires proper hazard class mapping from DGS and MEA segments. Customs authorities (e.g., CUSCAR, AMS, ENS) require strict HS code and consignee validation. The mapping dict surfaces compliance flags explicitly, allowing downstream systems to reject or quarantine non-conforming records before they trigger operational bottlenecks. For environments requiring Maritime Security Boundary Setup, the parser should integrate cryptographic hashing of the raw payload alongside the mapped dict to guarantee non-repudiation during port call handoffs.
Operational Deployment Notes
In production, carrier EDI streams rarely conform to textbook specifications. Implement fallback routing logic to route malformed or partially mapped records to a quarantine queue rather than failing the entire interchange. The structured logging configuration above outputs machine-readable JSON payloads that integrate directly with ELK stacks or Splunk for real-time anomaly detection. When scaling across multiple terminals, decouple the parsing step from the validation step to maintain low-latency ingestion while preserving strict audit trails. Reference official UN/EDIFACT syntax rules and Python’s logging documentation to align your implementation with international standards and enterprise observability practices.