Connecting to public AIS feeds with Python asyncio

For port authorities, shipping operations centers, and Python automation engineers, the move from batch-processed vessel position logs to real-time telemetry demands a resilient asynchronous consumer. Connecting to public AIS feeds with Python asyncio is the operational baseline for maintaining sub-second latency against a live !AIVDM stream: non-blocking socket I/O, deterministic NMEA parsing, and explicit backpressure that survives the packet loss, multipart fragmentation, and format drift inherent to coastal and satellite AIS aggregators. This task page specifies the exact connector — the reconnection contract, the reassembly buffer, and the bounded queue — that everything downstream in the tracking pipeline depends on.

Architecture Alignment

This connector is the transport tier of AIS Data Stream Integration, the ingestion boundary that turns raw AIS broadcasts into a validated vessel-telemetry schema for the whole Container Tracking & AIS Event Synchronization domain. Its single responsibility is transport hygiene: hold a live TCP or UDP feed open, reassemble multipart NMEA 0183 sentences, verify checksums, and hand clean payloads to a bounded queue — nothing more. Six-bit de-armouring, ITU-R M.1371 field extraction, and Pydantic modelling happen in the parent’s decode stage; correlation with terminal gate events happens later in Container Status Mapping Rules. Keeping this boundary thin is what lets the async event loop absorb burst traffic without blocking, and it mirrors the same non-blocking discipline the landside Terminal API Polling Strategies layer applies to REST endpoints.

Prerequisites & Environment Setup

Python 3.11+ — for asyncio.TaskGroup, asyncio.timeout(), and precise StreamReader semantics.
structlog — mandatory for JSON audit logging; bare print() and stdlib logging string formatting are not acceptable for maritime audit trails.
pyais (optional, downstream) — six-bit payload decoding to ITU-R M.1371 fields; this connector deliberately stops short of decoding.
A reachable public AIS endpoint. Many coastal administrations expose an unauthenticated TCP feed of !AIVDM sentences; configure it out of band rather than hard-coding it.

python -m venv .venv && . .venv/bin/activate
pip install "structlog>=24.1" "pyais>=2.6"
export AIS_FEED_HOST="153.44.253.27"   # your coastal/aggregator TCP feed
export AIS_FEED_PORT="5631"
export AIS_QUEUE_MAXSIZE="10000"        # bounded backpressure budget

from __future__ import annotations

import os

import structlog

structlog.configure(
    processors=[
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso", utc=True),
        structlog.processors.JSONRenderer(),
    ]
)
log = structlog.get_logger("ais.ingest")

FEED_HOST: str = os.environ["AIS_FEED_HOST"]
FEED_PORT: int = int(os.environ["AIS_FEED_PORT"])
QUEUE_MAXSIZE: int = int(os.environ.get("AIS_QUEUE_MAXSIZE", "10000"))

Step-by-step Implementation

Each step is runnable in isolation and composes into a single long-lived AISFeedConsumer. The wire format is NMEA 0183 framing (per IEC 62320-1) carrying ITU-R M.1371 message content; every field decision below is anchored to that contract.

Reconnect with capped exponential backoff and jitter. A public feed will drop the socket during RF fade, aggregator restarts, or network partitions. Never reconnect in a tight loop — cap the exponential term and add real jitter so a fleet of consumers does not synchronise a thundering-herd reconnect.

import asyncio
import random


async def open_feed(
    host: str, port: int, *, max_retries: int = 10, base_delay: float = 2.0
) -> tuple[asyncio.StreamReader, asyncio.StreamWriter]:
    for attempt in range(max_retries):
        try:
            reader, writer = await asyncio.open_connection(host, port)
            log.info("tcp_connection_established", host=host, port=port)
            return reader, writer
        except OSError as exc:
            delay = min(base_delay * (2 ** attempt), 60.0) + random.uniform(0, 1.0)
            log.warning("tcp_connect_failed", attempt=attempt, delay=round(delay, 2), error=str(exc))
            await asyncio.sleep(delay)
    raise ConnectionError("max reconnection attempts exhausted")

Read bounded chunks and split whole NMEA lines. Sockets deliver arbitrary byte boundaries; a single read() can split a sentence mid-line. Buffer partial input and only dispatch complete \n-terminated lines. Wrap the read in asyncio.timeout() so a silent feed is treated as a heartbeat gap, not a permanent hang.

async def read_lines(reader: asyncio.StreamReader):
    buffer = bytearray()
    while True:
        try:
            async with asyncio.timeout(30.0):
                chunk = await reader.read(4096)
        except TimeoutError:
            log.debug("read_timeout_heartbeat")
            continue
        if not chunk:
            log.warning("stream_eof")
            return
        buffer.extend(chunk)
        while b"\n" in buffer:
            raw, _, rest = buffer.partition(b"\n")
            buffer = bytearray(rest)
            line = raw.decode("ascii", errors="replace").strip()
            if line:
                yield line

Verify the NMEA checksum. XOR every character between the leading !/$ delimiter and the *, then compare against the trailing two-digit hex value (NMEA 0183 §checksum). A mismatch means corrupt transport — drop before any further parsing.

def checksum_ok(sentence: str) -> bool:
    if "*" not in sentence:
        return False
    body, _, chk = sentence.partition("*")
    payload = body[1:]  # strip leading '!' or '$'
    computed = 0
    for ch in payload:
        computed ^= ord(ch)
    try:
        return computed == int(chk[:2], 16)
    except (ValueError, IndexError):
        return False

Reassemble multipart AIVDM fragments. A Type 5 static report or a long addressed message spans two !AIVDM sentences. The fragment count and number are fields 1 and 2; the sequential message ID is field 3. That sequence id is a single 0–9 digit shared across vessels and reused on both the A and B radio channels, so the reassembly key must combine the sequence id with the channel (field 4) or fragments from different vessels collide.

from collections import defaultdict


class Reassembler:
    def __init__(self) -> None:
        self._buffer: dict[str, list[tuple[int, str]]] = defaultdict(list)

    def feed(self, sentence: str) -> str | None:
        parts = sentence.split(",")
        if len(parts) < 7 or not parts[0].endswith(("AIVDM", "AIVDO")):
            return None
        total, number, seq_id, channel, payload = (
            int(parts[1]), int(parts[2]), parts[3], parts[4], parts[5],
        )
        if total == 1:
            return payload
        key = f"{seq_id}:{channel}"
        self._buffer[key].append((number, payload))
        if len(self._buffer[key]) < total:
            return None  # await remaining fragments
        ordered = sorted(self._buffer.pop(key), key=lambda x: x[0])
        return "".join(p for _, p in ordered)

Enforce bounded-queue backpressure and emit an audited event. An unbounded queue is the primary cause of OOM kills — one busy terminal can push 50,000+ sentences per minute. Cap the queue and, on overflow, shed the lowest-value traffic (repeated Type 5 voyage data) rather than blocking the reader. Every enqueue and every drop is a structured JSON audit line carrying a correlation id.

import itertools
import time


class Enqueuer:
    def __init__(self, queue: asyncio.Queue[dict], source: str) -> None:
        self._queue = queue
        self._source = source
        self._ids = itertools.count(1)

    def submit(self, payload: str) -> None:
        corr_id = f"ais-{next(self._ids):08d}"
        event = {
            "corr_id": corr_id,
            "payload": payload,               # six-bit armoured, decoded downstream
            "ts_utc": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
            "source": self._source,
        }
        try:
            self._queue.put_nowait(event)
            log.debug("payload_enqueued", corr_id=corr_id, qsize=self._queue.qsize())
        except asyncio.QueueFull:
            log.critical("backpressure_overflow", corr_id=corr_id,
                         action="dropped_low_priority", qsize=self._queue.qsize())

Compose the lifecycle loop. Reconnect, drain lines through checksum → reassembly → enqueue, and always close the writer on teardown so a failover does not leak sockets. A separate consumer task drains the queue into the decode stage, keeping database and API calls out of the ingestion hot path.

class AISFeedConsumer:
    def __init__(self, host: str, port: int, maxsize: int) -> None:
        self._host, self._port = host, port
        self.queue: asyncio.Queue[dict] = asyncio.Queue(maxsize=maxsize)
        self._reassembler = Reassembler()
        self._enqueuer = Enqueuer(self.queue, f"{host}:{port}")
        self._running = False

    async def run(self) -> None:
        self._running = True
        while self._running:
            writer = None
            try:
                reader, writer = await open_feed(self._host, self._port)
                async for line in read_lines(reader):
                    if not checksum_ok(line):
                        log.warning("checksum_failed", raw=line[:48])
                        continue
                    payload = self._reassembler.feed(line)
                    if payload is not None:
                        self._enqueuer.submit(payload)
            except ConnectionError:
                log.critical("ingestion_halted")
                return
            finally:
                if writer is not None:
                    writer.close()
                    await writer.wait_closed()
                await asyncio.sleep(5)

    def stop(self) -> None:
        self._running = False

Edge Cases & Carrier Deviations

Sequence-id wrap and channel collision. The 0–9 sequence id wraps quickly on a busy receiver, so two interleaved multipart messages can share an id. Keying reassembly on seq_id alone silently splices fragments from different vessels; always combine it with the A/B channel, and evict stale partial groups on a bounded timeout so a lost second fragment cannot leak memory forever.
MMSI collisions and non-vessel identifiers. Base stations (MMSI prefix 00), aids-to-navigation (99), and SART/MOB test beacons broadcast on the same feed. They are structurally valid AIS but are not vessels; tag or filter them here so they never reach vessel-track correlation, and treat a duplicate MMSI on two distant positions as a spoof signal for the downstream guard, not a teleport.
Format drift from legacy Class B transponders. Older units emit bare \n terminators, omit the trailing *XX, or pad fields inconsistently. The checksum_ok guard drops the un-checksummed frames; do not “repair” them into the stream.
UDP multicast packet loss. On a UDP feed there is no reconnection to lean on — a dropped fragment simply never arrives. Bound the reassembly buffer with a TTL so the orphaned first half of a multipart message is evicted rather than accumulated.
Feed authorisation scope. A public position feed must never be trusted to write an operational decision on its own; that boundary is enforced by Maritime Security Boundary Setup, and a berth-approach transition only advances state when it agrees with a terminal event.

Verification & Testing

Assert the connector’s two hardest behaviours — checksum rejection and correct multipart reassembly — against fixed fixtures, and confirm the audit log shape. The fixtures below are canonical ITU-R M.1371 Type 1 and a two-part Type 5 sentence.

import pytest


# A valid single-sentence Type 1 position report and a corrupted copy.
GOOD = "!AIVDM,1,1,,A,15M67FC000G?ufbE`FepT@3n00Sa,0*5C"
BAD = "!AIVDM,1,1,,A,15M67FC000G?ufbE`FepT@3n00Sa,0*00"

# A two-fragment Type 5 static/voyage message (same channel B, seq id 1).
FRAG_1 = "!AIVDM,2,1,1,B,55?MbV02;H;s<HtKR20EHE:0@T4@Dn2222222216L961O5Gf0NSQEp6ClRp8,0*1C"
FRAG_2 = "!AIVDM,2,2,1,B,88888888880,2*25"


def test_checksum_gate_rejects_corruption() -> None:
    assert checksum_ok(GOOD) is True
    assert checksum_ok(BAD) is False


def test_multipart_reassembly_joins_in_order() -> None:
    r = Reassembler()
    assert r.feed(FRAG_1) is None          # buffered, awaiting fragment 2
    joined = r.feed(FRAG_2)
    assert joined is not None
    assert joined.startswith("55?MbV02")    # fragment 1 payload leads
    assert joined.endswith("88888888880")   # fragment 2 payload trails


def test_backpressure_drops_when_full() -> None:
    q: asyncio.Queue[dict] = asyncio.Queue(maxsize=1)
    enq = Enqueuer(q, "test:0")
    enq.submit("first")
    enq.submit("second")   # overflow -> dropped, logged critical
    assert q.qsize() == 1

A healthy run emits one JSON line per accepted payload and a critical line on every shed frame — the exact shape auditors and the structlog pipeline consume:

{"event": "payload_enqueued", "corr_id": "ais-00000042", "qsize": 118, "level": "debug", "timestamp": "2026-07-03T09:14:02Z"}
{"event": "backpressure_overflow", "corr_id": "ais-00000043", "action": "dropped_low_priority", "qsize": 10000, "level": "critical", "timestamp": "2026-07-03T09:14:02Z"}

Frequently Asked Questions

Should I reassemble multipart AIVDM sentences before or after the checksum check?

Checksum first, per fragment. Each !AIVDM sentence carries its own NMEA checksum over its own transport, so a corrupt fragment must be dropped before it can poison a reassembly group. If you defer the check until after joining, one bad fragment discards a message that a resend of just that fragment could have completed, and — worse — a fragment whose sequence id was corrupted can be filed under the wrong key. Validate every line as it arrives, then feed only checksum-clean fragments to the reassembler.

Why a bounded queue with drops instead of an unbounded queue that never loses data?

Because an unbounded queue converts a transient burst into a permanent outage. A busy terminal can emit 50,000+ sentences a minute; if the decode stage stalls for even a few seconds, an unbounded queue grows until the OS OOM-killer terminates the whole process and you lose everything, not just the overflow. A bounded queue makes the failure mode explicit and survivable: you shed the lowest-value traffic (repeated Type 5 voyage data), log every drop with a correlation id, and keep live position reports flowing. Losing stale static data is always cheaper than losing the process.

Do I need pyais inside this connector?

No — and keeping it out is deliberate. This tier’s job is transport hygiene: hold the socket, reassemble, checksum, and enqueue the raw six-bit payload. Six-bit de-armouring and ITU-R M.1371 field extraction belong in the decode stage documented under AIS Data Stream Integration, where pyais or a hand-rolled bit reader projects the payload into typed Pydantic models. A thin connector stays fast, is trivial to test against raw fixtures, and lets you swap the decoder without touching the network loop.

AIS Data Stream Integration — the ingestion boundary and decode stage this connector feeds.
Container Status Mapping Rules — how validated positions are fused with terminal gate events into container states.
Terminal API Polling Strategies — the landside REST counterpart to this real-time feed.
Threshold Tuning for Alerts — adaptive alerting on the gap between AIS-derived and terminal-confirmed state.
Maritime Security Boundary Setup — the ISPS-aligned trust boundary that governs what a public feed is allowed to trigger.

Up: AIS Data Stream Integration — the parent discipline governing AIS ingestion, decoding, validation, and downstream handoff.

Connecting to public AIS feeds with Python asyncio #

Architecture Alignment #

Prerequisites & Environment Setup #

Step-by-step Implementation #

Edge Cases & Carrier Deviations #

Verification & Testing #

Frequently Asked Questions #

Related #

Related in AIS Data Stream Integration