Architecture
Spegel can run as a stateless application by exploiting the fact that an image pulled by a node is not immediately garbage collected. Spegel is deployed as a Daemonset on each node which acts as both the registry and mirror. Each instance is reachable both locally through a host port and a Service. This enables Containerd to be configured to use the localhost interface as a registry mirror and for Spegel instances to forward requests to each other.
Images are composed of multiple layers which are stored as individual files on the node disk. Each layer has a digest which is its identifier. Every node advertises the digests which are stored locally on disk. Kademlia is used to enable a distributed advertisement and lookup of digests. An image pull consists of multiple HTTP requests with one request per digest. The request is first sent to Spegel when an image is pulled if it is configured to act as the mirror for the registry. Spegel will lookup the digest within the cluster to see if any node has advertised that they have it. If a node is found the request will be forwarded to that Spegel instance which will serve the file with the specified digest. If a node is not found a 404 response will be returned and Containerd will fallback to using the actual remote registry.
In its core Spegel is a pull only OCI registry which runs locally on every Node in the Kubernetes cluster. Containerd is configured to use the local registry as a mirror, which would serve the image from within the cluster or from the source registry.
Flow Diagrams
The flow diagrams provide a comprehensive set of diagrams explaining Spegel’s architecture, flows, and operations.
High-Level Cluster Architecture
Shows how Spegel pods form a P2P network within the cluster. Containerd interacts with Spegel for image pulls and handles fallback to external registry when needed.
graph TB subgraph "External" ER["External Registry"] end subgraph "Kubernetes Cluster" subgraph "Node 1" SP1["Spegel Pod"] CD1["Containerd"] SP1 <-->|interacts| CD1 CD1 -->|fallback| ER end subgraph "Node 2" SP2["Spegel Pod"] CD2["Containerd"] SP2 <-->|interacts| CD2 CD2 -->|fallback| ER end subgraph "Node 3" SP3["Spegel Pod"] CD3["Containerd"] SP3 <-->|interacts| CD3 CD3 -->|fallback| ER end SP1 <-->|P2P Network| SP2 SP2 <-->|P2P Network| SP3 SP3 <-->|P2P Network| SP1 end
Pod Component Architecture
Details the internal components of a Spegel pod and their relationships, showing how the registry service, P2P components, and state management interact with each other and with containerd.
graph TB subgraph "Spegel Pod" subgraph "Registry Service" RS[HTTP Server /v2/] RH[Request Handler] RS --> RH end subgraph "P2P Components" P2P[P2P Router] DHT[DHT Provider] BS[Bootstrapper] P2P --> DHT BS --> P2P end subgraph "State Management" ST[State Tracker] MT[Metrics] ST --> MT end CD[Containerd Client] RH --> P2P ST --> P2P CD --> ST end subgraph "Node Components" CDD[Containerd Daemon] CS[Content Store] CDD --> CS end CD --> CDD
Image Pull Flow
Shows the sequence of operations during an image pull request, demonstrating both successful peer pulls and fallback to external registry.
sequenceDiagram participant CD as Containerd participant SR as Spegel Registry participant P2P as P2P Router participant PR as Peer Registry participant ER as External Registry Note over SR,P2P: 20ms default resolve timeout Note over SR,P2P: 3 default resolve retries CD->>SR: GET /v2/{name}/manifests/{ref} SR->>P2P: Resolve(key, allowSelf, retries) alt Peer Found P2P-->>SR: Return Peer Address SR->>PR: Request Content PR-->>SR: Stream Content SR-->>CD: Return Content CD->>CS: Store Content else No Peers Available (within 20ms) SR-->>CD: 404 Not Found CD->>ER: Request from External ER-->>CD: Return Content CD->>CS: Store Content end
P2P Network Formation
Shows how nodes discover each other and form the P2P network through leader election and peer sharing.
sequenceDiagram participant N1 as Node 1 participant N2 as Node 2 participant N3 as Node 3 participant LE as Leader Election Note over N1,LE: 10s lease duration Note over N1,LE: 5s renew deadline Note over N1,LE: 2s retry period N1->>LE: Participate in Election N2->>LE: Participate in Election N3->>LE: Participate in Election LE->>N1: Elected Leader N2->>N1: Discover Leader N3->>N1: Discover Leader N1->>N2: Share Peer List N1->>N3: Share Peer List N2->>N3: Establish P2P Connection Note over N1,N3: P2P Network Formed
State Management and Content Advertisement
Shows how content availability is maintained and advertised in the P2P network, including periodic refresh cycles and event-driven updates.
sequenceDiagram participant ST as State Tracker participant CD as Containerd participant P2P as P2P Router participant DHT as DHT Network participant MT as Metrics Note over ST,DHT: Content TTL: 10 minutes Note over ST,DHT: Refresh: Every 9 minutes loop Every 9 minutes ST->>CD: List Images CD-->>ST: Image List loop For each image ST->>P2P: Advertise(image_keys) P2P->>DHT: Provide(keys) end ST->>MT: Update Metrics end CD-->>ST: Image Event (Create/Update/Delete) ST->>P2P: Update Advertisement ST->>MT: Update Metrics
Content Resolution Process
Shows how content is located and retrieved from peers in the network, including peer selection and retry mechanisms.
sequenceDiagram participant SR as Spegel Registry participant P2P as P2P Router participant DHT as DHT Network participant PR1 as Peer 1 participant PR2 as Peer 2 SR->>P2P: Resolve(content_key) P2P->>DHT: FindProviders(key) par Parallel Resolution DHT-->>P2P: Found Peer 1 DHT-->>P2P: Found Peer 2 end P2P->>SR: Return First Available Peer Note over SR,PR2: Default 20ms timeout Note over SR,PR2: 3 retry attempts alt Try Peer 1 SR->>PR1: Request Content PR1-->>SR: Stream Content else Peer 1 Fails SR->>PR2: Request Content PR2-->>SR: Stream Content end
Data Flow Paths
Shows the content paths and system control flows, including peer transfers and fallback mechanisms.
graph LR subgraph "Content Paths" CD[Containerd] SP[Spegel] P[Peers] ER[External Registry] CS[Content Store] CD -->|Request| SP SP -->|Check| P P -->|Content| SP SP -->|Return| CD CD -->|Store| CS SP -->|404| CD CD -->|Fallback| ER end subgraph "P2P Operations" P2P[P2P Network] DHT[DHT] ST[State Tracker] P2P -->|Advertise| DHT DHT -->|Discover| P2P ST -->|Update| P2P end
Failure Handling
Shows how different types of failures are handled in the system.
sequenceDiagram participant CD as Containerd participant SR as Spegel Registry participant P2P as P2P Router participant PR as Peer participant ER as External Registry Note over SR,ER: Failure Scenarios alt Peer Not Found CD->>SR: Request Content SR->>P2P: Resolve(key) P2P--xSR: No Peers Available SR-->>CD: 404 Not Found CD->>ER: Fallback Request end alt Peer Connection Failed SR->>PR: Request Content PR--xSR: Connection Failed SR->>P2P: Resolve(key) Retry P2P-->>SR: Alternative Peer end alt Content Corrupted SR->>PR: Request Content PR-->>SR: Stream Content SR--xCD: Verification Failed CD->>ER: Fallback Request end
Metrics Collection
Shows how metrics are collected and organized across the system components.
graph TB subgraph "Metrics Sources" RQ[Registry Requests] P2P[P2P Operations] ST[State Changes] end subgraph "Metric Types" CT[Counters] HT[Histograms] GT[Gauges] end subgraph "Prometheus Metrics" MR[mirror_requests_total] RD[resolve_duration_seconds] AI[advertised_images] AK[advertised_keys] RL[request_latency] IF[requests_inflight] end RQ --> CT RQ --> HT P2P --> HT P2P --> GT ST --> GT CT --> MR HT --> RD HT --> RL GT --> AI GT --> AK GT --> IF