Scientific Information Flow Tool - ISU thesis project
  • Rust 95.7%
  • Nix 4.3%
Find a file
1ffy 674a2144ed
All checks were successful
CI / Build, test & lint (push) Successful in 1m35s
Build and Release / Build x86_64-unknown-linux-gnu (push) Successful in 1m25s
Build and Release / Build x86_64-pc-windows-gnu (push) Successful in 2m6s
Build and Release / Attach binaries to Release (push) Successful in 31s
Fix peer address advertising issue in master implementation
2026-05-06 06:19:25 +08:00
.forgejo/workflows Update Release workflow 2026-04-27 22:57:07 +08:00
crates Fix peer address advertising issue in master implementation 2026-05-06 06:19:25 +08:00
docs docs: implement Phase 6 — README, config reference, code polish 2026-04-18 11:13:08 +08:00
modules Add Nix flake 2026-05-06 00:39:45 +08:00
src Fix linting errors 2026-05-06 00:48:53 +08:00
tests Cargo format 2026-04-27 20:18:56 +08:00
.gitignore docs: implement Phase 6 — README, config reference, code polish 2026-04-18 11:13:08 +08:00
Cargo.lock Fix peer address advertising issue in master implementation 2026-05-06 06:19:25 +08:00
Cargo.toml Fix peer address advertising issue in master implementation 2026-05-06 06:19:25 +08:00
flake.lock Add Nix flake 2026-05-06 00:39:45 +08:00
flake.nix Fix peer address advertising issue in master implementation 2026-05-06 06:19:25 +08:00
README.md Add Nix flake 2026-05-06 00:39:45 +08:00

sift — Scientific Information Flow Tool

Hybrid master-peer file synchronization designed for scientific data distribution. The master node holds the single authoritative version of the data; peer nodes exchange file blocks directly with each other to reduce master load. Only the blocks that actually changed are transferred on each update.

How it works

flowchart TD
    subgraph M["Master Node (bind_addr)"]
        direction TB
        WD["watched_dir/\ndataset.h5  meta.json  ..."]
        MAN["Manifest version N\nfile -> ordered chunk list"]
        STORE["Chunk store\nBLAKE3 content-addressed"]
        WD -->|"FastCDC chunking + BLAKE3 hashing\n200 ms debounce on FS events"| MAN
        MAN --> STORE
    end

    subgraph PA["Peer A"]
        LA["local_dir/"]
    end

    subgraph PB["Peer B"]
        LB["local_dir/"]
    end

    M -->|"ManifestPush (new version number)"| PA
    M -->|"ManifestPush (new version number)"| PB
    PA -->|"Hello  ManifestRequest  ChunkRequest"| M
    PB -->|"Hello  ManifestRequest  ChunkRequest"| M
    PA <-->|"ChunkRequest / ChunkResponse  (direct P2P, bypasses master)"| PB

Key properties:

  • Strong consistency — the master is the single source of truth; peers always converge to the master's current version.
  • Reduced master load — peers exchange blocks directly; the master is only consulted for blocks that no peer holds yet.
  • Block-level delta sync — content-defined chunking (FastCDC) ensures that inserting or modifying bytes in a file only invalidates the affected blocks. The rest are re-used without re-downloading.
  • Integrity verification — every block is identified by its BLAKE3 hash. A peer rejects any block whose hash does not match the manifest.
  • Push notifications — the master pushes a ManifestPush notification over the existing persistent TCP connection whenever a file changes. Peers begin re-syncing within milliseconds.

Installation and configuration via Nix/NixOS

{ config, inputs, ... }:
{
  imports = [
    inputs.sift.nixosModules.default
  ];

  services.sift = {
    masters.lab = {
      enable = true;
      watched_dir = "/data/science";
      bind_addr = "0.0.0.0:7777";
      pskFile = config.age.secrets.sift-psk.path;  # Use agenix or sops-nix for securely storing secrets
      openFirewall = true;
    };

    peers.edge1 = {
      enable = true;
      master_addr = "10.10.10.5:7777";
      local_dir = "/data/local";
      bind_addr = "0.0.0.0:7778";
      pskFile = config.age.secrets.sift-psk.path;
      openFirewall = true;
    };
  };
}

Building

Requires Rust 1.75 or later (stable).

git clone <repo-url> sift
cd sift
cargo build --release

The resulting binary is at target/release/sift.

Quick start

1. Set up the master node

Create master.toml:

watched_dir = "/data/science"   # directory to watch and serve
bind_addr   = "0.0.0.0:7777"   # TCP address peers connect to
psk         = "change-me"       # pre-shared key (share with peers out-of-band)

Start the master:

sift master start --config master.toml

2. Set up a peer node

Create peer.toml:

master_addr = "master.example.com:7777"  # where the master is running
psk         = "change-me"                # must match master's psk
local_dir   = "/data/local"              # where synced files are written
bind_addr   = "0.0.0.0:7778"            # this peer's listen address (for other peers)

Start the peer:

sift peer start --config peer.toml

The peer authenticates with the master, downloads the current file set, and then waits for change notifications. When the master detects a file change it notifies all peers; each peer downloads only the new or changed blocks.

3. Check system status

sift status --master master.example.com:7777 --psk change-me
# or: export SIFT_PSK=change-me && sift status --master master.example.com:7777

Example output:

Manifest version : 5
Files            : 3  (42 chunks, 128.4 MiB)
Connected peers  : 2

Files:
  dataset.h5   38 chunks   120.1 MiB
  meta.json     3 chunks     8.1 MiB
  readme.txt    1 chunk      0.2 MiB

Peers:
  10.0.0.2:7778   38 chunks held
  10.0.0.3:7778   42 chunks held

Configuration reference

See docs/config.md for all configuration fields, types, defaults, and examples.

Protocol overview

sift uses a custom binary protocol over TCP. Each message is wrapped in a 4-byte length-delimited frame with a 5-byte header:

[4B magic "SIFT"][1B version][payload: postcard-serialized Message enum]

Messages: Hello, HelloAck, ManifestRequest, ManifestResponse, ChunkRequest, ChunkResponse, PeerListRequest, PeerListResponse, ManifestPush, Goodbye.

Authentication uses a pre-shared key included in the Hello message. The connection is plaintext; TLS is documented as future work.

Workspace structure

sift/
  Cargo.toml              # workspace root + sift binary
  crates/
    sift-core/            # FastCDC chunking, BLAKE3 hashing, Manifest — no network deps
    sift-net/             # protocol messages, TCP framing (Connection type)
    sift-master/          # master daemon logic (library crate)
    sift-peer/            # peer daemon logic (library crate)
  src/
    main.rs               # CLI entry point (clap subcommands)
    status.rs             # sift status implementation
  tests/
    integration.rs        # in-process integration tests

Running tests

# All unit + integration tests
cargo test --workspace

# Linting
cargo clippy --workspace -- -D warnings

# Format check
cargo fmt --all -- --check

Known limitations

  • NAT traversal: peers must be reachable at a public address and port. Peers behind strict NAT fall back to master-only mode for block downloads. STUN/ICE-based hole punching is documented as future work.
  • Plaintext transport: the current prototype does not encrypt the TCP connection. TLS via rustls is the planned production path.
  • No rolling chunk eviction: the master's in-memory chunk store grows monotonically (old chunks are retained so in-flight requests from peers on the previous manifest version can still be served).

License

Course project — all rights reserved.