- Rust 95.7%
- Nix 4.3%
|
All checks were successful
CI / Build, test & lint (push) Successful in 1m35s
Build and Release / Build x86_64-unknown-linux-gnu (push) Successful in 1m25s
Build and Release / Build x86_64-pc-windows-gnu (push) Successful in 2m6s
Build and Release / Attach binaries to Release (push) Successful in 31s
|
||
|---|---|---|
| .forgejo/workflows | ||
| crates | ||
| docs | ||
| modules | ||
| src | ||
| tests | ||
| .gitignore | ||
| Cargo.lock | ||
| Cargo.toml | ||
| flake.lock | ||
| flake.nix | ||
| README.md | ||
sift — Scientific Information Flow Tool
Hybrid master-peer file synchronization designed for scientific data distribution. The master node holds the single authoritative version of the data; peer nodes exchange file blocks directly with each other to reduce master load. Only the blocks that actually changed are transferred on each update.
How it works
flowchart TD
subgraph M["Master Node (bind_addr)"]
direction TB
WD["watched_dir/\ndataset.h5 meta.json ..."]
MAN["Manifest version N\nfile -> ordered chunk list"]
STORE["Chunk store\nBLAKE3 content-addressed"]
WD -->|"FastCDC chunking + BLAKE3 hashing\n200 ms debounce on FS events"| MAN
MAN --> STORE
end
subgraph PA["Peer A"]
LA["local_dir/"]
end
subgraph PB["Peer B"]
LB["local_dir/"]
end
M -->|"ManifestPush (new version number)"| PA
M -->|"ManifestPush (new version number)"| PB
PA -->|"Hello ManifestRequest ChunkRequest"| M
PB -->|"Hello ManifestRequest ChunkRequest"| M
PA <-->|"ChunkRequest / ChunkResponse (direct P2P, bypasses master)"| PB
Key properties:
- Strong consistency — the master is the single source of truth; peers always converge to the master's current version.
- Reduced master load — peers exchange blocks directly; the master is only consulted for blocks that no peer holds yet.
- Block-level delta sync — content-defined chunking (FastCDC) ensures that inserting or modifying bytes in a file only invalidates the affected blocks. The rest are re-used without re-downloading.
- Integrity verification — every block is identified by its BLAKE3 hash. A peer rejects any block whose hash does not match the manifest.
- Push notifications — the master pushes a
ManifestPushnotification over the existing persistent TCP connection whenever a file changes. Peers begin re-syncing within milliseconds.
Installation and configuration via Nix/NixOS
{ config, inputs, ... }:
{
imports = [
inputs.sift.nixosModules.default
];
services.sift = {
masters.lab = {
enable = true;
watched_dir = "/data/science";
bind_addr = "0.0.0.0:7777";
pskFile = config.age.secrets.sift-psk.path; # Use agenix or sops-nix for securely storing secrets
openFirewall = true;
};
peers.edge1 = {
enable = true;
master_addr = "10.10.10.5:7777";
local_dir = "/data/local";
bind_addr = "0.0.0.0:7778";
pskFile = config.age.secrets.sift-psk.path;
openFirewall = true;
};
};
}
Building
Requires Rust 1.75 or later (stable).
git clone <repo-url> sift
cd sift
cargo build --release
The resulting binary is at target/release/sift.
Quick start
1. Set up the master node
Create master.toml:
watched_dir = "/data/science" # directory to watch and serve
bind_addr = "0.0.0.0:7777" # TCP address peers connect to
psk = "change-me" # pre-shared key (share with peers out-of-band)
Start the master:
sift master start --config master.toml
2. Set up a peer node
Create peer.toml:
master_addr = "master.example.com:7777" # where the master is running
psk = "change-me" # must match master's psk
local_dir = "/data/local" # where synced files are written
bind_addr = "0.0.0.0:7778" # this peer's listen address (for other peers)
Start the peer:
sift peer start --config peer.toml
The peer authenticates with the master, downloads the current file set, and then waits for change notifications. When the master detects a file change it notifies all peers; each peer downloads only the new or changed blocks.
3. Check system status
sift status --master master.example.com:7777 --psk change-me
# or: export SIFT_PSK=change-me && sift status --master master.example.com:7777
Example output:
Manifest version : 5
Files : 3 (42 chunks, 128.4 MiB)
Connected peers : 2
Files:
dataset.h5 38 chunks 120.1 MiB
meta.json 3 chunks 8.1 MiB
readme.txt 1 chunk 0.2 MiB
Peers:
10.0.0.2:7778 38 chunks held
10.0.0.3:7778 42 chunks held
Configuration reference
See docs/config.md for all configuration fields, types, defaults, and examples.
Protocol overview
sift uses a custom binary protocol over TCP. Each message is wrapped in a 4-byte length-delimited frame with a 5-byte header:
[4B magic "SIFT"][1B version][payload: postcard-serialized Message enum]
Messages: Hello, HelloAck, ManifestRequest, ManifestResponse,
ChunkRequest, ChunkResponse, PeerListRequest, PeerListResponse,
ManifestPush, Goodbye.
Authentication uses a pre-shared key included in the Hello message. The
connection is plaintext; TLS is documented as future work.
Workspace structure
sift/
Cargo.toml # workspace root + sift binary
crates/
sift-core/ # FastCDC chunking, BLAKE3 hashing, Manifest — no network deps
sift-net/ # protocol messages, TCP framing (Connection type)
sift-master/ # master daemon logic (library crate)
sift-peer/ # peer daemon logic (library crate)
src/
main.rs # CLI entry point (clap subcommands)
status.rs # sift status implementation
tests/
integration.rs # in-process integration tests
Running tests
# All unit + integration tests
cargo test --workspace
# Linting
cargo clippy --workspace -- -D warnings
# Format check
cargo fmt --all -- --check
Known limitations
- NAT traversal: peers must be reachable at a public address and port. Peers behind strict NAT fall back to master-only mode for block downloads. STUN/ICE-based hole punching is documented as future work.
- Plaintext transport: the current prototype does not encrypt the TCP
connection. TLS via
rustlsis the planned production path. - No rolling chunk eviction: the master's in-memory chunk store grows monotonically (old chunks are retained so in-flight requests from peers on the previous manifest version can still be served).
License
Course project — all rights reserved.