Storage Model

Lowkey uses content-addressed storage with BLAKE3 hashing for chunk identification.

Directory Structure

data_dir/
├── identity.key       # Node keypair (Ed25519, protobuf encoded)
├── chunks/            # Content-addressed chunk storage
│   ├── a1b2c3d4...bin # Raw chunk data (256 KiB max)
│   └── ...
├── manifests/         # File metadata
│   ├── abc123...json  # Manifest JSON
│   └── ...
├── index/             # Keyword search index
│   ├── kw_music.json  # File IDs containing "music"
│   └── ...
└── downloads/         # Cached complete files
    └── ...

Chunk Store

Content Addressing

Files are split into 256 KiB chunks, each identified by its BLAKE3 hash:

const CHUNK_SIZE: usize = 256 * 1024;  // 256 KiB

fn chunk_file(path: &Path) -> Vec<ChunkInfo> {
    let mut chunks = Vec::new();
    let mut buf = vec![0u8; CHUNK_SIZE];

    loop {
        let n = file.read(&mut buf)?;
        if n == 0 { break; }
        buf.truncate(n);

        let hash = blake3::hash(&buf);
        let chunk_id = hash.to_hex().to_string();

        chunks.push(ChunkInfo { id: chunk_id, data: buf.clone() });
    }

    chunks
}

Deduplication

Identical content produces identical chunk IDs:

File A: "Hello World" -> chunk: a1b2c3... File B: "Hello World" -> chunk: a1b2c3... (same!) Storage: Only one copy stored

Compression

Certain file types are compressed before storage using zstd level 3:

Compressed	Not Compressed
text/plain, text/html, text/css	audio/mpeg, video/mp4
application/json, application/xml	image/jpeg, image/png
audio/wav	application/pdf, application/zip

Storage API

impl ChunkStore {
    /// Store chunk, returns chunk ID
    pub fn put(&self, data: &[u8]) -> Result<String> {
        let hash = blake3::hash(data);
        let chunk_id = hash.to_hex().to_string();

        let path = self.chunks_dir.join(format!("{}.bin", chunk_id));
        if !path.exists() {
            fs::write(&path, data)?;
        }

        Ok(chunk_id)
    }

    /// Retrieve chunk by ID
    pub fn get(&self, chunk_id: &str) -> Result<Option<Vec<u8>>> {
        let path = self.chunks_dir.join(format!("{}.bin", chunk_id));
        if path.exists() {
            Ok(Some(fs::read(&path)?))
        } else {
            Ok(None)
        }
    }
}

Manifests

Each shared file has a manifest describing its contents:

{
    "file_id": "abc123def456...",
    "title": "My Song.mp3",
    "mime_type": "audio/mpeg",
    "size_bytes": 5242880,
    "chunks": [
        "chunk_id_1",
        "chunk_id_2",
        "chunk_id_3"
    ],
    "created_at": 1704672000
}

File ID Generation

File ID is derived from content hash:

fn generate_file_id(chunks: &[String]) -> String {
    let mut hasher = blake3::Hasher::new();
    for chunk_id in chunks {
        hasher.update(chunk_id.as_bytes());
    }
    hasher.finalize().to_hex().to_string()
}

// Identical files have identical IDs (deduplication)

Keyword Index

File titles are tokenized for search:

fn tokenize(title: &str) -> Vec<String> {
    title
        .to_lowercase()
        .split(|c: char| !c.is_alphanumeric())
        .filter(|s| s.len() >= 2)
        .map(|s| s.to_string())
        .collect()
}

// "My Awesome Song.mp3" -> ["my", "awesome", "song", "mp3"]

Quota Management

Configuration

Setting	Default	Description
`data_dir`	Platform-specific	`~/.lowkey` or app data dir
`quota_bytes`	10 GB	Maximum storage usage
`chunk_size`	256 KiB	Chunk split size
`max_chunks`	50,000	Maximum chunks tracked

LRU Eviction

When quota is exceeded, least-recently-used chunks are evicted:

fn enforce_quota(&self) {
    let current = self.current_bytes();
    if current <= self.quota_bytes {
        return;
    }

    let excess = current - self.quota_bytes;
    let mut freed = 0u64;

    // Sort chunks by access time (oldest first)
    let mut chunks = self.list_chunks();
    chunks.sort_by_key(|c| c.accessed_at);

    // Evict until under quota
    for chunk in chunks {
        if freed >= excess { break; }
        fs::remove_file(&chunk.path)?;
        freed += chunk.size;
    }
}

Integrity Verification

All chunks are verified on storage and retrieval:

// On storage
fn store_chunk(chunk_id: &str, data: &[u8]) -> Result<()> {
    let computed = blake3::hash(data).to_hex().to_string();
    if computed != chunk_id {
        return Err("Hash mismatch");
    }
    // Store only after verification
    fs::write(path, data)
}

// On remote retrieval
let data = request_from_peer(chunk_id).await?;
let computed = blake3::hash(&data).to_hex().to_string();
if computed != chunk_id {
    return Err("Corrupted chunk from peer");
}

Lowkey Documentation