Storage Model
Lowkey uses content-addressed storage with BLAKE3 hashing for chunk identification.
On this page
Directory Structure
data_dir/
├── identity.key # Node keypair (Ed25519, protobuf encoded)
├── chunks/ # Content-addressed chunk storage
│ ├── a1b2c3d4...bin # Raw chunk data (256 KiB max)
│ └── ...
├── manifests/ # File metadata
│ ├── abc123...json # Manifest JSON
│ └── ...
├── index/ # Keyword search index
│ ├── kw_music.json # File IDs containing "music"
│ └── ...
└── downloads/ # Cached complete files
└── ...
Chunk Store
Content Addressing
Files are split into 256 KiB chunks, each identified by its BLAKE3 hash:
const CHUNK_SIZE: usize = 256 * 1024; // 256 KiB
fn chunk_file(path: &Path) -> Vec<ChunkInfo> {
let mut chunks = Vec::new();
let mut buf = vec![0u8; CHUNK_SIZE];
loop {
let n = file.read(&mut buf)?;
if n == 0 { break; }
buf.truncate(n);
let hash = blake3::hash(&buf);
let chunk_id = hash.to_hex().to_string();
chunks.push(ChunkInfo { id: chunk_id, data: buf.clone() });
}
chunks
}
Deduplication
Identical content produces identical chunk IDs:
File A: "Hello World" -> chunk: a1b2c3...
File B: "Hello World" -> chunk: a1b2c3... (same!)
Storage: Only one copy stored
Compression
Certain file types are compressed before storage using zstd level 3:
| Compressed | Not Compressed |
|---|---|
| text/plain, text/html, text/css | audio/mpeg, video/mp4 |
| application/json, application/xml | image/jpeg, image/png |
| audio/wav | application/pdf, application/zip |
Storage API
impl ChunkStore {
/// Store chunk, returns chunk ID
pub fn put(&self, data: &[u8]) -> Result<String> {
let hash = blake3::hash(data);
let chunk_id = hash.to_hex().to_string();
let path = self.chunks_dir.join(format!("{}.bin", chunk_id));
if !path.exists() {
fs::write(&path, data)?;
}
Ok(chunk_id)
}
/// Retrieve chunk by ID
pub fn get(&self, chunk_id: &str) -> Result<Option<Vec<u8>>> {
let path = self.chunks_dir.join(format!("{}.bin", chunk_id));
if path.exists() {
Ok(Some(fs::read(&path)?))
} else {
Ok(None)
}
}
}
Manifests
Each shared file has a manifest describing its contents:
{
"file_id": "abc123def456...",
"title": "My Song.mp3",
"mime_type": "audio/mpeg",
"size_bytes": 5242880,
"chunks": [
"chunk_id_1",
"chunk_id_2",
"chunk_id_3"
],
"created_at": 1704672000
}
File ID Generation
File ID is derived from content hash:
fn generate_file_id(chunks: &[String]) -> String {
let mut hasher = blake3::Hasher::new();
for chunk_id in chunks {
hasher.update(chunk_id.as_bytes());
}
hasher.finalize().to_hex().to_string()
}
// Identical files have identical IDs (deduplication)
Keyword Index
File titles are tokenized for search:
fn tokenize(title: &str) -> Vec<String> {
title
.to_lowercase()
.split(|c: char| !c.is_alphanumeric())
.filter(|s| s.len() >= 2)
.map(|s| s.to_string())
.collect()
}
// "My Awesome Song.mp3" -> ["my", "awesome", "song", "mp3"]
Quota Management
Configuration
| Setting | Default | Description |
|---|---|---|
data_dir |
Platform-specific | ~/.lowkey or app data dir |
quota_bytes |
10 GB | Maximum storage usage |
chunk_size |
256 KiB | Chunk split size |
max_chunks |
50,000 | Maximum chunks tracked |
LRU Eviction
When quota is exceeded, least-recently-used chunks are evicted:
fn enforce_quota(&self) {
let current = self.current_bytes();
if current <= self.quota_bytes {
return;
}
let excess = current - self.quota_bytes;
let mut freed = 0u64;
// Sort chunks by access time (oldest first)
let mut chunks = self.list_chunks();
chunks.sort_by_key(|c| c.accessed_at);
// Evict until under quota
for chunk in chunks {
if freed >= excess { break; }
fs::remove_file(&chunk.path)?;
freed += chunk.size;
}
}
Integrity Verification
All chunks are verified on storage and retrieval:
// On storage
fn store_chunk(chunk_id: &str, data: &[u8]) -> Result<()> {
let computed = blake3::hash(data).to_hex().to_string();
if computed != chunk_id {
return Err("Hash mismatch");
}
// Store only after verification
fs::write(path, data)
}
// On remote retrieval
let data = request_from_peer(chunk_id).await?;
let computed = blake3::hash(&data).to_hex().to_string();
if computed != chunk_id {
return Err("Corrupted chunk from peer");
}