Sunil Venkataram
blog
/
garden
/
projects
/
about
Back to garden
Tag: architecture
3 notes
🌰
Edge vs Server Model Architecture - Why One DNA Cannot Serve Both
🌰
KV Cache Compression - The Primary Bottleneck in Long-Context Inference
🌰
Per-Layer Embeddings - Trading Flash for DRAM on Edge Models