- Complete transformer implementation from scratch - Training pipeline with gradient accumulation and mixed precision - Optimized inference with KV caching - Multi-format data processing (PDFs, images, code, text) - Comprehensive documentation - Apache 2.0 license - Example training plots included in docs/images/
Symbolic link
1 line
25 B
Plaintext
Symbolic link
1 line
25 B
Plaintext
/mnt/storage/sheepOp/data |