4 Commits

Author SHA1 Message Date
Carlos Gutierrez
8b604a1925 Adding paper 2025-11-18 23:23:50 -05:00
Carlos Gutierrez
9f17e1db24 Fix optimized attention mask handling for training
- Fix mask format conversion (float to boolean) for scaled_dot_product_attention
- Fix mask dimensions for proper broadcasting [batch, 1, seq_len, seq_len]
- Resolve conflict between is_causal and custom mask parameters
- Enable training with optimized attention and KV caching
2025-11-16 16:44:55 -05:00
Carlos Gutierrez
3fef3e2689 fixing memory 2025-11-16 16:39:11 -05:00
Carlos Gutierrez
3d2da94ce2 Initial commit: SheepOp LLM - Transformer-based language model implementation
- Complete transformer implementation from scratch
- Training pipeline with gradient accumulation and mixed precision
- Optimized inference with KV caching
- Multi-format data processing (PDFs, images, code, text)
- Comprehensive documentation
- Apache 2.0 license
- Example training plots included in docs/images/
2025-11-06 22:07:41 -05:00