sheepOp

Author	SHA1	Message	Date
Carlos Gutierrez	8b604a1925	Adding paper	2025-11-18 23:23:50 -05:00
Carlos Gutierrez	9f17e1db24	Fix optimized attention mask handling for training - Fix mask format conversion (float to boolean) for scaled_dot_product_attention - Fix mask dimensions for proper broadcasting [batch, 1, seq_len, seq_len] - Resolve conflict between is_causal and custom mask parameters - Enable training with optimized attention and KV caching	2025-11-16 16:44:55 -05:00
Carlos Gutierrez	3fef3e2689	fixing memory	2025-11-16 16:39:11 -05:00
Carlos Gutierrez	3d2da94ce2	Initial commit: SheepOp LLM - Transformer-based language model implementation - Complete transformer implementation from scratch - Training pipeline with gradient accumulation and mixed precision - Optimized inference with KV caching - Multi-format data processing (PDFs, images, code, text) - Comprehensive documentation - Apache 2.0 license - Example training plots included in docs/images/	2025-11-06 22:07:41 -05:00