sheepOp

CarGDev/sheepOp

Fork 0

Commit Graph

Author	SHA1	Message	Date
Carlos Gutierrez	9f17e1db24	Fix optimized attention mask handling for training - Fix mask format conversion (float to boolean) for scaled_dot_product_attention - Fix mask dimensions for proper broadcasting [batch, 1, seq_len, seq_len] - Resolve conflict between is_causal and custom mask parameters - Enable training with optimized attention and KV caching	2025-11-16 16:44:55 -05:00
Carlos Gutierrez	3fef3e2689	fixing memory	2025-11-16 16:39:11 -05:00
Carlos Gutierrez	3d2da94ce2	Initial commit: SheepOp LLM - Transformer-based language model implementation - Complete transformer implementation from scratch - Training pipeline with gradient accumulation and mixed precision - Optimized inference with KV caching - Multi-format data processing (PDFs, images, code, text) - Comprehensive documentation - Apache 2.0 license - Example training plots included in docs/images/	2025-11-06 22:07:41 -05:00

Author

SHA1

Message

Date

Carlos Gutierrez

9f17e1db24

Fix optimized attention mask handling for training

- Fix mask format conversion (float to boolean) for scaled_dot_product_attention
- Fix mask dimensions for proper broadcasting [batch, 1, seq_len, seq_len]
- Resolve conflict between is_causal and custom mask parameters
- Enable training with optimized attention and KV caching

2025-11-16 16:44:55 -05:00

Carlos Gutierrez

3fef3e2689

fixing memory

2025-11-16 16:39:11 -05:00

Carlos Gutierrez

3d2da94ce2

Initial commit: SheepOp LLM - Transformer-based language model implementation

- Complete transformer implementation from scratch
- Training pipeline with gradient accumulation and mixed precision
- Optimized inference with KV caching
- Multi-format data processing (PDFs, images, code, text)
- Comprehensive documentation
- Apache 2.0 license
- Example training plots included in docs/images/

2025-11-06 22:07:41 -05:00

3 Commits