sheepOp

Files

Carlos Gutierrez 9f17e1db24 Fix optimized attention mask handling for training

- Fix mask format conversion (float to boolean) for scaled_dot_product_attention
- Fix mask dimensions for proper broadcasting [batch, 1, seq_len, seq_len]
- Resolve conflict between is_causal and custom mask parameters
- Enable training with optimized attention and KV caching

2025-11-16 16:44:55 -05:00

__init__.py

Initial commit: SheepOp LLM - Transformer-based language model implementation

2025-11-06 22:07:41 -05:00

attention.py

Initial commit: SheepOp LLM - Transformer-based language model implementation

2025-11-06 22:07:41 -05:00

blocks.py

Initial commit: SheepOp LLM - Transformer-based language model implementation

2025-11-06 22:07:41 -05:00

optimized_attention.py

Fix optimized attention mask handling for training

2025-11-16 16:44:55 -05:00

prefetching.py

Initial commit: SheepOp LLM - Transformer-based language model implementation

2025-11-06 22:07:41 -05:00

transformer.py

fixing memory

2025-11-16 16:39:11 -05:00