Files
sheepOp/models/optimized_attention.py
Carlos Gutierrez 9f17e1db24 Fix optimized attention mask handling for training
- Fix mask format conversion (float to boolean) for scaled_dot_product_attention
- Fix mask dimensions for proper broadcasting [batch, 1, seq_len, seq_len]
- Resolve conflict between is_causal and custom mask parameters
- Enable training with optimized attention and KV caching
2025-11-16 16:44:55 -05:00

16 KiB