Files
sheepOp/models
Carlos Gutierrez 9f17e1db24 Fix optimized attention mask handling for training
- Fix mask format conversion (float to boolean) for scaled_dot_product_attention
- Fix mask dimensions for proper broadcasting [batch, 1, seq_len, seq_len]
- Resolve conflict between is_causal and custom mask parameters
- Enable training with optimized attention and KV caching
2025-11-16 16:44:55 -05:00
..
2025-11-16 16:39:11 -05:00