Measured against unmodified upstream llama.cpp at the same Bonsai/Q2_0 commit, same M4 Max:
- tg128: 309.82 → 442.42 t/s (+42.0%)
- pp512: 4250.32 → 4622.63 t/s (+8.8%)
Measured against unmodified upstream llama.cpp at the same Bonsai/Q2_0 commit, same M4 Max:
- tg128: 309.82 → 442.42 t/s (+42.0%)
- pp512: 4250.32 → 4622.63 t/s (+8.8%)
3 comments