Comparison of all metrics across engine/condition combinations. Green = better, Red = worse. WIP is inverted (higher is better).
Analyze performance distribution and resilience across different groupings.
| Engine | Degradation | Enhancement | Audio Norm | Text Norm | WER | CER | MER | WIL | WIP | Time (s) |
|---|---|---|---|---|---|---|---|---|---|---|
| whisper-base | None | - | Broadcast_standard | Brut | 0.0606 | 0.0235 | 0.0603 | 0.1132 | 0.8868 | 13.93 |
| whisper-base | None | - | Broadcast_standard | Normalisé | 0.0087 | 0.0008 | 0.0086 | 0.0129 | 0.9871 | 13.93 |
| whisper-base | None | - | No_normalization | Brut | 0.0736 | 0.0463 | 0.0733 | 0.1375 | 0.8625 | 7.14 |
| whisper-base | None | - | No_normalization | Normalisé | 0.0216 | 0.0226 | 0.0216 | 0.0385 | 0.9615 | 7.14 |
| whisper-base | Cathedral | - | Broadcast_standard | Brut | 0.2814 | 0.1436 | 0.2720 | 0.4446 | 0.5554 | 7.02 |
| whisper-base | Cathedral | - | Broadcast_standard | Normalisé | 0.2165 | 0.1168 | 0.2092 | 0.3448 | 0.6552 | 7.02 |
| whisper-base | Cathedral | - | No_normalization | Brut | 0.2857 | 0.1460 | 0.2762 | 0.4487 | 0.5513 | 7.99 |
| whisper-base | Cathedral | - | No_normalization | Normalisé | 0.2208 | 0.1201 | 0.2134 | 0.3489 | 0.6511 | 7.99 |
| whisper-base | Megaphone, outdoor | - | Broadcast_standard | Brut | 0.1472 | 0.0856 | 0.1466 | 0.2621 | 0.7379 | 8.26 |
| whisper-base | Megaphone, outdoor | - | Broadcast_standard | Normalisé | 0.0952 | 0.0620 | 0.0948 | 0.1700 | 0.8300 | 8.26 |
| whisper-base | Megaphone, outdoor | - | No_normalization | Brut | 0.1515 | 0.0848 | 0.1509 | 0.2695 | 0.7305 | 8.41 |
| whisper-base | Megaphone, outdoor | - | No_normalization | Normalisé | 0.0952 | 0.0604 | 0.0948 | 0.1700 | 0.8300 | 8.41 |
| whisper-base | None | Demucs_hybrid | Broadcast_standard | Brut | 0.0563 | 0.0228 | 0.0560 | 0.1051 | 0.8949 | 8.40 |
| whisper-base | None | Demucs_hybrid | Broadcast_standard | Normalisé | 0.0087 | 0.0008 | 0.0086 | 0.0129 | 0.9871 | 8.40 |
| whisper-base | None | Demucs_hybrid | No_normalization | Brut | 0.0606 | 0.0243 | 0.0603 | 0.1132 | 0.8868 | 8.56 |
| whisper-base | None | Demucs_hybrid | No_normalization | Normalisé | 0.0130 | 0.0024 | 0.0129 | 0.0215 | 0.9785 | 8.56 |
| whisper-base | Cathedral | Demucs_hybrid | Broadcast_standard | Brut | 0.7532 | 0.4584 | 0.6641 | 0.8685 | 0.1315 | 17.72 |
| whisper-base | Cathedral | Demucs_hybrid | Broadcast_standard | Normalisé | 0.7143 | 0.4319 | 0.6298 | 0.8403 | 0.1597 | 17.72 |
| whisper-base | Cathedral | Demucs_hybrid | No_normalization | Brut | 0.6926 | 0.4396 | 0.6400 | 0.8557 | 0.1443 | 8.25 |
| whisper-base | Cathedral | Demucs_hybrid | No_normalization | Normalisé | 0.6494 | 0.4142 | 0.6000 | 0.8219 | 0.1781 | 8.25 |
| whisper-base | Megaphone, outdoor | Demucs_hybrid | Broadcast_standard | Brut | 0.1688 | 0.0934 | 0.1674 | 0.2916 | 0.7084 | 7.32 |
| whisper-base | Megaphone, outdoor | Demucs_hybrid | Broadcast_standard | Normalisé | 0.1082 | 0.0677 | 0.1073 | 0.1857 | 0.8143 | 7.32 |
| whisper-base | Megaphone, outdoor | Demucs_hybrid | No_normalization | Brut | 0.1645 | 0.0926 | 0.1638 | 0.2885 | 0.7115 | 8.06 |
| whisper-base | Megaphone, outdoor | Demucs_hybrid | No_normalization | Normalisé | 0.1039 | 0.0677 | 0.1034 | 0.1821 | 0.8179 | 8.06 |
| large-v3-turbo | None | - | Broadcast_standard | Brut | 0.0433 | 0.0196 | 0.0431 | 0.0804 | 0.9196 | 39.57 |
| large-v3-turbo | None | - | Broadcast_standard | Normalisé | 0.0087 | 0.0008 | 0.0086 | 0.0129 | 0.9871 | 39.57 |
| large-v3-turbo | None | - | No_normalization | Brut | 0.0433 | 0.0196 | 0.0431 | 0.0804 | 0.9196 | 39.83 |
| large-v3-turbo | None | - | No_normalization | Normalisé | 0.0087 | 0.0008 | 0.0086 | 0.0129 | 0.9871 | 39.83 |
| large-v3-turbo | Cathedral | - | Broadcast_standard | Brut | 0.7186 | 0.7057 | 0.4346 | 0.4713 | 0.5287 | 52.67 |
| large-v3-turbo | Cathedral | - | Broadcast_standard | Normalisé | 0.6753 | 0.6785 | 0.4084 | 0.4212 | 0.5788 | 52.67 |
| large-v3-turbo | Cathedral | - | No_normalization | Brut | 0.0693 | 0.0345 | 0.0690 | 0.1294 | 0.8706 | 38.92 |
| large-v3-turbo | Cathedral | - | No_normalization | Normalisé | 0.0260 | 0.0137 | 0.0259 | 0.0469 | 0.9531 | 38.92 |
| large-v3-turbo | Megaphone, outdoor | - | Broadcast_standard | Brut | 0.7186 | 0.6962 | 0.4346 | 0.4713 | 0.5287 | 81.81 |
| large-v3-turbo | Megaphone, outdoor | - | Broadcast_standard | Normalisé | 0.6753 | 0.6696 | 0.4084 | 0.4212 | 0.5788 | 81.81 |
| large-v3-turbo | Megaphone, outdoor | - | No_normalization | Brut | 0.6364 | 0.6185 | 0.4061 | 0.4472 | 0.5528 | 91.52 |
| large-v3-turbo | Megaphone, outdoor | - | No_normalization | Normalisé | 0.5887 | 0.5915 | 0.3757 | 0.3892 | 0.6108 | 91.52 |
| large-v3-turbo | None | Demucs_hybrid | Broadcast_standard | Brut | 0.6926 | 0.6939 | 0.4188 | 0.4415 | 0.5585 | 54.94 |
| large-v3-turbo | None | Demucs_hybrid | Broadcast_standard | Normalisé | 0.6580 | 0.6688 | 0.3979 | 0.4005 | 0.5995 | 54.94 |
| large-v3-turbo | None | Demucs_hybrid | No_normalization | Brut | 0.0866 | 0.0651 | 0.0826 | 0.1184 | 0.8816 | 41.16 |
| large-v3-turbo | None | Demucs_hybrid | No_normalization | Normalisé | 0.0519 | 0.0459 | 0.0496 | 0.0537 | 0.9463 | 41.16 |
| large-v3-turbo | Cathedral | Demucs_hybrid | Broadcast_standard | Brut | 0.6970 | 0.5557 | 0.4735 | 0.5896 | 0.4104 | 60.93 |
| large-v3-turbo | Cathedral | Demucs_hybrid | Broadcast_standard | Normalisé | 0.6450 | 0.5415 | 0.4382 | 0.5328 | 0.4672 | 60.93 |
| large-v3-turbo | Cathedral | Demucs_hybrid | No_normalization | Brut | 0.6883 | 0.5549 | 0.4676 | 0.5804 | 0.4196 | 60.61 |
| large-v3-turbo | Cathedral | Demucs_hybrid | No_normalization | Normalisé | 0.6494 | 0.5431 | 0.4412 | 0.5376 | 0.4624 | 60.61 |
| large-v3-turbo | Megaphone, outdoor | Demucs_hybrid | Broadcast_standard | Brut | 0.7143 | 0.6954 | 0.4319 | 0.4664 | 0.5336 | 54.51 |
| large-v3-turbo | Megaphone, outdoor | Demucs_hybrid | Broadcast_standard | Normalisé | 0.6753 | 0.6696 | 0.4084 | 0.4212 | 0.5788 | 54.51 |
| large-v3-turbo | Megaphone, outdoor | Demucs_hybrid | No_normalization | Brut | 0.7229 | 0.6970 | 0.4372 | 0.4762 | 0.5238 | 54.59 |
| large-v3-turbo | Megaphone, outdoor | Demucs_hybrid | No_normalization | Normalisé | 0.6753 | 0.6696 | 0.4084 | 0.4212 | 0.5788 | 54.59 |
| wav2vec2-english | None | - | Broadcast_standard | Brut | 0.3117 | 0.0808 | 0.3077 | 0.5124 | 0.4876 | 8.12 |
| wav2vec2-english | None | - | Broadcast_standard | Normalisé | 0.1385 | 0.0242 | 0.1368 | 0.2419 | 0.7581 | 8.12 |
| wav2vec2-english | None | - | No_normalization | Brut | 0.3117 | 0.0801 | 0.3077 | 0.5145 | 0.4855 | 7.10 |
| wav2vec2-english | None | - | No_normalization | Normalisé | 0.1299 | 0.0234 | 0.1282 | 0.2301 | 0.7699 | 7.10 |
| wav2vec2-english | Cathedral | - | Broadcast_standard | Brut | 0.9307 | 0.7198 | 0.9267 | 0.9895 | 0.0105 | 7.02 |
| wav2vec2-english | Cathedral | - | Broadcast_standard | Normalisé | 0.9177 | 0.7067 | 0.9138 | 0.9854 | 0.0146 | 7.02 |
| wav2vec2-english | Cathedral | - | No_normalization | Brut | 0.9307 | 0.7182 | 0.9267 | 0.9895 | 0.0105 | 6.96 |
| wav2vec2-english | Cathedral | - | No_normalization | Normalisé | 0.9177 | 0.7051 | 0.9138 | 0.9854 | 0.0146 | 6.96 |
| wav2vec2-english | Megaphone, outdoor | - | Broadcast_standard | Brut | 0.6450 | 0.2520 | 0.6032 | 0.8296 | 0.1704 | 7.29 |
| wav2vec2-english | Megaphone, outdoor | - | Broadcast_standard | Normalisé | 0.5281 | 0.2055 | 0.4939 | 0.7228 | 0.2772 | 7.29 |
| wav2vec2-english | Megaphone, outdoor | - | No_normalization | Brut | 0.6450 | 0.2512 | 0.6032 | 0.8296 | 0.1704 | 7.07 |
| wav2vec2-english | Megaphone, outdoor | - | No_normalization | Normalisé | 0.5281 | 0.2047 | 0.4939 | 0.7228 | 0.2772 | 7.07 |
| wav2vec2-english | None | Demucs_hybrid | Broadcast_standard | Brut | 0.3117 | 0.0801 | 0.3077 | 0.5124 | 0.4876 | 7.06 |
| wav2vec2-english | None | Demucs_hybrid | Broadcast_standard | Normalisé | 0.1385 | 0.0234 | 0.1368 | 0.2419 | 0.7581 | 7.06 |
| wav2vec2-english | None | Demucs_hybrid | No_normalization | Brut | 0.3074 | 0.0793 | 0.3034 | 0.5064 | 0.4936 | 7.06 |
| wav2vec2-english | None | Demucs_hybrid | No_normalization | Normalisé | 0.1342 | 0.0226 | 0.1325 | 0.2344 | 0.7656 | 7.06 |
| wav2vec2-english | Cathedral | Demucs_hybrid | Broadcast_standard | Brut | 0.9351 | 0.7700 | 0.9351 | 0.9912 | 0.0088 | 6.92 |
| wav2vec2-english | Cathedral | Demucs_hybrid | Broadcast_standard | Normalisé | 0.9307 | 0.7623 | 0.9307 | 0.9900 | 0.0100 | 6.92 |
| wav2vec2-english | Cathedral | Demucs_hybrid | No_normalization | Brut | 0.9351 | 0.7692 | 0.9351 | 0.9912 | 0.0088 | 6.85 |
| wav2vec2-english | Cathedral | Demucs_hybrid | No_normalization | Normalisé | 0.9307 | 0.7607 | 0.9307 | 0.9900 | 0.0100 | 6.85 |
| wav2vec2-english | Megaphone, outdoor | Demucs_hybrid | Broadcast_standard | Brut | 0.6494 | 0.2512 | 0.6224 | 0.8455 | 0.1545 | 7.04 |
| wav2vec2-english | Megaphone, outdoor | Demucs_hybrid | Broadcast_standard | Normalisé | 0.5238 | 0.2047 | 0.5021 | 0.7313 | 0.2687 | 7.04 |
| wav2vec2-english | Megaphone, outdoor | Demucs_hybrid | No_normalization | Brut | 0.6494 | 0.2527 | 0.6224 | 0.8455 | 0.1545 | 6.95 |
| wav2vec2-english | Megaphone, outdoor | Demucs_hybrid | No_normalization | Normalisé | 0.5238 | 0.2063 | 0.5021 | 0.7313 | 0.2687 | 6.95 |
| seamless-m4t-v2-large | None | - | Broadcast_standard | Brut | 0.5411 | 0.4333 | 0.5411 | 0.6526 | 0.3474 | 79.16 |
| seamless-m4t-v2-large | None | - | Broadcast_standard | Normalisé | 0.4286 | 0.4102 | 0.4286 | 0.4612 | 0.5388 | 79.16 |
| seamless-m4t-v2-large | None | - | No_normalization | Brut | 0.5411 | 0.4333 | 0.5411 | 0.6526 | 0.3474 | 76.47 |
| seamless-m4t-v2-large | None | - | No_normalization | Normalisé | 0.4286 | 0.4102 | 0.4286 | 0.4612 | 0.5388 | 76.47 |
| seamless-m4t-v2-large | Cathedral | - | Broadcast_standard | Brut | 0.9740 | 0.8728 | 0.9740 | 0.9951 | 0.0049 | 61.74 |
| seamless-m4t-v2-large | Cathedral | - | Broadcast_standard | Normalisé | 0.9697 | 0.8606 | 0.9697 | 0.9934 | 0.0066 | 61.74 |
| seamless-m4t-v2-large | Cathedral | - | No_normalization | Brut | 0.9740 | 0.8728 | 0.9740 | 0.9951 | 0.0049 | 61.85 |
| seamless-m4t-v2-large | Cathedral | - | No_normalization | Normalisé | 0.9697 | 0.8606 | 0.9697 | 0.9934 | 0.0066 | 61.85 |
| seamless-m4t-v2-large | Megaphone, outdoor | - | Broadcast_standard | Brut | 0.7532 | 0.7229 | 0.7532 | 0.7836 | 0.2164 | 59.55 |
| seamless-m4t-v2-large | Megaphone, outdoor | - | Broadcast_standard | Normalisé | 0.7359 | 0.7180 | 0.7359 | 0.7522 | 0.2478 | 59.55 |
| seamless-m4t-v2-large | Megaphone, outdoor | - | No_normalization | Brut | 0.7532 | 0.7229 | 0.7532 | 0.7836 | 0.2164 | 59.48 |
| seamless-m4t-v2-large | Megaphone, outdoor | - | No_normalization | Normalisé | 0.7359 | 0.7180 | 0.7359 | 0.7522 | 0.2478 | 59.48 |
| seamless-m4t-v2-large | None | Demucs_hybrid | Broadcast_standard | Brut | 0.5628 | 0.4474 | 0.5628 | 0.6729 | 0.3271 | 65.83 |
| seamless-m4t-v2-large | None | Demucs_hybrid | Broadcast_standard | Normalisé | 0.4545 | 0.4263 | 0.4545 | 0.4909 | 0.5091 | 65.83 |
| seamless-m4t-v2-large | None | Demucs_hybrid | No_normalization | Brut | 0.5455 | 0.4333 | 0.5455 | 0.6615 | 0.3385 | 67.02 |
| seamless-m4t-v2-large | None | Demucs_hybrid | No_normalization | Normalisé | 0.4372 | 0.4110 | 0.4372 | 0.4811 | 0.5189 | 67.02 |
| seamless-m4t-v2-large | Cathedral | Demucs_hybrid | Broadcast_standard | Brut | 0.9351 | 0.8619 | 0.9351 | 0.9762 | 0.0238 | 57.50 |
| seamless-m4t-v2-large | Cathedral | Demucs_hybrid | Broadcast_standard | Normalisé | 0.9134 | 0.8525 | 0.9134 | 0.9578 | 0.0422 | 57.50 |
| seamless-m4t-v2-large | Cathedral | Demucs_hybrid | No_normalization | Brut | 0.9351 | 0.8619 | 0.9351 | 0.9762 | 0.0238 | 57.89 |
| seamless-m4t-v2-large | Cathedral | Demucs_hybrid | No_normalization | Normalisé | 0.9134 | 0.8525 | 0.9134 | 0.9578 | 0.0422 | 57.89 |
| seamless-m4t-v2-large | Megaphone, outdoor | Demucs_hybrid | Broadcast_standard | Brut | 0.7532 | 0.7229 | 0.7532 | 0.7836 | 0.2164 | 58.67 |
| seamless-m4t-v2-large | Megaphone, outdoor | Demucs_hybrid | Broadcast_standard | Normalisé | 0.7359 | 0.7180 | 0.7359 | 0.7522 | 0.2478 | 58.67 |
| seamless-m4t-v2-large | Megaphone, outdoor | Demucs_hybrid | No_normalization | Brut | 0.7532 | 0.7229 | 0.7532 | 0.7836 | 0.2164 | 58.87 |
| seamless-m4t-v2-large | Megaphone, outdoor | Demucs_hybrid | No_normalization | Normalisé | 0.7359 | 0.7180 | 0.7359 | 0.7522 | 0.2478 | 58.87 |
| parakeet-ctc-1.1b | None | - | Broadcast_standard | Brut | 0.2251 | 0.0589 | 0.2222 | 0.3872 | 0.6128 | 11.54 |
| parakeet-ctc-1.1b | None | - | Broadcast_standard | Normalisé | 0.0260 | 0.0024 | 0.0256 | 0.0383 | 0.9617 | 11.54 |
| parakeet-ctc-1.1b | None | - | No_normalization | Brut | 0.2251 | 0.0589 | 0.2222 | 0.3872 | 0.6128 | 13.03 |
| parakeet-ctc-1.1b | None | - | No_normalization | Normalisé | 0.0260 | 0.0024 | 0.0256 | 0.0383 | 0.9617 | 13.03 |
| parakeet-ctc-1.1b | Cathedral | - | Broadcast_standard | Brut | 0.2424 | 0.0651 | 0.2393 | 0.4138 | 0.5862 | 11.29 |
| parakeet-ctc-1.1b | Cathedral | - | Broadcast_standard | Normalisé | 0.0433 | 0.0089 | 0.0427 | 0.0717 | 0.9283 | 11.29 |
| parakeet-ctc-1.1b | Cathedral | - | No_normalization | Brut | 0.2424 | 0.0651 | 0.2393 | 0.4138 | 0.5862 | 11.00 |
| parakeet-ctc-1.1b | Cathedral | - | No_normalization | Normalisé | 0.0433 | 0.0089 | 0.0427 | 0.0717 | 0.9283 | 11.00 |
| parakeet-ctc-1.1b | Megaphone, outdoor | - | Broadcast_standard | Brut | 0.2251 | 0.0589 | 0.2222 | 0.3872 | 0.6128 | 12.01 |
| parakeet-ctc-1.1b | Megaphone, outdoor | - | Broadcast_standard | Normalisé | 0.0260 | 0.0024 | 0.0256 | 0.0383 | 0.9617 | 12.01 |
| parakeet-ctc-1.1b | Megaphone, outdoor | - | No_normalization | Brut | 0.2251 | 0.0589 | 0.2222 | 0.3872 | 0.6128 | 11.32 |
| parakeet-ctc-1.1b | Megaphone, outdoor | - | No_normalization | Normalisé | 0.0260 | 0.0024 | 0.0256 | 0.0383 | 0.9617 | 11.32 |
| parakeet-ctc-1.1b | None | Demucs_hybrid | Broadcast_standard | Brut | 0.2294 | 0.0597 | 0.2265 | 0.3939 | 0.6061 | 11.09 |
| parakeet-ctc-1.1b | None | Demucs_hybrid | Broadcast_standard | Normalisé | 0.0303 | 0.0032 | 0.0299 | 0.0467 | 0.9533 | 11.09 |
| parakeet-ctc-1.1b | None | Demucs_hybrid | No_normalization | Brut | 0.2294 | 0.0597 | 0.2265 | 0.3939 | 0.6061 | 11.00 |
| parakeet-ctc-1.1b | None | Demucs_hybrid | No_normalization | Normalisé | 0.0303 | 0.0032 | 0.0299 | 0.0467 | 0.9533 | 11.00 |
| parakeet-ctc-1.1b | Cathedral | Demucs_hybrid | Broadcast_standard | Brut | 0.3117 | 0.1217 | 0.3077 | 0.5082 | 0.4918 | 11.13 |
| parakeet-ctc-1.1b | Cathedral | Demucs_hybrid | Broadcast_standard | Normalisé | 0.1429 | 0.0701 | 0.1410 | 0.2429 | 0.7571 | 11.13 |
| parakeet-ctc-1.1b | Cathedral | Demucs_hybrid | No_normalization | Brut | 0.3117 | 0.1217 | 0.3077 | 0.5082 | 0.4918 | 11.30 |
| parakeet-ctc-1.1b | Cathedral | Demucs_hybrid | No_normalization | Normalisé | 0.1429 | 0.0709 | 0.1410 | 0.2429 | 0.7571 | 11.30 |
| parakeet-ctc-1.1b | Megaphone, outdoor | Demucs_hybrid | Broadcast_standard | Brut | 0.2251 | 0.0589 | 0.2222 | 0.3872 | 0.6128 | 11.13 |
| parakeet-ctc-1.1b | Megaphone, outdoor | Demucs_hybrid | Broadcast_standard | Normalisé | 0.0260 | 0.0024 | 0.0256 | 0.0383 | 0.9617 | 11.13 |
| parakeet-ctc-1.1b | Megaphone, outdoor | Demucs_hybrid | No_normalization | Brut | 0.2251 | 0.0589 | 0.2222 | 0.3872 | 0.6128 | 10.99 |
| parakeet-ctc-1.1b | Megaphone, outdoor | Demucs_hybrid | No_normalization | Normalisé | 0.0260 | 0.0024 | 0.0256 | 0.0383 | 0.9617 | 10.99 |
| parakeet-tdt-1.1b | None | - | Broadcast_standard | Brut | 0.2251 | 0.0589 | 0.2222 | 0.3872 | 0.6128 | 13.46 |
| parakeet-tdt-1.1b | None | - | Broadcast_standard | Normalisé | 0.0260 | 0.0024 | 0.0256 | 0.0383 | 0.9617 | 13.46 |
| parakeet-tdt-1.1b | None | - | No_normalization | Brut | 0.2251 | 0.0589 | 0.2222 | 0.3872 | 0.6128 | 11.41 |
| parakeet-tdt-1.1b | None | - | No_normalization | Normalisé | 0.0260 | 0.0024 | 0.0256 | 0.0383 | 0.9617 | 11.41 |
| parakeet-tdt-1.1b | Cathedral | - | Broadcast_standard | Brut | 0.2381 | 0.0651 | 0.2340 | 0.4032 | 0.5968 | 11.29 |
| parakeet-tdt-1.1b | Cathedral | - | Broadcast_standard | Normalisé | 0.0390 | 0.0089 | 0.0383 | 0.0591 | 0.9409 | 11.29 |
| parakeet-tdt-1.1b | Cathedral | - | No_normalization | Brut | 0.2338 | 0.0636 | 0.2298 | 0.3965 | 0.6035 | 12.02 |
| parakeet-tdt-1.1b | Cathedral | - | No_normalization | Normalisé | 0.0346 | 0.0073 | 0.0340 | 0.0508 | 0.9492 | 12.02 |
| parakeet-tdt-1.1b | Megaphone, outdoor | - | Broadcast_standard | Brut | 0.2251 | 0.0589 | 0.2222 | 0.3872 | 0.6128 | 11.80 |
| parakeet-tdt-1.1b | Megaphone, outdoor | - | Broadcast_standard | Normalisé | 0.0260 | 0.0024 | 0.0256 | 0.0383 | 0.9617 | 11.80 |
| parakeet-tdt-1.1b | Megaphone, outdoor | - | No_normalization | Brut | 0.2251 | 0.0589 | 0.2222 | 0.3872 | 0.6128 | 11.86 |
| parakeet-tdt-1.1b | Megaphone, outdoor | - | No_normalization | Normalisé | 0.0260 | 0.0024 | 0.0256 | 0.0383 | 0.9617 | 11.86 |
| parakeet-tdt-1.1b | None | Demucs_hybrid | Broadcast_standard | Brut | 0.2251 | 0.0589 | 0.2222 | 0.3872 | 0.6128 | 11.77 |
| parakeet-tdt-1.1b | None | Demucs_hybrid | Broadcast_standard | Normalisé | 0.0260 | 0.0024 | 0.0256 | 0.0383 | 0.9617 | 11.77 |
| parakeet-tdt-1.1b | None | Demucs_hybrid | No_normalization | Brut | 0.2251 | 0.0589 | 0.2222 | 0.3872 | 0.6128 | 11.88 |
| parakeet-tdt-1.1b | None | Demucs_hybrid | No_normalization | Normalisé | 0.0260 | 0.0024 | 0.0256 | 0.0383 | 0.9617 | 11.88 |
| parakeet-tdt-1.1b | Cathedral | Demucs_hybrid | Broadcast_standard | Brut | 0.2511 | 0.0808 | 0.2468 | 0.4204 | 0.5796 | 12.02 |
| parakeet-tdt-1.1b | Cathedral | Demucs_hybrid | Broadcast_standard | Normalisé | 0.0649 | 0.0250 | 0.0638 | 0.1046 | 0.8954 | 12.02 |
| parakeet-tdt-1.1b | Cathedral | Demucs_hybrid | No_normalization | Brut | 0.2511 | 0.0808 | 0.2468 | 0.4204 | 0.5796 | 11.92 |
| parakeet-tdt-1.1b | Cathedral | Demucs_hybrid | No_normalization | Normalisé | 0.0649 | 0.0250 | 0.0638 | 0.1046 | 0.8954 | 11.92 |
| parakeet-tdt-1.1b | Megaphone, outdoor | Demucs_hybrid | Broadcast_standard | Brut | 0.2251 | 0.0589 | 0.2222 | 0.3872 | 0.6128 | 11.87 |
| parakeet-tdt-1.1b | Megaphone, outdoor | Demucs_hybrid | Broadcast_standard | Normalisé | 0.0260 | 0.0024 | 0.0256 | 0.0383 | 0.9617 | 11.87 |
| parakeet-tdt-1.1b | Megaphone, outdoor | Demucs_hybrid | No_normalization | Brut | 0.2251 | 0.0589 | 0.2222 | 0.3872 | 0.6128 | 11.78 |
| parakeet-tdt-1.1b | Megaphone, outdoor | Demucs_hybrid | No_normalization | Normalisé | 0.0260 | 0.0024 | 0.0256 | 0.0383 | 0.9617 | 11.78 |
| vosk-en | None | - | Broadcast_standard | Brut | 0.2381 | 0.0636 | 0.2340 | 0.4032 | 0.5968 | 12.45 |
| vosk-en | None | - | Broadcast_standard | Normalisé | 0.0390 | 0.0081 | 0.0383 | 0.0591 | 0.9409 | 12.45 |
| vosk-en | None | - | No_normalization | Brut | 0.2381 | 0.0636 | 0.2340 | 0.4032 | 0.5968 | 12.18 |
| vosk-en | None | - | No_normalization | Normalisé | 0.0390 | 0.0081 | 0.0383 | 0.0591 | 0.9409 | 12.18 |
| vosk-en | Cathedral | - | Broadcast_standard | Brut | 0.8831 | 0.8689 | 0.8831 | 0.9266 | 0.0734 | 42.81 |
| vosk-en | Cathedral | - | Broadcast_standard | Normalisé | 0.8485 | 0.8574 | 0.8485 | 0.8767 | 0.1233 | 42.81 |
| vosk-en | Cathedral | - | No_normalization | Brut | 0.8961 | 0.8862 | 0.8961 | 0.9344 | 0.0656 | 42.27 |
| vosk-en | Cathedral | - | No_normalization | Normalisé | 0.8615 | 0.8783 | 0.8615 | 0.8833 | 0.1167 | 42.27 |
| vosk-en | Megaphone, outdoor | - | Broadcast_standard | Brut | 0.4069 | 0.1923 | 0.4034 | 0.6055 | 0.3945 | 30.49 |
| vosk-en | Megaphone, outdoor | - | Broadcast_standard | Normalisé | 0.2468 | 0.1418 | 0.2446 | 0.3675 | 0.6325 | 30.49 |
| vosk-en | Megaphone, outdoor | - | No_normalization | Brut | 0.4069 | 0.1947 | 0.4034 | 0.6036 | 0.3964 | 28.41 |
| vosk-en | Megaphone, outdoor | - | No_normalization | Normalisé | 0.2468 | 0.1442 | 0.2446 | 0.3645 | 0.6355 | 28.41 |
| vosk-en | None | Demucs_hybrid | Broadcast_standard | Brut | 0.2381 | 0.0636 | 0.2340 | 0.4032 | 0.5968 | 12.22 |
| vosk-en | None | Demucs_hybrid | Broadcast_standard | Normalisé | 0.0390 | 0.0081 | 0.0383 | 0.0591 | 0.9409 | 12.22 |
| vosk-en | None | Demucs_hybrid | No_normalization | Brut | 0.2381 | 0.0636 | 0.2340 | 0.4032 | 0.5968 | 12.19 |
| vosk-en | None | Demucs_hybrid | No_normalization | Normalisé | 0.0390 | 0.0081 | 0.0383 | 0.0591 | 0.9409 | 12.19 |
| vosk-en | Cathedral | Demucs_hybrid | Broadcast_standard | Brut | 0.9567 | 0.9639 | 0.9567 | 0.9639 | 0.0361 | 30.53 |
| vosk-en | Cathedral | Demucs_hybrid | Broadcast_standard | Normalisé | 0.9567 | 0.9629 | 0.9567 | 0.9639 | 0.0361 | 30.53 |
| vosk-en | Cathedral | Demucs_hybrid | No_normalization | Brut | 0.9654 | 0.9765 | 0.9654 | 0.9654 | 0.0346 | 34.62 |
| vosk-en | Cathedral | Demucs_hybrid | No_normalization | Normalisé | 0.9654 | 0.9758 | 0.9654 | 0.9654 | 0.0346 | 34.62 |
| vosk-en | Megaphone, outdoor | Demucs_hybrid | Broadcast_standard | Brut | 0.4416 | 0.2300 | 0.4378 | 0.6358 | 0.3642 | 28.98 |
| vosk-en | Megaphone, outdoor | Demucs_hybrid | Broadcast_standard | Normalisé | 0.2987 | 0.1821 | 0.2961 | 0.4293 | 0.5707 | 28.98 |
| vosk-en | Megaphone, outdoor | Demucs_hybrid | No_normalization | Brut | 0.4372 | 0.2316 | 0.4335 | 0.6338 | 0.3662 | 30.09 |
| vosk-en | Megaphone, outdoor | Demucs_hybrid | No_normalization | Normalisé | 0.2987 | 0.1837 | 0.2961 | 0.4348 | 0.5652 | 30.09 |
This section explains the metrics used to evaluate ASR (Automatic Speech Recognition) performance. All metrics are computed by comparing the reference (ground truth) text with the hypothesis (transcription) text.
Range: 0 to ∞ (typically 0 to 1) | Lower is better
The most common ASR metric. Measures the edit distance at word level.
WER = (Substitutions + Deletions + Insertions) / Total Reference Words
Example: If reference is "the cat sat" and hypothesis is "a cat sits", WER = 2/3 = 0.667
Range: 0 to ∞ (typically 0 to 1) | Lower is better
Like WER but computed at character level. More granular, useful for languages without clear word boundaries.
CER = (Char Substitutions + Char Deletions + Char Insertions) / Total Reference Characters
Note: CER is often lower than WER since partial word matches are credited.
Range: 0 to 1 | Lower is better
Proportion of words that were not correctly matched between reference and hypothesis.
MER = (Substitutions + Deletions + Insertions) / (Hits + Substitutions + Deletions + Insertions)
Key difference from WER: MER is bounded at 1.0 and accounts for hypothesis length.
Range: 0 to 1 | Lower is better
Measures the proportion of word information that was lost in transcription.
WIL = 1 - (Hits² / (Reference Length × Hypothesis Length))
Interpretation: Combines precision and recall into a single information-theoretic measure.
Range: 0 to 1 | Higher is better
The complement of WIL - measures how much word information was correctly preserved.
WIP = Hits² / (Reference Length × Hypothesis Length) = 1 - WIL
Note: This is the only metric where higher values indicate better performance.
Interactive scatter plot showing the trade-off between processing time (X-axis) and error rate (Y-axis). Points closer to the bottom-left corner represent faster and more accurate transcriptions.
Color-coded matrix comparing all metrics across configurations. Green indicates good performance, red indicates poor performance. WIP column is inverted since higher values are better.
Shows the distribution of metric values grouped by engine, degradation type, or other factors. Useful for identifying which engines are most resilient to audio degradation.
Side-by-side comparison of reference and hypothesis texts with diff highlighting:
Use the dropdown filters at the top to focus on specific subsets of results. All tabs respect the current filter selection. Click "Reset" to restore all filters to default.
The normalization checkboxes control how metrics (WER, CER, MER, WIL, WIP) are computed:
Note: Text normalization is now a grid search dimension - each combination produces separate results with normalized text shown in the diff view.