New AI model can use human perception to help tune out noisy audio

Listeners rated the speech quality of each recording on a scale of 1 to 100.

The model derives its impressive performance from a joint-learning method that incorporates a specialised speech enhancement language module with a prediction model that can anticipate the mean opinion score that human listeners might give a noisy signal.

Results showed that the new approach outperformed other models in leading to better speech quality as measured by objective metrics such as perceptual quality, intelligibility and human ratings.

However. using human perception of sound quality has its own issues, Williamson said.

“What makes noisy audio so difficult to evaluate is that it’s very subjective. It depends on your hearing capabilities and on your hearing experiences,” he said.

Factors like having a hearing aid or a cochlear implant also impact how much the average person perceives from their sound environment, the researcher added.