NCVS Insights – Science that Resonates
Voice and Artificial Intelligence: The Role of Vocologists
November 27, 2025
Volume 3, Issue 11 – November 2025
By Leonardo Lopes and Vera Medina
In recent years, Artificial Intelligence (AI) has transformed industries, including healthcare, and redefined professional roles globally. In voice science, AI-driven tools are used by clinicians and researchers for a range of applications, including vocal signal (or image) processing and analysis, the generation of synthetic data, the development of new diagnostic tools and algorithms, and the creation of clinical decision-making models, among others. AI has the potential to provide insights from multidimensional vocal production data, enabling more effective diagnosis, treatment, and improvement of our patients’ and clients’ voices. For instance, AI models will allow us to handle large volumes of data and identify associations and predictions between variables that would be difficult to discern with the naked eye or through conventional statistical modeling. These associations and predictions, however, must make sense to voice experts.
AI-powered technologies also benefit from human expertise, particularly from vocologists who inform systems that rely on the voice as their primary data source. Speech-language pathologists, vocal scientists, otorhinolaryngologists, singing teachers, and vocal coaches are uniquely positioned to offer a holistic understanding of phonation, physiology, and pedagogy. In collaborative work with data scientists and engineers, vocologists add a level of insight that makes AI tools more inclusive and attuned to the diversity and authenticity of the human voice.
The vocologist’s expertise, rooted in a deep understanding of prosody, articulatory phonetics, and the expressive and communicative dimensions of the human voice, is crucial for the naturalness, inclusivity, and effectiveness of AI systems that interact with voice. In the development of virtual assistants and voice interfaces, the vocologist refines the AI’s “delivery,” guiding, for example, the modulation of intonation and speech rhythm to convey naturalness, emphasis, or communicative intent. Furthermore, the vocologist’s ability to identify and correct algorithmic biases in speech recognition can be fundamental, ensuring that AI comprehends regional dialects, diverse accents, and atypical vocal characteristics (e.g., dysarthrias and voices with extreme deviations) with equal precision, thereby promoting inclusive, universally accessible technology. The absence of this qualitative contribution would result in artificial interactions and communication failures.
In applications such as security and voice biometrics, the vocologist’s expertise is crucial for distinguishing natural vocal variability from fraudulent attempts or spoofing. Differentiating between authentic vocal variations (e.g., a voice altered by a cold) and imitation patterns requires an in-depth understanding of the physiological and behavioral factors of the voice. The vocologist can guide the inclusion of “voice change” scenarios in the training of biometric algorithms, enabling AI to identify vocal characteristics inherent to individual identity rather than superficial, easily manipulable traits. This collaboration is crucial for optimizing system robustness against sophisticated fraud, including deepfakes, and for preventing the undue rejection of legitimate users due to normative vocal variations. For instance, during the 2024 Brazilian mayoral elections, our team developed a support system for the Electoral Court to determine whether leaked audio recordings were human or synthesized.
Additionally, in the design of AI algorithms for dysphonia detection, the contribution of the vocologist is indispensable. Instead of merely feeding the system with healthy and pathological voices, the vocologist labels and categorizes specific nuances of vocal quality in extensive databases. These perceptual labels, which are intrinsically human and result from specialized auditory training, provide the necessary “ground truth” for autonomous AI learning. Without this qualitative and clinically informed input, the algorithm could focus on irrelevant acoustic features or fail to differentiate between dysphonia subtypes, compromising diagnostic accuracy. The relevance of this approach is demonstrated by our team’s development of the Integrated Vocal Deviation Index (IVDI) (Lima-Filho et al., 2024). In this work, experienced speech-language pathologists performed perceptual-auditory judgment of overall severity and specific voice characteristics (such as roughness, breathiness, instability, and tension), providing the basis for training a machine learning model (XGBoost). The final model integrated four acoustic measures and four perceptual-auditory judgment measures, achieving 93.75% accuracy in classifying the degree of vocal deviation, which underscores the indispensability of human expertise in calibrating diagnostic AI.
As we transition into the digital age, important questions arise: What role does the human voice play in the rapidly expanding universe of AI? The key challenge is how we can direct AI developers toward creating solutions that enhance the human voice rather than simply automating processes. Furthermore, what role should we, as vocologists, play in ensuring these advancements respect and preserve vocal diversity and authenticity? These questions prompt us to reflect and engage in dialogue about the future path, emphasizing the ongoing importance of vocologists as they connect traditional methods with emerging technologies.
This manuscript aims to foster reflection and discussion on the evolving role of the vocologist as AI reshapes vocal rehabilitation and training practices. By avoiding excessively optimistic or pessimistic stances, we encourage a responsible and thoughtful exploration of AI’s integration and application across various domains (clinical, artistic, forensic, and others), recognizing the human voice as one of the most vital forms of expression.
It’s a fact: The voice is special in the context of AI and the connected world! The human voice stands uniquely apart from all the signals AI seeks to interpret. The world has discovered (and AI has highlighted) what voice experts have long known: the ability of voice to convey a wide range of information. The human voice is a rich and complex manifestation of individual identity, incorporating contextual, intentional, emotional, and regional variations, as well as the speaker’s physical attributes.
Voice, as a biomarker, holds significant potential for detecting and monitoring various health conditions, including neurodegenerative diseases, neurocognitive disorders, and mental health disorders. Vocologists play an essential role in curating and labeling vocal data, ensuring that AI algorithms are trained with relevant, explainable, and representative acoustic and perceptual patterns for the conditions under investigation. This curation involves not only the careful selection of vocal tasks that are simple and clinically relevant for each condition but also ensuring high-quality signal capture by controlling noise, selecting appropriate recording devices, and standardizing acoustic environments when necessary. Thus, by guaranteeing robust, well-labeled data, vocologists enable AI models to monitor speech patterns at scale, providing practical tools for both patients and clinicians.
This wealth of information positions the human voice as a central element in the development of AI-based technologies that extend beyond mere replication, aiming to capture and understand its subtleties for practical applications in various areas. By integrating the vocal signal into AI systems, we are essentially leveraging one of the most dynamic aspects of human communication, which can reveal information about an individual’s mental and physical state, for example.
Additionally, due to its remarkable sensitivity to subtle (neuro)physiological and organic changes, the human voice can serve as a valuable biomarker for monitoring health conditions. Its involvement with multiple systems of the human body and sophisticated neural control allows it to reflect shifts in a person’s well-being, often before other symptoms manifest. In AI applications, this translates into a potential and promising tool for early diagnosis and continuous monitoring of chronic health conditions. Recently, the U.S. Food and Drug Administration (FDA) and the National Institutes of Health created a Biomarkers, Endpoints, and Other Tools (BEST) glossary, bringing in the concept of “digital biomarkers.” A digital biomarker is a quantifiable measurement derived from digital data that captures, monitors, or predicts health information related to physiological status or changes in a condition. The voice signal is situated in this context and is considered a promising digital biomarker of different physiological and mental conditions.
If we accept that the human voice holds a genuinely pivotal role within AI, vocologists should be involved in designing, developing, and implementing these transformative AI technologies. Vocologists of all kinds possess not only technical knowledge but also a deep understanding of vocal nuances that can transform technological advances into effective, human-centered solutions. Vocologists deal directly with the human element in their work, receiving and assisting individuals who require vocal rehabilitation or wish to enhance their performance, offering a perspective that extends beyond the technical. This direct experience with real individuals helps to incorporate contextualization and specific requirements into AI systems, ensuring that these technologies are truly useful and relevant in practical situations.
In the emerging field of vocal biomarkers, for example, we are witnessing an influx of startups focused on developing technologies to monitor various health conditions. This sector has attracted considerable investment, demonstrating growing interest in the transformative potential of the voice as a health-monitoring tool. It is expected that by 2028, more than $5.1 billion will be invested in AI technologies that use voice signals as biomarkers for the prognosis, diagnosis, or monitoring of various health conditions.
Another point that we should bring to the discussion is the role of vocologists in ensuring that the evolution of voice technologies considers ethical aspects. Their role is crucial in ensuring that these tools are built and utilized efficiently and responsibly. One way they achieve this is through their deep understanding of vocal diversity, encompassing the wide range of vocal characteristics across genders, ages, cultures, and health conditions. By applying this knowledge, vocologists can guide the development of AI systems to recognize and respect this diversity, ensuring that technologies are not biased toward a narrow subset of voices and that they accurately represent and serve all users equitably. This approach not only enhances the ethical deployment of AI technologies but also helps create more inclusive, universally applicable solutions.
AI’s capacity to process large data volumes and identify subtle patterns is transforming work, making applications like multilingual speech systems, voice cloning, text-to-speech (TTS), AI-generated singing, and vocal analysis widely accessible. Reflecting the work of vocologists in vocal habilitation, AI-based tools are transforming the development of communication skills. These platforms utilize voice signals, speech, and language structure to provide AI-driven coaching with instant feedback, enabling individuals to practice presentations, refine their public speaking skills, or prepare for critical conversations. Such tools not only empower individuals by providing a private practice space but also streamline organizational communication and accelerate skill acquisition.
However, this accessibility also raises risks. In music production, AI-generated cloned voices, editing, effects, and real-time pitch correction can enhance creativity. Still, they should not serve as a shortcut to bypass the training and deliberate development of vocal skills by human performers. Relying on predefined cloned voices or AI-generated vocal packages to mimic genuine artistic performance risks diluting artistic integrity and favoring convenience over skill. In telehealth, AI-powered vocal analysis enables remote diagnosis and follow-ups, reducing the need for in-person consultations. Yet, the growing reliance on automated tools also introduces risks of misdiagnosis, data privacy concerns, and a potential overreliance on AI at the expense of personalized, expert-led assessments. We have an ethical responsibility to ensure that these technologies truly enhance patient care and support voice professionals rather than merely substituting for their expertise.
In voice synthesis technology, there is a pressing need to address identity fraud and disinformation. Technologies that enable the generation of deepfakes, digital watermarking methods, the use of metadata, and robust ethical practices must be established to prevent the malicious use of AI-generated voices. Unfortunately, not all developers are aware of these risks when their tools go public, often neglecting issues related to terms of use and privacy.
In the context of forensics and speaker identification, AI-based technologies demonstrate promising potential by utilizing advanced algorithms to analyze complex vocal patterns. AI techniques enhance the extraction of key vocal characteristics, enabling accurate individual identification even in challenging acoustic conditions. However, with the advent of AI and voice cloning, significant challenges have emerged, such as detecting voice deepfakes. During the 2024 Brazilian elections, we developed a system to detect cloned candidate voices, which the courts use to identify and combat electoral crimes related to cloned voices. This situation exemplifies the ongoing need to emphasize security and integrity in the application of these emerging technologies.
One risk all vocology communities face is the potential for AI to replace human professionals. However, many technologists argue that AI should be viewed as a tool for augmenting human capabilities rather than a substitute for human expertise. Rather than replacing professionals, AI can complement their work, enabling them to conduct more in-depth and complex analyses.
The possibilities and applications of AI associated with speech signals are vast and promising. However, many of these technologies are still in the early stages of development, with many documented in scientific publications or as prototypes. The improvements they promise to bring to our professional practice, as well as the outcomes for our patients and clients, are not yet fully understood or realized in our daily routines.
Significant technical and ethical challenges are associated with the development and implementation of AI technologies that rely on voice signals. These include building and sharing large databases with high-quality audio recordings related to patient clinical data and the identification of vocal biomarker candidates; harmonizing and standardizing audio data between studies; building databases in different languages, accents, and cultures; improving the accuracy of algorithms; and integrating these algorithms into medical devices and telemedicine or Internet of Things (IoT) systems. In addition, it is essential to ensure security in data collection and storage, to maintain variability in data profiles and avoid systemic biases, to maximize open data initiatives and guarantee transparency, cross-comparison, and interoperability, to increase the explainability of algorithms, and to prevent accentuating the existing digital divide, thereby ensuring universal access to innovation.
However, the biggest challenge lies in the connection between researchers and clinicians, as well as the involvement of vocologists in developing these technologies. Surprisingly, more than 70% of studies that used AI to predict health conditions from voice data did not include clinicians or speech-language pathologists (vocologists) on their research teams. Vocologists should actively participate in AI development by collaborating with engineers, data scientists, and medical researchers to advance the field.
Initiatives such as the Bridge2AI-Voice Consortium, which unites a multidisciplinary team of clinicians, data engineers, computer scientists, AI experts, bioethicists, and vocologists from diverse institutions across the U.S. and Canada, exemplify the effort required to overcome the barrier of harmonizing and standardizing audio data across studies and clinics. Working to build a high-quality, ethically sourced voice database linked to health information, the consortium focuses on generating a large-scale, AI-ready dataset, as detailed by Bensoussan et al. (2025) in their work “Bridge2AI-Voice: An ethically sourced, diverse voice dataset linked to health information.” In this context, the vocologist acts as a central liaison, developing and implementing rigorous and ethically grounded protocols for voice data collection—encompassing the definition of microphone types, recording software, controlled acoustic environments, and standardized vocal tasks tailored to the five disease cohort categories studied (Voice Disorders, Neurological and Neurodegenerative Disorders, Mood and Psychiatric Disorders, Respiratory Disorders, and Pediatric Voice and Speech Disorders). Without such clinically informed standardization and a robust ethical focus, the voice data collected would be incompatible, hindering the creation of comprehensive datasets—like the 19,271 recordings from 442 participants across five sites contained in version 2.0.1—necessary to train robust, generalizable, and universally applicable AI models capable of analyzing the five categories of derived data (spectrograms, MFCCs, acoustic features, phonetic/prosodic features, and transcriptions). The creation of best-practice guides for clinical vocal recording, co-authored by vocologists and engineers, that serve as reference standards for vocal biomarker research worldwide is a tangible example of this collaboration.
The first step towards this is to deepen our understanding of the various AI techniques and their applications so that we can make informed decisions about how this technology can improve the quality of services we offer our clients and patients and effectively enhance human performance in ways that better meet their needs. Thoughtful collaboration between vocologists and AI developers is essential to ensure that AI applications are clinically valid, ethical, and practical. In addition, vocologists should stay up to date with the most relevant advances and seek to incorporate AI-related training into diagnostic, treatment, and vocal pedagogy programs to build public trust and ensure the sustainability of performance in future scenarios. Despite AI’s growing capabilities, human expertise remains irreplaceable. Vocology is inherently multidisciplinary, and while AI can aid in decision-making, the ability to see, hear, and assess a patient or performer in real time cannot be replicated by technology alone.
The benefits are as numerous as the risks. But this has always been the case when we face new ways of doing things. The benefits should prevail, as AI-powered tools that use the human voice have the potential to expand access to healthcare, particularly in regions with limited medical resources.
Vocologists should engage in interdisciplinary collaborations that enable them to maintain high standards of expertise when using AI-based tools responsibly within their circle and act as educators for their patients. Any use of AI tools for this purpose should be indicated and closely monitored by experts in the field to prevent misuse by the general public. Vocologists are responsible for ensuring that AI enhances, rather than replaces, human judgment and care. As AI continues to evolve, so must its role, shaping the accountable and effective integration of these technologies into vocal science and practice.
*The authors declare that this text was translated from Brazilian Portuguese to English and subsequently reviewed by Artificial Intelligence (Deep Translate, ChatGPT version 5, and Gemini).
REFERENCES
Babrak, L. M., Menetski, J., Rebhan, M., Nisato, G., Zinggeler, M., Brasier, N., et al. (2019). Traditional and digital biomarkers: Two worlds apart? Digital Biomarkers, 3(2), 92–102. https://doi.org/10.1159/000502000
Bensoussan, Y., Sigaras, A., Rameau, A., Elemento, O., Powell, M., Dorr, D., et al. (2025). Bridge2AI-Voice: An ethically sourced, diverse voice dataset linked to health information (version 2.0.1). PhysioNet. https://doi.org/10.13026/gzjs-0535
Briganti, G., & Lechien, J. R. (2025). Voice quality as digital biomarker in bipolar disorder: A systematic review. Journal of Voice. Advance online publication. https://doi.org/10.1016/j.jvoice.2025.01.002
Califf, R. M. (2018). Biomarker definitions and their applications. Experimental Biology and Medicine, 243(3), 213–221. https://doi.org/10.1177/1535370217750088
Evangelista, E., Kale, R., McCutcheon, D., Rameau, A., Gelbard, A., Powell, M., et al. (2024). Current practices in voice data collection and limitations to voice AI research: A national survey. Laryngoscope, 134(3), 1333–1339. https://doi.org/10.1002/lary.30691
Evangelista, E. G., Bélisle-Pipon, J.-C., Naunheim, M. R., Powell, M., Gallois, H., Bridge2AI-Voice Consortium, et al. (2024). Voice as a biomarker in health-tech: Mapping the evolving landscape of voice biomarkers in the start-up world. Otolaryngology–Head and Neck Surgery, 171(2), 340–352. https://doi.org/10.1002/ohn.830
Firestein, G. S. (2006). A biomarker by any other name…. Nature Clinical Practice Rheumatology, 2(12), 635. https://doi.org/10.1038/ncprheum0348
Hajjar, I., Okafor, M., Choi, J. D., Moore, E. II, Abrol, A., Calhoun, V. D., et al. (2023). Development of digital voice biomarkers and associations with cognition, cerebrospinal biomarkers, and neural representation in early Alzheimer’s disease. Alzheimer’s & Dementia, 15(1), e12393. https://doi.org/10.1002/dad2.12393
Härmä, A., den Brinker, B., Grossekathöfer, U., Ouweltjes, O., Nallanthighal, S., Abrol, S., et al. (2024). Survey on biomarkers in human vocalizations [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2408.05190
Kim, D. (2024). Digital biomarkers development using multimodal AI technology. The National High School Journal of Science, 1(1), 1–8.
Lima-Filho, L. M. A., Lopes, L. W., & Silva Filho, T. M. (2024). Integrated Vocal Deviation Index (IVDI): A machine learning model to classify the general grade of vocal deviation. Journal of Voice. Advance online publication. https://doi.org/10.1016/j.jvoice.2024.11.002
Macias Alonso, A. K., Hirt, J., Woelfle, T., Janiaud, P., & Hemkens, L. G. (2024). Definitions of digital biomarkers: A systematic mapping of the biomedical literature. BMJ Health & Care Informatics, 31(1), e100914. https://doi.org/10.1136/bmjhci-2023-100914
Motahari-Nezhad, H., Fgaier, M., Abid, M. M., Péntek, M., Gulácsi, L., & Zrubka, Z. (2021). Scoping review of systematic reviews of digital biomarker-based studies [Preprint]. JMIR Preprints. https://doi.org/10.2196/preprints.35722
Robin, J., Harrison, J. E., Kaufman, L. D., Simpson, W., Yancheva, M., & Rudzicz, F. (2020). Evaluation of speech-based digital biomarkers: Review and recommendations. Digital Biomarkers, 4(3), 99–108. https://doi.org/10.1159/000510922
Sara, J. D. S., Orbelo, D., Maor, E., Lerman, L. O., & Lerman, A. (2023). Guess what we can hear: Novel voice biomarkers for the remote detection of disease. Mayo Clinic Proceedings, 98(9), 1353–1375. https://doi.org/10.1016/j.mayocp.2023.04.017
Vasudevan, S., Saha, A., Tarver, M. E., & Patel, B. (2022). Digital biomarkers: Convergence of digital health technologies and biomarkers. NPJ Digital Medicine, 5, 36. https://doi.org/10.1038/s41746-022-00624-3
Yao, P., Usman, M., Chen, Y. H., German, A., Andreadis, K., Mages, K., et al. (2022). Applications of artificial intelligence to office laryngoscopy: A scoping review. Laryngoscope, 132(10), 1993–2016. https://doi.org/10.1002/lary.30026
Leonardo Lopes
Speech-language pathologist. Master’s and PhD in Linguistics. Full Professor in the Department of Speech-Language Pathology and Audiology at the Federal University of Paraíba, Brazil. Scientific Director of the Brazilian Society of Speech-Language Pathology and Audiology (2025-2027). Research productivity fellow from the National Council for Research and Technology (CNPq). He received the “Speech-Language Pathology and Audiology Fellow” award from the Brazilian Society of Speech-Language Pathology and Audiology in recognition of his career and work toward the development of Speech-Language Pathology and Audiology. He leads nationally and internationally funded projects on the use of voice and artificial intelligence to predict various health conditions.
Vera Medina
Entrepreneur, singer, songwriter, and producer, Vera Medina integrates vocal technique and AI innovation in the creation and analysis of the singing voice at her studio, Neomundo Academy. An AI executive and Women in AI (Paris) Ambassador for Brazil, she is a PhD student at FEA-USP. She holds a Bachelor’s in Business Administration from FGV-EAESP, a Master’s in Production Systems (Centro Paula Souza – FATEC), and a Master of Music (General Studies) from Berklee College of Music. She also holds postgraduate diplomas in Vocal Pedagogy (FASM) and Creative Musical Practices, is certified in Somatic Voicework™ (The LoVetri Method), and has training in Soul Ingredients ®.
HOW TO CITE
Subscribe to NCVS Notes
Contact
975 S. State Street
Clearfield, UT 84015
