Yan Zhao (赵岩)

About me

I obtained my Ph.D. in Computer Science at The Ohio State University, advised by Prof. DeLiang Wang at Perception and Neurodynamics Laboratory (PNL). Now I'm working as a Research Scientist at ByteDance.

My research interest focuses on speech enhancement/separation, audio processing and multimodal LLMs.

Contact

Email:
[lastname] [dot] [firstname] [at] bytedance [dot] com

Experience

Jul. 2020 - present: Research Scientist at Speech Team, ByteDance (San Jose, CA)
May 2019 - Aug. 2019: Applied Scientist Intern at AWS, Amazon (East Palo Alto, CA)
May 2018 - Aug. 2018: Research Intern at Machine Intelligence Technology, DAMO Academy, Alibaba Group (Bellevue, WA)
May 2017 - Aug. 2017: Research Intern at Signal Processing Research Department, Starkey Hearing Technologies (Eden Prairie, MN)

Teaching

Instructor: CSE 1223: Introduction to Computer Programming in Java, Fall 2018, OSU

Dissertation

Zhao Y. (2020): Deep learning methods for reverberant and noisy speech enhancement. The Ohio State University.

Publications

Full List: Google Scholar

Liu Y., Liu X., Zhao Y., Wang Y., Xia R., Tian P., Wang Y. (2024): Audio prompt tuning for universal sound separation. Proceedings of ICASSP-24, pp. 1446-1450.
Liu X., Kong Q., Zhao Y., Liu H., Yuan Y., Liu Y., Xia R., Wang Y., Plumbley M., Wang W. (2023): Separate anything you describe. arXiv preprint arXiv:2308.05037.
Tam K., Li L., Zhao Y., and Xu C. (2023): FedCoop: Cooperative federated learning for noisy labels. Proceedings of ECAI-23, pp. 2298-2306.
Shu X., Chen Y., Shang C., Zhao Y., Zhao C., Zhu Y., Huang C., and Wang Y. (2022): Non-intrusive speech quality assessment with a multi-task learning based subband adaptive attention temporal convolutional neural network. Proceedings of INTERSPEECH-22, pp. 3298-3302.
Liu H., Liu X., Kong Q., Tian Q., Zhao Y., Wang D.L., Huang C., and Wang Y. (2022): VoiceFixer: A unified framework for high-fidelity speech restoration. Proceedings of INTERSPEECH-22, pp. 4232-4236.
Liu H., Kong Q., Tian Q., Zhao Y., Wang D.L., Huang C., and Wang Y. (2021): VoiceFixer: Toward general speech restoration with neural vocoder. arXiv preprint arXiv:2109.13731.
Zhao Y., and Wang D.L. (2020): Noisy-reverberant speech enhancement using DenseUNet with time-frequency attention. Proceedings of INTERSPEECH-20, pp. 3261-3265.
Zhao Y., Wang D.L., Xu B., and Zhang T. (2020): Monaural speech dereverberation using temporal convolutional networks with self attention. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1598-1607.
Zhao Y., Wang Z.-Q., and Wang D.L. (2019): Two-stage deep learning for noisy-reverberant speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, pp. 53-62.
Zhao Y., Wang D.L., Johnson E.M., and Healy E.W. (2018): A deep learning based segregation algorithm to increase speech intelligibility for hearing-impaired listeners in reverberant-noisy conditions. Journal of the Acoustical Society of America, vol. 144, pp. 1627-1637.
Zhao Y., Wang D.L., Xu B., and Zhang T. (2018): Late reverberation suppression using recurrent neural networks with long short-term memory. Proceedings of ICASSP-18, pp. 5434-5438.
Zhao Y., Xu B., Giri R., and Zhang T. (2018): Perceptually guided speech enhancement using deep neural networks. Proceedings of ICASSP-18, pp. 5074-5078.
Zhao Y., Wang Z.-Q., and Wang D.L. (2017): A two-stage algorithm for noisy and reverberant speech enhancement. Proceedings of ICASSP-17, pp. 5580-5584.
Zhao Y., Wang D.L., Merks I., and Zhang T. (2016): DNN-based enhancement of noisy and reverberant speech. Proceedings of ICASSP-16, pp. 6525-6529.
Wang Z.-Q., Zhao Y., and Wang D.L. (2016): Phoneme-specific speech separation. Proceedings of ICASSP-16, pp. 146-150.

Service

Journal/Conference Reviewer:

- IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
- IEEE Signal Processing Letters (SPL)
- IEEE Transactions on Multimedia (TMM)
- JASA Express Letters (JASA-EL)
- Journal of Speech, Language, and Hearing Research (JSLHR)
- Computer Speech and Language
- Digital Signal Processing

- ICASSP/INTERSPEECH/AAAI