Yan Zhao (赵岩)
About me
I obtained my Ph.D. in Computer Science at The Ohio State University, advised by Prof. DeLiang Wang at Perception and Neurodynamics Laboratory (PNL). Now I'm working as a Research Scientist at ByteDance.
My research interest focuses on speech enhancement/separation, audio processing and multimodal LLMs.
Contact
Email:
[lastname] [dot] 836 [at] osu [dot] edu
[lastname] [dot] [firstname] [at] bytedance [dot] com
Experience
-
Jul. 2020 - present: Research Scientist at Speech Team, ByteDance (San Jose, CA)
-
May 2019 - Aug. 2019: Applied Scientist Intern at AWS, Amazon (East Palo Alto, CA)
-
May 2018 - Aug. 2018: Research Intern at Machine Intelligence Technology, DAMO Academy, Alibaba Group (Bellevue, WA)
-
May 2017 - Aug. 2017: Research Intern at Signal Processing Research Department, Starkey Hearing Technologies (Eden Prairie, MN)
Teaching
-
Instructor: CSE 1223: Introduction to Computer Programming in Java, Fall 2018, OSU
Dissertation
Publications
Full List: Google Scholar
- Liu Y., Liu X., Zhao Y., Wang Y., Xia R., Tian P., Wang Y. (2024): Audio prompt tuning for universal sound separation. Proceedings of ICASSP-24, pp. 1446-1450.
- Liu X., Kong Q., Zhao Y., Liu H., Yuan Y., Liu Y., Xia R., Wang Y., Plumbley M., Wang W. (2023): Separate anything you describe.
arXiv preprint arXiv:2308.05037.
- Tam K., Li L., Zhao Y., and Xu C. (2023): FedCoop: Cooperative federated learning for noisy
labels.
Proceedings of ECAI-23, pp. 2298-2306.
- Shu X., Chen Y., Shang C., Zhao Y., Zhao C., Zhu Y., Huang C., and Wang Y. (2022): Non-intrusive speech quality assessment with a multi-task learning based subband adaptive attention temporal convolutional neural network.
Proceedings of INTERSPEECH-22, pp. 3298-3302.
- Liu H., Liu X., Kong Q., Tian Q., Zhao Y., Wang D.L., Huang C., and Wang Y. (2022): VoiceFixer: A unified framework for high-fidelity speech restoration.
Proceedings of INTERSPEECH-22, pp. 4232-4236.
- Liu H., Kong Q., Tian Q., Zhao Y., Wang D.L., Huang C., and Wang Y. (2021): VoiceFixer: Toward general speech restoration with neural vocoder.
arXiv preprint arXiv:2109.13731.
-
Zhao Y., and Wang D.L. (2020): Noisy-reverberant speech enhancement using DenseUNet with time-frequency attention.
Proceedings of INTERSPEECH-20, pp. 3261-3265.
-
Zhao Y., Wang D.L., Xu B., and Zhang T. (2020):
Monaural speech dereverberation using temporal convolutional networks with self attention.
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1598-1607.
-
Zhao Y., Wang Z.-Q., and Wang D.L. (2019):
Two-stage deep learning for noisy-reverberant speech enhancement.
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, pp. 53-62.
-
Zhao Y., Wang D.L., Johnson E.M., and Healy E.W. (2018):
A deep learning based segregation algorithm to increase speech intelligibility for
hearing-impaired listeners in reverberant-noisy conditions. Journal of the Acoustical Society of America, vol. 144, pp. 1627-1637.
-
Zhao Y., Wang D.L., Xu B., and Zhang T. (2018):
Late reverberation suppression using recurrent neural networks with long short-term memory. Proceedings of ICASSP-18, pp. 5434-5438.
- Zhao Y., Xu B., Giri R., and Zhang T. (2018):
Perceptually guided speech enhancement using deep neural networks. Proceedings of ICASSP-18, pp. 5074-5078.
-
Zhao Y., Wang Z.-Q., and Wang D.L. (2017):
A two-stage algorithm for noisy and reverberant speech enhancement. Proceedings of ICASSP-17, pp. 5580-5584.
-
Zhao Y., Wang D.L., Merks I., and Zhang T. (2016):
DNN-based enhancement of noisy and reverberant speech.
Proceedings of ICASSP-16, pp. 6525-6529.
-
Wang Z.-Q., Zhao Y., and Wang D.L. (2016):
Phoneme-specific speech separation.
Proceedings of ICASSP-16, pp. 146-150.
Service
Journal/Conference Reviewer:
- IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
- IEEE Transactions on Multimedia (TMM)
- JASA Express Letters (JASA-EL)
- Journal of Speech, Language, and Hearing Research (JSLHR)
- Computer Speech and Language
- ICASSP/INTERSPEECH/AAAI