【学术会议】Text Representation Learning and Pre-trained Models
主题: Text Representation Learning and Pre-trained Models
报告人: Prof. Yan SONG, CUHK-Shenzhen
时间: 12:00 pm - 01:00 pm, Wednesday, November 18, 2020
Current natural language processing (NLP) requires to properly represent text for neural models so as to improve NLP applications accordingly. Neural text representation models have been developed and contribute so much to the NLP community in the past decade, including word embeddings, context-aware embeddings as well as current pre-trained models. Especially nowadays, pre-trained models become the most prevailing technique that helps researchers and engineers not only obtain the state-of-the-art performance in many NLP tasks, but also simplify NLP process to a two-stage paradigm. This tutorial revisits the development of text representation techniques with some detailed descriptions on particular models, and analyze how the pre-trained models are learned with their resource requirement (which explains that why NLP becomes an arm-race among giant companies in current AI fields).
Prof. Yan SONG joined CUHK-SZ in 2020 and is current an associate professor in SDS. His research focus includes Natural Language Processing (NLP), Information Retrieval and Extraction, and Text Representation Learning. He has published extensively in world leading AI academic conferences and journals, such as the Annual Meeting of the Association for Computational Linguistics (ACL), American Association for Artificial Intelligence (AAAI), Empirical Methods in Natural Language Processing (EMNLP), International Joint Conference on Artificial Intelligence (IJCAI), Computational Linguistics Journal, etc. Apart from his academic publications, Prof. Song also has rich working and researching experiences: he joined Microsoft Research Asia as Visiting Researcher in 2010, and the main contribution of his work is the first large-scale CCG treebank and parser for Chinese. He went to University of Washington as the Visiting Scholar from 2011 to 2012, and later joined Microsoft AI & Research as a researcher (he was one of the founding members of Microsoft XiaoIce). From 2017 to 2019, he was the principal researcher in the natural language processing center in Tencent AI Lab. A representative work he led during this period is the Large-scale Chinese word embeddings, an open-source embedding resource for Chinese language that filled the blank in this area, covering over 8 million words. This embedding resource is awarded as one of 10 AI open source datasets in 2018.