宋彦

副教授

教育背景

博士(香港城市大学)

硕士(沈阳航空航天大学)

学士(湘潭大学)

研究领域
自然语言处理,信息检索与提取,文本表征学习
个人网站
电子邮件
songyan@cuhk.edu.cn
个人简介

宋彦教授现为香港中文大学(深圳) 副教授。宋教授于2004年获得湘潭大学物理学学士学位,于2008年获得沈阳航空航天大学计算机科学硕士学位,后于2013年获得香港城市大学计算语言学博士学位;并于2019年至今担任华盛顿大学客座教授。

宋教授的研究方向包括自然语言处理、信息检索和抽取、文本表征学习等。其著作多次被国际权威组织或会议收录,如国际计算语言学协会(ACL),美国人工智能协会(AAAI),自然语言处理的经验方法会议(EMNLP),国际人工智能联合会议(IJCAI)等等。

除学术论文著作颇丰外,宋教授还拥有丰富的实践经历。他于2010年在微软亚洲研究院担任访问研究员,参与构建了第一个大规模中文组合范畴语法树库和语法分析器;于2011至2012年担任华盛顿大学访问学者;后于2013至2017年加入微软人工智能研究中心,成为“微软小冰”项目的创始人之一;在2017到2019年间,他加入腾讯人工智能实验室,作为自然语言理解(NLU)团队首席研究员,领导构建了腾讯AI Lab大规模中文词向量数据集(包括800万中文词),该数据集成为2018年十大人工智能开源数据集。目前,宋教授正作为研究科学家,于深圳市大数据研究院进行医疗文本表征学习研究。

学术著作

1. Yuanhe Tian, Yan Song, Fei Xia, Tong Zhang and Yonggang Wang. 2020. Improving Chinese Word Segmentation with Wordhood Memory Networks. in ACL-2020

2. Yuanhe Tian, Yan Song, Xiang Ao, Fei Xia, Xiaojun Quan, Tong Zhang and Yonggang Wang. 2020. Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-way Attentions of Auto-analyzed Knowledge. in ACL-2020 

3. Kun Li, Chengbo Chen, Xiaojun Quan, Qing Ling and Yan Song, 2020. Conditional Augmentation for Aspect Term Extraction via Masked Sequence-to-Sequence Generation. in ACL-2020

4. Kun Xu, Linfeng Song, Yansong Feng, Yan Song, Dong Yu. 2020. Coordinated Reasoning for Cross-Lingual Knowledge Graph Alignment. in AAAI-2020

5. Xintong Yu, Hongming Zhang, Yangqiu Song, Yan Song, Changshui Zhang. 2019. What You See is What You Get: Visual Pronoun Coreference Resolution in Dialogues. in EMNLP-2019

6. Ling Luo, Xiang Ao, Yan Song, Feiyang Pan, Min Yang, Qing He. 2019. Reading Like HER: Human Reading Inspired Extractive Summarization. in EMNLP-2019

7. Hongming Zhang, Jiaxin Bai, Yan Song, Kun Xu, Changlong Yu, Yangqiu Song, Wilfred Ng, Dong Yu. 2019. Multiplex Word Embeddings for Selectional Preference Acquisition. in EMNLP-2019

8. Shizhe Diao, Jiaxin Bai, Yan Song, Tong Zhang, Yonggang Wang. 2019. ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations. arXiv preprint:1911.00720

9. Miaofeng Liu, Yan Song, Hongbin Zou, Tong Zhang. 2019. Reinforced Training Data Selection for Domain Adaptation. in ACL-2019

10. Hongming Zhang, Yan Song, Yangqiu Song, Dong Yu. 2019. Knowledge-aware Pronoun Coreference Resolution. in ACL-2019

11. Kun Xu, Liwei Wang, Mo Yu, Yansong Feng, Yan Song, Zhiguo Wang, Dong Yu. 2019. Cross- lingual Knowledge Graph Alignment via Graph Matching Neural Network. in ACL-2019

12. Ling Luo, Xiang Ao, Yan Song, Jinyao Li, Xiaopeng Yang, Qing He, Dong Yu. 2019. Unsupervised Neural Aspect Extraction with Sememes. In IJCAI-2019

13. Zhaofeng Wu, Yan Song, Sicong Huang, Yuanhe Tian, and Fei Xia. 2019. A Hybrid Approach to Biomedical Natural Language Inference. In BioNLP-2019

Our NLI system ranked the First among 42 teams in the MEDIQA-2019 medical natural language inference shared-task.

14. Yuanhe Tian, Weicheng Ma, Fei Xia, Yan Song. 2019. ChiMed: A Chinese Medical Corpus for Question Answering. in BioNLP-2019

15. Hongming Zhang, Yan Song, Yangqiu Song. 2019. Incorporating Context and External Knowledge for Pronoun Coreference Resolution. in NAACL-2019

16. Guoyin Wang, Yan Song, Dong Yu. 2019. Learning Word Embeddings with Domain Awareness. in arXiv preprint:1906.03249

17. Jing Li, Yan Song, Zhongyu Wei, KF Wong. 2018. A Joint Model of Conversational Discourse and Latent Topics on Microblogs. Computational Linguistics, Sep, 1-51

18. Dingmin Wang, Yan Song, Jing Li, Jialong Han, and Haisong Zhang. 2018. A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check. in EMNLP-2018.

19. Jichuan Zeng, Jing Li, Yan Song, Cuiyun Gao, Michael R. Lyu, and Irwin King. 2018. Topic Memory Networks for Short Text Classification. in EMNLP-2018.

20. Juntao Li, Yan Song, Haisong Zhang, Dongmin Chen, Shuming Shi, and Rui Yan. 2018. Generating Classical Chinese Poems via Conditional Variational Autoencoder and Adversarial Training. in EMNLP-2018.

21. Xiuying Chen, Shen Gao, Chongyang Tao, Yan Song, Dongyan Zhao, and Rui Yan. 2018. Iterative Document Representation Learning Towards Summarization with Polishing. In EMNLP-2018.

22. Yan Song, Shuming Shi. 2018. Complementary Learning of Word Embeddings, in IJCAI-2018.

23. Yan Song, Shuming Shi, Jing Li. 2018. Joint Learning Embeddings for Chinese Words and their Components via Ladder Structured Networks, in IJCAI-2018.

24. Jialong Han, Yan Song, Xin Zhao, Shuming Shi, Haisong Zhang. 2018. hyperdoc2vec: Distributed Representations of Hypertext Documents, in ACL-2018.

25. Yingyi Zhang, Jing Li, Yan Song, Chengzhi Zhang. 2018. Encoding Conversation Context for Neural Keyphrase Extraction from Microblog Posts, in NAACL-2018.

26. Yan Song, Shuming Shi, Jing Li, Haisong Zhang. 2018. Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings, in NAACL-2018.

27. Jing Li, Yan Song, Haisong Zhang, Shuming Shi. 2018. A Manually Annotated Chinese Corpus for Non-task-oriented Dialogue Systems, in arXiv preprint: 1805.05542

28. Nan Wang, Yan Song, Fei Xia. 2018. Constructing a Chinese Medical Conversation Corpus Annotated with Conversational Structures and Actions, in LREC-2018

29. Nan Wang, Yan Song, Fei Xia. 2018. Coding Structures and Actions with the COSTA Scheme in Medical Conversations. in BioNLP-2018

30. Miaofeng Liu, Jialong Han, Haisong Zhang, Yan Song. 2018. Domain Adaptation for Disease Phrase Matching with Adversarial Networks. in BioNLP-2018

31. Haisong Zhang, Zhangming Chan, Yan Song, Dongyan Zhao, Rui Yan. 2018. When Less Is More: Using Less Context Information to Generate Better Utterances in Group Conversations. in NLPCC-2018

32. Yan Song, Chia-Jung Lee, Fei Xia. 2017. Learning Word Representations with Regularization from Prior Knowledge, in CoNLL-2017.

33. Yan Song, Chia-Jung Lee. 2017. Learning User Embeddings from Emails. in EACL-2017

34. Yan Song, Chia-Jung Lee. 2017. Embedding Projection for Query Understanding. in WWW- 2017

35. Chang-Ning Huang, Yan Song. 2015. Chinese CCGbank Construction from Tsinghua Chinese Treebank, in Journal of Chinese Linguistics Monograph Series. 274-311.

36. Yan Song and Fei Xia, 2014. Modern Chinese Helps Archaic Chinese Processing: Finding and Exploiting the Shared Properties, in LREC-2014.

37. Jingfei Du, Yan Song, Chi-Ho Li, Perceptron-based Tagging of Query Boundaries for Chinese Query Segmentation, In WWW-2014.

38. Yan Song and Fei Xia, 2013. A Common Case of Jekyll and Hyde: the Synergistic Effect of Using Divided Source Training Data for Feature Augmentation, in IJCNLP-2013.

39. Xiaojun Quan, Chunyu Kit, Yan Song. 2013. Non-Monotonic Sentence Alignment via Semisupervised Learning, in Proceedings of ACL-2013.

40. Yan Song, Prescott Klassen, Fei Xia, Chunyu Kit. 2012. Entropy-based Training Data Selection for Domain Adaptation, In COLING-2012.

41. Yan Song, Fei Xia. 2012. Using a Goodness Measurement for Domain Adaptation: A Case Study on Chinese Word Segmentation, In LREC-2012.

42. Yan Song, Chang-Ning Huang, Chunyu Kit. 2012. Construction of Chinese CCGbank, in Journal of Chinese Information Processing, vol. 26(3). 2012.

43. Chang-Ning Huang, Yan Song. 2011. Chinese CCGbank Construction from Tsinghua Chinese Treebank. In Proceedings of the Roundtable Conference on Linguistic Corpus and Corpus Linguistics in the Chinese Context. Hong Kong, May, 2011.

44. Bin Lu, Yan Song, Xing Zhang and Benjamin K. Tsou. Learning Chinese Polarity Lexicons By Integration of Graph Models and Morphological Features. in Information Retrieval Technology (AIRS-2010). Lecture Notes in Computer Science, vol.6458. pp. 466-477. 2010.

45. Xing Zhang, Yan Song, Alex Chengyu Fang. 2010. Conditional Random Fields for Term Extraction. in KDIR-2010

46. Xing Zhang, Yan Song, Alex Chengyu Fang. 2010. Term recognition using conditional random fields. in NLPKE-2010

47. Xing Zhang, Yan Song, A.C.Fang. 2010. How Well Conditional Random Fields Can be Used in Novel Term Recognition. in Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation (PACLIC-2010). Sendai, Japan, 4-7 Nov 2010. (Best student paper).

48. Yan Song, Chunyu Kit and Hai Zhao. 2010. Reranking with multiple features for better transliteration. In NEWS-2010. Our transliteration system ranked the First in all Chinese related tasks in the NEWS 2010 machine transliteration evaluation.

49. Hai Zhao, Yan Song and Chunyu Kit. 2010. How Large a Corpus Do We Need: Statistical Method Versus Rule-based Method, in LREC-2010.

50. Cong Hui, Hai Zhao, Yan Song, Bao-Liang Lu. 2010. An empirical study on development set selection strategy for machine translation learning. in WMT-2010

51. Yan Song, Chunyu Kit. 2010. Does joint decoding really outperform cascade processing in English-to-Chinese transliteration generation? The role of syllabification. in ICMLC-2010

52. Yan Song and Chunyu Kit. 2010. PCFG parsing with CRF tagging for head recognition, in Proceedings of CIPS-ParsEval-2009, pp.133-137. Nov, 2009.

53. Yan Song, Dongfeng Cai, Guiping Zhang, Hai Zhao, 2009. An Approach to Chinese Word Segmentation based on Character-Word Joint Decoding, Journal of Software, 20 (9), 2236-2376.

54. Yan Song, Chunyu Kit and Xiao Chen. 2009. Transliteration of Name Entity via Improved Statistical Translation on Character Sequences, in Proceedings of the 2009 ACL Named Entities Workshop: Shared Task on Transliteration (NEWS 2009), pp.57-60. Aug, 2009.

55. Hai Zhao, Yan Song, Chunyu Kit, Guodong Zhou, 2009. Cross Language Dependency Parsing using a Bilingual Lexicon, in ACL-2009.

56. Yan Song, Chunyu Kit, Ruifeng Xu, Hai Zhao. 2009. How Unsupervised Learning Affects Character Tagging based Chinese Word Segmentation: A Quantitative Investigation. in ICMLC- 2009.

57. Yan Song, Guiping Zhang, Dongfeng Cai. 2007. N-gram based Sentence Similarity Computation, in CNCCL-2007.

58. Guiping Zhang, Chao Yu, Dongfeng Cai, Yan Song, Jingguang Sun. 2006. Research on concept-sememe tree and semantic relevance computation. In PACLIC-2006.

59. Yan Song, Jiaqing Guo, Dongfeng Cai, 2006. Chinese Word Segmentation based on an Approach of Maximum Entropy Modeling, in Proc. of Fifth SIGHAN Workshop on Chinese Language Processing.