近年来，随着机器语义理解需求的日益增长，知识图谱，即各类实体、概念及其之间的语义关系，日益成为大数据时代知识表示的主要形态之一。知识图谱为语义理解提供了丰富的背景知识，是实现下一代智能信息处理与机器智脑的关键核心基础技术。知识图谱已经在智慧搜索、文本理解等领域发挥巨大价值。 近期伴随着数据处理能力以及机器学习能力的进一步提高，以及知识图谱应用的进一步深化，知识图谱技术面临全新机遇。 同时，知识图谱的海量规模、异构来源、多模形态和复杂结构等特性也对知识图谱技术的发展提出了全新的挑战。
In recent years, we have witnessed the prosperity of a variety of knowledge graphs (KG) that contain various entities, concepts as well as the semantic relationships among them. The ever-increasing demand for semantic understanding of data drove the continuous research progress about KGs. KGs now have become one of the prominent representations of knowledge in big data era. KGs provide rich background knowledge for semantic understanding and consequently become the underlying technique in the next generation intelligent information processing and building smart machine brain. KGs have been successfully used in many real applications such as smart search and text understanding. Recently, with the enhanced ability of machines to learn from data and manage big data, many new opportunities to use KGs are emerging. Meanwhile, the research of KG also faces new challenges such as the web-scale, complicated structure, heterogeneous modeling of KGs.
In this workshop, we will invite more than 10 leading researchers around the world to give talks. This workshop aims to introduce the current progress of knowledge graph in both academia and industry, discuss the major research challenges, and design a future plan for knowledge graph research. We welcome all researchers and students to attend the workshop.
|8:45-9:15||Dr Haixun Wang (王海勋):Text Processing with Neural Network||汪卫|
|9:15-9:45||Prof. Seung-won Hwang :(Spatial) Entity Search and Intelligence|
|9:45-10:15||Prof. Yueguo Chen (陈跃国): A System for Entity Exploration and Debugging in Large-Scale Knowledge Graphs|
|10:30-11:00||Prof. Lei Zou (邹磊): gStoreD: A Distributed Graph-based RDF Triple Store||汪卫|
|11:00-11:30||Dr Zhongyuan Wang (王仲远): Conceptualization for Short Text Understanding|
|11:30-12:00||Dr Bin Shao (邵斌): Real-time knowledge graph serving|
|13:30-14:00||Prof. Jianyong Wang(王建勇): Entity Linking with a Knowledge Base for Heterogeneous Data||Dr Haixun Wang (王海勋)|
|14:00-14:30||Prof. Kenny Qili Zhu(朱其立): Representing verbs as argument concepts|
|14:30-15:00||Prof. Xin Lin(林欣):基于众包平台的不确定知识图谱清洗|
|15:15-15:45||Prof. Kewei Tu(屠可伟):基于随机文法的知识表示与学习||Dr Haixun Wang (王海勋)|
|15:45-16:15||Prof. Haofen Wang(王昊奋):浅析面向知识图谱构建的数据融合技术|
The deep learning tsunami continues to take over NLP. In many cases, by replacing hand-crafted and heavily engineered traditional NLP methods, it reduces development cost and achieves performance gain. In this talk, I will discuss deep learning approach for parsing, and describe its potential for handling many other text processing tasks.
Haixun Wang is a research scientist / Engineering manager at Facebook. Before Facebook, he is with Google Research, working on natural language processing. He led research in semantic search, graph data processing systems, and distributed query processing at Microsoft Research Asia. He had been a research staff member at IBM T. J. Watson Research Center from 2000 - 2009. He was Technical Assistant to Stuart Feldman (Vice President of Computer Science of IBM Research) from 2006 to 2007, and Technical Assistant to Mark Wegman (Head of Computer Science of IBM Research) from 2007 to 2009. He received the Ph.D. degree in computer science from the University of California, Los Angeles in 2000. He has published more than 150 research papers in referred international journals and conference proceedings. He served PC Chair of conferences such as CIKM’12 and he is on the editorial board of IEEE Transactions of Knowledge and Data Engineering (TKDE), and Journal of Computer Science and Technology (JCST). He won the best paper award in ICDE 2015, 10 year best paper award in ICDM 2013, and best paper award of ER 2009.
Knowledge graph has been lately considered as a machine intelligence platform for Web search and software in general. In this talk, I will present my recent research to automatically harvest, maintain,and integrate (spatial) data intelligence from the Web.
Seung-won Hwang is a Professor of Computer Science at Yonsei University. Prior to joining Yonsei, she had been an Associate Professor at POSTECH for 10 years, after her PhD in Computer Science from UIUC. Her recent research interest has been data(-driven) intelligence, led to 100+ publication at top-tier database/mining, AI, and NLP venues, including ACL, AAAI, SIGMOD, VLDB, ICDE, and WSDM (best paper)
Many large-scale machine-readable entity knowledge bases have emerged in recent years and have been shown very useful in building semantic search and deep Q/A systems. As an important tool to enrich the knowledge bases, entity linking can link entity mentions appearing in the Web text with their corresponding mapping entities in a knowledge base and has many applications in the fields of content analysis and business intelligence beyond knowledge base population. However, this task is challenging due to name variations and entity ambiguity. In this talk, we will introduce some of our efforts on entity linking for heterogeneous data, including entity linking for unstructured Web free text, entity linking for structured Web lists and Web tables, and entity linking for tweets, and discuss its various applications too.
Jianyong Wang is currently a professor in the Department of Computer Science and Technology, Tsinghua University, Beijing, China. He received the PhD degree in computer science in 1999 from the Institute of Computing Technology, Chinese Academy of Sciences. He was ever an assistant professor at Peking University, and visited Simon Fraser University, University of Illinois at Urbana-Champaign, and University of Minnesota at Twin Cities before joining Tsinghua University. His research interests mainly include data mining and Web information management. He has co-authored over 70 papers in some leading international conferences and journals. He ever served as a PC Co-Chair for WISE’15, BioMedCom'14, WAIM'13, ADMA'11, and NDBC'10, and is an associate editor of IEEE TKDE.
Verbs play an important role in the understanding of natural language text. This paper studies the problem of abstracting the subject and object arguments of a verb into a set of noun concepts, known as the “argument concepts”. This set of concepts, whose size is parameterized, represents the fine-grained semantic of a verb. For example, the object of “enjoy” can be abstracted into time, hobby and event, etc. We present a novel framework to automatically infer human readable and machine computable action concepts with high accuracy.
Kenny Qili Zhu is the Distinguished Research Professor (PhD advisor) at Department of Computer Science and Engineering of Shanghai Jiao Tong University. He graduated with B.Eng (Hons) in Electrical Engineering in 1999 and PhD in Computer Science in 2005 from National University of Singapore. He was a postdoctoral researcher and lecturer from 2007 to 2009 at Princeton University. Prior to that, he was a software design engineer at Microsoft, Redmond, WA. From Feb 2010 to Aug 2010, he was a visiting professor at Microsoft Research Asia in Beijing. Kenny's main research interests are data and knowledge engineering and programming languages. He has published extensively in databases, AI and programming languages at top venues. He has served on the PC of WWW, CIKM, ECML, COLING, SAC, WAIM, APLAS and NDBC, etc. His research has been supported by NSF China, MOE China, Microsoft, Google, Oracle, Morgan Stanley and AstraZeneca. Kenny is the winner of the 2013 Google Faculty Research Award and 2014 DASFAA Best Paper Award.
We propose techniques for processing SPARQL queries over a large RDF graph in a distributed environment. We adopt a “partial evaluation and assembly” framework. Answering a SPARQL query Q is equivalent to finding subgraph matches of the query graph Q over RDF graph G. Based on properties of subgraph matching over a distributed graph, we introduce local partial match as partial answers in each fragment of RDF graph G. For assembly, we propose two methods: centralized and distributed assembly. We analyze our algorithms from both theoretically and experimentally. Extensive experiments over both real and benchmark RDF repositories of billions of triples in a cluster of 10-50 machines confirm that our method is superior to the state-of-the-art methods in both the system’s performance and scalability
Lei Zou received his BS degree and Ph.D. degree in Computer Science at Huazhong University of Science and Technology (HUST) in 2003 and 2009, respectively. He received a CCF (China Computer Federation) Doctoral Dissertation Nomination Award in 2009 and won Second Class Prize of CCF Natural Science Award in 2014. Since September 2009, he joined Institute of Computer Science and Technology (ICST) of Peking University (PKU) as a faculty member. He has been an associate professor in PKU since August 2012. Before joining PKU, he visited Hong Kong University of Science and Technology (HKUST) and University of Waterloo (UW) during 2007 and 2008, respectively. His recent research interests include graph databases, RDF knowledge graph, particularly in graph-based RDF data management. He has published more than 30 papers, including more than 15 papers published in reputed journals and major international conferences, such as SIGMOD, VLDB, ICDE, TKDE, VLDB Journal. His personal homepage is at http://www.icst.pku.edu.cn/intro/leizou/index.html.
The acquisition of knowledge becomes scalable. The machine-readable knowledge keeps its pace with the phenomenal “Big Data” era. On the one hand, we have a revolutionary way of piling up knowledge; on the other hand, the technology of making the knowledge graph accessible, i.e. how to serve the knowledge to support real-life applications, evolves slowly. Due to the great connectedness, knowledge data by its very nature is a complex entity graph with rich schemata. This talk presents our efforts of serving real-world knowledge graphs for real-time query processing at scale.
Bin Shao is a lead researcher at Microsoft Research (Beijing, China). He joined Microsoft after receiving his Ph.D. degree from Fudan University in July 2010. Bin Shao is the architect and a core developer of Microsoft Graph Engine, which is a distributed, in-memory, large graph processing engine. His research interests include in-memory databases, distributed systems, graph query processing, and concurrency control algorithms.
Integrating, representing, and reasoning over human knowledge is a computational grand challenge for the 21st century. Currently, most IR approaches are keyword-based statistical approaches. When the input is sparse, noisy, and ambiguous, knowledge is needed to fill the gap. In this talk, I will focus on knowledge powered short text understanding. I will introduce the Probase project at Microsoft Research, whose goal is to enable machines to understand human communications. Probase is a universal, probabilistic semantic network. It contains millions of concepts, harnessed automatically from a corpus of billions of web pages. It enables probabilistic interpretations of search queries, document titles, ad keywords, etc. The probabilistic nature also enables it to incorporate heterogeneous information naturally. I will introduce the core technique called Conceptualization we develop on this probabilistic semantic network. The goal of conceptualization is to infer concepts in the text. I will show how we leverage conceptualization to improve current web search, ads matching, query recommendation, etc.
Dr. Zhongyuan Wang is a Researcher at Microsoft Research Asia (MSRA). He leads two projects at MSRA: Enterprise Dictionary (knowledge mining from Enterprise) and Probase (knowledge mining from Web). He received his master’s degree and bachelor's degree in computer science at Renmin University in 2010 and 2007 respectively. Zhongyuan Wang won Wu Yuzhang Scholarship (Top-level Scholarship at Renmin University), Kwang-Hua Scholarship, and ACM SIGMOD07 Undergraduate Scholarship (one of the seven winners all over the world) in the university. After he graduated from RUC, he joined MSRA as a Research Software Development Engineer, and then became an Associate Researcher. Until now, Zhongyuan Wang has published 10+ papers (including ICDE 2015 Best Paper) in the leading international conferences, such as VLDB, ICDE, CIKM, etc. He is also the translator of the book “Windows Phone 7 Programming for Android and iOS Developers”, published in 2012, and the co-author of the book “Web Data Management: Concepts and Techniques”, published in 2014. He guided 30+ interns, who got PhD offers from Harvard, Yale, CMU, UW, etc. His research interests include knowledge base, web data mining, semantic network, machine learning, and natural language processing.
Large-scale knowledge graphs (KGs) contain rich entities and abundant relationships among the entities. Data exploration over KGs allow users to browse the attributes of entities as well as the relations among entities. It therefore provides a good way of learning the structure and coverage of KGs. In this talk, we introduce a system called SEED that is designed to support entity-oriented exploration in large-scale KGs, based on retrieving similar entities of some seed entities as well as their semantic relations that show how entities are similar to each other. A by-product of entity exploration in SEED is to facilitate discovering the deficiency of KGs, so that the detected bugs can be easily fixed by users as they exploring the KGs.
陈跃国，博士，中国人民大学副教授，博士生导师，中国计算机学会大数据专家委员会通讯委员、YOSCEF委员。2009年博士毕业于新加坡国立大学。目前研究方向是大数据实时分析系统和知识图的探索式搜索。在TKDE、ICDE、AAAI、EDBT、CIKM等国内外学术期刊和学术会议上发表论文20余篇。先后承担了国家自然科学基金青年项目和面上项目各一项, 广东省重大科技项目《高通量大数据实时商业智能系统产业化实现》，中国人民大学团队预研项目《面向社会化服务的大数据管理关键技术研究》。组织了WISE2013实体标注比赛、搜狗-中国数据库年会知识抽取比赛、2015中国青年大数据创新大赛。担任VLDBJ, TKDE, ICDE, WWW, CIKM等期刊和会议的审稿，FCS青年编委。
男,博士,出生于1981年7月, 现担任华东师范大学信息科学技术学院副教授。 目前主要致力于新型数据管理研究。先后在该领域发表论文30余篇,其中近三年在中国计算机学会推荐的A类顶级期刊TKDE和A类会议ICDE发表论文4篇。2011年 入选首批“香江学者计划”,赴香港浸会大学从事为期2年的访问研究。2014年回国后入选上海市“浦江人才计划”。现担任SCI杂志《Frontier of Computer Science》青年副主编,担任TKDE、TPDS等权威学术期刊的审稿人,并多次担任WAIM,ICPADS等国际会议的PC member。
随着2012年谷歌提出知识图谱的概念并将其成功应用于Web搜索，知识图谱和语义技术正受到越来越多的学术界和工业界的重视。其中，如何将来自不同源的异构知识融合在一起形成更完整的知识库已称为研究热点。本次报告将分享我在构建知识图谱（尤其是中文知识图谱）过程中所面对的融合挑战和相应的解决方法。具体来说，我将介绍Zhishi.me（第一份中文开放数据）中涉及的实体匹配（Instance matching），在线知识图谱高效聚合（online knowledge graph aggregation)，和跨语言的模式映射（cross-lingual schema mapping）等多个融合算法。
王昊奋，2013年从上海交通大学获得工学博士学位，目前担任华东理工大学讲师。他同时担任计算机技术研究所所长助理和自然语言处理与大数据挖掘研究室副主任等职务。王昊奋在语义技术和图数据管理方面有比较丰富的经验和积累，共发表40余篇高水平论文，其中包括20余篇CCF A类和B类论文。作为技术负责人，他带领团队构建的语义搜索系统在十亿三元组挑战赛（Billion Triple Challenge）中获得全球第2名的好成绩；在著名的本体匹配竞赛OAEI的实体匹配任务中获得全球第1名的好成绩。他带领团队构建了第一份中文语义互联知识库zhishi.me，被邀请参加W3C的multilingual研讨会并做报告。此外，他还作为组织者组织了3届语义搜索研讨会（WWW Workshop SemSearch09, SemSearch10和SemSearch11）和国际语义Web顶级会议ISWC 2010，并长期作为ISWC, WWW, AAAI等国际顶级会议程序委员会委员。他还带领团队参加了百度大数据知识挖掘并连续两年获得第一名的好成绩。他主持并参与了多项国家自然科学基金、863国家项目、国家科技支撑相关项目。在就读博士期间，他连续两年获得IBM全球博士精英奖，并深入参与了IBM Watson系统的研发工作。目前，王昊奋是CCF YOCSEF上海学术委员，中文信息学会语言与知识计算委员会委员，NLPCC 2015知识图谱方向主席，并担任CCF ADL55期知识图谱讲师等社会职位。
Yanghua Xiao got his PHD degree in software theory from Fudan University, Shanghai, China, in 2009. He now is an associate professor of computer science at Fudan University. He is one of young 973 scientists. His research interest includes big data management and mining, graph database, knowledge graph. He was a visiting professor of Human Genome Sequencing Center at Baylor College Medicine, and visiting researcher of Microsoft Research Asia. He won the Best Phd Thesis Nomination of CCF (Chinese Computer Federation) in year 2010，CCF2014 Natural Science Award (second level), ACM(CCF) Shanghai distinguished young scientists nomination award. Recently, he has published more than 50 papers in international leading journals and top conferences, including Pattern Recognition, Physical Review E, Plos One, Information System, Computers & Mathematics with Applications, BCM Systems Biology, Physica A; SIGMOD,VLDB, ICDE, IJCAI, AAAI, EDBT, ICSE, OOPSLA, WWW, ICDM, SDM, ECML/PKDD, ICWS, ICSM, CIKM, DASFAA, COMPSAC, SSDBM, etc. He is the PI or Co-PI of more than ten projects supported by Natural Science Foundation of China, Ministry of Education of China, Shanghai Municipal Science and Technology Commission, Microsoft, IBM, China Telecom, Baidu etc. He regularly serves as the reviewer of Natural Science Foundation of China, PC member of IJCAI, SIGKDD, ICDE, WWW, CIKM, ICDM, COLING，WAIM, GDM, etc, Associate Editor of Frontier of Computer Science, and reviewers of leading journals such as Plos One, IEEE Tansaction on Computers, TKDE, KIS, WWW Journal, JCST, Physica A, IEEE Intelligent System, BMC Bioinformatic, Distributed and Parallel Database etc. He is a member of ACM, IEEE and senior member of CCF. He is the director of GDM@FUDAN.