BFS-CTC汉语句义结构标注语料库构建方法

Method of Building BFS-CTC: a Chinese Tagged Corpus of Sentential Semantic Structure

  • 摘要: 根据现代汉语语义学,构建了一种层次化的句义结构模型. 基于该模型构建了汉语句义结构标注语料库(Beijing forest studio-Chinese tagged corpus, BFS-CTC). 利用自行开发的标注和管理工具,对模型中各个句义成分及其组合关系进行快速标注,降低培训工作量和标注成本. BFS-CTC涵盖了6种句式类型,约1万句,提供了符合现有规范的词法和句法标注信息与自定义规范的句义结构标注信息,便于词法、句法和句义的对照分析研究,以及语料的综合使用和横向分析. 此外,BFS-CTC还具有较强的可扩展性,可在核心标注库基础上扩展生成其它扩展库和标注资源.

     

    Abstract: Based on the modern Chinese semantics, a Chinese sentential semantic mode is built, and then a Chinese tagged corpus, BFS-CTC (Beijing forest studio-Chinese tagged corpus), is built according to the Chinese sentential semantic mode. There are more than ten thousand sentences in the corpus, and the corpus contains six kinds of Chinese syntactic types. Tagging the sentence quickly and conveniently could be implemented by using the self-developed tools. BFS-CTC provides lexical, syntactic and sentential semantic structure tagging information, so that it could be used in comparative analysis of syntactic and semantic, or used for horizontal analysis. In addition, the corpus has good scalability, and it could generate more targeted extension tagged banks.

     

/

返回文章
返回
Baidu
map