[目的/意义]情报学研究方法是该领域的重要研究方向之一,构建细粒度研究方法语料库并进行研究方法实体抽取,有助于学者快速了解该领域的研究方法,探索方法演变及其未来发展趋势,为后续数字化浪潮下实现研究方法语料库的服务与应用奠定基础。[方法/过程]首先,文章以《情报学报》2000—2023年发表的学术论文为数据来源,从中随机抽取50篇并对其中的研究方法实体进行人工标注,将其作为实体抽取训练语料;其次,对BERT-base-chinese和Chinese-BERT-wwm-ext 2种模型进行训练并选择性能较优的模型进行研究方法实体抽取;最后,根据较优实体抽取模型,从未标注语料中抽取细粒度研究方法实体。[结果/结论]文章构建了一个包含理论实体、方法实体、数据集实体、指标实体、工具实体和其他实体6类情报学细粒度研究方法标注语料库。在基于人工标注语料对实体抽取模型进行训练的任务中,Chinese-BERT-wwm-ext模型表现更佳,准确率、召回率和F1值分别为0.808 2、0.846 7和0.827 0。此外,文章对研究方法实体及其类别进行分析,发现情报学研究方法日益多样化,新兴技术与传统方法并存、各有优势。
[Purpose/significance]Research methods in information science are one of the critical research directions in this field. Constructing a fine-grained research method corpus and extracting research method entities can help scholars quickly understand the research methods in this field, explore the evolution of methods and their future development trends, and lay the foundation for the service and application of the research method corpus in the subsequent digital wave. [Method/process]Firstly, based on academic articles published in the Journal of the China Society for Scientific and Technical Information from 2000 to 2023, this study randomly selected 50 articles and manually annotated the research methodology entities within them, using these as the training corpus for entity extraction. Secondly, two models, BERT-base-chinese and Chinese-BERT-wwm-ext, were selected for entity extraction, and the model with superior performance was chosen as the final entity extraction model for this study. [Result/conclusion]This paper constructs a fine-grained research method annotation corpus of informatics that includes six types of entities: theoretical entity, method entity, dataset entity, indicator entity, tool entity, and other entities. In the task of training an entity extraction model based on manually annotated corpora, the Chinese-BERT-wwm-ext model performed better, with an accuracy rate, recall rate, and F1 score of 0.808 2, 0.846 7, and 0.827 0, respectively. Furthermore, this paper conducts an analysis of the research method entities and their categories, discovering that research methodologies in information science are becoming increasingly diverse, with emerging technologies coexisting alongside traditional methods, each showcasing their unique strengths.