情报学细粒度研究方法抽取研究

郝家亿 王玉琢 章成志

科技情报研究 ›› 2025, Vol. 7 ›› Issue (1) : 16-29.

PDF(4540 KB)
PDF(4540 KB)
科技情报研究 ›› 2025, Vol. 7 ›› Issue (1) : 16-29. DOI: 10.19809/j.cnki.kjqbyj.2025.01.002
研究方法专题

情报学细粒度研究方法抽取研究

  • 郝家亿1 王玉琢2 章成志1
作者信息 +

Extraction of Fine-grained Research Methods in the Field of Information Science

  • HAO Jiayi1, WANG Yuzhuo2, ZHANG Chengzhi1
Author information +
文章历史 +

摘要

[目的/意义]情报学研究方法是该领域的重要研究方向之一,构建细粒度研究方法语料库并进行研究方法实体抽取,有助于学者快速了解该领域的研究方法,探索方法演变及其未来发展趋势,为后续数字化浪潮下实现研究方法语料库的服务与应用奠定基础。[方法/过程]首先,文章以《情报学报》2000—2023年发表的学术论文为数据来源,从中随机抽取50篇并对其中的研究方法实体进行人工标注,将其作为实体抽取训练语料;其次,对BERT-base-chinese和Chinese-BERT-wwm-ext 2种模型进行训练并选择性能较优的模型进行研究方法实体抽取;最后,根据较优实体抽取模型,从未标注语料中抽取细粒度研究方法实体。[结果/结论]文章构建了一个包含理论实体、方法实体、数据集实体、指标实体、工具实体和其他实体6类情报学细粒度研究方法标注语料库。在基于人工标注语料对实体抽取模型进行训练的任务中,Chinese-BERT-wwm-ext模型表现更佳,准确率、召回率和F1值分别为0.808 2、0.846 7和0.827 0。此外,文章对研究方法实体及其类别进行分析,发现情报学研究方法日益多样化,新兴技术与传统方法并存、各有优势。

Abstract

[Purpose/significance]Research methods in information science are one of the critical research directions in this field. Constructing a fine-grained research method corpus and extracting research method entities can help scholars quickly understand the research methods in this field, explore the evolution of methods and their future development trends, and lay the foundation for the service and application of the research method corpus in the subsequent digital wave. [Method/process]Firstly, based on academic articles published in the Journal of the China Society for Scientific and Technical Information from 2000 to 2023, this study randomly selected 50 articles and manually annotated the research methodology entities within them, using these as the training corpus for entity extraction. Secondly, two models, BERT-base-chinese and Chinese-BERT-wwm-ext, were selected for entity extraction, and the model with superior performance was chosen as the final entity extraction model for this study. [Result/conclusion]This paper constructs a fine-grained research method annotation corpus of informatics that includes six types of entities: theoretical entity, method entity, dataset entity, indicator entity, tool entity, and other entities. In the task of training an entity extraction model based on manually annotated corpora, the Chinese-BERT-wwm-ext model performed better, with an accuracy rate, recall rate, and F1 score of 0.808 2, 0.846 7, and 0.827 0, respectively. Furthermore, this paper conducts an analysis of the research method entities and their categories, discovering that research methodologies in information science are becoming increasingly diverse, with emerging technologies coexisting alongside traditional methods, each showcasing their unique strengths.

关键词

情报学 / 研究方法语料库 / 研究方法实体 / 研究方法识别 / 细粒度研究方法

Key words

information science / corpus of research method / research method entities / research method recognition / fine-grained research method

引用本文

导出引用
郝家亿 王玉琢 章成志. 情报学细粒度研究方法抽取研究[J]. 科技情报研究, 2025, 7(1): 16-29 https://doi.org/10.19809/j.cnki.kjqbyj.2025.01.002
HAO Jiayi, WANG Yuzhuo, ZHANG Chengzhi. Extraction of Fine-grained Research Methods in the Field of Information Science[J]. Scientific Information Research, 2025, 7(1): 16-29 https://doi.org/10.19809/j.cnki.kjqbyj.2025.01.002
中图分类号: G350   

参考文献

[1] 林定夷.问题与科学研究:问题学之探究[M].广州:中山大学出版社,2006.
[2] 王芳,王向女.我国情报学研究方法的计量分析:以1999~2008年《情报学报》为例[J].情报学报,2010,29(04):652-662.
[3] 杨光.基于生物医学领域知识增强的实体识别和关系抽取研究[D].上海:华东师范大学,2023.
[4] MONDAL I,HOU Y,JOCHIM C.End-to-End Construction of NLP Knowledge Graph[C]//Findings of the Association for Computational Linguistics:ACL-IJCNLP 2021.Online:Association for Computational Linguistics,2021:1885-1895.
[5] 胡乔,赵春江,吴华瑞,等.结合对抗训练和注意力机制的蔬菜种植领域命名实体识别[J/OL].计算机工程与应用,https://link.cnki.net/urlid/11.2127.tp.20240401.1652.006.
[6] 谢雨欣.学术文献全文内容中细粒度方法实体的抽取及其关联与演化研究[D].南京:南京理工大学,2024.
[7] 马费成,李志元.中国当代情报学的起源及发展[J].情报学报,2021,40(05):547-554.
[8] CHU H,KE Q.Research methods:What’s in the name?[J].Library and Information Science Research,2017,39(04):284-294.
[9] HOWISON J,BULLARD J.Software in the scientific literature:Problems with seeing,finding,and using software mentioned in the biology literature[J].Journal of the Association for Information Science and Technology,2016,67(09):2137-2155.
[10] KOVAČEVIĆ A,KONJOVIĆ Z,MILOSAVLJEVIĆ B,et al.Mining methodologies from NLP publications:A case study in automatic terminology recognition[J].Computer Speech & Language,2012,26(02):105-126.
[11] KEVIN H,SIMONE T.Identifying problems and solutions in scientific text[J].Scientometrics,2018,116(02):1367-1382.
[12] 李秋荣,刘晓晓,王波,等.滑坡地质灾害语料库构建与命名实体识别[J/OL].南京信息工程大学学报,https://doi.org/10.13878/j.cnki.jnuist.20240429001.
[13] 章成志,王玉琢,王如萍.情报学方法语料库构建[J].科技情报研究,2020,2(01):30-45.
[14] 章成志,谢雨欣,张恒.学术文献全文内容中的方法实体细粒度抽取及演化分析研究[J].情报学报,2023,42(08):952-966.
[15] QASEMIZADEH B,SCHUMANN A K.The ACL RD-TEC 2.0:A Language Resource for Evaluating Term Extraction and Entity Recognition Methods[C]//In 10th edition of the Language Resources and Evaluation Conference (LREC).Portorož,Slovenia:2016:1862-1868.
[16] 李贺,杜杏叶.基于知识元的学术论文内容创新性智能化评价研究[J].图书情报工作,2020,64(01):93-104.
[17] ZHANG Z,TAM W,COX A.Towards automated analysis of research methods in library and information science[J].Quantitative Science Studies,2021,2(02):698-732.
[18] 化柏林.学术论文中方法知识元的类型与描述规则研究[J].中国图书馆学报,2016,42(01):30-40.
[19] 庞瑞欣,李秀霞.基于知识元迁移的学科领域方法库构建研究[J].情报理论与实践,2024,47(05):204-212.
[20] 陈伟,吴友政,陈文亮,等.基于BiLSTM-CRF的关键词自动抽取[J].计算机科学,2018,45(S1):91-96,113.
[21] ZHANG C,TIAN L.Non-synchronism in global usage of research methods in library and information science from 1990 to 2019[J].Scientometrics,2023,128(07):3981-4006.
[22] 胡潜,吴茜,陈漳尧,等.融合预训练和深度学习的图书功用分类研究[J].情报理论与实践,2023,46(06):155-160.
[23] 南京大学.中文社会科学引文索引(2021—2022)收录来源期刊目录[EB/OL].(2021-04-25)[2024-08-08].https://cssrac.nju.edu.cn/cpzx/zwshkxywsy/20210425/i198393.html.
[24] 中国知网(CKNI).期刊导航[EB/OL].(2023-11-09)[2024-08-08].https://navi.cnki.net/knavi/journals/index.
[25] 章成志,张颖怡.基于学术论文全文的研究方法实体自动识别研究[J].情报学报,2020,39(06):589-600.
[26] 赵洪,王芳.理论术语抽取的深度学习模型及自训练算法研究[J].情报学报,2018,37(09):923-938.
[27] COHEN J.A Coefficient of Agreement for Nominal Scales[J].Educational and Psychological Measurement,1960,20(01):37-46.
[28] DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Minneapolis,Minnesota:2019:4171-4186.
[29] CUI Y,CHE W,LIU T,et al.Pre-Training with Whole Word Masking for Chinese BERT[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:3504-3514.
[30] WANG Y,ZHANG C.Using the full-text content of academic articles to identify and evaluate algorithm entities in the domain of natural language processing[J].Journal of Informetrics,2020,14(04):101091.
[31] 张柳,王晰巍,黄博,等.基于LDA模型的新冠肺炎疫情微博用户主题聚类图谱及主题传播路径研究[J].情报学报,2021,40(03):234-244.
[32] 史伟,薛广聪,何绍义.基于偏差规则马尔可夫模型的网络舆情情感预测研究[J].情报学报,2023,42(09):1065-1077.
[33] 段庆锋,闫绪娴,陈红,等.基于媒介比较的学科新兴主题动态识别:altmetrics与引文数据的融合方法[J].情报学报,2022,41(09):930-944.
PDF(4540 KB)

101

Accesses

0

Citation

Detail

段落导航
相关文章

/