基于文本挖掘技术的脑卒中文献计量分析:以PubMed数据库为例
分享到:
发布人:yaot 发布时间:2023/4/13 14:43:46  浏览次数:566次
【字体: 字体颜色

——叶全伟 杨肖光 叶丽萍 刘静 周萍

【摘要】目的 回溯并勾勒2010年-2020年全球脑卒中领域研究的主题演化历程与时空分布规律。方法 从PubMed数据库中获取2010年-2020年脑卒中相关文献,利用R 4.1.2的pubmedR软件包批量抓取文献基本信息,通过quanteda软件包构建语料库,使用结构主体模型进行分析,并运用ggplot2软件包进行国家或地区合作网络分析。结果 全球脑卒中研究2010年-2013年为快速增长期,增长率为11.94%~17.60%;2014年-2018年为增长平缓期,增长率为3%~4%左右;2019年-2020年再次快速增长,增长率为12.81%、17.96%。中国大陆地区在2012年、2020年增长率分别为102.99%、43.85%,在全球两个阶段的高速增长期中均做出了较大贡献。基于文献摘要文本,共归纳出11个主题类别,其中“临床研究”与“卒中康复”占较大比重,合计热度值为46.66%。我国近年研究热点主要集中在“分子遗传病学”与“动物实验”。结论 全球脑卒中研究热度持续增长,主题类别逐渐丰富。我国需进一步加强脑卒中前端预防和后端康复研究,优化防治策略。
【关键词】文本挖掘技术;文献计量分析;脑卒中;PubMed数据库
中图分类号:R743文献标识码:A
Bibliometric Analysis of Literature on Stroke Based on Text Mining: a Case Study of PubMed Database /YE Quanwei,YANG Xiaoguang,YE Liping,et al.//Chinese Health Quality Management,2023,30(3):25-30
Abstract Objective To retrospectively outline the evolution and spatial-temporal distribution of the theme of global stroke research from 2010 to 2020.Methods Literature related to stroke from 2010 to 2020 was obtained from PubMed database, basic information of literature was captured in batches by pubmedR package (R 4.1.2), corpus was constructed by quanteda package, and structural subject model was used for analysis. The ggplot2 package was used to analyze national or regional cooperation networks.Results Global stroke research showed a rapid growth period from 2010 to 2013, with a growth rate of 11.94% to 17.60%. The growth period from 2014 to 2018 was flat, with a growth rate of about 3%~4%. From 2019 to 2020, it grew rapidly again, with growth rates of 12.81% and 17.96%. With a growth rate of 102.99 percent in 2012 and 43.85 percent in 2020,the mainland area in China has contributed significantly to the two periods of rapid global growth. Based on the abstract text of literature, a total of 11 subject categories were summarized, among which "clinical research" and "stroke rehabilitation" accounted for a large proportion, with a total heat value of 46.66%.In the recent years, the focus of research in China has been on "molecular epidemiology" and "animal experiments".Conclusion The global research on stroke continues to increase in popularity, and the subject categories are gradually enriched. We need to further strengthen the research on front-end prevention and back-end rehabilitation of stroke and optimize prevention and treatment strategies.
Key wordsText Mining;Bibliometric Analysis;Stroke;PubMed Database
Firstauthor's address School of Public Health, Fudan University/Key Lab of Health Technology Assessment, National Health Commission of the People's Republic of China, Shanghai,200032, China


脑卒中是一种急性脑血管疾病,包括缺血性和出血性卒中[1]。从全球趋势来看,脑卒中是导致患者死亡或残疾的首要病因[2-4],也是我国成人致死、致残的第一病因,具有高发病率、高致残率、高死亡率、高复发率、高经济负担5大特点[5-6]。有学者从文献计量角度分析了脑卒中领域研究的态势与特点[7-10],以期为深化与拓展相关研究、优化政策分析等提供参考。传统文献计量分析方法,一是将已结构化的文献变量(如时间、国别、机构名、作者名等)整合成数据库,借助相关软件(如CiteSpace、 VOSviewer等)分析绘图;二是对尚未结构化的文本(如关键词、摘要、正文等),通过手工信息摘录相关变量汇集成数据库,再进行系统分析。前者可处理大量文献,但难以应对非结构化文本;后者因人力、时间限制,难以应对大量文本信息。
文本挖掘技术是指从文本数据中提取有价值的信息和知识的计算机处理技术,已应用于电子病历[11]、新闻媒体文本分析[12]、大众舆情分析[13]等领域。本研究通过文本挖掘技术,结合传统文献计量分析方法,回溯并勾勒了2010年-2020年全球脑卒中领域研究的主题演化历程与时空分布规律。

……