1 Introduction
1.1 Type of approach
-
Author Grouping aims to group references of the same author using a type of similarity by analyzing the attributes of these references. Usually, these methods use clustering techniques, pre-defined similarity functions, or machine learning techniques, extracting information from co-authorship relationships or a set of heuristic rules.
-
Author Assignment methods assign each author record using the construction of a model that represents the author. These methods aim to directly attribute the authorship record to their respective authors, adopting some classification or clustering technique.
1.1.1 Explored evidence
-
Citation Information extracts information directly from the citation records, such as author and co-author names, paper titles, year of publication, and other information. These attributes are the most commonly used AND methods available in the literature. However, sometimes they do not provide enough information about the approaches used.
-
Web Information is extracted from the Web and used as supplementary information about an author’s publication profile. This obtained information is used as attributes to calculate the authorship record similarity.
-
Implicit Evidence is obtained from visible attribute elements, such as the latent topics of a citation which returns each topic probability given a particular citation. This value is used as an attribute or evidence to calculate the similarity between authorship records.
2 Methodology
2.1 Research preparation
-
IC-1: primarily addressing AND as an integral component of the study.
-
IC-2: published in peer-reviewed conferences or journals, available in major bibliographic repositories.
-
EC-1: works that are not available in online repositories.
-
EC-2: primarily associated with domains other than information systems, computer science, and engineering.
-
EC-3: published beyond the timeframe of 2003–2022.
WoS | Scopus |
---|---|
Information Science & Library Science | Computer Science |
Computer Science Information Systems | Social Sciences |
Computer Science Interdisciplinary Applications | Mathematics |
Computer Science Theory Methods | Engineering |
Computer Science Artificial Intelligence | Decision Sciences |
Engineering Electrical Electronic | Medicine |
Computer Science Software Engineering | Multidisciplinary |
Multidisciplinary Sciences | Business, Management and Accounting |
Telecommunications | Arts and Humanities |
Computer Science Hardware Architecture | Materials Sciences |
Medical Informatics | Agricultural and Biological Sciences |
Health Care Sciences Services | Biochemistry, Genetics and Molecular Biology |
Mathematics Interdisciplinary Applications | Energy |
Medicine General Internal | Neuroscience |
Operations Research Management Science | Physics and Astronomy |
Physics Multidisciplinary | |
Business | |
Cardiac Cardiovascular Systems | |
Computer Science Cybernetics | |
Education Educational Research | |
Education Scientific Disciplines | |
Engineering Mechanical | |
Management | |
Mathematical Computational Biology | |
Medicine Research Experimental | |
Physics of Fluids and Plasmas | |
Physics Mathematical | |
Regional Urban Planning | |
Social Sciences Mathematical Methods | |
Statistics Probability |
2.2 Data presentation and interrelation
2.2.1 Laws
-
Bradford’s law [20] allows finding journals that publish the most on the topic. The scientific journals of an area should be ordered in a decreasing manner according to their productivity, generating nuclei where appear few journals usually account for a high share of total publications. While a high number of journals publish fewer articles in the area [21]. This law also measures bibliographic dispersion, how much knowledge is dispersed in journals. The Bradford’s law is computed from the journals n that have published the most articles on the subject, which would be the core. As one moves away from the core, an increasing proportion of the articles in the subsequent zones is observed 1:n:n\(^2\):n\(^3\). In the context of this study, Bradford’s law facilitates citing a limited number of scientific journals in the AND area, which collectively account for a substantial portion of the total publications.
-
The Elitism or Prince law is born from Lotka’s Law [22], one of the most discussed models under bibliometrics, which states that the number of authors making n contributions is about \(1/n^{2}\) of those making a single publication. The Elitism law seeks to reveal the most important (most-cited) authors and papers employing the square root of the total number of authors, unveiling what is considered an elite. If n represents the total number of authors, \(\sqrt{n}\) would represent the elite of the studied area. In this study, the most cited authors reveal the most important authors and documents responsible for more than half of the contributions in the AND area.
-
The 80/20 law (Pareto rule) [23] is inspired by information systems used in commerce and industry, where 80% of information demand is satisfied by 20% of the set of information sources. In this work, this law searches for more relevant journals, conferences, countries, and universities that publish the most in the AND area, and the choice of more representative keywords.
2.2.2 Tools
Document type | WoS | Scopus | Merged |
---|---|---|---|
Journal Article | 81 | 90 | 98 |
Conference Article | 62 | 85 | 87 |
Conference Review | 0 | 14 | 14 |
Review | 3 | 5 | 5 |
Book Chapter | 0 | 1 | 1 |
Data Paper | 1 | 1 | 1 |
Erratum | 2 | 1 | 3 |
Early Access Article | 2 | 0 | 2 |
Total | 151 | 197 | 211 |
Journal | h-index | SJR | WoS | Scopus | Merged |
---|---|---|---|---|---|
Scientometrics | 123 | 0.929 | 21 | 19 | 21 |
Journal of the Association for Information Science and Technology | 150 | 0.848 | 8 | 8 | 8 |
Journal of the American Society for Information Science and Technology | 18 | 4 | 4 | 4 | |
Journal of Information Science | 69 | 0.761 | 3 | 3 | 3 |
Journal of Informetrics | 77 | 1.437 | 4 | 3 | 4 |
IEEE Access | 158 | 927 | 2 | 3 | 4 |
2.3 Detailing, integrating model and validation by evidence
Number of documents | Number of authors |
---|---|
13 | 2 |
12 | 1 |
11 | 1 |
7 | 1 |
6 | 2 |
5 | 6 |
4 | 8 |
3 | 13 |
2 | 53 |
1 | 351 |
3 Results and analysis
3.1 Data presentation and interrelation
Author | Doc. WoS | Citations WoS | Doc. Scopus | Citations Scopus | Doc. Merged | Citations Merged |
---|---|---|---|---|---|---|
Gonçalves, M. A. (Orcid: 0000-0002-2075-3363) | 9 | 305 | 13 | 507 | 13 | 507 |
Kim, J. (Orcid: 0000-0001-6481-2065) | 12 | 85 | 13 | 166 | 13 | 166 |
Ferreira, A. A (Orcid: 0000-0002-2487-6600) | 8 | 302 | 12 | 500 | 12 | 500 |
Laender, A. H. F. (Orcid: 0000-0001-5032-2233) | 7 | 297 | 11 | 499 | 11 | 499 |
Asghar, S. (Orcid: 0000-0001-6883-3584) | 6 | 43 | 6 | 72 | 7 | 72 |
Hussain, I. (Orcid: 0000-0002-1586-1503) | 6 | 43 | 6 | 72 | 6 | 72 |
Smalheiser, N. R. (Orcid: 0000-0003-1079-3406) | 3 | 306 | 6 | 524 | 6 | 524 |
Chandra, J. (Orcid: 0000-0001-5994-9024) | 5 | 13 | 5 | 18 | 5 | 18 |
Giles, C. L. (Orcid: 0000-0002-1931-585X) | 4 | 35 | 5 | 168 | 5 | 168 |
Mondal, S. (Orcid: 0000-0002-2159-3410) | 5 | 13 | 5 | 18 | 5 | 18 |
Torvik, V. I. (Orcid: 0000-0002-0035-1850) | 3 | 340 | 5 | 549 | 5 | 549 |
Veloso, A. (Orcid: 0000-0002-9177-4954) | 3 | 69 | 5 | 166 | 5 | 166 |
Zhang, L. (Orcid: 0000-0003-2104-0194) | 2 | 5 | 5 | 10 | 5 | 10 |
Organization | Country | Doc. WoS | Citation WoS | Doc. Scopus | Citation Scopus | Doc. Merged | Citation Merged |
---|---|---|---|---|---|---|---|
School of Information Sciences, University of Illinois at Urbana-Champaign | USA | 11 | 482 | 2 | 58 | 11 | 498 |
Departamento de Ciência da Computação, Universidade Federal de Minas Gerais | Brazil | 9 | 305 | 7 | 359 | 10 | 392 |
Departamento de Computação, Universidade Federal de Ouro Preto | Brazil | 7 | 228 | 4 | 231 | 8 | 290 |
Institute for Research on Innovation & Science, University of Michigan | USA | 12 | 176 | 6 | 76 | 12 | 192 |
School of Information Management, Wuhan University | China | 4 | 65 | 4 | 28 | 7 | 115 |
Heidelberg Institute for Theoretical Studies (GGMBH) | Germany | 4 | 58 | 4 | 65 | 4 | 65 |
Microsoft Research | USA | 3 | 25 | 2 | 32 | 4 | 44 |
Mathematics Department, Fiz Karlsruhe, Berlin | Germany | 2 | 36 | 2 | 41 | 2 | 41 |
Computer Science and Engineering, Pennsylvania State University, Univ. Park | USA | 4 | 42 | 2 | 27 | 4 | 61 |
3.2 Detailing, integrating model and validation by evidence
3.2.1 Co-citation analysis
Co-citation analysis (Fig. 8) | Coupling analysis (Fig. 10) |
---|---|
Shin [38]—Cluster 1 | Zhang [39]—Cluster 1 |
Ferreira [40]—Cluster 1 | Kim [41]—Cluster 1 |
Kim [42]—Cluster 2 | Xu [33]—Cluster 2 |
Kim [43]—Cluster 2 | Colavizza [44]—Cluster 3 |
Levin [45]—Cluster 3 | |
Ferreira [11]—Cluster 3 | |
Cota [46]—Cluster 3 | |
Smalheiser [35]—Cluster 3 | |
Torvik [36]—Cluster 4 | |
Torvik [34]—Cluster 4 |
3.2.2 Bibliographic coupling analysis
3.2.3 Overview of AND publications
References | Type of approach | Bibliographic | ||||
---|---|---|---|---|---|---|
Author grouping | Author assignment classification method | Explored evidence | DataSet | Citation | ||
Similarity function | Clustering method | Database | ||||
Kim and Owen-Smith [10] | Transfer Learning | Agglomerative | Citation Information and Researchers’ personal files from Web | DBLP, AMiner, KISTI, MEDLINE American Physical Society | Scopus WoS | |
Xu et al. [33] | Authority with a learn probabilistic metric; Semantic Scholar with error-drive and hank-based learning | Agglomerative | Medline Metadata | PubMed | Scopus WoS | |
D’Angelo and van Eck [70] | Ruled-Based Scoring | Agglomerative | Author’s informations (name, academic rank, research fields and institutional affiliation) | Data Source from Italian Ministry of Education Universities and Research | Scopus WoS | |
Chuanming et al. [75] | Unsupervised Learning | Agglomerative | Citation Information | CiteSeerX, AMiner, DBLP | Scopus | |
Jhawa et al. [60] | Ensemble-based Classification with Random Forest and Gradient Boosted Tree | Medline Metadata | PubMed | Scopus | ||
Li et al. [67] | Heuristic | Partitioning | XML files with author and publications attributes | Scopus WoS | Scopus | |
Ma et al. [76] | Graph Auto-Encoder and Graph Embedding with Word2Vec | Agglomerative | Citation Information and Researchers’ personal files from Web | AMiner | Scopus | |
Jinqi et al. [73] | Maximum flow in network graph | Agglomerative | Citation Information and Researchers’ personal files from Web | AMiner, Microsoft Academic | Scopus | |
Ma et al. [77] | Meta-path based algorithm with node embeddings in a homogeneous network | Agglomerative | Citation Information and Researchers’ personal files from Web | AMiner | Scopus | |
Wang et al. [78] | Supervised classification technique with Random walk based model | DBLP data dump with author’s informations | DBLP | Scopus WoS | ||
Wang et al. [79] | Adversarial representation learning model with heterogeneous information network | Agglomerative | Author’s name and citation information | AMiner | Scopus WoS | |
Zhang and Ban [64] | Rule-based disambiguation in a graph model | Agglomerative | Author’s name and citation information | AMiner | Scopus | |
Zhang et al. [71] | Convolutional Neural Network to compare clusters of publications | Agglomerative | Citation Information and Researchers’ personal files from Web | AMiner | Scopus | |
Pooja et al. [80] | Graph with Edge Pruning-Based Approach | Agglomerative | Citation Information and Researchers’ personal files from Web | AMiner, Scopus WoS | Scopus | |
Rodrigues et al. [68] | Multi-strategic approach with comparison of strings and author’s networks | Agglomerative | Citation Information and Researchers’ personal files from Web | DBLP | Scopus | |
Zhou et al. [74] | Graph similarity with Inverse Document Frequency | Partitioning | Author’s name and citation information | AMiner | Scopus WoS | |
Firdaus et al. [81] | Naïve Bayes, Random Forest, Support Vector Machine and Deep Neural Network | DBLP data dump with author’s information | DBLP | Scopus | ||
Xiong et al. [82] | Unsupervised Learning with Variotinal AutoEncoder | Agglomerative | Author’s name and citation information | AMiner, DBLP, CiteSeerX | Scopus | |
Kim & Owen-smith [65] | Authority similarities | Agglomerative | Medline Metadata | PubMed | Scopus WoS | |
Mozafari [72] | Genetic Algorithm to learn from the available samples | Agglomerative | Author’s information (name, academic rank, research fields, and institutional affiliation) | Iranian Ministry of Science, Ministry of Health | Scopus | |
Mihaljević and Santamaría [63] | Supervised Learning with Decision Tree, Random Forest, and Histogram-based Gradient Boosting | Agglomerative | Author’s name and documents | NASA/ADS | Scopus | |
Correia et al. [83] | Web page with form to crowdsourcing campaign | Scopus WoS | ||||
Zhang et al. [84] | Graph Attention Networks | Spectral Clustering | Author’s name and citation information | AMiner | Scopus WoS | |
Pooja et al. [85] | Multi-dimensional representation learning based with meta-content and author similarity graphs | Agglomerative | Author’s name and citation information | AMiner, DBLP, CiteSeer, Zbmath | Scopus | |
Zhang et al. [86] | Supervised Learning with Random Forest | Citation Information and Researchers’ personal files from Web | PubMed, Microsoft Academic, Semantic Scholar | Scopus WoS | ||
Kim et al. [62] | Gradient Boosting, Logistic Regression, Naïve Bayes, and Random Forest | Author’s name and citation information | KISTI, AMiner, GESIS, UM-IRIS | Scopus WoS | ||
Waqas and Qadir [69] | Multilayer heuristics based clustering with Research2vec, and Cosine similarity | Agglomerative | Author’s name and citation information | AMiner, BDBComp | Scopus WoS | |
Firdaus et al. [87] | Cost-Sensitive Deep Neural Network | Author’s name and citation information | DBLP | Scopus | ||
Rehs [61] | Random Forest and Logistic Regression | Partitioning | Author’s name and documents | WoS | Scopus WoS | |
Färber and Lamprecht [88] | Ruled-based with Jaro-Winkler similarity | Agglomerative | XML files with author and publication attributes | OpenAire, WikiData | Scopus WoS | |
Pooja et al. [9] | Attention-Based Graph Convolution with a multihop neighborhood | Agglomerative | Author’s name and citation information | AMiner | Scopus | |
Backes and Deitze [89] | Progressive block merging | Agglomerative | Author’s name and documents | WoS | Scopus WoS | |
Manzoor et al. [90] | Convolutional Neural Network to classification | Agglomerative | Medline Metadata | PubMed | Scopus WoS | |
Boukhers and Asundi [66] | Neural network that learns author and co-authors representations | Agglomerative | Author’s name and citation information | DBLP | Scopus WoS | |
Färber and Ao [91] | Unsupervised Approach with ruled-based classifier | Agglomerative | Author’s name and documents | MAKG | Scopus WoS | |
Qiping et al. [92] | Network representation learning | Agglomerative | Author’s name and citation information | AMiner, DBLP, CiteSeerX | Scopus WoS | |
Santini et al. [93] | Multimodal Knowledge Graph Embeddings | Agglomerative | Author’s name and citation information | AMiner, ORCID | Scopus WoS | |
Waqas and Qadir [94] | Manually cross check and cosine similarity to detect ambiguities | Agglomerative | Citation Information and researchers’ personal files from Web | Google Scholar, DBLP | Scopus WoS |