This is the most popular papers of the twenty-first century

Nature’s 25 Most-cited Twenty-First Century: An Exploratory Review of Machine Learning and AI for Physicists and Computer Scientists

There are many databases that look at different document sets and have different citation numbers. Nature took the median rankings for the five databases that were selected. The databases cover tens of millions of papers published in the twenty-first century.

The open- source nature of a lot of the early academic work in machine learning has been cited more often. The sixth-most-cited paper is titled ‘Random forests.’ and it presents an improved machine-learning method. The method was extended by the late author, US statistician Leo Breiman, and it was done by a Statistician at Utah State University. She said the paper is popular because it’s free and easy to use. It also works extremely well off the shelf, with little or no customization required.

Hot on the heels of that work came Microsoft’s now top-cited paper. The paper, titledAttention is all you need, was published in a landmark paper in the summer of 2017. This underlies the advances in large language models that power tools such as ChatGPT by efficiently implementing a mechanism called self-attention, which lets networks prioritize relevant information when learning patterns. The paper is the seventh highest cited in this century.

Three years after the publication describing AlexNet, an influential paper reported modifications to the network’s architecture, resulting in U-nets, which require less training data to process images6. It is now 12th in the list. Co-author Olaf Ronneberger, a researcher who was recruited by Google DeepMind in London as a result of the publication, says the conference that eventually accepted the paper almost rejected it for not being novel enough. He says that it is still the main workhorse in most models.

That is one of the findings of an analysis by Nature’s news team of the 25 most-cited papers published in the twenty-first century. The articles garnering the most citations report developments in artificial intelligence (AI); approaches to improve the quality of research or systematic reviews; cancer statistics; and research software. However, a pioneering 2004 paper on experiments with graphene1 — work that won its authors the Nobel Prize in Physics in 2010 — is also among the twenty-first century’s most-cited.

AI papers come with natural advantages in the citation stakes, notes Geoff Hinton, a computer scientist at the University of Toronto in Canada who won a share of the Nobel Prize in Physics last year for his work in AI. The twenty-first century has seen rapid progress and a large amount of papers in this area, and it’s relevant to a huge number of fields.

The first quarter of the twenty-first century has produced some huge scientific breakthroughs, ranging from the first mRNA vaccines and CRISPR-based gene-editing techniques to the discovery of the Higgs boson and the first measurements of gravitational waves. But you won’t find any of these advances described in the top-cited papers published since 2000.

Citations — the means by which authors acknowledge previous sources in the literature — are one measure of a paper’s influence. Most of the popular scientific discoveries are not the most highly cited papers. The workhorses of scientists are the scientific methods and software described in these works. The methods that scientists value the most, but in practice the methods are cited more, is how sociologists say that scientists value methods.

The most-cited research paper of all time, based on a paper by Schmittgen using Quantitative PCR and Google Scholar, by Kieron Burke

The concept behind ResNets was a factor that led to tools that could play board games and model language. One of the authors, Kaiming He, who now works at the Massachusetts Institute of Technology in Cambridge, said before them that deep learning was not that deep.

The database has tried to combine preprints and final articles, according to the co-founder of the company which developed the database. And Google Scholar tries to group all versions of a work and aggregate citations to it, says co-founder Anurag Acharya, who works for Google in Mountain View, California.

About 25 years ago, pharmaceutical scientist Thomas Schmittgen submitted a paper that included data from a technique called quantitative PCR, which allows researchers to quantify the amount of DNA in a sample. To analyse his data, Schmittgen used equations found in a technical manual. “One of the reviewers came back and said, ‘You can’t cite a user manual in a paper,’” he says. The creator of the equations contacted the author of the paper and they published a paper that could be cited.

“The citations just grew and grew and grew,” says physicist Kieron Burke, who co-authored the article with physicists John Perdew and Matthias Ernzerhof. In fact, one-quarter of the paper’s total citations were garnered in the past two years, according to the Dimensions research database, and it is also the fourth most-cited paper of all time (see ‘These are the most-cited research papers of all time’).

Chemical chemist George Sheldrick’s paper “On the most-cited papers of the twenty-first century” by Freddie Bray and Douglas Hanahan

The formulae provided by Schmittgen’s paper are popular because they are easy for biologists to calculate changes in gene activity in response to a drug. A software program called DESeq2, described in a paper10 that makes the list at number 18, is also used to calculate changes in gene activity, but uses RNA sequencing data to do so.

George Sheldrick, a British chemist who died in February this year, wrote the software paper number five on the list. He created the SHELX suite of computer programs to analyse the scattering patterns of X-rays after they are shot through crystals of molecules, with the aim of revealing the molecules’ atomic structure. When he began the work in the 1970s, “my job was to teach chemistry, and I wrote the programs as a hobby in my spare time”, he told Nature a decade ago. In 2008 a review paper was written which suggested that any SHELX programs be cited thousands of times.

Three top-cited papers are familiar fare in the introduction sections of cancer research papers. The World Health Organization project that tracks global cancer statistics every two years had two of them, numbers nine and ten. Freddie Bray is the lead author of two papers that show how Gloucester data can be used to give researchers and advocates cancer incidence and mortality rates.

The third cancer paper (number 19) is a review14 that attempts to distil the complexity of cancer into a handful of characteristics commonly found in tumours. The Ludwig Institute for Cancer Research in Lausanne, Switzerland, has a researcher named Douglas Hanahan who said these hallmarks of cancer have helped shape the field.

Source: Exclusive: the most-cited papers of the twenty-first century

Thematic Analysis: How Do Psychologists Get Their References? A Biostatistical Analysis of Nature’s Top 100 Most-Cited Scientific Papers

At number four on the list is what is sometimes called ‘psychiatry’s bible’: the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), which was published in 2013, nearly 20 years after the previous iteration. The book15 describes the criteria for categorizing and diagnosing mental disorders including addiction and depression and is extensively used by researchers and health professionals worldwide. DSM-5 is the only book in Nature that was recorded, because most of the databases recorded it.

Psychologists Virginia Braun and Victoria Clarke were used to getting only a handful of citations on their papers about gender and sexuality. So they watched with astonishment as their 2006 paper16 became the third-most-cited article published this century. It has a life of its own.

After the paper’s publication, researchers started referring to thematic analysis — as outlined by Braun and Clarke — as the method they used, which sent its citation count off the charts. The paper has “been completely life-changing”, says Clarke. She and Braun have since pivoted much of their work to thematic analysis and have received invitations to meetings across the world. “It was entirely accidental,” adds Braun, who works at the University of Auckland in New Zealand.

In fact, many papers on biological laboratory techniques dominate the list of the most highly cited papers of all time, according to data provided to Nature by the US firm Clarivate, which owns the WoS. The top-100 list also includes papers on artificial intelligence (AI), research software and statistical methods.

One way of answering that question is to find out which articles appear most frequently in today’s research papers. Nature asked some bibliometricians how they find the patterns of references in scientific publications. They churned through tens of millions of references cited in all the papers published in 2023, the most complete year available in research repositories at the time.

The 1951 paper is the one that headed all of the lists. But a paper2 by Microsoft researchers from a 2015 conference on AI is already ranked fifth, when median rankings are analysed across three databases (or seventh in the WoS alone; see ‘Top ten cited papers’).

Rising annual volumes of research papers — meaning more references each year — and the greater visibility of that research online and on social media might explain how some modern papers have shot up the charts, says Paul Wouters, a retired scientometrics researcher at Leiden University in the Netherlands.

Researchers advance by standing on the shoulders of giants, to paraphrase Isaac Newton. Which research giants are still being cited frequently today?

“Sadly, the field of AI and machine learning is rife with plagiarism,” he says, “and some of the most famous papers actually failed to cite the original work published when compute was millions of times more expensive than today.”

There is another research paper that was written nearly three decades ago. In 1996, three researchers at Tulane University in New Orleans, Louisiana, published a clever, fast approximation that could be used in software to help researchers to calculate the interactions of electrons in materials, as a way to understand the materials’ properties.

Previous post The disclosure details how DOGE might have taken sensitive labor data
Next post Sudan’s war is 2 years old and there are no signs of slowing it down