We Review ‘Semantic Scholar’: An AI-Powered Literature Searching Tool

Semantic Scholar is a free literature search tool developed by the Allen Institute of AI (nicknamed AI2), a non-profit research institute. It has had a meteoric rise to prominence recently. Back in 2015, it could only be used to search through about 3 million computer science papers; not exactly useful to a wide range of scholars. But today, Semantic Scholar can search through over 180 million papers across all academic disciplines, and it’s starting to become a mainstream research tool.

The team behind Semantic Scholar claims to be all about smart information retrieval and reducing information overload. Their mission has been to make the (sometimes onerous) job of searching through research papers faster and easier. As if to prove their dedication to this mission, they preview their own research papers on their website not with abstracts, but with ‘TL;DR’ summaries*… and they’ve just started publishing auto-generated single-sentence TL;DR summaries for papers in their database too.

But can you really take a TL;DR approach to literature searching? After all, surely the whole point of a literature review is to help researchers develop a nuanced and rich understanding of the subject matter. I’ve had countless supervisors and professors over the years who have preached the merits of deep reading, and bemoaned the abbreviations and oversimplifications of internet-era research. TL;DR research? I can almost hear them crying. What is this, Buzzfeed?

Hypothetically offended past professors: hear me out. I’m quietly excited about this tool, and others like it. Let’s face it: the volume of available research has exploded in the age of the internet, and methods of literature searching have changed accordingly.

Infographic showing how literature searching has changed. Where once researchers would rely on the printed books and journals their local libraries, today's researchers have access to millions of outputs across countless databases. Where once researchers knew the limits of their field and its standard texts, increasingly today's researchers work across disciplinary lines and work with a huge diversity of source materials. Where once researchers browsed through card catalogues, we now use keyword searches.

But despite these updates to our literature searching, the way we narrow down our reading list hasn’t really changed. Fifty years ago, you could assess a paper’s relevance by reading a 200-word abstract. And now… yeah, same thing. So we have exponentially more sources to wade through, and no obvious quicker way to identify what’s relevant to our research. That’s a problem.

Tools like Semantic Scholar don’t (and shouldn’t) replace deep reading. But they can (and do) limit the amount of time spent on fruitless searching.

Here’s how it works. Semantic Scholar looks and feels like a scholarly search engine – just type in your search query, and the results pop up. There are filters to enable you to limit your search to particular date ranges, publication types, and so on. At first glance it’s not dissimilar to, say, Google Scholar.

But the difference is that there’s artificial intelligence at play here.** Semantic Scholar uses natural language processing and machine learning models to power its search engine, giving you (in theory) more relevant search results.

I tested Semantic Scholar against Google Scholar by using the same search phrase in both – a phrase relevant to my literary PhD research, but which also applies in multiple other disciplines. Since Semantic Scholar has focused mainly on the sciences, I was skeptical that my search phrase would bring up anything at all. On the contrary: it brought up more results, and more relevant results, than Google Scholar.

Infographic showing that a search for 'hyperreal in posthumanism' with no quote marks or Boolean operators brought up different results in Google Scholar vs. Semantic Scholar. In Google, it brought up 2660 results; the top result was highly cited but out of date; there were 3 options to filter/sort; and clicking the paper brought up the publisher's website. In Semantic Scholar, there were 5350 results; the top result was a recent thesis on the topic; there were 7 options to filter/sort; and clicking a paper brought up a Semantic Scholar page with the abstract & related papers.

Not only were my Semantic Scholar search results more relevant, the post-search tools are fabulous. This search phrase occurs across disciplines; but I was able to quickly filter down to the philosophy papers I was interested in. Plus, the option to filter for listings that include a PDF gave me a quick way to find full-text results.

But by far, my favourite feature of Semantic Scholar is how it connects papers together. When you click on a search result, you go to a page with different tabs. The ‘references’ tab lifts and digitises the bibliography of the paper; and the ‘citations’ tab lists other papers that have cited this one. In both cases, you can filter and sort the results to find what you need. Semantic Scholar also distinguishes between citations that are incidental, and those that are ‘highly influential.’ This makes it much easier to trace the relationships between sources and quickly see what you should read next.

So, should Semantic Scholar become part of your regular literature searching strategy? It only takes a minute to have a go – I say give it a shot, and see how it works for you.


*TL;DR is internet speak for ‘too long; didn’t read’ – often used to denote a quick summary of a long block of text.

**Those interested in the technology behind the AI might like to read AI2’s research website here, and staffer Sergey Feldman’s write-up of their development process here.

About Anaise Irvine

Dr Anaise Irvine is the Editor of Thesislink. She has a research background in science and narrative. Her PhD research analysed how contemporary films and novels represent genetic engineering as a social justice issue. She has previously researched fictional representations of evolution and quantum mechanics. She has taught such diverse texts as Blade Runner and Bridget Jones’s Diary, and her most obscure skill is being able to turn novels into phylogenetic trees!

Leave a Reply

Your email address will not be published. Required fields are marked *

 characters available

4 × five =