Inspired by all the COVID-19 pre-print research being made publicly available, I wanted to apply some data science skills and see if any trends were emerging. There were a few surprises.
As terrible as the currently unfolding Coronavirus epidemic has been, it’s been fascinating to observe how quickly academic and research communities across the world have scrambled to start understanding this virus and its potential impact.
Because of this, there are lots of interesting pre-print academic papers coming out fast. I would encourage you to read pre-prints with caution as the claims made are unverified, but I wanted to see if there were any discernible patterns in the topics and conclusions these papers are discussing.
So, I have manually scraped the results and insights sections of these pre-print papers based on the list from the Elsevier Novel Coronavirus Information Center and using popular Gensim library did some topic modelling using Latent Dirichlet Allocation (LDA). I’ll be sure to include the GitHub link at the bottom of this post if you want to try this yourself.
After training multiple LDA models on a sample size of 75 pre-prints, 32 topics appeared optimal (maybe 20 would have ok) with a coherence value of 0.54. I then selected the most prominent and exciting topic keywords clusters, inferred the central insight and located the most representative article for each.
This post is by no means a scientific review, but rather a little experiment I wanted to share. Perhaps we can use tools like this to identify patterns from multiple sources faster and foster collaboration.
9 Prominent Coronavirus Topics and their Most Representative Papers
1. CT Scans Appear Promising for Screening COVID-19
Keywords: pneumonia, confirm, evidence, fatality, beijing, focus, large, prediction, effort, propagation
Although the virus has demonstrated that it is highly contagious and causes infection in both lungs spontaneously, clinical evidence shows that Wuhan-viral pneumonia has a low fatality rate. CT Imaging plays a pivotal role in the screening, diagnosis, isolation plan, treatment, management or prognosis of patients with Wuhan-viral pneumonia.
Most Representative Paper: [Clinical and Imaging Evidence of Wuhan-Viral Pneumonia: A Large-Scale Prospective Cohort Study](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3537088&utm_source=EC&utm_medium=Connect)
Percentage of Documents With Topic: 8%
2. Using AI to Screen COVID-19 Patients
Keywords: clinical, diagnosis, significant, highly, characteristic, symptom, aim, index, eosinophil, admission
Using AI technology to screen patients for attributes like WBC, Eosinophil count, Eosinophil rate, 2019 novel coronavirus RNA (2019n-CoV) and Amyloid-A this team developed a faster method to achieve COVID-19 diagnosis with a improved confirmed diagnosis rate for clinical use.
Most Representative Paper: Artificial Intelligence Application in COVID-19 Diagnosis and Prediction
Percentage of Documents With Topic: 7%
3. Countries are not ready for 2019-nCoV
Keywords: prevent, health, capacity, effectively, emergency, manage, strengthen, support, readiness, exist
Countries vary widely in terms of their capacity to prevent, detect and control outbreaks, which is underpinned by global variances in the strength of health systems to manage health emergencies. We need to strengthen global readiness to contain existing outbreaks including the ongoing international spread of 2019-nCoV.
Most Representative Paper: Review of Health Security Capacities in Light of 2019-nCoV Outbreak — Opportunities for Strengthening IHR (2005) Implementation
Percentage of Documents With Topic: 7%
4. Medical Staff Insomnia, Psychological Issues and COVID-19
Keywords: psychological, factor, find, disease, social, isolation, staff, identify, depression, stay
A study found more than one-third of the medical staff suffered from insomnia symptoms during the COVID-19 outbreak. The related factors were including education level, isolation environment, social-psychological worries about the COVID-19 outbreak, and occupation of doctor. Interventions of insomnia on medical staffs were needed considering different social psychological factors.
Most Representative Paper: Survey of Insomnia and Related Social Psychological Factors Among Medical Staffs Involved with the 2019 Novel Coronavirus Disease Outbreak
Percentage of Documents With Topic: 7%
5. Should we use strong prevention measures to control the epidemic?
Keywords: measure, prevention, city, strong, strict, individual, expect, maintain, expose, stop
Strong prevention measures are being encouraged until the Coronavirus epidemic is over. Other domestic places and overseas have confirmed infected individuals should follow China’s example and make strong interventions immediately. Earlier strong prevention measures could efficiently stop the independent, self-sustaining outbreaks in other cities globally.
Most Representative Paper: Simulating the Infected Population and Spread Trend of 2019-nCov Under Different Policy by EIR Model
Percentage of Documents With Topic: 7%
6. New Rapid Genetic Diagnostic Test Identified for COVID-19
Keywords: test, pcr, sample, rt, lamp, diagnostic, reverse, nucleic, swab, screen
Quantitative reverse transcription PCR (qRT-PCR) is currently the standard for COVID-19 detection; however, Reverse Transcription Loop-Mediated Isothermal Amplification (RT-LAMP) may allow for faster and cheaper field-based testing at point-of-risk. The objective of this study was to develop a rapid screening diagnostic test that could be completed in under 30 minutes.
Most Representative Paper: Rapid Detection of Novel Coronavirus (COVID19) by Reverse Transcription-Loop-Mediated Isothermal Amplification
Percentage of Documents With Topic: 5%
7. Distinguishing COVID-19 and other infections quickly
Keywords: low, patient, acid, fever, lung, opacity, ground, process, feature, image
Little is known on the distinguishable clinical features between COVID-19 and nucleic acid negative patients in fever clinics. The highest nucleic acid detection rate for 2019-nCoV infection was observed in patients with muscle ache, followed by dyspnea. The combination of fever, lower accounts of eosinophils and the imaging features of ground-glass opacity in bilateral lungs might be a valuable indicator for 2019-nCoV infection.
Most Representative Paper: Analysis of 2019-nCoV Infection and Clinical Manifestations of Outpatients: An Epidemiological Study from the Fever Clinic in Wuhan, China
Percentage of Documents With Topic: 5%
8. Who is most at risk for Severe Infection of COVID-19?
Keywords: patient, symptom, system, government, people, age, person, significantly, improve, renal
Investigations confirmed there was no significant age limit for the infectivity of the population, but older adults were still vulnerable groups. Patients with diabetes were more likely to develop into severe patients, and the probability of admission to the ICU was significantly increased.
Most Representative Paper: Epidemiological and Clinical Features of 197 Patients Infected with 2019 Novel Coronavirus in Chongqing, China: A Single Center Descriptive Study
Percentage of Documents With Topic: 4%
9. Treatments for Severe COVID-19
Keywords: severe, respiratory, treatment, acute, method, syndrome, similar, level, origin, occur
COVID-19 infection causes severe respiratory disease, similar to severe acute respiratory syndrome coronavirus, and is associated with ICU entry and high mortality. We have studied the origin, epidemiology, treatment methods and other aspects, according to the treatment plan formulated by Wuhan Union Hospital, certified by the Ministry of Health of China, we hope to develop an effective treatment method to reduce the mortality of the disease.
Most Representative Paper: Clinical Characteristics and Treatment of Patients Infected with COVID-19 in Shishou, China
Percentage of Documents With Topic: 4%
Conclusion
I enjoyed this weekend project — it was my first attempt to apply some NLP skills to a problem. I know people out there who will see issues with my approach (please do reach out to me), but this idea is intriguing. I guess it’s like another form of summarization and the more I play with it, the more I think about the possibilities for medical and academic research.
Perhaps some kind of wine social where we invite academic authors based on clustered topic keywords and do some sort of person matching based on the topic crossover %. I better stop there before I give out any more fantastic ideas for free. Feedback welcome!
Full Disclaimer: I work as a product manager for Elsevier. This blog post and analysis is of own creation and in no way represent the thoughts and opinions of Elsevier.
GitHub Repo
GitHub Repo with python scripts input data and outputs: https://github.com/Raudaschl/coronvaviruspreprintresearch_nlp
Want More?
If you enjoyed the article, be sure also to check out a comic I made about Coronavirus around the start of February 2020. The Coronavirus Outbreak So Far and Why It’s So Concerning [Comic] The Coronavirus outbreak has been dominating headlines for the last few weeks. I am growing concerned with the…medium.co
Bibliography
- All pre-prints listed above I collected from the Elsevier Novel Coronavirus Information Center on March 1st 2020, which in turn I downloaded from SSRN.
- Coronavirus: latest information and advice. GOV.UK. https://www.gov.uk/guidance/wuhan-novel-coronavirus-information-for-the-public Published 2020. Accessed February 2, 2020.