Every node in the network is a preprint, and a link is drawn between preprints based on the semantic similarity of their abstracts.
Before creating the network based on abstract similarity, the text is preprocessed using spaCy. English stopwords and punctuation are removed, as well as often occurring trivial words like "covid". Then, the abstracts are lemmatized. Lemmas that are neither (proper) nouns, adjectives nor verbs are removed from the analysis.
Every node is an article. Links are drawn between nodes according to the following rule:
Let 𝑀𝑖,𝑗 be be the number of common lemmata between nodes 𝑖 and 𝑗.
A link is created between 𝑖 and 𝑗 if 𝑀𝑖,𝑗 > 𝑡𝑙.
𝑡𝑙 is defined as 16 exploratively.
The network is layed out using a force-directed algorithm (nodes that share links are closer to each other) with the force-graph library.