METHODS
All submitted abstracts from 2002 to 2023 were scraped from the DPG SKM website.
Every node is an abstract and a link is drawn between two abstracts if they are semantically similar. To achieve this, the abstracts were pre-processed (stop-word removal, lemmatization) and transformed to tf-idf vectors. Then, the pairwise cosine similarity between all vectors was computed, which yields an adjacency matrix of a weighted, undirected graph. To sparsen this graph, we remove all edges below a threshold such that no node is isolated. We then find clusters of densely connected nodes using modularity optimization. These clusters roughly correspond to the topic clusters in the corpus, which we then hand-label based on their highest scoring words with respect to the mean tf-idf score of the cluster.
Every node is an author, a link is drawn between two authors for every abstract they submitted together.