Variation on many levels: why and how comparing corpora and (Slavic) languages makes sense

Ruprecht von Waldenfels

In my talk, I present a number of projects concerned with comparative Slavic corpus linguistics from different perspectives. First, I show how a parallel corpus can be used to identify interesting synchronic contrasts between the Slavic languages that otherwise easily escape attention. Then, I trace some of these contrasts in a diachronic, comparable corpus of West Slavic languages in order to gain historical insight into how these contrasts developed and how they can be explained. I then use a regionally tagged corpus of Ukrainian to investigate the geographical distribution of contrasting forms and put them into perspective in respect to general patterns of variation in standard Ukrainian today. Finally, I outline how a nascent network of dialect speech corpora could be used to help explain this variation. In my talk, I thus attempt a tour de force of several projects that employ different approaches to analyze data from diverse sources and in distinct languages, and aim to show the benefit of such an eclectic approach that becomes increasingly feasible and, I would argue, necessary as more and more corpora of diverse types become available and relatively easy to use for non-computational linguists.