====== "What's up, Switzerland?" ====== ===== The project ===== The data underlying the corpus was collected in 2014 to constitute the data base of the research project "What's up, Switzerland?" under the lead of [[https://www.rose.uzh.ch/de/seminar/wersindwir/mitarbeitende/stark.html|Prof. Elisabeth Stark]] (University of Zurich). The project was funded by the [[http://www.snf.ch/en/Pages/default.aspx|Swiss National Fund]] (Sinergia: CRSII1_160714) with CHF 1'832'647 and ran between 2016 - 2020. [[3_project|More about the project ...]] ===== Using the corpus ===== [[https://corpora.linguistik.uzh.ch/annis/|This corpus is freely available]] for academic, non-commercial research. When using the corpus, please make sure to quote correctly. ===== The corpus ===== Our authentic WhatsApp chats were gathered in summer 2014. Not all made it into the corpus (e.g. doublets, chats or message without permission etc.). In its present form, the corpus comprises: * Number of chats: 617 * Number of messages (with permission to be used): 763’644 * Number of informants (who gave their permission): 944 * Number of tokens: 5'155'476 (without redactedQ.* (cf. [[01_corpus:02_preprocessing:02_without_permission|Messages without permission]])) * Number of emojis: 382'116 The corpus is built up of chats in all four national languages of Switzerland, i.e. Swiss German dialect, non-dialectal German, French, Italian and varieties of Romansh. In more detail, the following languages and varieties can be found in the corpus: Available languages: * fra: French * ita: Italian * roh: any variety of Romansh * gsw: dialectal German as used in Switzerland * deu: non-dialectal German * eng: English * spa: Spanish * sla: any Slavic language Romansh varieties: * roh-ja: Jauer Romansh * roh-sr: Romontsch Sursilvan * roh-st: Rumàntsch Sutsilvan * roh-sm: Rumantsch Surmiran * roh-pt: Rumauntsch Puter * roh-vl: Rumantsch Vallader * roh-gr: Rumantsch Grischun More information about the corpus can be found in the section [[01_corpus:start|corpus]] and in the following publication: [[https://bop.unibe.ch/linguistik-online/article/view/3849|Ueberwasser, Simone/Stark, Elisabeth (2017). "What’s up, Switzerland? A corpus-based research project in a multilingual country". Linguistik online 84/5, 105-126]] DOI: [[https://doi.org/10.13092/lo.84.3849 |https://doi.org/10.13092/lo.84.3849 ]]. ===== Quoting===== When using the corpus, please quote as follows: ==== The corpus ==== Stark, Elisabeth; Ueberwasser, Simone; Göhring, Anne (2014-2020). //Corpus "What’s up, Switzerland?"//. University of Zurich. www.whatsup-switzerland.ch. ==== This documentation ==== Stark, Elisabeth; Ueberwasser, Simone (2020): The corpus "What's up, Switzerland?". Documentation, facts and figures. www.whatsup-switzerland.ch. ==== Creation of the corpus ==== Ueberwasser, Simone; Stark, Elisabeth (2017): "What’s up, Switzerland? A corpus-based research project in a multilingual country”. In: Linguistik online, 84/5, 105-126. https://bop.unibe.ch/linguistik-online/article/view/3849/5834 ==== The project ==== Stark, Elisabeth (2016-2020). //SNSF project "What’s up, Switzerland?"// (Sinergia: CRSII1_160714). University of Zurich. www.whatsup-switzerland.ch. ===== Raw data ===== If you want to use our raw data for computational linguistic projects, please contact [[estark@rom.uzh.ch|Prof. Elisabeth Stark]] to see whether your project complies with our requirements. If we make the data available, a CC BY-NC-ND license is applied.