SLS 2025 (24.07.2025)

Our abstract has been accepted for presentation at the 20th Annual Meeting of the Slavic Linguistics Society at a special session on Slavic prosody co-organized by Zofia Malisz in Verona, Italy (10-12 September 2025). Presentation title: "Speaking with expectation: Predictability and prosody in Polish read and spontaneous speech" (with Dr. Zofia Malisz and Jan Foremski, MA).

Nasz abstrakt został przyjęty do prezentacji na 20. Annual Meeting of the Slavic Linguistics Society podczas specjalnej sesji poświęconej prozodii słowiańskiej, współorganizowanej przez Zofię Malisz w Weronie, Włochy (10-12 września 2025). Tytuł wystąpienia: "Speaking with expectation: Predictability and prosody in Polish read and spontaneous speech" (z dr Zofią Malisz i mgr Janem Foremskim).

PLM 2025 (30.06.2025)

Our abstract has been accepted for presentation at PLM 2025 in Poznań (21-24 September 2025). Presentation title: "How predictability shapes the way we speak: Lessons from Polish conversations and reading aloud" (with Dr. Zofia Malisz and Jan Foremski, MA).

Nasz abstrakt został przyjęty do prezentacji na PLM 2025 w Poznaniu (21-24 września 2025). Tytuł wystąpienia: "How predictability shapes the way we speak: Lessons from Polish conversations and reading aloud" (z dr Zofią Malisz i mgr Janem Foremskim).

Interspeech 2025 (30.06.2025)

We will present our research at Interspeech 2025 in Rotterdam, Netherlands (17-21 August 2025). Presentation title: "Contextual predictability effects on acoustic distinctiveness in read Polish speech" (with Dr. Zofia Malisz and Jan Foremski, MA).

Zaprezentujemy nasze badania na Interspeech 2025 w Rotterdamie, Holandia (17-21 sierpnia 2025). Tytuł wystąpienia: "Contextual predictability effects on acoustic distinctiveness in read Polish speech" (z dr Zofią Malisz i mgr Janem Foremskim).

@inproceedings{malisz2025contextual, title={Contextual predictability effects on acoustic distinctiveness in read Polish speech}, author={Zofia Malisz and Jan Foremski and Małgorzata Kul}, booktitle={Proceedings of Interspeech 2025}, year={2025}, address={Rotterdam, Netherlands} }

MA Defense (30.06.2025)

Jan has defended his MA thesis titled "Contextual predictability and speech variation: A study of read and spontaneous Polish using discourse-aware language models."

Jan obronił pracę magisterską pt. "Contextual predictability and speech variation: A study of read and spontaneous Polish using discourse-aware language models."

Audio samples (02.01.2025)

Próbki dźwiękowe (02.01.2025)

We have made selected audio samples available for listening and download.

Udostępniliśmy wybrane próbki dźwiękowe do odsłuchu i pobrania.

Female: Conversation

Kobieta: Rozmowa

Female: Reading

Kobieta: Czytanie

Female: Answering

Kobieta: Odpowiadanie

Male: Conversation

Mężczyzna: Rozmowa

Male: Reading

Mężczyzna: Czytanie

Male: Answering

Mężczyzna: Odpowiadanie

ICL 2024 (10.09.2024)

We presented our preliminary findings.

Przedstawiliśmy nasze wstępne wyniki.

Mrs Malisz presenting preliminary results

LREC 2024 (08.05.2024)

Our team will attend LREC 2024 to present our paper, "PRODIS — a speech database and a phoneme-based language model for the study of predictability effects in Polish".

Nasz zespół weźmie udział w LREC 2024, aby zaprezentować artykuł "PRODIS — a speech database and a phoneme-based language model for the study of predictability effects in Polish".

ICL 2024 (08.05.2024)

Two abstracts on our project work have been accepted at the International Conference of Linguistics (ICL) due in September 2024 in Poznań.

Dwa abstrakty dotyczące naszej pracy projektowej zostały zaakceptowane przez International Conference of Linguistics (ICL), który odbędzie się we wrześniu 2024 w Poznaniu.

The project aims to study why speakers lengthen or reduce speech sounds as well as pronounce them more or less carefully. For instance, some of the factors that influence this mechanism are word frequency and the effect of surprisal, that is, the predictability of a word in relation to the words in its immediate context.

Celem projektu jest zbadanie, dlaczego mówcy wydłużają lub skracają dźwięki mowy, a także wymawiają je mniej lub bardziej starannie. Na przykład, niektórymi czynnikami wpływającymi na ten mechanizm jest częstość występowania elementów mowy oraz efekt zaskoczenia, czyli przewidywalność słowa w stosunku do słów w jego bezpośredniej relacji do kontekstu.

When we speak, we lengthen and highlight some elements of speech while we shorten others. As a rule, we lengthen important or new words and shorten words which are obvious from context or occur very frequently. For example, expressions such as "I don't know" and "because" are often reduced to "dunno" and "'cuz", respectively. This is because, in general, we say them very often, but also, we can successfully guess in advance, from the meaning of the preceding sentence, that someone is about to pronounce them.

Kiedy mówimy, wydłużamy i uwypuklamy niektóre elementy wypowiedzi, a inne skracamy. Z reguły wydłużamy słowa ważne lub nowe, a skracamy słowa, które są oczywiste z kontekstu lub występują bardzo często. Na przykład, wyrażenia takie jak "w ogóle" i "na przykład" są często skracane odpowiednio do "wgle" i ""nprzykłd"". Dzieje się tak dlatego, ponieważ w zasadzie wypowiadamy je bardzo często, ale również z powodzeniem możemy z góry odgadnąć, na podstawie znaczenia poprzedniego zdania, że ktoś zaraz je wypowie.

Therefore, why we lengthen and why we reduce is influenced by factors such as the frequency of words and phrases in a language as well as by the surprisal effect: whether we expect a certain word to appear in a given context, or not. Another problem that we need to tackle is that this lengthening and highlighting also occurs under the influence of grammatical accent. Grammatical emphasis, for example, instructs us to lengthen and highlight the syllable "tu" in the word "constitution". Therefore, it is important for us to answer the following question: does the fact that Polish possess rules of grammatical emphasis, help or hinder the shortening of words which are very frequent or have a low surprisal effect?

Zatem na to, dlaczego wydłużamy i dlaczego skracamy, wpływają między innymi czynniki takie jak częstotliwość występowania słów i fraz w języku, a także efekt zaskoczenia: czy spodziewamy się, że dane słowo pojawi się w danym kontekście, czy też nie. Kolejnym problemem, z którym musimy się zmierzyć, jest to, że owo wydłużanie i uwypuklenie występuje również pod wpływem gramatycznego akcentowania. Gramatyczne akcentowanie, na przykład, nakazuje nam wydłużyć i podkreślić sylabę "pe" w słowie "encyklopedia". W związku z powyższym, ważne jest, abyśmy odpowiedzieli na następujące pytanie: czy fakt, że język polski posiada reguły akcentu gramatycznego, pomaga czy przeszkadza w skracaniu wyrazów, które są bardzo częste lub mają niski efekt zaskoczenia?

To carry out our research, we need to create a new speech database. The database must be of very good sound quality. We need excellent quality because our goals require measuring the emphasis in speech accurately (that is, acoustically). We will record the speech of Polish speakers in a recording studio that allows for capturing the speech in complete silence. In addition, we will build artificial neural networks which will "learn" a Polish language model using large amounts of text. This model will allow us to get an accurate measurement of which words have a high or low surprisal effect in a given context. Once we collect the surprisal, frequency and grammatical accent measurements, we will be able to examine how they affect speech duration and pronunciation.

Aby zrealizować nasze badania, musimy stworzyć nową bazę danych mowy. Baza ta musi być bardzo dobrej jakości dźwiękowej. Potrzebujemy doskonałej jakości, ponieważ nasze cele wymagają dokładnego (czyli akustycznego) pomiaru uwypuklenia w mowie. Będziemy nagrywać mowę użytkowników języka polskiego w studiu nagraniowym, które pozwala na uchwycenie mowy w całkowitej ciszy. Ponadto, zbudujemy sztuczne sieci neuronowe, które "nauczą się" modelu języka polskiego przy użyciu dużej ilości tekstu. Model ten pozwoli nam uzyskać dokładny pomiar tego, które słowa mają wysoki lub niski efekt zaskoczenia w danym kontekście. Po zebraniu pomiarów efektu zaskoczenia, częstotliwości i gramatycznego akcentowania, będziemy mogli zbadać, jak wpływają one na czas trwania wypowiedzi oraz wymowę.

Our research on Polish is important because it has a specific focus on accentual grammar. Studying Polish will expand our knowledge of how surprisal and frequency affect human speech. In addition, at the end of the project, we will examine whether the listener is able to actually hear the surprisal effect when it is produced by speakers. This research question has not been answered yet.

Nasze badania nad językiem polskim są ważne, ponieważ skupiają się na gramatycznym akcentowaniu. Badania języka polskiego poszerzą naszą wiedzę o tym, jak efekt zaskoczenia i częstotliwość wpływają na ludzką mowę. Ponadto, pod koniec projektu zbadamy, czy słuchacz jest w stanie rzeczywiście usłyszeć efekt zaskoczenia, podczas gdy wytwarzany jest on przez mówców. Na to pytanie badawcze nie ma jeszcze odpowiedzi.

PRODIS

Probabilistic, prosodic and discourse effects on acoustic distinctiveness in speech

Probabilistyczne, prozodyczne i dyskursowe efekty na odrębność akustyczną w mowie