Measuring formality through word frequencies
Principia Cybernetica Web

Measuring formality through word frequencies

The degree of formality of a text can be measured by adding the frequencies of context-independent words, subtracting the frequencies of context-dependent (deictic) pronouns) and normalizing the sum


Grouping words in the traditional grammatical categories (nouns, verbs, prepositions, etc.), this produces the following formula for formality (F):

F = (noun frequency + adjective freq. + preposition freq. + article freq. - pronoun freq. - verb freq. - adverb freq. - interjection freq. + 100)/2

Such a formula provides an easily applicable measure for ordering language from different sources, genres or styles according to their formality. The calculated formality corresponds generally quite well with intuitive expectations, e.g. official documents or scientific texts are more formal than personal letters, speeches are more formal than conversations, etc. For example, data for Dutch reveal the following ordering:

context- independent categoriesdeictic categories
NounsArticles Prep.Adject.Pron. Verbs Adv. Conj.Form.
Oral Female10.406.895.868.0916.9519.3517.457.4738.7
Oral N.Acad.12.758.506.346.7116.0118.8019.316.3440.1
Oral Male11.488.166.697.6315.8418.4516.537.0541.6
Oral Acad.13.169.587.917.1313.9617.7517.887.1344.1
Novels18.5210.4810.2610.0013.2520.6210.476.0652.5
Fam. Magaz.21.789.7712.2111.1410.0918.719.746.3958.2
Magazines24.2011.6113.9010.938.5517.688.734.3462.8
Scientific23.1015.0013.7510.756.7116.587.985.9865.7
Newspapers25.9714.6814.5410.575.6216.697.214.7068.1


Copyright© 1995 Principia Cybernetica - Referencing this page

Author
F. Heylighen, & J-M. Dewaele

Date
Jul 13, 1995

Home

Metasystem Transition Theory

Epistemology

Language

Human language

Formalization

Up
Prev. Next
Down



Discussion

Add comment...