Journal article

Deviation of Zipf’s and Heaps’ laws in human languages with limited dictionary sizes

  • Lü, Linyuan Institute of Information Economy, Hangzhou Normal University, Hangzhou, China
  • Zhang, Zi-Ke Department of Physics, University of Fribourg, Switzerland - Beijing Computational Science Research Center, China
  • Zhou, Tao Institute of Information Economy, Hangzhou Normal University, Hangzhou, China - Web Sciences Center, University of Electronic Science and Technology of China, Chengdu, China
    30.01.2013
Published in:
  • Scientific Reports. - 2013, vol. 3, no. 1, p. 1082
English Zipf's law on word frequency and Heaps' law on the growth of distinct words are observed in Indo-European language family, but it does not hold for languages like Chinese, Japanese and Korean. These languages consist of characters, and are of very limited dictionary sizes. Extensive experiments show that: (i) The character frequency distribution follows a power law with exponent close to one, at which the corresponding Zipf's exponent diverges. Indeed, the character frequency decays exponentially in the Zipf's plot. (ii) The number of distinct characters grows with the text length in three stages: It grows linearly in the beginning, then turns to a logarithmical form, and eventually saturates. A theoretical model for writing process is proposed, which embodies the rich-get-richer mechanism and the effects of limited dictionary size. Experiments, simulations and analytical solutions agree well with each other. This work refines the understanding about Zipf's and Heaps' laws in human language systems.
Faculty
Faculté des sciences et de médecine
Department
Département de Physique
Language
  • English
Classification
Language, linguistics
License
License undefined
Identifiers
Persistent URL
https://folia.unifr.ch/unifr/documents/303114
Statistics

Document views: 45 File downloads:
  • zha_dzh.pdf: 33