Authorship attribution and profiling in Spanish and English language
51 p.
Mémoire de master: Université de Fribourg, 2014
English
The authorship attribution is the practice of inferring the author of a given text based on the analysis of her/his writing style. It has been largely used in literature work disputes but it has other interesting applications such as forensics and plagiarism detection. The purpose of this project is to experiment and present a solution that can identify the authors of a given corpora. We have two corpora to analyse: Spanish literature of the 19th century and blogs written in English and Spanish. We aim to identify the author given a list of candidates or infer its gender or age range. We propose to use the Kullback-Leibler Divergence (KLD), an information-based measure of disparity among models. In order to validate the proposal we use as baseline the naive Bayes classifier whose performance is generally accepted for this kind of problem. The results show a significative improvement with the proposed method over the baseline when there is enough text size to train, and they were really promising when detecting the gender and age in the blogs in English language. The performance using few data training could improve with some input conditions identifed and described in this report that could be a precedent for future work.
-
Faculty
- Faculté des sciences et de médecine
-
Department
- Département d'Informatique
-
Language
-
-
Classification
-
Applied sciences
-
License
-
License undefined
-
Identifiers
-
-
RERO DOC
323082
-
RERO
R007902228
-
Persistent URL
-
https://folia.unifr.ch/unifr/documents/306862