Forskerskolen i språkvitenskap og filologi


Introduction to statistics for linguists

Introduction to basic statistical methods and their use in the study of language, 24-28 October.


Dates: 24–28 October, 2011 (10.00–15.00)

Venue: Studentsenteret, University of Bergen. Monday and Wednesday at Seminar room C, Tuesday, Thursday and Friday at Seminar room B. The session after lunch on Friday will be at Faklab 1130 at Høyteknologisenteret.

Description [UPDATED]: This course aims to give a thorough grounding in basic statistical methods and their use in the study of language, with an emphasis on explaining how to decide which particular techniques and tests are appropriate for particular kinds of data. The course will be illustrated throughout by examples of statistical analyses of language, conducted using SPSS.

We begin with a discussion of why many kinds of research in linguistics require statistical techniques, and then move on to descriptive statistics: summarising data in the form of frequency tables and graphical representations; measures of central tendency (mean, median, mode); measures of variability (standard deviation and variance, interquartile range).

We then discuss the basic principles of hypothesis testing, covering types of research design, the setting up of null and alternative hypotheses, the concepts of test statistic, critical value, significance level and effect size. After this we proceed to examine ‘parametric’ tests for the significance of the difference between two means (t-tests for independent and paired samples) and for the significance of the differences across several means (Analysis of Variance, or ANOVA). We then discuss some useful ‘non-parametric’ tests for the differences between two groups (Mann-Whitney, Wilcoxon and sign tests) or across more than two groups (Kruskal Wallis test, Friedman’s ANOVA), which do not impose as many restrictions as the parametric tests.

The course then goes on to explore the chi-square test, which is most often used to investigate association, or the lack of it, between two nominal variables (i.e. variables which can only be classified into categories).

We then look at describing relationships between variables using correlation and regression techniques. We also examine very briefly a set of ‘multivariate’ methods: Multivariate Analysis of Variance or MANOVA, used when more than one dependent variable is measured; techniques for providing visual representations of relationships among variables (hierarchical cluster analysis, multidimensional scaling); methods for reducing dimensions of variability (factor analysis, principal component analysis).

If time permits, we will also look briefly at statistical techniques which are commonly used in the analysis of corpus materials.

Finally, participants will carry out a set of practical exercises in the computer lab, using SPSS.

Teacher [UPDATED]: Chris Butler is Honorary Professor in the Department of Languages, Translation and Media, College of Arts and Humanities, Swansea University. He is also a Visiting Professor at the University of Huddersfield and Visiting Fellow in the Centre for Translation Studies, University of Leeds. He is a member of the research group SCIMITAR run by the University of Santiago de Compostela, Spain. His main areas of expertise are functional linguistics, corpus linguistics (especially of English and Spanish) and the application of statistical techniques to the study of language.

Preparations: Please read (at least) the first three chapters of Chris Butler’s book Statistics in Linguistics.

Credits: 2 ECTS credit points.

Programme: Daily classes from 10.00 to 15.00 with a one hour lunch break at 12.00.

Organised by the PhD research school in linguistics and philology at the University of Bergen. Please sign up by sending an e-mail to Martin Paulsen no later than 30 September, but be aware that we can only take a limited number of participants, and the first to sign up will be given priority.