Logotip de la revista Noves SL





Metodologia sobre la recerca sociolingŘÝstica
Spring 2002

Statistics in the analysis of phonetic variation: application of the Goldvarb programme, by Josefina Carrera

In this article the autor describes the functioning of the speadsheet programme Goldvarb taking into account its statistical applications which can be used for sociolinguistics analysis purposes, especially regarding phonetic variationism.

PDF printing version. 115 k



1. General aspects
2. Aims
3. Study variables
4. Application of the Goldvarb statistical programme
5. Goldvarb in variationist sociolinguistics
6. Bibliography

1. General aspects

Variationist sociolinguistics has taken major steps towards perfecting quantitative analysis techniques used to demonstrate the importance of social and linguistic contexts in variation. Hence, the "probability theory to the data allows us to extract higher-order regularities that govern variation in the community" (Labov, 1994:25). To this end, "the variationist method aims to calculate the probability that a given linguistic feature will appear in specific linguistic, sociological and contextual circumstances. On the basis of frequency data gathered from a group of speakers, a theoretical model is created from the probabilities of a certain phenomenon occurring when a number of circumstances converge. Statistics marks the extent to which the calculated probabilities are likely and the circumstances that, when occurring simultaneously, best explain a linguistic fact". (Moreno, 1994:95)

Sociolinguistics works with two types of statistics: a) descriptive statistics, which quantitatively counts and orders data extracted from reality; and b) inference statistics, which applies the results of descriptive statistics and adapts them to realities of a specific linguistic community that have not been studied.

The main object of study in inference statistics is the "dependent variable". This variable is determined by independent or explanatory variables which, in language, are linguistic and sociosituational elements. To establish the probability of a variable phenomena occurring in certain ways, first of all, we need to know how many times it has occurred in terms of all possible cases; this figure is obtained by counting the frequencies of appearance of a certain feature in each of the envisaged conditions and in the discourses gathered from a sample of speakers. Once the cases where a factor is present have been counted, we then turn to look for the frequency with which the phenomenon occurs when different explanatory factors coincide.

Probabilistic analysis allows us to find out: 1) the extent to which different groups of explanatory factors determine the variation of an element when all of these explanatory factors act together; and 2) the general behaviour of a community, even though data is only collated from a stratified sample of speakers. The probabilities are used to create a model of the sociolinguistic competence of speakers in order to predict future trends.

The first probabilistic model applied to linguistic analysis is based on an additive model; a logistic-multiplicative model (1) is then reached using this formula:


The mathematical advances in sociolinguistics that took place between 1969 – with Labov’s work (1969) – and 1978, were in turn complemented by computer applications that perform statistical calculations. Two programmes that follow this line of research are: Varbrul for the PC and Vax, which performs multinomial analyses, (2) and Goldvarb, for the Macintosh, which calculates probability using a binomial dependent variable.

2. Aims

The aim of this article is to explain briefly the way in which the Goldvarb calculation programme works by studying a process of change in progress analysed in Alguaire (a town in the SegriÓ region), and to evaluate the use of the programme in variationist sociolinguistics.

The process of phonic variation analysed here forms part of the atonic vocalism of LleidatÓ and involves initial e- in absolute initial pretonic position in words such as enciam, escala or erišˇ. Diverse dialectological studies carried out during the twentieth century reveal that this vowel was traditionally pronounced with the solution [a]; however, we are now observing a process of phonic variation leaning towards replacement of the solution [a] with another corresponding to written forms of language: [e]. The analysis focuses on informants aged 3 to 80 years from the town of Alguaire and basically reveals the influence of written language on the formal model of North-western Catalan, since the phonic changes observed depend on the speaker’s knowledge of Catalan, the type of education they had, and –linked to this– their age. (3)

1 de 5