Descargar

Corpus Linguistics

Enviado por Victor Birkner


  1. Semantics
  2. Corpus linguistics
  3. Preliminary considerations
  4. LC and its pedagogical transfer
  5. Modified texts
  6. Conclusions
  7. Bibliography

Semantics

The use of corpora in the semantic studies has not been frequent. Semantics has known its greatest developments in highly theoretical work that have developed systems with high levels of abstraction. This fact has often been understood as a separation from its raw language, intensified by the study of certainly marginal phenomena and the use of exemplifications which could be strange for a typical native speaker (Kilgarrif, 2003).

Perhaps it is this reason that has caused the discipline feels difficult and mysterious for non-specialists and is the cause that it has remained far away from practical approaches (for example, of the annotation of corpora and computational linguistics in general, usually based on phonological and morph syntactic aspects). The leap into empirical research is not straightforward and brings with it major problems.

Linguistic studies have addressed the descriptions – and also requirements – from different approaches, from those whose interests focused on the origins and evolution of languages to those which, mainly thanks to the advent of the increasing storage capacity of computers, intended to give an account of the behavior of specific languages through the observation and study of vast collections of speech/text.

Corpus linguistics

Corpus-based studies are, contrary to the prevailing imaginary, quite older than you think; by the way, the appearance of the recent technological-computational tools is responsible for both the statistical fact of such studies, as well as its increasing overgrowing.

Without explicitly using the term Corpus Linguistics, at the end of the 19th century, Kading (1897), gathered a corpus of German of over eleven million words in order to determine, among other things, sequences of letters in that language. Likewise, some of the most prominent scholars of the structuralist period who included, for example, Boas (1940), also made use of corpora in order to observe the behavior of Amerindian languages.

In the 1930s a significant number of investigations carried out where corpora was analyzed to set lexical frequencies in use of real speech (Palmer, 1933;) Palmer &Hornby;, 1937). This work was characterized by being led and performed by professionals closely connected with the teaching of English.

In 1960, Quirk designed and conducted a study called Survey of English Usage, research that later Starvic digitized and complemented with the famous Brown corpus, giving rise to what Leech (1991) considered as a resource without comparison for those interested in studying of spoken English.

The use of newspapers (a type of corpus), carefully prepared by those linguists dedicated to language acquisition processes, have been until now a rich source of language studying material. Longitudinal studies in this investigative line, a tradition also quite extensive, have made use of important collections of statements contained in journals, for example Brown (1973) and Bloom (1970).

Chomsky, the linguist in his personal merit marked a turning point in modern Linguistics, postulated criticism implied to corpus Linguistics, because for him, investigative linguist core lies in what he calls the competence, i.e. knowledge that the speaker's possesses of (the rules) their language; on the other hand, what he called performance, corresponds to a poor demonstration of richness contained in competence.

In the following section some preliminary considerations on the use of corpora in language studies will be introduced, along with the advantages that LC presents. Then, it is intended to address more directly the debate on the (possible) transfer of studies of the LC to the teaching of English. Finally, we will analyze a position that collects some of the most outstanding aspects of the debate in question, together with concrete examples applicable to the context of English classes.

Preliminary considerations

Corpora today are compelled to respond to three general basic requirements: (i) size, although this depends on the motivation of their construction and their specific uses; (ii) internal balance, because it must respond to the period which it claims to represent, to the dialectal variety chosen and with an internal structure in such a proportion of text to validate emerging conclusions from it, and (iii) the simplicity of use depending on, once again, the use it will have. In addition, the difficulty that exists on author's rights, especially when there are several participants involved in an oral communicative act chosen for the corpus and care in the codification of the linguistic data.

The impact of LC has allowed that virtually all the linguistics sub-components are affected by technological developments, they include Phonetics, articulatory, acoustic and auditory; the same applies to phonology, which explores the distribution of both segments and suprasegments in a particular language. Grammar is probably one of the areas that has given rise a productive publication of literature; However, the study of the lexicon is which has generated a series of dictionaries and bounded studies which sometimes express pretensions of becoming innovative materials for educational purposes.

As Leech claims (1991), for many the use of corpus in the pre-chomskyan research was conceived as a unique source of evidence for linguistic theory, from the most passionate exponents as Harris (1951) to the more moderate as Hocket (1948). Chomsky here presents a series of objections and values, in their absence, introspection, i.e., the space where the linguistic competence lies; it only allows us to disambiguate statements, in addition to establishing what sentences are grammatically correct or incorrect.

In respect to the benefits normally raised by corpus-based studies we can include the following:

(i) Emerging corpus data is observable and objective (Leech, 1992);

(ii) The vast majority of the sentences contained in corpus are grammatically correct, (Labov, 1969), in clear opposition to Chomsky"s statements (1968).

(iii) The processing capacity of linguistic data is growing at speeds increasingly higher and margins of error virtually around zero.

(iv) The examples of linguistic data correspond to real speech- even though this term is contentious enough for many scholars – found in (small) context (s).

(v) The easy access to corpus that can be analyzed and worked is such that too sophisticated technological tools are not required; Access can now be personal and domestic.

The vibrancy with which this this renovated and technologized stage of LC has been received that, according to Johansson (1991), the number of significant studies based on corpus has gone from ten in 1965 to more than 320 carried out only in a span of fifteen years. This has gone hand in hand with the construction of corpus in English, of which the first is the aforementioned Survey of English Usage, in 1960; then the so-called Brown University Corpus of American English arose, the first computerized corpus in the sixties; the Lancaster-Oslo/ Bergen Corpus of British English occurred later in the seventies; in the 1980s the Collins Birmingham University International Language Database (COBUILD) was created, work from which the dictionary that bears its name emerges. Back in the nineties a project called Bank of English emerges along with the British National Corpus (BNC) and the International Corpus of English, among others.

LC and its pedagogical transfer

The debate here is mainly based on the academic writings contained in Controversies in Applied Linguistics, Barbara Seidlhofer (2003), which presents two positions in relative opposition on the eventual pedagogical transfer of the underlying paradigm to the LC to the English as a foreign language classroom.

In this respect, Carter and McCarthy along with Gavioli and Aston are presented as defenders of the relationship between LC and teaching of English, while Prodromou & Cook question this relationship of linguistic description and pedagogical prescription.

A feature that is often presented as an advantage of linguistic descriptions, with its subsequent pedagogical implications, is that collected speech is natural, 'real' – although as Carter asserts (1998), the term "real" is extremely loaded with positive connotations. The aforementioned is in clear contrast to the forms of speech contained in texts.

What seems to be even more audacious is to assure that the form of informal British speech, as McCarthy & Carter do (1995), is 'real' English. It is true: the concordances widely used by those who are fond of the study of corpora are accountable of certain linguistic truths that often stated against the traditional teachings.

Nevertheless, the above mentioned 'real' samples are subordinated to the membership of an individual to the cultural community of the language in question. In other words, it is not possible to speak like a speaker of British English of informal record if one does not belong to the above mentioned practice community, particularly if we add to the above mentioned demand the segmental and prosodic aspects of English, especially if, on having analyzed a significant sample of the plans of study of almost 100 programs of formation of teachers of English in Chile, a gradual disappearance of classes of English phonetics and phonology is observed. It is for reason that it is worth wondering if it sounds more "'strange" (McCarthy and Oil pan, 1995:207) to use a bookish lexicon or to speak with a proper lexicon of the variety and register mentioned, but with a rhythm, accentuation and intonation clearly foreign.

On the other hand, as Prodromou (1996) questions: to what extent can a non-native teacher, under the special premise previously mentioned teach "real" English? In the same sense, the underlying principle to the education of "real" English, from samples belonging to a certain dialectal variety and to a particular register, includes, perhaps in an indirect way, the assumption that our students of English as a foreign language learn the above mentioned language to communicate with native speakers of English. It considers, in turn, that the 'native speaker' – increasingly evasive concept in the anthropolinguistic reflection – turns out to be invariably our model, empowering him at the expense of a non-native teacher. The above mentioned presumption turns out to be extraordinarily fallacious when one observes that the growing number of speakers of English as a second language and as a foreign language exceed at length 400 million, so to adopt a dialectal variety that represents "real" English must be reformulated due to the intrinsic value of English considered the lingua franca of modern times.

It is probably true that teachers of languages are absorbed with the processes of natural speech, and as this – partially, of course – is contained in corpora, we tend to think that this is what should penetrate the classrooms and language texts.

Modified texts

This position taken by Carter is called 'moderate' or even weak by Cook (1998), who in fact, for example, recognizes that one of the great contributions of the LC is to show that language in use is not limited to the domain of grammatical rules, in harmonious combination with lexical items; it is, rather, a vast collection of collocations, a principle to which Willis (2003), Larsen-Freeman (2003), McCarthy & Carter (1995) subscribe. In this sense it has been spread, both at academic levels and organization responsible for public policies, the virtual conviction that the degrees of comprehension of a text in English are exclusive and mathematically related to the number and type of words – according to frequency-that the student knows. However, this purely mathematical relationship ignores fundamental aspects inherent to the pedagogical exercise, namely the treatment of the students" expectations by the teacher, individual differences, in which learning strategies are inserted; attitudes of the teacher and students, cultural diversity, among others.

In addition, as Cook points out (1998), the deployments of concordances of a corpus in terms of lexical frequencies or range show produced speech, but ignores another equally important aspect as it is the perception of speech and its interpretation, aspects now covered by pragmatic. Such displays of concordances, useful by the way, don't consider other aspects so real of the use of the language such as the eventual infrequency of a particular item, but its supreme potential usefulness or pedagogical relevance; or the frequency of an item and its narrowness of contextual ranges.

Finally, if we assume that speaking samples contained in corpora must become models for our students, especially about the logic of mathematical registers of frequency.

Conclusions

With the evidence of the background presented regarding the debate stated here, we can conclude the following:

(i) Research and corpus-based analyses are here to stay (Mc – Carthy & crankcase, 1995). This is not only based on the number of investigations of this nature, but also in the productivity of the debate that LC has risen in the academic community and relevant participants. This increase in academic production, the technological tools of processing and storage of linguistic data have definitely, played a fundamental role.

(ii) There is already an extensive diversification of types of corpora, in which the inquiry about the lexicon exceeds the other available uses.

(iii) Contributions that LC is able to conceive are extraordinarily rich in terms of objectivity due the fact that it is responsible, not only for the absolute mathematical aspects such as frequency of linguistic items, but also for the lexical nature of the English language, in particular.

(iv) An important part of the findings challenge, from the empirical evidence provided by the analysis of concordances, eloquently a significant number of beliefs (linguistic) from traditional texts based on introspection.

(v) It seems extremely dangerous to assert that real speech corresponds to a dialectal variety, in a given register, McCarthy & Carter (1995), discrediting other varieties, other registers and an already indisputable truth: our students of pedagogy in English, together with their students in the school system used, probabilistically speaking, English with other non-native speakers. Therefore, the inevitable question arises why to work in shaping our education on the basis of a variety which most likely our students will not listen in real contexts?(vi) That fact which is conceived as real speech, by the fact to come from a corpus, is invariably covered with a socio-cultural context virtually not transferable to the class room. If this is added to linguistic aspects such as prosodic elements of the speech that, given the characteristics of our students and related studies, are not acquired on regular training programs, we should question the relevance of an emphasis on "real speech", understood as real lexicon use, but encapsulated in a foreign supra segmental wrapper. The principle of real speech based on the native speaker makes that kind of speech unteachable for the non-native teacher along with the fact that it invalidates him socially speaking.

(vii) It seems to be that the invaluable data that LC provides are more easily susceptible to discourse analysis or conversational analysis, rather than the immediate use of this type of material in the classroom. The main reason for this is because this material is usually full of ellipsis, interruptions in taking turns, textual bookmarks, false beginnings, hesitation, etc. That is why it is sensible to consider intermediate positions as Carter offers in his suggestion for amending and remodeling texts.

(viii) The data provided by corpus concordances in terms of lexical frequencies should not be the only criterion to determine what is taught and what is not.

Bibliography

Meyer Charles (ed.). (2007). English Corpus Linguistics. An Introduction. New York. Cambridge University Press.

Bloom L (1970) Language development: form and function in emerging grammars, Cambridge, MA: MIT Press.        

Boas F. (1940). Race, language and culture. New York: Macmillan.        

Brown R. (1973). A first language: the early stages. Cambridge. MA: Harvard University Press.        

 

 

Autor:

Victor Birkner