Printer Friendly

A lexicometric analysis of the poems from O Guardador de Rebanhos/Analise lexicometrica dos poemas de O guardador de rebanhos.

Introduction (1)

This article is made up of three parts and its main objective is to present a lexical study of the poems from O guardador de rebanhos (The keeper of flocks), by Alberto Caeiro. O guardador de rebanhos is a collection of poems written by Alberto Caeiro, a heteronym of Fernando Pessoa (2007). The poems were written in 1914 and the writer Fernando Pessoa traced their genesis to a single night when Caeiro was suffering from insomnia. Fernando Pessoa is the greatest Portuguese poet of the 20th century and his works are translated into many languages, thus being studied in many countries outside the lusophone world. This is the reason why we have decided to write this text in English, so that it can be read by a wider community of readers. The current study was undertaken using the portuguese version of O guardador de rebanhos and NooJ, a computer programme that performs lexical analysis. This computational resource enables the text to be tackled in portuguese, whereas the majority of the resources that are available relate to an analysis of data written in English. NooJ is linguistic development environment software that, on the one hand, enables formal descriptions (dictionaries and grammar books) of a wide range of natural languages to be generated and, on the other hand, effectively applies these descriptions to long texts. The use of NooJ allows us the possibility to put into practice a diverse range of procedures (Figure 1).

There are another computer programs of lexicometric analysis: lexico, stablex, tropes, wordsmith, microconcord, TACT, corpus presenter, intex, lexa, concordance, protan, SALT, text analysis, and so on. AntConc, for example, is a corpus analysis toolkit designed by Laurence Anthony (Anthony, 2004). As the authors Fadanelli and Monzon 2017) noted,

It is equipped with a Concordance program, word frequency list generators and the tool used for extraction, a keyword function (offering an on-screen list of candidates for keywords in a corpus, comparing the study corpus with another corpus, at least five times greater - called the reference corpus); the reference corpus was composed of texts extracted from the COCA (Contemporary Corpus of American English) and the BNC (British National Corpus) corpora, researching the occurrence of the 30 most frequent words in the COCA Corpus wordlist. The reference corpus has approximately 211,000 tokens, more than five times the size of the study corpus (Sardinha, 2000). The texts originate from diverse domains, academic, news, literature, Internet, and went through the same punctuation cleaning, etc. to integrate the corpus of research (Fadanelli & Monzon, 2017, p. 363, our translation) (2).

In the first part, after a definition of lexicometry, brief reference is made to its potentialities and to some of the concepts that it covers, such as keyword, theme word, word frequency counters, taggers, concordancers and concordancies. There is also a brief presentation of the NooJ programme and the definition of thematic field.

In the second part, using NooJ, we will carry out a lexicometric analysis, based on the statistical analysis of the theme words, in order to define possible thematic fields in those poems. In light of the above mentioned, we will begin by presenting the general data of the corpus and by organising a list of theme words, departing from the list of tokens in descending order of frequency. The lexicometric analysis of the poems from The keeper of flocks, by Alberto Caeiro (Pessoa, 2007), ends with the presentation of the thematic fields, drawn from the corresponding theme words.

In the last part of this work, we will present a pedagogical proposal that explores the potentialities of the lexicometric analysis of the poems from The keeper of flocks (Pessoa, 2007). The poem O guardador de rebanhos (Pessoa, 2007) is studied by secondary school students attending 12th grade and it is compulsory for all, regardless of the area of study. These are 17- or 18-year-old students. At university level, the poem is studied in curricular units related to literature in degrees in humanities, specifically in letters.


Lexicometry consists of a set of objective, descriptive, inductive and scientific technological methods, which, due to statistical analysis programmes, enable "[...] formal reorganisations of the vocabulary (set of forms actualised in discourse, attested in a text or in a corpus of texts). The lexicometric study implies an exhaustive survey of ALL occurrences, of ALL the forms of the corpus to be studied" (Carvalho, Marques, & Silva, 1999, p. 225, our translation) (3). This means that, besides granting us access to a detailed and rigorous linguistic register of the vocabulary of the corpus under analysis, lexicometry also gives us systematic and objective results, and this contributes to an objective presentation of the quantified linguistic data.

For example, as we notice in sections 3 and 4 of this study, through lexicometry, we can observe, in a rigorous way, the frequency with which the words from The keeper of flocks (Pessoa, 2007) occur within it, as well as identify the keywords, the theme words and frequency forms 1, that is, the 'hapax legomena or hapaxes', as a result of the various computational resources used in corpus linguistics, from frequency counters, programmes that provide the frequency of words, to taggers, which perform automatic analysis of the corpus and tagging, to textual engineering software as well as to concordancers.

As its name implies, the concordancers allow the extraction of concordances. According to Berber Sardinha (2004), concordance is a list of the contexts of occurrence of a given linguistic form, the search word, which appears centralised and accompanied by the linguistic forms that occur to its left and its right, that is, by its original co-text. Because the concordances grant access to the meaning and contextualised use of words, their analysis is of great pedagogical potentiality (McCullough, 2001), as we will show in section 4 of this study.

For us to be able to undertake this lexicometric analysis of the poems from The keeper of flocks, by Alberto Caeiro (Pessoa, 2007), we used NooJ (Silberztein, 2015), a computer programme of lexical analysis, available online at NooJ is a linguistic engine

[...] based on an annotation structure. An annotation is a pair (position, information) that determines that a certain position in the text has certain properties. When NooJ processes a text, it produces a set of annotations that are stored in the Text Annotation Structure (TAS) and are synchronised with it (Mota & Silberztein, 2007, p. 196, our translation).

After having presented some of the concepts that the lexicometric analysis considers within the scope of this study, we shall now refer to what we understand by theme word and by thematic field.

Regarding the concept of the theme 'What is word', Genouvrier and Peytard (1974) argued that the "[...] theme word is a word characterised by a very high frequency and that, in a list sorted by decreasing frequency of the vocabulary of an author, it belongs, for example, to the first 50 positions "(Genouvrier & Peytard 1974, p. 313, our translation) (4). As the above-mentioned authors point out, the analysis according to theme words allows the characterisation of the "[...] author's style as deviation from a norm [...]. Both the key words and the 'legomena hapaxes', that is, the exclusive terms', are analysed as a way to characterise typical thematic and semantic areas" (Genouvrier & Peytard, 1974, p. 317-318, our translation,) (5).

Galisson and Coste (1983), relating the concepts of keyword and theme word, express their views on this issue as follows:

The keyword is a full (non-grammatical) word, of great frequency in a work (or in the whole work) of an author; this frequency has the characteristic - when compared to the 'theme word' - of being very far from the frequency of the same word in a corpus of works of the same kind. In other words, the 'keyword' has the peculiarity of being abnormally frequent in a work or in an author (Galisson & Coste, 1983, p. 114-115, our translation,) (6).

In line with Galisson & Coste (1983), the thematic fields

[...] constitute sets of terms that are functionally possible within a given thematic situation and whose internal organisation depends on a number of parameters paired with a psychosocial activity. For example, the thematic field of the 'house' would include 'building' (hall, staircase, elevator, step, etc.), 'construction' (materials, etc.), 'place of residence' (function, decoration, etc.), [...] and the organisation of these terms would depend on the activities of the individual in this thematic situation (Galisson & Coste, 1983, p. 104, our translation) (7).

Then, based on the linguistic analysis developed by NooJ, we will present the lexicometric analysis of O guardador de rebanhos, by Alberto Caeiro (Pessoa, 2007). In light of Silberztein's theorisation, it is important to warn against possible confusion over the usage of the terms 'Simple word', which is a type of minimal meaningful language unit, and 'Word form', which is a type of 'token': Tokens are the basic linguistic objects processed by NooJ. They are classified into three types: word forms are sequences of letters between two delimiters; digits; and delimiters. A 'word form' is a sequence of letters that does not necessarily correspond to a minimal language unit. A 'simple word' is, by definition, a minimal language unit, that is, the smallest non-analysable element of the vocabulary.

Lexicometric analysis of the poems from The keeper of flocks, by Alberto Caeiro

Before proceeding to the lexicometric analysis of Alberto Caeiro's work, let us look at a short passage from the poem Sou um guardador de rebanhos (Pessoa, 2013, p. 31).
   Sou um guardador de rebanhos.
   O rebanho e os meus pensamentos
   E os meus pensamentos sao todos sensacoes.
   Penso com os olhos e com os ouvidos
   E com as maos e os pes
   E com o nariz e a boca.
   Pensar uma flor e ve-la e cheira-la
   E comer um fruto e saber-lhe o sentido".
   [I am a keeper of sheep.
   The sheep are my thoughts
   And all my thoughts are sensations.
   I think with my eyes and ears
   And with my hands and feet
   And with my nose and mouth.
   To think of a flower is to see it and smell it
   And to eat a fruit is to Khnow its meaning.]

Once the linguistic analysis starts, NooJ displays the general distinguishing features of the text, as shown in Table 1.

The NooJ programme analysed the tokens and their frequencies. Tokens can be presented in descending order of their frequency and/or alphabetically.

Through the analysis of the most frequent items, we found that, as in most corpora, the most frequent forms are functional or grammatical words, for example, in The keeper of flocks (Pessoa, 2007), the five most frequent tokens are the following: 'E' / 'e', 'que' / 'Que', 'a', 'o', 'de' (8), having a frequency of 448, 373, 223, 205, 142, respectively.

In the analysis of the most frequent tokens, it is important to note that, as it happens in most lexicometric applications, NooJ distinguishes between upper and lower-case, considering, separately, each different form of the same lemma.

In this context, the token that appears in the first position of the list in descending order of frequency, 'que [that]', has a frequency of 307, to which the frequency of the form 'Que [That]' is added, matching 66. Similarly, the token which appears in the second position of the list in descending order of frequency, 'E [And]', has a frequency of 237, while the form 'e [and]' has a frequency of 211.

We selected the 60 most frequent tokens, we filtered and exported the data and copied it to microsoft word, producing a list of theme words and their frequencies, which we transcribe below. We have chosen to present the listing of tokens, having common nouns, adjectives, verbs and adverbs as our main criteria.

In the first stage of analysis of theme words, we come across phenomena of potential ambiguity, inherent to the majority of linguistic forms, because, in the tokens list provided by NooJ, the theme words are not part of the contexts of occurrence and, thus, it was necessary to carry out the extraction of concordances in NooJ. As a result, we analysed all the contexts of occurrence of each of the theme words of the corpus and we looked up its definitions in the dictionary. The dictionary that was used is the Dicionario Infopedia da lingua portuguesa da Porto editora[R] (Online version: not only due to its availability, but also because it is the most commonly used dictionary by both teachers and students in elementary and secondary schools in Portugal.

In fact, the examination of concordances and of the dictionary allows for a distancing from the intuition inherent to the observer's point of view. As an example, we can see the theme word 'montes', which constitutes a phenomenon of partial homonymy, that is, 'montes' as both 'rides' [verb to ride] and 'hills' [noun], as we notice when browsing through the dictionary (9), referring to the form of the verb 'to ride' and to the masculine noun 'hill'.

As we can see in Figure 1 from the analysis of concordances of the theme word 'montes', we may conclude that the form 'montes' matches the masculine plural noun 'hill' and the first of the nine meanings of this noun that are in the above-mentioned dictionary ('1. elevation of land above the earth's surface, less extensive and not as high as a mountain; 2. rhyme or set of any stacked objects, a collection; 3. 'figurative' considerable portion; 4. group; gathering; 5. games of chance using cards; 6. portion of goods as part of an inheritance, 7. playing cards; 8. 'regionalism' (from the Alentejo) estate quarters made up of several buildings around a courtyard; designation sometimes attributed to the actual farmstead; 9. 'regionalism' (from Tras-os-Montes) tract of land covered with spontaneous shrubby (usually, woody) and herbaceous vegetation'). Thus, as we notice in Table 3, the theme word 'montes' is part of the thematic field of 'universe'.

In addition, in NooJ, we can also view the ambiguous words (Figure 3) and the unambiguous ones (Figure 4), as we are granted access to all the annotations produced for the corpus under study.

Annotations can also be viewed, verse by verse. For example, in Figure 5, we can examine the annotations of the first verse of the poems from The keeper of flocks.

Based on this methodology, as we can see in Table 3, in the poems from The keeper of flocks, by Alberto Caeiro (Pessoa, 2007), we could define six thematic fields within different domains: the thematic field within the domain of sight was delimited from the theme words olhos, cor, olhar and ver (10); the thematic field of transcendence was delimited from the theme words Deus and alma (11); the thematic field within the domain of universe, from the theme words flores; sol; arvores; Natureza; ser; luar; ceu; rio; rios; homens; agua; dia; noite; pedras; montes; natural; flor; terra; vento; coisas; coisa; gente; aldeia; vezes; passa; casa; cor; vida; janela and estrada (12); the thematic field of the senses, related to the theme words sinto; sei; saber; sabe; sentir (13); the thematic field of reflection was built upon the theme words pensar; penso (14); the thematic field within the domain of refusal was delimited from the theme words nao / Nao; nunca / Nunca; nada (15).

Pedagogical potentialities introduced by the lexicometric analysis of the poems from The keeper of flocks

The Programa e metas curriculares de portugues do ensino secundario (16) (Buescu, Maia, Silva, & Rocha, 2014), arguing for a "[...] global language pedagogy [...]" (Buescu et al., 2014, p. 5, our translation) (17), which implies interaction among the five domains (speaking, reading, writing, literary education and grammar), in the context of the 12th grade, in the domain of literary education, recommends the study of poems by Alberto Caeiro (Buescu et al., 2014), in particular, and of the poetry attributed to the heteronyms of Fernando Pessoa, in general (Buescu et al., 2014). In this field of action, we believe that the study of Alberto Caeiro's poems may benefit from the contribution of lexicometrics, because, just as Biber (2011) claims, we also believe that the adoption of methods from Corpus linguistics in the study of literary texts sets the scene towards a new direction for classes of portuguese, due to the fact that, as mentioned in Section 2 of this work, the data resulting from the lexicometric analysis enables rigorous and comprehensive knowledge of the author's vocabulary to arise. In light of the above, we hereby present some pedagogical approaches that promote vocabulary learning of the poems from The keeper of flocks, by Alberto Caeiro (Pessoa, 2007).

Because the theme words that are part of Table 2, in Section 3, challenge students with problems of lexical ambiguity, the teacher should prepare activities that guide students in the adoption of strategies to overcome those difficulties.

Students, acting as discoverers of meaning, may observe and analyse the concordances related to the theme words of the poems from The keeper of flocks, provided by NooJ, as well as look up their definitions in the dictionary (18), as exemplified in Table 4, which focuses on the theme word 'trees'.

Similarly, from other theme words, the teacher may suggest tasks such as the one shown in the previous table, leading students to gain access to the real meaning of the theme words from The keeper of flocks (Pessoa, 2007) and enabling thematic fields to be identified through them.

Based on the analysis of the theme words shown in Table 2, Section 3, students may build thematic fields in poems from The keeper of flocks (Pessoa, 2007). Consequently, the teacher may suggest that the students carry out tasks such as the ones presented in Table 5.

To sum up, the analysis of cases of lexical ambiguity, including both homonymy and polysemy, allows the students to discover new meanings. This finding, which results from observation, selection, verification and inference, grants the students the opportunity to develop a better representation of the meanings of words, greatly influencing the expansion of their lexical competence and, thus, it must be permanently encouraged in classes of portuguese.

Therefore, in the context of teaching and learning, the lexicometric analysis of the poems from The keeper of flocks (Pessoa, 2007) offers a fruitful possibility of didactic intervention, because the full access to the vocabulary of the poems from The keeper of flocks (Pessoa, 2007) enables the development of the students' linguistic competence.

Finally, we should note that it is up to the teacher of Portuguese to adapt the lexicometric analysis of the poems from The keeper of flocks (Pessoa, 2007) to each particular pedagogic and didactic context, bearing in mind the students' individual needs.

Final considerations

The contemporary framework in the process of teaching and learning of Portuguese, in Portugal, requires the development and the availability of didactic resources that may serve as the basis for the demanding and thorough pedagogical practices of the 21st-century educational conjuncture. Therefore, 'NooJ' can be seen as a proposal for a didactic approach, within the scope of the new methodologies inherent to the teaching of languages. Undoubtedly, the didactic potentialities of 'NooJ' are almost limitless. Its level of proficiency depends on the teacher's creativity and on the student's curiosity. By extracting sequences or linguistic contexts from works recommended by Programmes of portuguese for various levels, the teacher is able to use a didactic approach to certain linguistic concepts (word classes, morphology, among others), in an objective, appealing and motivating way. Simultaneously, this resource also allows the teacher to acquire the necessary skills to foster within students a critical, self-reflective attitude towards the language, aiming to develop observation competencies and an analysis of the language in a process of discovery of its functioning system.

Thus, 'NooJ', in an educational setting, sets up a new teaching and learning architecture that the Portuguese language classes should incorporate and operationalise, since it is considered a relevant didactic resource. Actually, 'NooJ' can help teachers to work out another way of handling this precious and powerful tool, the Language, and, in the analysis and for the analysis of Language itself, make learning happen.

In light of the above considerations, a lexicometric analysis may have a prominent role in the process of didactic operationalisation in secondary education, revealing 'NooJ' as a potential didactic resource and setting forth the undeniable contribution of Corpus linguistics to teaching, while emphasising a more objective and scientific analysis of linguistic data, using computational tools.

From what we have discussed, we may conclude that the lexicometric analysis of the poems from The keeper of flocks, by Alberto Caeiro (Pessoa, 2007), paves the way for students to actively participate in linguistic discoveries, turning them into '[...] researchers and not merely 'receivers' of language" (Berber Sardinha, 2011, p 305, our translation) (21).

In fact, this didactic approach shows that the lexicometric analysis provides the teacher and the students with the stimulating opportunity to gain access to an objective and detailed inventory of all the linguistic forms of the poems from The keeper of flocks, enabling a clear definition of thematic fields in the literary work of the Pessoan heteronym and reaching a more objective and neutral methodology than the traditional analysis that adopts a hermeneutical stance.

We should note that, although the lexicometric analysis of the poems from The keeper of flocks (Pessoa 2007) distances itself from traditional analysis, as an opportunity for didactic action, within the multiplicity of possible didactic activities available to the teacher, depending on his/her creativity, this does not mean its exclusion, because the subjective points of view, resulting from traditional analysis, may be confirmed (or not) by lexicometric analysis.

Finally, we should point out that the diversity of teaching methodologies is of utmost importance in the teaching of portuguese, because, due to the heterogeneity of students, it must also be diversified and renewed, challenging the teacher of portuguese with the stimulating opportunity to adopt new alternatives for didactic action, enhanced by technological developments.

Doi: 10.4025/actascilangcult.v41i1.42079

Received on March 20, 2018.

Accepted on February 13, 2019.


Anthony, L. (2004). Design and development of a freeware corpus analysis toolkit for the technical writing classroom. Retrieved from or_the_technical_writing_classroom

Berber Sardinha, T. (2004). Linguistica de corpus. Sao Paulo, SP: Manole.

Berber Sardinha, T. (2011). Como usar a linguistica de corpus no ensino de lingua estrangeira. In V. Viana, & S. E. O. Tagnin (Orgs.), Corpora no ensino de linguas estrangeiras (p. 301-356). Sao Paulo, SP: HUB Editorial.

Biber, D. (2011). Corpus linguistics and the study of literature: back to the future?. Scientific Study of Literature, 1(10), 15-23.

Buescu, H., C., Maia, L. C., Silva, M. G., & Rocha, M. R. (2014). Programa e metas curriculares de portugues ensino secundario. Lisboa, PT: Ministerio da Educacao e Ciencia.

Carvalho, D., Marques, M. E. R., & Silva, M. F. (1999). Discurso: praticas lexicometricas. In M. A. Mota, & P. Marrafa (Orgs.), Linguistica computacional: investigacao fundamental e aplicacoes (p. 255-262). Lisboa, PT: Edicoes Colibri / Associacao Portuguesa de Linguistica.

Dicionario Infopedia da lingua portuguesa da Porto editora[R]. (2017). Porto, PT: Porto Editora. Retrieved on July 10, 2017 from

Fadanelli, S. B., & Monzon, A. J. (2017). Generos textuais datasheet e artigo cientifico em aulas de ESP: levantamentos lexico-estatisticos para fins educacionais. Dominios de Lingu@gem, 11(2), 351-378.

Galisson, R., & Coste, D. (1983). Dicionario de didactica das linguas. Coimbra, PT: Livraria Almedina.

Genouvrier, E. & Peytard, J. (1974). Linguistica e ensino do portugues. Coimbra, PT: Livraria Almedina.

McCullough, J. L. (2001). Los usos de los corpora de textos en la ensenanza de lenguas. In T. M. Parera (Ed.), Nuevas tecnologias para el autoaprendizaje y la didactica de lenguas (p. 125-140). Lleida, ES: Milenio.

Mota, C., & Silberztein, M. (2007). Em busca da maxima precisao sem almanaques: O Stencil/Nooj no HAREM. In D. Santos, & N. Cardoso (Eds.), Reconhecimento de entidades mencionadas em portugues: documentacao e actas do HAREM, a primeira avaliacao conjunta na area (p. 191-208). Retrieved from

Pessoa, F. (2007). The collected poems of Alberto Caeiro (C. Daniels, Trad.). Exeter, UK: Shearsman Books. Retrieved from caeiro-SAMPLE.pdf.

Pessoa, F. (2013). Poemas completos de Alberto Caeiro. Retrieved from https://www.luso-

Silberztein, M. (2015). A linguistic development environment. Retrieved from

Carlos Assuncao [1] * and Carla Araujo [2]

[1] Universidade de Tras-os-Montes e Alto Douro, Quinta de Prados, 5001-801, Vila Real, Portugal. [2] Instituto Politecnico de Braganca, Braganca, Portugal.

* Author for conrrespondence. E-mail:

(1) The keeper of flocks, using Chris Daniels's translation (Pessoa, 2007). From here onwards, this translated title will be adopted and used in this study.

(2) "equipado com um concordanciador geradores de listas de frequencia de palavras e a ferramenta utilizada para a extracao, a funcao Keyword (oferece na tela a lista da candidatas a palavras-chave de um corpus,comparando o corpus de estudo com um outro corpus no minimo cinco vezes maior - chamado de corpus de referencia); o corpus de referencia foi composto de textos extraidos do corpus COCA (Contemporary Corpus of American English) e do BNC (British NationalCorpus), pesquisando a ocorrencia das 30 palavras mais frequentes na wordlist do Corpus COCA. O corpus de referencia possui aproximadamente 211 mil tokens, mais do que cinco vezes o tamanho do corpus de estudo (Sardinha,2000). Os textos tem origens de diversos dominios, academico, noticiarios, literatura, internet, e passaram pela mesma limpeza de pontuacao, etc. que o Corpus de pesquisa".

(3) "reorganizacoes formais do vocabulario (conjunto de formas atualizadas no discurso, atestadas num texto ou num corpus de textos). O estudo lexicometrico impoe o levantamento exaustivo de TODAS as ocorrencias, de TODAS as formas do corpus a estudar".

(4) "palavra-tema e uma palavra caracterizada por uma frequencia muito elevada e que, numa ordenacao por frequencia decrescente do vocabulario de um autor, pertence, por exemplo, aos primeiros 50 lugares".

(5) "estilo do autor como desvio a partir de uma norma [...]. Tanto as palavras-chave como os "hapaxes legomena", isto e os termos exclusivos", sao analisados no sentido de caracterizar areas tematico semanticas tipicas".

(6) "A palavra-chave e uma palavra plena (nao gramatical), de grande frequencia numa obra (ou em toda a obra) de um autor; esta frequencia apresenta a caracteristica - em relacao a palavra-tema - de estar muito longe da frequencia da mesma palavra num corpus de obras do mesmo genero. Por outras palavras, a palavra-chave possui a particularidade de ser anormalmente frequente numa obra ou num autor".

(7) "constituem conjuntos de termos funcionalmente possiveis no interior de uma determinada situacao tematica e cuja organizacao interna depende de um certo numero de parametros emprestados a atividade psicossocial. Ex: o campo tematico da "casa" compreenderia o que diz respeito ao "edificio" (hall, escada, elevador, degrau, etc.), a "construcao" (materiais, etc.), ao "lugar de habitacao" (funcao, decoracao, etc.), [...] e a organizacao destes termos dependeria das atividades do individuo que se encontrasse nessa situacao tematica".

(8) And / and, that / That, the [feminine and masculine definite articles], of.

(9) monte In Dicionario Infopedia da lingua portuguesa com acordo ortografico [online].

(10) eyes, colour, look and see.

(11) God and soul.

(12) Flowers; sun; trees; Nature; to be; moonlight; sky; river; rivers; men; water; day; night; stones; mountains; natural; flower; land; wind; things; thing; people; village; times; passes/goes by; house; colour; life; window and road.

(13) [I] feel; [I] know; to know; [s/he] knows; to feel.

(14) to think, [I] think.

(15) no / No; never / Never; nothing.

(16) The national curriculum secondary programme of study for the Portuguese language.

(17) "preconizando [...] uma pedagogia global da lingua [...]"

(18) Dicionario Infopedia da lingua portuguesa da Porto editora[R] (Online version).

(19) arvore [tree] in Dicionario Infopedia da lingua portuguesa da Porto editora[R] [online].

(20) Chris Daniels's translation (Pessoa, 2007, p. 18).

(21) "[...]pesquisadores e nao meros "receptores" da lingua"

Caption: Figure 1. NooJ, a computational resource of lexical analysis (Silberztein, 2015).

Caption: Figure 2. Concordances of the theme word 'montes' (Pessoa, 2007).

Caption: Figure 3. Ambiguous words (Pessoa, 2007).

Caption: Figure 4. Unambiguous words (Pessoa, 2007).

Caption: Figure 5. Annotations of the first verse of the poems from O guardador de rebanhos (Pessoa, 2007).
Table 1. General features of the 'corpus'.

General features of the 'corpus'--The keeper of flocks

Text units                    1074
No. of characters             39137 (29357 letters; 6386 blank
                                spaces; 1394 other delimiters)
* tokens                      6603
* word forms                  7409
* delimiters                  1394
Annotations                   27473
Ambiguity                     1139 different types of ambiguity
Non-ambiguous linguistic      501

Source: Silberztein (2015).

Table 2. List of theme words.

Frequency                   Token

146                      e/E [is/Is]
140                    nao/Nao [no/No]
74           eu/Eu I [lower-case]/I [upper-case]
43                      flores flowers
40                     coisas [things]
31                        sol [sun]
30                        sao [are]
27                     arvores [trees]
24                ha/Ha [there is/There is]
24                    Natureza [Nature]
24                    Pensar [to think]
24                       ser [to be]
22                      coisa [thing]
20                        Deus [God]
20                      nada [nothing]
19                        ver [see]
18                       sei [I know]
17                        tem [has]
16                        esta [is]
16                       olhos [eyes]
16                     luar [moonlight]
16                         sou [am]
15                        ceu [sky]
15                     Tem [they have]
15                        toda [all]
15                        ceu [sky]
14                       rio [river]
14                      rios [rivers]
13                       todos [all]
13                     Saber [to know]
13                      sinto [I feel]
13                      vezes [times]
13                      ter [to have]
13                       homens [men]
13                nunca/Nunca [never/Never]
12                     pedras [stones]
12                       alma [soul]
12                      tenho [I have]
12                      gente [people]
12               passa [s/he passes/goes by]
12                        dia [day]
12                      noite [night]
12                        era [was]
11                    natural [natural]
11                       sabe [knows]
11                       tinha [had]
11               Montes [s/he rides / hills]
11                     penso [I think]
10                       casa [house]
10                      flor [flower]
10                       terra [land]
10                       olhar [look]
10                       vento [wind]
10                     aldeia [village]
10                       cor [colour]
9                      sentir [to feel]
9                        agua [water]
9                       estrada [road]
9                        vida [life]
9                      janela [window]

Source: The keeper of flocks (Pessoa, 2007).

Table 3. Thematic fields in The keeper of flocks.

          Theme words                      Thematic fields

olhos; cor; olhar; ver (6).          Thematic field of sight (O
                                           ver) [The look]

Deus; alma (7).                           Thematic field of
                                     transcendence (A metafisica

flores; sol; arvores;                Thematic field of universe
Natureza; ser; luar; ceu;            (O real objetivo) [The real
rio; rios; homens; agua;                     objective]
dia; noite; pedras; montes;
natural; flor; terra; vento;
coisas; coisa; gente;
aldeia; vezes; passa; casa;
cor; vida; janela; estrada

sinto; sei; saber; sabe;            Thematic field of the senses
sentir (9).                               (O sensacionismo)

pensar; penso (10).                 Thematic field of reflection
                                    (O pensamento) [The thought]

nao /Nao; nunca /Nunca; nada        Thematic field of refusal (A
(11).                                  rejeicao do pensamento/
                                      misticismo) [Rejection of

Source: The keeper of flocks (Pessoa, 2007).

Table 4. Example of activity--definition /concordance of
the theme word.

Theme word arvores [trees]

Entry (19)                                     Concordance

1. BOTANICA planta lenhosa             nas ruas, E a maneira como
que pode atingir grandes                dava pelas coisas, E o de
alturas e cujo tronco se              quem olha para arvores, E de
ramifica na parte superior              quem desce os olhos pela
                                      estrada por onde vai andando
                                                E anda a

[BOTANICS Woody plant that            [the streets, And the way he
can reach great heights and           had of taking things in, Was
whose trunk branches in the              like someone looking at
upper part]                             trees, Or lowering their
                                       eyes to the road where they
                                        go walking Or taking in]

2. Representacao de alguma
coisa em forma de um esquema
com tronco e ramificacoes

[Representation of something
in the form of a schema with
trunk and branches]

3. MECANICA peca principal
rotativa de uma maquina

[MECHANICS main rotating
part of a machine]

4. MECANICA eixo, veio, fuso

[MECHANICS shaft, axle,

5. NAUTICA mastro completo
do navio

[SAILING the ship's full-mast]

6. 'figurado' pessoa muito

['figurative meaning' a very
tall person]

6. LINGUISTICA representacao
grafica da estrutura de uma
frase ou oracao, salientando
as relacoes de hierarquia e
derivacao por meio de linhas

[LINGUISTICS graphical
representation of the
structure of a sentence or
clause, emphasising
relationships of hierarchy
and derivation by means of
descending lines]

arvore de Natal

[Christmas tree]

arvore genealogica

[Family tree]

Source: Dicionario Infopedia da lingua portuguesa da Porto
editora[R] (2017)

Table 5. Example of activity--thematic fields in poems from
The keeper of flocks.

Theme words                                  Thematic fields

Example:                           Thematic field of Visao [Sight]
olhos; cor; olhar;
ver [eyes; colour; look; see]      1. Thematic field of--
                                   2. Thematic field of--
                                   3. Thematic field of--
                                   4. Thematic field of--
                                   5. Thematic field of--
                                   6. Thematic field of--

Source: The keeper of flocks (Pessoa, 2007).
COPYRIGHT 2019 Universidade Estadual de Maringa
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2019 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Assuncao, Carlos; Araujo, Carla
Publication:Acta Scientiarum. Language and Culture (UEM)
Article Type:Report
Date:Jan 1, 2019
Previous Article:Anthropophagy, intimate feeling and synchronicity: a possible introduction to analysis of humor in Brazilian literature/Antropofagia, sentimento...
Next Article:O Bobo, by Herculano, and its models Ivanhoe and Notre Dame de Paris: the Portuguese version of the nation formation/O Bobo de Herculano e seus...

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters