
domenica 28 giugno 2009

NSM Semantic Mark-up for Dongba Pictographs

This short draft is taken from CL.A.U.D.I.A.'s intro page and it is focused about a peculiar approach to semantic annotation of Dongba manuscripts by NSM primes as mark-up tagger. Anna Wierzbicka

The NSM theory known as “Natural Semantic Metalanguage” has been developed by A. Wierzbicka and, later, by C. Goddard and other scholars.

The “basic idea is that we should try to describe complex meanings in terms of simpler ones. For example, to state the meaning of a semantically complex word we should try to give a paraphrase composed of words which are simpler and easier to understand than the original. This method of semantic description is called reductive paraphrase. It prevents us from getting tangled up in circular and obscure definitions, problems which bedevil conventional dictionaries and other approaches to linguistic semantics. No technical terms, neologisms, logical symbols, or abbreviations are allowed in a reductive paraphrase – only plain words from ordinary natural language.” (NSM Semantics in Brief: → 2009, June 28th )

By reductive paraphrase is thus possible to proceed as method for meaning analysis, which conduces to every language’s “semantic core”, “ a language-like structure, with a lexicon of indefinable expressions ("semantic primes") and a grammar [...] governing how the lexical elements can be combined ”: in other words a “mini-language” as expressive powerful as full natural language.

Semantic complex concepts thus could be explained by reductive paraphrase method, for instance a pair of samples from Wierzbicka 1996: 251-253; Goddard 1988):

  • Semantic emotion terminology: feeling invidious

    • X felt invidious =

      • X felt something bad

      • because X thought like this about someone else:

        • something good happened to this person

        • it didn’t happened to me

        • this is bad

        • I want things like this happen to me

  • Causative verb break, as Person-X broke Y (ex.: Pete broke the window)

    • X broke Y =

      • X did something to Y

      • because of this, something happened to Y at this time

      • because of this, after this Y wasn’t one thing any more

Anna Wierzbicka aiming always to reduce the terms of the explications to the smallest and most versatile set by experimentation: she thus identified a core of 60 semantic primes (Goddard & Wierzbicka, “Meaning and Universal Grammar”, Eds 2002)

Semantic primes like I, YOU, SOMEONE, SOMETHING, THIS, HAPPEN, MOVE, etc.., identifying simples and intuitively intelligible meanings, are essential for numerous other words meaning explicating and non-circular grammatical constructions.

According to Wierzbicka, English core at today consists in:


  • Relational substantives: KIND, PART

  • Determiners: THIS, THE SAME, OTHER/ELSE

  • Quantifiers: ONE, TWO, SOME, ALL, MUCH/MANY

  • Evaluators: GOOD, BAD

  • Descriptors: BIG, SMALL

  • Mental predicates: THINK, KNOW, WANT, FEEL, SEE, HEAR

  • Speech: SAY, WORDS, TRUE

  • Actions, events, movement, contact: DO, HAPPEN, MOVE, TOUCH

  • Location, existence, possession, specification: BE (SOMEWHERE),THERE IS, HAVE, BE (SOMEONE/SOMETHING)

  • Life and death: LIVE, DIE



  • "Logical" concepts: NOT, MAYBE, CAN, BECAUSE, IF

  • Intensifier, augmentor: VERY, MORE

  • Similarity: LIKE

Such primes are could well been implemented as Xml well-formed tree, constructing a NSM ontology skeleton, as exemplified below:

Implementation of NSM Semantic Mark-up for Dongba pictographs thus consists in:

  • annotations of English translation of CL.A.U.D.I.A. stored Dongba pictographs by NSM English semantic core marking-upFor instance: pictograph Ea1 will be semantically marked-up as

Such sample represents a very basic approach to Dongba pictographs NSM mark-up I am working over.

NSM studies have been carried out in a wide range of a languages, including English, Russian, French, Spanish, Polish, Italian, Ewe, Malay, Japanese, East Cree, Chinese, Mbula, Yankunytjatjara, Arrernte, and Maori, among others.

Considering Naxi language and Dongba pictograph writing system as close and strictly linked – but independent phenomena, my aim is to study by NSM approach Naxi and Dongba for:

  • identification of Naxi metalanguage semantic core

  • identification of Dongba meta-pictographs semantic core

  • confronts and among Naxi and Dongba cores for patterns and relationship identification and analysis

Do.M.En.I. - Dongba Manuscripts Encoding Initiative

Kame miniate

Do.M.En.I. means Dongba Manuscript Encoding Initiative. It's a CL.A.U.D.I.A. parallel project, originating and taking form by Dongba pictograph manuscript tradition and digital humanities study.

Do.M.En.I. is focused about implementing an on-line encoded corpus of Dongba manuscripts as a World Wide enjoyable resource for Dongba manuscript and Naxi pictograph literature, according to W3C standards for text and corpora encoding.

World Wide web technology at today should be seen as the wider library to be consulted for information retrieval and documents browsing, especially in Academic world for exchanging of material and up-to-date information which run faster then every paper-publication or review.

Many digital humanities application is dedicated to texts browsing and documents retrieving, and cover large range of kind of information: from daily newspapers to linguistic annotated corpora or more specialized archives. Plenty of digital texts and their extreme heterogeneity pushed W3C consortium (among other world wide question) to define some base guidelines for digital encoding of text and corpora digital encoding. at today W3C recommends TEI guidelines for digital texts encoding and CES for corpora encoding.

Do.M.En.I., project of diffusion and acculturation about Dongba pictograph manuscript tradition and Naxi people culture, follows and accords to W3C standards recommendations because it's the best way to open such cultural and artistic treasures to academic world and all members of human kind. Do.M.En.I. thus operates in retrieving of just-available on-line resources re-organizing them in a CES according corpus, proceeding to single manuscripts encoding according to TEI recommendations.

At today two are the only available on-line resources for Naxi pictograph manuscripts tradition, both displaying some tens of digital reproduction of manuscripts pages:

they have heterogeneous standards for digital reproduction of manuscripts, moreover both corpora is just stops to "0 level" of text encoding. (Lenci et al., 2005: 57) , for Dongba pictograph text reproduction of manuscripts pages, sequences of images without any deeper structural, meta-structural, linguistic, syntactic and semantic annotation.

Do.M.En.I. projects if focused in building a complete (and open for up-to-dating initiatives) resources as annotated corpus of Dongba manuscripts, and this plays for:

  • safeguard and conservation of Dongba manuscripts and Naxi pictograph literature

  • diffusing and enjoying Dongba tradition and Naxi pictograph literature

  • convergence and unification of just-existing online resources

  • implementation of new resources according to W3C standards

  • study of Dongba pictograph writing system

  • study of Naxi language

Do.M.En.I. targets are finally summarizable in 2 main scores:

  • defining and building a codify schema for encoding Dongba pictograph manuscripts according to TEI standard

  • implementing a World Wide Corpus of encoded Dongba manuscripts according to CES standard

For wider documentation please visit Do.M.En.I. and CL.A.U.D.I.A. dedicated pages.