Infrastructure to Empowerment:
An OSWA+GIS Model for Documenting Local Languages

    
    Moses Ekpenyong, Nnamso Umoh (University of Uyo)
Mfon Udoinyang, Golden Ibiang, Eno-Abasi Urua (University of Uyo)
Dafydd Gibbon (Bielefeld University)

     1 Introduction

Language is an important feature of human society. Some of the important functions of language include the informative, which provides a description of conditions; the expressive, which has to do with the feelings of language users; and the directive, which is used to direct behaviour in a specific way. Language is a primary means of communication between members of the same speech community. It is therefore easy to see that language is critical since it obviously permeates every facet of human life. Different languages are spoken by different people living as a unit in different locations in the world. The maps of the locations of these languages, for instance, can be digitally produced or reproduced by the application of GIS technology. GIS technology is useful for scientific investigations, resources management, access management, development planning, cartography and route planning. With GIS, emergency response time computations in the event of natural disaster can be made. The application could also be useful when locating wetlands that require protection from pollution.

Various techniques are used in GIS. Some of these techniques as outlined by [1] include:

(1) Relating information from different sources: Information about rainfall in a state could be related to aerial photographs of a country. This might assist in knowing which wetlands dry up at certain times of the year. A GIS, which gathers information from many different sources in different forms can be used for the analysis.
(2) Data representation: GIS data represents real world objects (roads, land use, elevation) with digital data. Real world objects can be classified into two: discrete (a house) and continuous fields (rain fall amount or elevation).
(3) Data capture: This involves entering data/information into the system. This consumes much of the time of GIS practitioners. Various data capture methods exist: scanning, survey, remote sensing, photogrammetry, etc.
(4) Data manipulation: GIS data can be converted into different formats, for instance, a GIS may be used to convert a satellite image map into a vector structure by generating lines around all cells with the same classification, while determining the cell spatial relationships, such as adjacency or inclusion.
(5) Projections, coordinate systems and registration: A property ownership map and soils map might show different data at different scales. Map GIS information should be manipulated so that it registers or fits with information gathered from other maps.

It has been observed by [1] that the GIS market has resulted in lower costs and continual improvements in the hardware and software components of GIS. These developments in turn, spread the GIS technology to all facets of lives such as science, government, business and the industry. GIS applications can be developed for real estates, public health, crime mapping, national defense, sustainable development, natural resources, transportation and logistics.


     2.0 Language Archiving/preservation

Language archiving has become so prominent in today's society and is undergoing a radical transformation. In fact, the need to archive/preserve languages has caused a shift of focus from descriptive linguistics to documentary linguistics. [2] defines the aim of language documentation as providing a comprehensive record of the linguistic practices characteristic of a given speech community.

The minority languages are disappearing at an alarming rate. The Foundation for Endangered languages in England estimates that more than half of the world's approximately 6,000 languages are not being transmitted effectively to the next generation, [3]. Hence, a case of disaster for instance, could extinct such languages if there are no proper records and the languages will be as good as not existing at all. This will represent a great loss of a wealth of wisdom/knowledge and a catastrophic loss of information. Indeed, this is also particularly relevant to preliterate societies as stated in [4] thus:

In as much as the primary motivation for LD was the attempt to contain language endangerment in order to preserve the fast disappearing languages and the concomitant cultural artifacts and histories, LD has also been invaluable and beneficial in relation to languages that are unwritten and only orally transmitted. It is vital to make this distinction between an endangered language and a language which is unwritten and whose transmission is oral, primarily because a language may be used in all the cultural activities of a community, transmitted from the adult generation to the next generation, and not be necessarily 'endangered' in the regular sense of the word.

This paper seeks to combat this imminent loss of linguistic diversity by empowering the native speakers of undocumented languages to document their own languages using available infrastructure. [5] proposes five benchmark principles required for a Workable Efficient Language Documentation (WELD). In addition to these principles, documented languages must also be dependable: having content that is rich and reliable to fall back on. In this era of Information Technology, there has been a shift from the traditional methods (use of pen and paper) to contemporary or modern methods (use of multimedia equipment). These infrastructures should be provided to the local communities and made affordable to empower them in the language documentation process.

The University of Hawaii Student Directed Language Documentation Project (LDP) for instance, has produced a wide range of digital materials on minority languages in the form of Web pages, thus making them accessible on the Internet, audio and visual records of twenty-three ethno linguistic groups that are currently represented on their campus. Though these Web pages provide basic information on languages from many places in the world – from Tiwa in New Mexico to Kalmyk in Russia – including many Austronesian languages spoken in Southeast Asia and the Pacific, they do not provide the needed local content to withstand the "endangerment threat". Providing the needed local content most importantly requires an extensive and effective fieldwork/research on these communities plus relevant infrastructure for documentation and archival to make for ease of retrieval.

[6] encourages the creation of high quality decentralized repositories for documenting West African languages. They recommend that these repositories should be made accessible via an Internet metadata portal such as Open Language Archive Community (OLAC) and discusses criteria with reference to documentation experiences involving three ongoing projects: (1) designing an encyclopedia, (2) documenting an endangered language and (3) creating a speech synthesizer. The paper specifically pays attention to the provision of metadata, a formal variety of library catalogue and maintenance information about data.

[7] has addressed seven dimensions of portability for digital language documentation and description, identifying problems, establishing core values, and proposing best practices. They survey existing tools and technologies and discuss the problems arising with the resources created using these tools and technologies. The portability problems are discussed under content, format, discovery, access, citation, preservation, and rights.


     2.1 Information Access

When information access becomes an integral part of rural areas package, the root of many superstitions and beliefs rather than lack of scientific temper resulting from lack of information will be addressed. Our hybrid system as stated in the abstract, will not only document culture and heritage, but also provide broader access to important information vital to the people by providing a TTS system with an unlimited domain that will synthesize important information/news in local languages and made available to the local people in a language they are familiar with. Given the initial achievements of a parallel project in the Local Language Speech Technology Initiative (LLSTI) consortium, we hope that this resource will be incorporated into this project when fully developed. The LLSTI collaboration aims at developing TTS synthesizers for local languages using the Festival speech synthesis system, [8], [9], [10]. At the moment the consortium has uploaded to its website (http://llsti.org), synthesizers in the following languages: Hindi – an Indian language, IsiZulu – a South African language, Ibibio – a West African (Nigerian) language, and KiSwahili – an East African (Kenyan) language. The consortium members are working to improve the synthesis quality of these prototypes. This incorporation will greatly empower pre-literate societies as well as people who may be physically and or mentally challenged.

In this paper, we focus on providing these resources for Ibibio, Nkari, Anaang and O`ro`, with a long term goal of developing something equivalent to the Simple Computer-Handheld Community Digital Assistant (CDA) enterprise of the "Bangalore Seven" used in India (http://www.simputer.org/) or even adapt most properties of this design to cell phones, given sufficient funding. The design will in no doubt serve as a model for other languages and will empower the language communities to properly document their languages.


     3.0 COMPONENTS AND METHODS

3.1 GIS Components

There appear to be five elements, which together comprise a GIS. These are computer hardware, computer software, data, liveware (for instance people) and methods, [11], [12].


     3.2 Infrastructure/ Manpower Development

Given the global threat of language endangerment in addition to the large number of undocumented languages, a tripartite research cooperation involving the University of Uyo, (Nigeria), the Universität Bielefeld (Germany) and the Université de Cocody, Abidjan (Cote d'Ivoire) was established in 2002. The collaboration has resulted in a pilot MA programme on Computational Language Documentation (CLD) in the University of Uyo, Nigeria from the 2005/2006 academic session. The MA programme, which is the first in Nigeria, is also expected to commence in the other two Universities soon. This exchange of knowledge will not only foster effective research but also prepare manpower ahead for this great task (documenting as many local languages as possible). With the collaboration, students will receive sound training on documentation technology by experts from these universities and will concentrate on documenting local languages, which will lay a firm foundation for effective modeling and strategies that could be applied to the documentation of other local languages.


     4.0 Documentation Model – What to Document?

We present an architectural model for our design after a fieldwork/research on some local communities in Nigeria. The model contains the needed local content that informs a visitor, for instance, on what to expect when visiting such communities, but more importantly is capable of preserving the languages and culture, should endangerment threat occur:

Fig. 1. OSWA+GIS Architecture

The needed local content will be provided in the form of visuals/pictures and mostly in local languages with translations to the official language in Nigeria, which is English.


     4.1 Pilot Design

In this section we present sample GUIs of an ongoing Web Archive of four local languages (Ibibio, Nkari (endangered), Anaan# and O`ro`) in Akwa Ibom State of Nigeria. We intend to restructure/modify the archive until it is as descriptive as possible and rich in local content. The project Home page displays the Nigerian map from where the needed resources (Archive/GIS) can be accessed by clicking on the state of interest. Currently, we have activated Akwa Ibom State, which is our initial implementation domain.


Fig. 2. OSWA+GIS Home Page

On clicking a state of choice, the map of the state is displayed with the respective spoken languages. The user can then select a language of choice, which then leads him/her to the language community as shown in fig. 3 below:


Fig. 3. Nkari Local Community – Interface

From the language community interface the user can access a wealth of information about that community. At the moment we focus on digital information. Figs. 4 and 5 displays information about development and culture/tradition of two local communities in Akwa Ibom State.


Fig. 4. Development in Nkari – Interface


Fig. 5. Culture/Tradition (Market/Fishing) of O`ro` – Interface

This archive will be published on the Internet and made accessible to all for access after sufficient fieldwork/research on these communities to ensure that all the model elements are properly covered. Currently we are interested in the documentary aspect of the language. The descriptive aspect will be added later.


     5.0 Challenges and Implementation Strategies

Challenges/experiences on repository creation has been discussed in [6]. Below we present strategies that will ensure effective implementation of this initiative: 1. Researchers and institutions working on language documentation in Africa need proper and adequate infrastructure support in the form of computers, recording equipment (audio and video), appropriate software to carry out their work efficiently and effectively. 2. Regular and steady power supply is a prerequisite to data capture, data processing and data retrieval. This critical ingredient is absent in Nigeria, for instance. It is important to possibly have alternative source of power supply rather than rely on the national grid. Regular and steady power supply will make it possible for both researchers and the communities to easily and quickly access information on the languages. 3. As mentioned in the abstract, local communities have always received poor developmental attention. There is therefore the need to reverse the approach to development from the usual Top-Down-Approach (TDA) to a Bottom-Top-Approach (BTA). BTA, because it will ensure community participation and involvement, will guaranty rapid development and maintenance of the infrastructure. 4. Local communities should be empowered by providing them with affordable infrastructure and more importantly to train them on how to use and manage such facilities. We here stress the need for fairness (returning the by-product back to the local communities) in language documentation. 5. An IT center should be established at least in each local community. This will enable broader access to information and frequent updates were necessary and will transform the way people experience public services, by making these services more accessible, more convenient, more responsive and more cost-effective [13]. 6. There is utmost need to create awareness and sensitize the local communities, industries and government alike to appreciate this unique heritage. This awareness will orient the native speakers and make them willing to provide data/information about their language. 7. A pilot implementation of an Ibibio TTS synthesizer has been made and a fair enough achievement has been achieved. Though much work still needs to be done, we are optimistic that when we approach interest groups, telecommunication companies (MTN, VMobile, etc.), and International institutions (UNDP, EU, etc.), we will have the needed drive and funding to empower as many Nigerian local communities as possible to witness a Language Technology (LT) revolution, where important information/news will be communicated to them at their doorsteps in a language they are already familiar with. Our immediate target is the most common communication infrastructure (the mobile phones), which has reached a handful of the local people. 8. Collaboration between Linguists, Computer scientists and Engineers should be encouraged. This collaboration will not only foster efficient research but will produce a comprehensive and acceptable solution to language archiving/preservation.


     6.0 Conclusion

Languages are in dying need of documentation and thus call for an efficient procedure for documentation. Documented languages must meet in addition to dependability, the five benchmarks (affordability, efficiency, comprehensibility, fairness and state-of-the-art) proposed by [4] to ensure that our unique resources/heritage are not doomed to extinction.

This paper has initiated a practical model/design for documenting local languages, and has focusing on providing a workable platform for local communities. The development of manpower/infrastructure is ongoing (Uyo-Bielefeld-Abidjan project), important devices/infrastructure have been explored and strategies to ensure a successful implementation presented.


     References

[1] Geographic Information System From Wikipedia, the Free Encyclopedia. http://en.wikipedia.org/wiki/GIS
[2] Himmelmann, N. (1998). Documentary and Descriptive Linguistics, Linguistics 36, Berlin: deGruter, pp. 161-195.
[3] UH Language Documentation Project. Helping to Save Endangered Languages. http://www.LLL.hawaii.edu/sltcc/
[4] Urua, E. E., D. Gibbon, F. Ahoua, M. Ekpenyong and E. Gbery 2006. The WALA Initiative: an overview. A paper presented at the UNESCO/ACALAN meeting, Bamako, Mali March 23-25, 2006
[5] Gibbon, D. (2002). Workable Efficient Language Documentation: A Report and a Vision. ELSnews, 11.3. Autumn: 3-5.
[6] Gibbon, D.; Ahoua, F.; Gbéry, E., Urua, E. and Ekpenyong, M. (2004). WALA: A Multilingual Resource Repository for West African Languages. In Proceedings of Language resources and Evaluation Conference (LREC), 2004, Lisbon. Vol. II: 579-582.
[7] Bird, S and Simons, G. (2003). Seven Dimensions of Portability for Language Documentation and Description, Language 79, 557-582.
[8] Black A. and Taylor P. (1997). Festival Speech synthesis System: System Documentation (1.1.1). Human Communication Research Centre Technical report HCR/TR-83.
[9] Black A.; Taylor P. and Caley R. (2002). Festival Speech synthesis System: System Documentation (1.4). University of Edinburgh, UK.
[10] Gibbon, D.; Urua, E-A.; Ekpenyong, M. (2006). Problems and Solutions in African Tone Language Text-To-Speech. In ISCA Tutorial and Research Workshop (ITRW) on Multilingual Speech and Language processing, Stellenbosch, South Africa.
[11] Cho, G. (2001). A Self Teaching Student's manual for GIS, University of Canberra, Australia.
[12] Leon, A and Leon, M. (2004). Fundamentals of Information Technology, Leon Vikas Publishing House, New Delhi.
[13] Ekpenyong, M.; Urua, E. and Gibbon, D. (2004). Local e-Government Text To Speech: A Speech Technology Initiative, Journal of Computer Science and its Applications, Vol. 10. No. 2: 125-131.