Taxonomy of Knowledge Organization Sources/Systems (1)

Draft June 7, 2000 (revised July 31, 2000)

The descriptions given here are simply to provide an overview of possible sources for the organization of digital libraries.  The descriptions are based on characteristics such as structure and complexity, the relationships between terms, and historical function.  The list is not intended to be comprehensive nor are the definitions specifically based on standards.  The specific types are grouped into general categories -- term lists, which emphasize lists of terms often with definitions; classifications and categories, which emphasize the creation of subject sets; and relationship lists, which emphasize the connections between terms and concepts.  This is extremely draft and was really produced for a different purpose, so there has not been sufficient thought given to describing the differences by how they would need to behave in a networked environment.

Term Lists

Authority Files

Authority Files are lists of terms that are used to control the variant names for an entity or the domain value for a particular field. Examples include names for countries, individuals, and organizations.  Non-preferred terms may be linked to the preferred versions. This type of KOS generally does not include a deep organization or complex structure.  The presentation may be alphabetical or organized by a shallow classification scheme.  There may be some limited hierarchy applied in order to allow for simple navigation, particularly when the authority file is being accessed manually or is extremely large.  Specific examples of authority files include the Library of Congress Name Authority File and the Getty Geographic Authority File. 

Glossaries

A glossary is a list of terms, usually with definitions.  The terms may be from a specific subject field or those used in a particular work.  The terms are defined within that specific environment and rarely have variant meanings provided.  Examples include the EPA Terms of the Environment.

Gazetteers

A gazetteer is a dictionary of place names. Traditional gazetteers have been published as books or they appear as indexes to atlases.  Each entry may also be identified by feature type, such as river, city, or school.  Geospatially referenced gazetteers provide coordinates for locating the place on the earth’s surface. An example is the Geographic Names Information Service <http://www-nmd.usgs.gov/www/gnis/>.  Note that the term “gazetteer” has several other meanings including an announcement publication such as a patent or legal gazetteer.  These gazetteers are often organized using classification schemes or subject categories.

Dictionaries

Dictionaries are alphabetical lists of terms and their definitions that provide variant senses for each term, where applicable.  They are more general in scope than a glossary.  They may also provide information about the origin of the term, variants (both by spelling and morphology), and multiple meanings across disciplines.  While a dictionary may also provide synonyms and through the definitions, related terms, there is no explicit hierarchical structure or attempt to group terms by concept. 

Classification and Categorization

Subject Headings

This scheme provides a set of controlled terms to represent the subjects of items in a collection.  Subject heading lists can be extensive, covering a broad range of subjects.  However, the subject heading list’s structure is generally very shallow, with a limited hierarchical structure.  In use, subject headings tend to be pre-coordinated, with rules for how subject headings can be joined to provide more specific concepts.  Examples include the Medical Subject Headings (MeSH) and the Library of Congress Subject Headings (LCSH).

Classification Schemes, Taxonomies and Categorization Schemes

These terms are often used interchangeably.  Though there may be subtle differences from example to example, in general these types of KOSs provide ways to separate entities into “buckets” or relatively broad topic levels.  Some examples provide a hierarchical arrangement of numeric or alphabetic notation to represent broad topics. These types of KOSs may not follow the strict rules for hierarchy required in the ANSI NISO Thesaurus Standard (Z39.19) (NISO), and they lack the explicit relationships presented in a thesaurus. Examples of classification schemes include the Library of Congress Classification Schedules (an open, expandable system), the Dewey Decimal Classification (a closed system of 10 numeric sections with decimal extensions), and the Universal Decimal Classification (based on Dewey but extended to include facets). Subject categories are often used to group thesaurus terms in broad topic sets, outside the hierarchical scheme of the thesaurus. Taxonomies are increasingly being used in object oriented design and knowledge management systems to indicate any grouping of objects based on a particular characteristic. "Taxonomy" may also refer to a scheme that presents biota in a hierarchical arrangement based on some characteristic.

Relationship Groups

Thesauri

These KOSs are based on concepts, and they show relationships between terms. Relationships commonly expressed in a thesaurus include hierarchy, equivalence, and associative (or related).   These relationships are generally represented by the notation BT (broader term), NT (narrower term), SY (synonym), and RT (associative or related).  Associative Relationships may be more granular in some schemes.  For example, the Unified Medical Language System (UMLS) from the National Library of Medicine has defined over 40 relationships, many of which are associative in nature. Preferred terms for indexing and retrieval are identified. Entry terms (or non-preferred terms) point to the preferred terms that are to be used for each concept.

There are standards for the development of monolingual thesauri (NISO, 1998; ISO, 1986) and multi-lingual thesauri (ISO, 1985). However, in these standards the definition of a thesaurus is fairly narrow.  Standard relationships are assumed, as well as the identification of preferred terms, and there are specific rules for the creation of the relationships between terms.  It should be noted that the definition of a thesaurus in these standards is often at variance with schemes that are actually called thesauri. There are many thesauri that do not follow all the rules of the standard, but are still generally thought of as thesauri.  Note: Another type of "thesaurus" represents only equivalence (synonymy), such as the Roget's Thesaurus (with the addition of classification categories).

Many thesauri are very large (more than 50,000 terms).  Most were developed for a specific discipline, or to support a specific product or family of products.  Examples include the Food and Agricultural Organization’s Aquatic Sciences and Fisheries Thesaurus and the NASA Thesaurus for aeronautics and aerospace-related topics.

Semantic Networks

With the advent of natural language processing, there have been significant developments in the area of semantic networks.  These KOSs structure concepts and terms not as hierarchies but as a network or a Web.  Concepts are thought of as nodes with various relationships branching out from them.  The relationships generally go beyond the standard BT, NT and RT.  They may include specific whole-part relationships, cause-effect, parent-child, etc.  One of the most noted semantic network is Princeton’s WordNet, which is now used in a variety of search engines.

Ontologies

“Ontology” is the newest label attached to some KOSs.  Ontologies are being developed as specific concept models by the Knowledge Management community. They can represent complex relationships between objects, and include the rules and axioms missing from semantic networks.  Ontologies that describe knowledge in a specific area are often connected with systems for data mining and knowledge management.

(1) Hodge, Gail. “Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files” CLIR Pub91.  April 2000. (www.clir.org/pubs/abstract/pub91abst.html)

 

 

Figure 1. Various Types of KOS(2)

 

KOS types

(2) Source: Zeng, Marcia Lei. "Knowledge Organization Systems (KOS)". Knowledge Organization, 35(2008)No.2/No.3

Figure 1 shows the types of knowledge organization systems ( KOS ), arranged according to the degree of controls introduced (from natural language to controlled language) and the strength of their semantic structure (from weakly structured to strongly structured), corresponding to the major functions of KOS. It represents a visualized summarization of the Taxonomy of Knowledge Organization Sources/Systems (http://nkos.slis.kent.edu/KOS_taxonomy.htm) adopted by the NKOS group based on Gail Hodge’s article on KOS (www.clir.org/pubs/abstract/pub91abst.html).