Many current developments in collection building, distributed search and retrieval, and special-topic clearinghouses are struggling with the basic question of the need for terminology and classification tools to support information description and retrieval. Experience has shown, once again, that controlled sets of terms are required and that the simple domain list approach has limited value. Thesauri, classification systems, and authority files are well developed information description systems that need to be adapted to the digital library environment.
This workshop will focus on networked implementation of thesauri, classification tools, and authority files but it is not limited to the technical challenges of doing this and it is also not limited to actual standards development. We hope to have participation from those whose applications would benefit from networked terminology/classification/authority systems so that we can identify the needs, understand the advantages, and lay the groundwork for subsequent standards and technology developments. We look for the results of the workshop to be the creation of a working group that would develop standards; e.g., an XML definition for a thesaurus as a starting point or a general scenario for searching and navigating a networked thesaurus or classification system. Also to ongoing ACM DL workshops on this topic, if appropriate.
Participants are invited to the workshop who
Individuals who are known to have active projects in this area will be invited to participate. General announcements through appropriate email distribution will be made. The organizers will select full participants on the basis of the fit with the workshop topic and the potential contribution to progress in this area.
The workshop will be a for a full day, 9-4. The morning will be devoted to general discussion of issues and development paths; there will be breakout sessions in the afternoon to develop the major issues in more detail, followed by a summary session and identification of the next steps toward our goals. Participation will be limited to provide a good workshop environment for fruitful discussions.
Discussion leaders among the participants will be assigned to develop crosscutting discussion points. The goal by the end of the meeting is for communities of interest to form around important issues for further development.
Linda L. Hill is a Research Specialist with the Alexandria Digital Library Project at the University of California, Santa Barbara. She has worked extensively with thesaurus and metadata development and digital library projects. Linda says "My personal goal in relation to this workshop is to get thesaurus principles and practices enabled within digital libraries so that existing thesauri can be more known and accessible, so that developing projects will recognize the value of the thesaurus approach and will develop and use thesauri according to established standards. Also, to evaluate the usefulness of thesauri and classification systems for networked information discovery." lhill@alexandria.ucsb.edu
Gail Hodge, Information International Associates (IIa), has been
involved with production systems for abstracting/indexing services for
20 years. She's currently working as a consultant to USGS on a biodiversity
vocabulary for the National Biological Information Infrastructure (NBII).
She previously held positions with the NASA Center for AeroSpace Information
and with Biosis. Gail says "My main goal is to promote this discussion
at a high level so that what we do within the U.S. Geological Survey, Biological
Resources Division (BRD) context is based on where we are within the technical
community. We know that we can't wait until all the problems are solved,
but we want to be in synch as much as possible. We need an understanding
of the issues involved in using distributed thesauri: How do we handle
rights management, authentication, etc. and charging for commercial databases?
What will users want to do with these distributed thesauri? This
effects navigation, searching, "transfer," etc. What do we need to know
about other thesauri and vocabularies in order to use them in a distributed
fashion? Is a registry both of thesaurus elements and of particular thesauri
necessary to this effort? Where does this effort intersect with the efforts
of others: RDF, XML, metadata schema, metadata registries, Z39.19, Z39.50,
search engine vendors, etc.? I would like to come out of this with at least
a start toward a way to deal with the architecture so that we can move
forward and integrate this effort with other metadata and Internet efforts.
gailhodge@aol.com
Ron Davies is a consultant with Bibliomatics, Inc., a Canadian
information systems consulting firm. He has designed and developed thesaurus
management systems for the Organisation for Economic Cooperation and Development
(OECD) and the International Development Research Centre (IDRC), and led
a project to create a subject classification system for United Nation's
information available over the Internet. He is currently developing Java-based
software for distributed thesaurus management and use. Ron says "My
personal goal has always been to get some agreement on an interoperable
way to connect to thesauri over network, so that I could access a thesaurus
at one site, and use it to index or search resources at another site. This
would mean development of standards in terms of the semantics of thesaual
relations as well as the syntax of consulting a thesaurus. This effort
could build on other standards (e.g. Z39.50, XML) but there's a lot of
specific work that still has to be done.
rdavies@bibliomatics.com
Time
and Place
The workshop will be held in conjunction with the ACM Digital Libraries
'98 Conference, Mariott City Center, Pittsburgh, PA, USA, June 23-26, 1998
details of which can be found at the URL above. It will be held on the
Saturday following the Conference, June 27th, from 9:00am to 4:00pm. Lunch
will be provided.
Schedule for applications and structuring the workshop:
June 5 (Friday)
The convenors of the workshop have developed four "strawman" topics in order to provide a framework for discussion at the workshop. We plan to hold discussions on the first two topics in the morning and then give participants a choice of discussing the last two in the afternoon or continuing the morning discussions.
Please note that thesaurus is a term often used for convenience in the descriptions of the topics, but that we do not intend that any of the discussions should be confined to traditional thesauri. We believe that the problems and solutions apply to a wide range of terminology tools including classification schemes, taxonomies and other structured authority files.
The topics are :
1. The Data Model
What kind of data model is needed to support the interactive use of
thesauri and other terminologies in online information services such as
digital libraries? What data elements and/or relations are needed
to convey the content of these resources? What data elements and relations
are important for multilingual thesauri? How do we represent system classifications
and notations in a way that client software can understand what the notation
means? Does XML hold promise for the representation of thesaurus structures?
At the end of this breakout session, we hope to have concrete proposals
for developing a generalized model thesaurus.
2. The Functional Model
How do users want to use thesauri and other authority files in searching
and resource description What kind of access is important in exploring
or "navigating" through the thesaurus? For example, can a user ask for
a single term (e.g., "chemistry"), a subset of terms (all terms with "chemical"
in them) or a range of terms (e.g. "chemi*"), or all three? How do users
indicate that they want to see an alphabetical view, or a hierarchical
list or a classified (systematic list)? Are there other kinds of ways for
asking for or looking at thesaurus information that are useful? How do
you indicate how much of the list you want to see at one time? At the end
of this breakout session, we hope to have the beginning of a functional
model of the features that are most important to users consulting terminology
services over a network.
3. Thesaurus-level Metadata and Thesaurus Registries
What thesaurus-level metadata is needed to represent the scope, structure,
size, ownership, access constraints, etc. of a thesaurus so that potential
users (for all applications) will know what is available and how to access
and use it? ("Metadata" is intended to mean not the actual attributes of
individual terminology tools but the "collection-level metadata" that would
describe the terminology tool as a whole.) What is the role of thesaurus-level
metadata in enabling the interoperability of online-accessible thesauri?
What role could thesaurus registries play in "advertising" the availability
of thesauri and facilitating access and use? What tasks are involved in
maintaining a registry? What kind of
organizations would best fulfill the registry function?
4. The Business/Intellectual Property Model
What types of collaborative agreements and partnerships are necessary
between thesaurus owners? What kinds of relationships are possible between
owners of vocabularies published over a public network and users? What
are the issues involved? How can users draw on vocabularies from
a variety of different organizations -- government, commercial, academic,
not-for-profit, and international? What are the issues involved with
copyright restrictions, payment or limited access to a thesaurus? What
technologies are necessary to address some of these issues? What impacts
will the expanded use of terminology tools beyond their initial application
have on the structure, design, and maintenance of the tools?