Tuesday, December 18, 2007

Definitions in RDA Scope

Originally posted to the RDA list.

The RDA scope document defines some basic concepts that presumably will be used throughout RDA. Some of these concepts it takes from the Dublin Core Abstract Model. In particular, it uses "literal value surrogate" and "non-literal value surrogate." These are defined in footnotes of the scope document as:

The term literal value surrogate is used as defined in the DCMI Abstract Model: “a value surrogate for a literal value, made up of exactly one value string (a literal that encodes the value)”.

The term non-literal value surrogate is used as defined in the DCMI Abstract Model: “a value surrogate for a non-literal value, made up of a property URI (a URI that identifies a property), zero or one value URI (a URI that identifies the non-literal value associated with the property), zero or one vocabulary encoding scheme URI (a URI that identifies the vocabulary encoding scheme of which the value is a member), zero or more value strings (literals that represent the value)”.

I found a more concise definition of this in a PPT by Lutz Maicher, University of Leipzig:

- a resource which is a non-literal value is represented by a proxy
- a resource which is a literal value is represented as literal

In the above, "literal" means a text string. So "Melville, Herman" is a literal, while "http://www.loc.gov/names/#n_79006936" is a non-literal proxy (because it points to the authority record, which is where the actual value is held).

The scope document then states:

- A label is represented by a literal value surrogate.
- A quantity is represented by a non-literal value surrogate
- A quality is represented by a non-literal value surrogate.
- A type is represented by a non-literal value surrogate
- A role is represented by a non-literal value surrogate.

However, in the element analysis in the scope document, it shows that quantities can be represented identically to labels (and I suspect that all other data types can as well). So that document has (and here there is a diagram that I cannot reproduce in email):

label
[resourceURIref] -> rda:title_proper -> [plain value string]

quantity
[resourceURIref] -> rda:extent -> [typed value string]^^[syntax encoding scheme]
- or -
[resourceURIref] -> rda:non_linear_scale -> [plain value string]

Given that the label example and the second example under quantity are structurally the same, I don't see how one can be a literal and one a non-literal.

I see two possibilities here. One is that all of the above has no real effect on the development of RDA, and therefore any errors in interpretation of the DCMI model can be ignored. The other is that the misunderstanding (which I think it is, but wait to be proven wrong) is significant, and therefore needs to be corrected as part of the development of RDA.

My gut feeling is that it is the former -- I don't see references to these definitions in the RDA text itself, and all values are treated as simple value strings. For example, dates are just text:

Record the date of the expression by giving the year or years alone.
1940 (p. 6-47 5rda_sec2349.pdf)

And quantities also seem to be just text strings as well:
46 slides
12 cm (from 5rda-parta-ch3rev.pdf)

Thus, at least as far as the RDA text is concerned, there are only literal values.

If this is not the case, would some please present the argument for a different understanding. Thank you.

Friday, December 07, 2007

Interpretations of FRBR Classes

Because it makes use of an entity-relationship model, FRBR consists of two primary concepts: things and relationships. (I often think of them as nouns and verbs.) In the "things" category, FRBR defines 10, which it calls entities. They are: Work, Expression, Manifestation, Item, Person, Corporate Body, Concept, Object, Event, Place.

This is an admirably short list of basic building blocks for bibliographic data. The question is: is it enough? Can we really express our bibliographic data with just these basic concepts? The answer is: probably not. Although we should take a lesson from FRBR and try to keep our set of basic entities small, while allowing for extension of them to express more complex concepts.

As an exercise, I took two well-known attempts to model FRBR using formal definitions. One is the FRBR in RDF, the other is FRBRoo. I also took the RDF entries that Martha Yee created for her cataloging rules and added those to the comparison although it is important to note that Yee's set of RDF statements is intended to go beyond FRBR since it is an expression of cataloging rules, not just the FRBR model.

In each of these three efforts, the FRBR entities are recorded as classes, and the FRBR relationships are recorded as properties. This is in keeping with the definitions in the RDF schema. What is interesting is the number of classes that are defined:

  • FRBR in RDF: 13 classes
  • FRBRoo: 23 classes, 18 sub-classes, 41 total
  • Yee's schema: 23 classes
These are compared to the 10 classes (entities) defined in FRBR. Since no one defined fewer classes, we need to look at what additional classes were defined. But first, there are a few cases where FRBR classes were not included, usually because they were substituted with a set of more detailed classes.

  • FRBRoo does not include Manifestation, but instead has Manifestation product type and Manifestation singleton
  • Yee's substitutes Event as subject for the FRBR class Event and substitutes Place as geographic area and Place as Jurisdictional Corporate Body for the FRBR Place
FRBR in RDF

FRBR in RDF adds only three classes. Two of these (Endeavor and ResponsibleEntity) are supersets of FRBR classes. Endeavor is a generalization that can be related to a work, expression, or manifestation. Similarly, ResponsibleEntity is a more general term that can relate to either a corporate body or a person. Both of these seem fairly sensible, allowing you to refer to the intellectual content or some actor without having to specify more information. It's like being able to say "it" without having to saying exactly to what you are referring.

The third class that is added is Subject. As a matter of fact, all three of these include some instance of subjects as classes in their schemas. FRBR clearly treats subject as a relationship. (And I would like to understand why these three interpreted subject as a class -- so post if you have ideas/knowledge on that, please.)


FRBRoo

FRBRoo is a very interesting interpretation of FRBR. As they state in the document, attempting to re-define FRBR using object-oriented rules rather than entity-relationship rules is a way to test the underlying concepts in FRBR. They also tackle the elements that in FRBR that are called "attributes." (Aside: The FRBR attributes are a bit odd, IMO. They seem to be all over the place and there is no explanation of how they were determined or any way to give them some organization. I don't think they actually fit the definition of attributes in E-R, which seem instead to be on the order of identifiers). The folks working on FRBRoo decided to treat the attributes as properties, that is, relationships between the classes.

FRBRoo defines 23 primary classes with 18 subclasses. They address the issue of complex items, such as articles within serials or collections of essays, by creating classes for aggregate and serial works. Some of the classes seem to be what I would normally understand as genres. As an example, there is a class Performance Plan that is described as:
This class comprises sets of directions to which individual performances of theatrical, choreographic, or musical works and their combinations should conform.
Another example of a new class is Publication Event. This is an action that is part of the work flow of publication, such as

Establishing in 1972 the layout, features, and prototype for the publication of “The complete poems of Stephen Crane, edited with an introduction by Joseph Katz” (ISBN “0-8014-9130-4”), which served for a second print run in 1978.
Being an action, I would tend to express this as a property (a verb). So the layout, features, etc. could be subclasses of a manifestation, there would be an actor (a noun, or a class, probably the publishing house, or more specifically a book designer), and a time. The verb (or property) could be "designed" "typeset" "printed" etc. This makes me wonder about the FRBR class Event as a noun, but I think I could buy into a concept of named events ("WWII" "Election day 2008" "Beatles first appearance on Ed Sullivan"). Interestingly, it does appear that all of these are events as subjects, as Event is defined in FRBR; the FRBRoo event does not appear to have this noun-ish characteristic.

Yee Schema

Martha Yee's set of classes (23 of them, but not the same 23 as FRBRoo) includes Genre/Form as a class. Genre/form seems to be more of an attribute about a work rather than something that has "thingness" in itself. It's hard to imagine how you can have genre/form without it relating to a work. (As opposed to: you can have a person or a corporate body that are things in and of themselves -- that have specific, unique identities.)

It has some classes that might be considered sub-classes. For examples, Place as geographical area and Place as jurisdictional corporate body would seem to be sub-classes of Place, although Yee does not include Place itself in her schema. I'm less clear about classes such as Corporate Subdivision, which has a part/whole relationship with Corporate Body, not a sub-class relationship. (Sub-class would be an "is a type of" relationship, and corporate subdivision is not a type of corporate body, it's a part of a corporate body.) Ditto the subject-related terms: Subject, Subject subdivision, Subject chronological subdivision, Subject form subdivision, Subject geographical subdivision, Subject topical subdivision. In FRBR, the subject is a relationship with the work. These look to me to be relationships with the subject heading, although there is no class for subject headings (unless that is what is meant by the class Subject, but I don't think it would be a good idea to equate subject with subject heading because it makes it impossible to include classifications as subjects or keywords as subjects).

What's the upshot? Well, it would take a good sit-down with all involved to hash out the differences, to understand what each group or person was thinking, and to see if we can formulate a theory of how one extends FRBR to meet ones needs. If a number of people turn out to have the same needs, then it may be that the FRBR model itself needs to take in those ideas. The only way to work this out is to keep modeling and sharing. So I thank the three featured here for the extensive work that they have done in this area.