Wednesday, January 31, 2007

Legislating technology - RFID and California SB 30

The California state Senate will be considering a bill that would regulate the use of RFID in identity documents. The bill is over 5700 words long, so it's a bit hard to address its details (and boy, does it have details!) in a short blog post. so I'll just make a few provocative statements and encourage you to read the text for yourself.

Too Much, Too Soon

It is almost always a very bad idea to legislate specific technology, especially when the technology is in its infancy. As far as I know, there have been no privacy breeches related to the use of RFID in any form, much less on identity documents. This legislation is 5700 words and many complex do's and don'ts for something that may not even be an issue five years from now. To its credit, it has a sunset date of December 31, 2013, allowing for change over that time period. Still, I regard this legislation as being too much, too soon.

Background

The legislation is at least partly fueled by work of the Electronic Frontier Foundation on RFID. EFF's view is that RFID is the camel's nose under the tent for a ubiquitous surveillance society. The EFF and the local ACLU argued against the implementation of RFID for materials in the San Francisco Public Library. At this moment, funding to purchase an RFID system by SFPL has been removed by the city council.

What the Bill Says

Having read the bill, I can say that its content would be a very interesting "best practices" document for the use of RFID for identification documents. Of course, such a document would not be written as law, and therefore might be easier to digest. Basically, the bill sets some very strict rules for the implementation of government-issued identity documents that can be read "remotely." These documents would be given much greater scrutiny than, for example, those with visible data or those with data in a magnetic strip. All cards using RF must meet these two standards:

(1) In order to prevent duplication, forgery, or cloning of the identification document, the identification document shall incorporate tamper-resistant features.
(2) In order to determine to a reasonable certainty that the identification document was legitimately issued by the issuing entity, is not cloned, and is authorized to be read, the identification document and authorized reader, in conjunction with related, functionally integrated software, shall implement an authentication process.
I wish I could tell you what this means for someone wishing to implement an identity card, but I'm afraid I don't know. Is this a level of complexity that adds a great deal to the cost? To what extent do these features create vendor lock-in, making it hard for an agency to move to a different vendor's platform? I have lots of questions of this nature throughout the document, and hope that some explanatory documentation will be provided at some point.

The technical details in the document are fairly specific, such as:

(3) If personally identifiable information is transmitted remotely from the identification document, the identification document and authorized reader, in conjunction with related, functionally integrated software, shall not only meet the requirements of paragraph (2) but also shall implement mutual authentication in order to prevent the transmission of personally identifiable information between identification documents and unauthorized readers.

(4) If personally identifiable information is transmitted remotely from the identification document, the identification document shall make the data unreadable and unusable by an unauthorized person through means such as encryption of the data during transmission, access controls, data association, encoding, obfuscation, or any other measures, or combination of measures, that are effective to ensure the confidentiality of the data transmitted between the identification document and authorized reader.

Like most complex legislation, you have to keep track of the "subdivision (a) shall not apply to" or the "except as provided in subdivision (b)", which is why I wish this had been written as a best practices document before getting turned into legislation-eze. However, if I read the document correctly, at one point it suggests that the holder of the document must be able to exercise control over whether the data is transmitted or not, and this requires some kind of physical contact, such as keying on a keypad, or having another person visually checking one's identity. This control is even suggested in the case of identity cards for school students, and I am immediately struck by the need to successfully get elementary school students to use PINs. That aside, this brings us to a dilemma -- if the person is there and available to key in a PIN, why would you go to the expense of using RFID rather than the less expensive magnetic strip or even a barcode?

Exceptions are included for the incarcerated, who presumably have little right to privacy but must be carefully identified, and those in government-run hospitals. In the latter case, however, each new hospitalization requires the creation of a new number. What puzzles me in this section is the last statement below (and note the all in the first sentence):

(5) An identification document issued to a patient who is in the care of a government-operated or government-owned hospital, ambulatory surgery center, or oncology or dialysis clinic if all of the following requirements are met:
(A) The identification document is valid for only a single episode of care.
(B) The identification document may be removed and reattached when used on a nonemergency outpatient.
(C) The identification document does not transmit or enable the remote reading using radio waves of personally identifiable information. [My emphasis]
Would this bill prevent hospitals from coming up with a wrist ID that could provide information about the patient's condition? To have their chart number, or their date of birth so the patient's identity could be easily verified on the way into surgery? Without a discussion with the bill's drafters, it's hard to extract from all of this detail what capabilities the bill will and will not allow.

There are also sections that would cover law enforcement use of IDs (I'm thinking of mass arrests during riots, but I'm sure some people will imagine even more dastardly motivations), and use by emergency response personnel. There are some exceptions for locating people in immediate physical danger, but if I've read the technical protections sections correctly, there will be few opportunities to make use of the radio frequency device to perform these kinds of operations. I think that this section arises from our "McGiver Miracle" wishful thinking -- that if it did happen that I were buried alive, I could somehow turn on my cell phone and the hero(ine) of my fantasy would be able to use the technology in some highly creative way to discover and rescue me. Comforting, but unlikely.

The bill prescribes user education and notification, and states that the agency must provide a notice on each reader, or a list of the location all of the readers that can be used to read the card, or a web site address where such locations can be found.

Personally Identifying Information

Library cards are mentioned in a list of possible uses for identity cards, however a card that only has an assigned identifier such as a patron ID appears not to be covered under the restrictions of the bill relating to personally identifiable information as it is defined:

(o) "Personally identifiable information" includes any of the following data elements to the extent that they are used alone or in conjunction with any other information to identify an individual:

(1) First or last name.

(2) Address.

(3) Telephone number.

(4) E-mail address.

(5) Date of birth.

(6) Driver's license number or State identification card number.

(7) Any unique personal identifier number contained or encoded on a driver's license or identification card issued pursuant to Section 13000 of the Vehicle Code.

(8) Bank, credit card, or other financial institution account number.

(9) Credit or debit card number.

(10) Any unique personal identifier number contained or encoded on a health insurance, health benefit, or benefit card issued in conjunction with any government-supported aid program.

(11) Religion.

(12) Ethnicity or nationality.

(13) Photograph.

(14) Fingerprint or other biometric identification.

(15) Social security number.
I can understand most of these but I'm puzzled by numbers 11 and 12, Religion and Ethnicity or nationality. There is no question that these are sensitive bits of information, but they can hardly be considered "personally identifiable" under most circumstances. If anything, they are elements of group identification.

As with any statement about what is personally identifiable, however, it comes down to the fact that the right context can link almost any information to you. Your library card number becomes you when combined with the library's patron database. Your credit card number identifies you if one has access to the bank's records. Quibbling over what is and what isn't personally identifiable just doesn't jive with the reality of our data mined world, and it is unclear to me why a bank card number is personally identifiable but a library card number is not (if it isn't, by this definition).

Bottom Line

There must be some way to promulgate best practices for new technologies without creating laws. Since this legislation relates to identity documents issued by state or local government agency, couldn't the agencies refuse to do business with anyone who can't provide the required level of security? I must say, however, that if the goal of this legislation is to make it just too complex and too expensive to use RFID in government-issued identity documents, it is probably a good vehicle for achieving that goal.

Tuesday, January 16, 2007

Comments on D-Lib Article: "RDA... for the 20th c."

Diane Hillmann and I wrote an article called "Resource Description and Access: Cataloging for the 20th Century." It is a critique of what is being developed as the successor to the current library cataloging rules. I'm posting this here primarily to provide a place for comments... so feel free to comment, criticize, add to, or simply vent. Note that I have to approve comments so there will be some delay, and something of an interruption at times due to my ALA schedule.

Also check out the article in the same issue of D-Lib by Karen Markey. One of her points is similar to what Diane and I say, which is that it may be time to de-emphasize descriptive cataloging (at least for regularly published materials) and put that energy into better subject access. Markey suggests adding tables of contents and index terms to records, and developing ranking algorithms to help get the most appropriate material in front of users. Some of what she suggests I would put under the heading of "context" -- categories like reader level vis-a-vis the topic (beginner, expert), general topic area (science, history).

Tuesday, January 02, 2007

RDA at MARBI

There are two interesting documents on RDA that are being presented to the MARC standards group, MARBI, at ALA in Seattle.

The first is a "crosswalk" from the MARC format to RDA. Those of us who think about the MARC standard have been rather anxiously awaiting a look at MARC from the RDA perspective. We're still waiting, because this document is a look at RDA from the MARC perspective. It concludes that with a few "tweaks" you can fill in a MARC21 record using RDA data. By this same token you could show that you can fill in a Dublin Core record using RDA data. That's backwards, of course -- MARC is supposed to allow markup of the cataloging record, the cataloging record is not supposed to fit into MARC. But given how strong the MARC culture is, I wouldn't be surprised if some people consider "fitting the cataloging rules to MARC" to be a logical step.

This report is comforting because it appears to show that MARC does not need to change. The forthcoming report that maps from RDA to MARC will be considerably less reassuring. Even some of the suggestions in this report, such as
RDA has elements that are recorded using terms for an English language context, e.g., publisher unknown. It may be useful to identify such elements through MARC 21 encoding.

could have significant implications for the MARC record.

The document suggests that the various code and authority lists in MARC21 might be better managed as part of the RDA standard. I'd go this one further and say that values in authority lists should not be part of either standard. One of the big problems with MARC21 today is that it takes a change to the standard to add values to a list. Because the standards process is slow, by the time you've added a new physical format to the appropriate list, you've got two years of cataloged materials that you have to go back and add the code to. Code lists should be managed by the communities for whom they are relevant. There should be a process for updating them and a standard location for them on the net. Just like there is for the larger lists managed by Library of Congress for geographical names, languages, and others.

Note: this document refers at points to sections 9-13 of the RDA draft. This appears to be Part B of RDA, which I cannot find on the JSC site. If anyone knows where it is, please let me know.

The second document is a work in progress to categorize media types for resources. This was developed in conjunction with the publishing industry standards group that has produced the ONIX standards. The problem tackled is what is often referred to as "content versus carrier." (See recent article by Gorden Dunsire in D-Lib on this project.) These two have become rather hopelessly muddled in the MARC format, so this is an opportunity to get it straightened out. The level of abstraction here is high, so an item's content can be described as being of Character=language, SensoryMode=sight, ImageDimensionality=two-dimensional, Interactivity=non-interactive, and the carrier could be StorageMediumFormat=sheet, HousingFormat=binding, BaseMaterial=paper, IntermediationTool=not required.

In the end, however, users will come to the library catalog looking for a book, or a DVD, or a music CD. I hope we can present the catalog data in the user's language. Note that today's MARC21 record does not unambiguously identify books. The closest it gets is "language material" plus "Monograph/item". Unfortunately, things other than books fit that bill, including pamphlets and digital documents. Many library catalogs extrapolate the designation "book" from that coding because that's right most of the time. But we really need to keep the users in mind when we start categorizing materials.

FRBR OO - Not?

Posted on the FRBR blog was a link to an article by Allen Renear and Yunseon Choi
Allen H. Renear and Yunseon Choi: Modeling Our Understanding, Understanding Our Models: The Case of Inheritance in FRBR (95 KB PDF). In Grove, Andrew, Eds. Proceedings 69th Annual Meeting of the American Society for Information Science and Technology (ASIST) 43. Here’s the abstract:
They argue against seeing FRBR as having inheritance between the Group 1 entities because only the Item entity is concrete, the others are abstract.

The argument is simple: FRBR describes works as abstract and items as concrete. If all properties of “higher” entities are inherited by “lower” entities then items inherit the property of being abstract, and therefore items will be both abstract and concrete. But nothing is both abstract and concrete - therefore there is no unlimited general property inheritance in FRBR.
They make their point using a symbol set that isn't part of my vocabulary, so I'm taking on faith that they've proven this adequately. I have to say that I tend to consider all aspects of metadata to be abstract in nature, since it is a representation of something else, so their argument doesn't quite work for me.

This brings up for me, however, some larger issues, such as: Do we need a bibliographic concept that we can describe as a formal model? The FRBR model doesn't appear to survive formal analysis (see citations in the article), but does that really matter? I'm not a great fan of formality (at least not compared to some other folks), but it worries me that we are embracing a concept that we may not all understand in the same way. I have twice seen references to the "Work" entity as being "the idea." This strikes me as being horribly wrong, but without something a bit more (pardon the expression) concrete to go on, I don't see how we are going to come out with a definition that we can all agree on. And if we need to jigger the FRBR model a bit to make it work better, what's the mechanism for doing so?