Wednesday, October 29, 2008

Thoughts on crosswalking

For the second Integrating Digital Papyrology project, we need to develop a method for crosswalking between EpiDoc (which is a dialect of TEI) and various database formats. We've thought about this quite a bit in the past and we think that we don't just want to write a one-off conversion because (a) there will be more than one such conversion and (b) we want to be able to document the mappings between data sources in a stable format that isn't just code (script, XSLT, etc.)

Some of the requirements for this notional tool are:


  • should document mappings between data formats in a declarative fashion

  • certain fields will require complex transformations. For example, the document text will likely be encoded in some variant of Leiden in the database, and will need to be converted to EpiDoc XML. This is currently accomplished by a fairly complex Python script, so it should be possible to define categories of transformation which would signal a call to an external process.

  • some mappings will involve the combination of database fields into a single EpiDoc element, and others, the division of a single field into multiple EpiDoc elements

  • Context-specific information (not included in the database) will need to be inserted into the EpiDoc document, so some sort of templating mechanism should be supported.

  • The mapping should be bidirectional. We aren't just talking about exporting from a database to EpiDoc, but also about importing from EpiDoc, which is envisioned as an interchange format as well as a publication format. This is why a single mapping document, rather than a set of instructions on how to get from one to the other would be nice.


So far, my questions to various lists have turned up favorable responses (i.e. "yes, that would be a good thing") but no existing standards....

Monday, October 20, 2008

On Bamboo the 2nd

I spent Thursday - Saturday last week at the second Bamboo workshop in San Francisco. So some reactions:

1) The organizers are well-intentioned and are sincerely trying to wrestle with the problem of cyberinfrastructure for Digital Humanities.

2) That said, it isn't clear that the Bamboo approach is workable. The team is very IT focused, and while they seem to have a solid grasp of large-scale software architecture, the ways in which that might be applied to the Humanities with any success aren't obvious. There was a lot of misdirected effort between B1 and B2 by some very smart people, who I must say had the good grace to admit it was a nonstarter. Their attempt to factor the practices of scholars into implementable activities resulted in something that lacked enough context and specificity to be useful. A refocusing on context and on the processes that contain and help define the activities happened at the workshop and seems likely to go forward.

3) The workshops themselves seem to have been quite useful. I wasn't at any or the round one workshops, and I doubt I'll be at any of the others (I represented the UNC Library because the usual candidates weren't available), but everyone I talked to was very engaged (if often skeptical). The connections and discussion that seem to have emerged so far probably make the investment worthwhile, even if "Bamboo" as conceived doesn't work.

4) The best idea I heard came (not surprisingly) from Martin Mueller, who suggested Bamboo become a way to focus Mellon funding on projects that conform to certain criteria (such as reusable components and standards) for a defined period (say five years). The actual outcome of the current Bamboo would be the criteria for the RFP. Simple, encourages institutions to think along the right lines, might actually do some good, and might allow participation by smaller groups as well.

5) There was a lot of talk about the people who are both researchers and technologists (guilty). These were variously defined as "hybrids," "translators," and, most offensively, "the white stuff inside the Oreo." None of this was meant to be offensive, but in the end, it is. People who can operate comfortably in both the worlds of scholarship and IT can certainly be useful go-betweens for those who can't, but that is not our sole raison d'ĂȘtre. Until recently there haven't been many jobs for us, but that seems to be changing, and I hope it continues to. See Lisa Spiro's excellent recent post on Digital Humanities Jobs and Sean Gillies, who without having been there, manages to capture some of the reservations I feel about the current enterprise and pick up on the educational aspect. One possible useful future for Bamboo would be simply to foster the development of more "hybrids."

6) The Bamboo folks have set themselves a truly difficult task. They are making a real effort to tackle it in an open way, and should be commended for it. But it is a very hard problem, and one for which there is still not a clear definition. The software engineer part of my hybrid brain wants problems defined before it will even consider solutions. The classicist part believes some things are just hard, and you can't expect technology to make them easy for you.