Archive for the 'Metadata' Category

Stop Making Sense

Last night I attended a talk at Princeton title Stop Making Sense: On Collecting, Sorting and Presenting Data presented by Rudolf Frieling, Curator of Media Arts at SFMOMA, San Francisco. I have to start by saying that the artsy parts lost me! Frieling would show and art piece and say - of course you’ve seen this or - you know this - and I’d be thinkin “huh? should I?”

Other than that - this was an interesting talk about how we organize our data and how technology is changing so fast and so much that our delivery methods and storage methods are not going to be the delivery methods and storage methods of the future - so how does one successfully archive media materials? When Frieling was introduced, the professor mentioned a few stories that were a bit funny - but also very sad if you think about it. The first was that when presenting in a newly built theater, he found that he could not play his VHS tape because the people who designed the theater had decided that VHS was no longer a valid storage format. The other was about a store here in town that actually sold its entire collection of VHS tapes to an artist so that he could make a sculpture out of them - this store no longer sells VHS tapes. The final story was about the library at the university no longer storing VHS tapes. He had approached them to ask for a space in the the high density storage unit for his tapes and the library said they were no longer keeping tapes and that anyone who had provided to the VHS collection at the library could come pick up their items or they would be given away first come first serve.

Along those lines, my husband and I donated all of our VHS tapes to the local public library a couple of years ago - the plan being to replace them with DVDs - a media type that takes up less space on our shelves and that we found ourselves using more than VHS.

Frieling provided some keywords for his talk (I didn’t catch them all): collecting, linking, presenting all in terms of data. The fact of the matter is (and we librarians know this already) not everything is available online and if it is - it’s possible that it’s not accessible because of hardware, software, or firewall reasons. He spoke of a tool that he and others had developed for CD-ROM that no longer worked on current systems due to hardware and software changes in these systems. He spoke of websites developed at the beginning of the web that no longer work as they were intended because they were developed with system limitations in mind. The long and short of it is that systems change and as archivers and curators how are we going to preserve information for future generations?

Freiling mentioned a TV show collector by the name of (excuse mis-spellings - the font was small and I was in the back of the room) Pentti Pajukallio. This man has spent most of his life recording TV shows and collecting these VHS tapes. He only stopped to have open heart surgery and even then his wife recorded what she could for him. The question is that what value does this collection have to anyone but Pentti? And if it does have value for others how will we access it?

One of the best slides (for me) was the one of a pile of 3×5 index cards that Frieling had put together as his first database. These cards contained bibliographic references that were of use to him. He keeps this “database” today because it has nostalgic value for him - but most of the references are probably inaccessible or unavailable - or even out-dated. This collection only has value to him or those studying him. Another great point that he brought up in reference to his note cards - information like technology is always changing so databases like this are not always going to be valuable - so are they worth archiving and making accessible? I don’t know - that was the question of the night.

One great quote was when Frieling mentioned that now that we have search engines and the world wide web it’s even harder to find the “pearl among the rubbish” when we’re browsing through collections. Books are a strong model to provide content. They can be browsed, you can jump back and forth, or you can read cover to cover. This 2D model (sounds a bit like Weinberger’s first order of order) allows the user to read the text as is or randomly, but it’s physical - it’s the pearl and it’s easy (in theory) to find because it’s not (in theory) surrounded by rubbish.

When it comes to webpages we may think of the “home” page as the entry point into our site but in reality people are entering our sites from every which way because search engines are indexing all (one again - in theory) of our pages and providing them in piecemeal to searchers. Frieling described this as users coming at our sites diagonally instead of straight on like they do with books. This means they only get parts of the information we’re providing and not getting the whole picture.

One way to look at information or media is that each item has two stories. One story is that of the artist or the collector and is usually personal in nature. The other story is that of the viewer. This story gives us the perspective of the outsider. This is the perspective that we’re giving in our catalogs - the perspective of the cataloger when viewing the item - so why not let the other “viewers” (our patrons) add their perspectives as well? This isn’t something that Frieling said exactly - just something I thought when he started talking about the two stories. What he did show us was Steve and how allowing others to add tags to art gave the piece a whole new perspective and a whole new value.

He ended by showing us the Way Maker (if you have a link please share it with me). This program is downloaded to your phone and then you attach your phone to your body and you record your life through your eyes. Does this hold value for anyone but you? Maybe not - but it allows you to see your life from another perspective. It shows you things that you maybe weren’t paying attention to throughout the day - and maybe even makes you more aware of your surroundings. Would a series of videos like this be worth archiving? Who knows - maybe it would be educational for future generations or other cultures to see what a day in the life of Nicole is like. Would I do it? Nope! I don’t need to go to that level of sharing my life - I have this blog and my personal networks - that’s enough for me :)

It was a great talk, while the art aspects were over my head, I’m glad I attended - I just wish that there were more links provided or that the slides were available as I’d like to link you to more information and I don’t have the time just now to do the research on Pentti or the Way Maker.

Metadata Tools

I just read on a few quotes from the the report of the RLG Programs metadata practice survey on Lorcan Dempsey’s blog (I haven’t read the whole report yet) and wanted to add to his comments. The report says:

… RLG Programs surveyed 18 Partner institutions1 in July and August 2007 to obtain a baseline understanding of their current descriptive metadata practices. Although we saw some expected variations in practice across libraries, archives and museums, we were struck by the high levels of customization and local tool development, the limited extent to which tools and practices are, or can be, shared (both within and across institutions), the lack of confidence institutions have in the effectiveness of their tools, and the disconnect between their interest in creating metadata to serve their primary audiences and the inability to serve that audience within the most commonly used discovery systems (such as Google, Yahoo, etc.).

I have heard this many times. At our library we use a combination of metadata standards and the MarkLogic XML Content Server to deliver the information to our patrons.

That said - while our delivery system is awesome, creating a METS document is one of the most cumbersome things I’ve ever had to do! This standard is amazing - it has such power and I can’t think how to make it less stressful to create documents - but it just seems like someone created this standard to torture librarians. This is probably why so many librarians are unsure of their tools and their metadata.

I also find that there are many choices - somewhat too many choices on how we can format our data. There is Dublin Core, MODS, MARCXML, etc. As a cataloger I say we need to use MARCXML - it holds the most data and stays in line with our print collections. As a programmer I say MODS is the easiest to read and retrieve data from. And as a lazy person (yes I too can be lazy) I say Dublin Core because I only need to enter minimal information.

But how do you make these decisions? And have I gotten totally off track? I don’t have any hard and fast answers for you - all I know is that I sympathize with librarians who are unsure and think I should go and read the entire report before adding anything else.

The Return of Everything is Miscellaneous

Last week I wrote about my impressions of David Weinberger’s Everything is Miscellaneous. Well, this morning (around 2am) I finished the book and am so impressed! I love books that make me think - and Weinberger really left my head reeling.

In my role as Metadata Librarian I not only have to work with metadata, but think about ways in which we can manipulate it to provide a better product for our patrons and that’s just what the third order of order is all about - well, not exactly, the third order of order allows the patrons to add value and I hope down the road to be able to open up our metadata to allow for user input.

But, back to the book. David mentions something I’ve heard in several presentations lately. The simple fact that the more “mess” you have the more valuable the data becomes. Basically if you have a tool like Flickr that keeps data from every picture we upload, results can be clustered in ways that are impossible in the first order world (the physical world). This is why LibraryThing is so amazing and the fact that they’re sharing their data with libraries is so great. By using data from LibraryThing, libraries have access to a much wider mess than they would ever be able to compile with their own patron base.

Throughout the book, Weinberger uses Wikipedia as an amazing example of how the third order of order has been successful. On page 208 he makes a great point:

The Britannica includes references at the end of articles to remind us that topics are related to other topics, literally afterthoughts. Wikipedia, on the other hand, is besotted with links…These links are not even bread crumbs, for with two clicks we well may be going down a path no one has trod before and that no one anticipated…In the miscellaneous order, a topic is anything someone somewhere is interested in. Anyone an pull a topic together by contributing to Wikipedia, writing a blog post, creating a playlist, or starting a discussion thread.

While librarians and researchers question the accuracy of Wikipedia (and rightly so) it cannot be dismissed as a powerful research tool. I like looking at Wikipedia and following the links to find additional information. As a librarian, I then go and research the topic further using additional tools to confirm accuracy - but if I hadn’t used Wikipedia in the first place I may not have ended up down the path I did.

Along similar lines, the value of tools like Wikipedia and the blogosphere is that it shares information in the words of the users - these sites include language that matches how the average person thinks and speaks. Weinberger used the example of the blogosphere’s reaction to Bush’s speech on immigration on May 15, 2006. After the talk the blogosphere exploded in comments and interpretations. Weinberger explains the speech as “Simple arguments, simple ideas, simple language.” and goes on to say, “That’s how politicians talk. But it’s not how we, their constituents talk.” (p.209).

Next, as I mentioned yesterday, Weinberger touches on the future of the ebook. He talked about how we could collect data from how people read books, the passages they highlight, where people read books and so much more using wireless enabled ebook readers (p.222) - and while it sounds like science fiction - we’re almost there. Kindle has the power of wireless technology - meaning that in theory, Amazon could connect to our readers and collect data. While this sounds scary and like a huge invasion of privacy - imagine the power that this data could provide. Some examples Weinberger has is that you could create a list of books that people most often read at the beach or a list of books people stopped reading 1/2 way through - how cool would that be?

So, like I said at the beginning - my head is reeling with information and I’ll probably have to read this book again to get a real hold on some of the theory involved, but I loved the book! I think it’s a great read for all librarians - but if I have to specific - Metadata Librarians in particular.

PS. In this article I linked you out to 9 other resources on the topics I was covering - what print product can do that??

Technorati Tags: , ,

Everything is Miscellaneous

This is not a review - so much as it is a review of points that have stuck with me from my reading of Everything is Miscellaneous by David Weinberger. I’m not done yet - but I can’t hold it in anymore - and my husband is tired of listening to me rant about library-type stuff :)

Point one: Allowing users to write reviews:

When I was at the NFAIS Humanities Roundtable, I faced this very question. “Why would we want to let amateurs write reviews?” and “Publishers will pull their content if we let them do that!” It was for this reason that I found page 59 so funny!

[Greg] Hark remarks. “Publishers said you’re allowing users to say that they hate a book.” The response from Jeff Bezos, Amazon’s founder, as Hart recalls it, was: “It will sell more books…just not ones customers don’t like.”

This was in response to Amazon allowing users to review books in their store - and it’s perfect! My answer at the conference was another question. What’s to stop a professional reviewer from saying they hate the book? The fact of the matter is that the average reader cares more about what other readers think than what professional book reviewers think - at least I do!

Point two: Library catalog limitations:

Weinberger points out (on page 119) that when looking at a record in a library card catalog:

Generally you will not find how well the book sold, if it’s been banned in any countries, a list of the books it cites, the college the author attended, what the reviewers said about it, the full index from the back of the book, or how many times it’s been checked out of the library…

Now, while we aren’t using cards to store our data anymore (well most of us aren’t) we’re still following the same rules - and more importantly, we’re still thinking about how much time it would take for us to add that extra metadata.

This is the beauty of LibraryThing’s new Common Knowledge - while it doesn’t have all of these things it does have some and they’re adding new fields all of the time! I love it! One day I spent hours just filling in all of the info I could find on my favorite authors - not a great use of time - but so useful to someone searching for that book!

Point three: Knowledge is social:

Starting on page 144, Weinberger discusses our education system here in the U.S. and how we’re taught to work in silos. Students are made to sit and take tests to measure what they’ve learned:

The implicit lesson is unmistakable: Knowing is something done by individuals. It is something that happens inside your brain. The mark of knowing is being able to fill in a paper with the right answers. Knowledge could not get any less social. In fact, in those circumstances when knowledge is social we call it cheating.

When I was in college, I lived with my husband (boyfriend at that time) and we took many of the same classes - since we had the same degree. We would sit and do our homework together and yes, come up with the same answers. Most of the professors were okay with this as long as we could fill out those test papers on our own come exam time - all except one - but we won’t go there. Now, Weinberger guarantees that students are on IM, chatting while doing homework - which probably ends up with the same result - shared knowledge. This - in my eyes - is the way of the world! You learn so much more by sharing with others than you do sitting alone at your desk. This is part of the reason why I started this blog - I wanted to share what I was learning so that others could learn too.

Two more quotes from Weinberger in this section that made me interrupt my husband as he tried to read his book last night …

Memorizing facts is often now a skill more relevant to quiz shows than to life … One thing is for sure: When our kids become teachers, they’re not going to be administering tests to students sitting in a neat grid of separated desks with the shades down.

So true!! And:

One of the lessons of Wikipedia is that conversation improves expertise by exposing weaknesses, introducing new viewpoints, and pushing ideas into accessible form.

Long story short - knowledge should be shared! And in doing so learning will be more valuable.

More points to come:

I’m only 1/2 way through with the book - and I’m sure I’ll have more to share with you as I finish - if you haven’t read the book - I highly recommend it just based on the first 150 pages and the conversations that I’ve seen spring up from it!

Open Source MODS-generating software

Via Metadatalibrarians:

The University of Tennessee Digital Library Center is proud to announce the release of the DLC-MODS Workbook, version 1.2 under the GNU General Public License version 3.

The DLC-MODS Workbook provides a series of web pages that enable users to easily generate complex, valid MODS metadata records that meet the 1-4 levels of specification outlined in the Digital Library Federation Implementation Guidelines for Shareable MODS Records, (DLF Aquifer Guidelines November 2006).

Developed by programmer Christine Haygood Deane under the direction of metadata librarian Melanie Feltner-Reichert, this open source client-side software provides control of date formats and other problematic fields at the point of creation, while shielding creators from the need to work in XML. Metadata records created can be partially created, saved to the desktop, reloaded and completed at a later date.

Final versions can be downloaded or cut-and-pasted into text editors for use elsewhere.

Developed in support for our state-wide digitization project, Volunteer Voices, we hope this system will assist others in their efforts to create valuable digital libraries also. The software can be viewed here and downloaded here.

Please address comments and questions to Melanie Feltner-Reichert ( mfeltner@utk.edu ) and Cricket Deane ( cdeane@utk.edu ).

Technorati Tags: ,

New Mark Twain Digital Collection

I just got this via a few of my mailing lists and thought I should share with you all.

I'm happy to announce that today the University of California launched the beta version of Mark Twain Project Online, a digital critical edition of the writings of Mark Twain, providing access to more than twenty-three hundred letters written between 1853 and 1880, including nearly 100 facsimiles of originals. The site is driven by metadata captured in METS records, the content was encoded in TEI P4, and the search, browse and display functionality was built using the XTF (the eXtensible Text Framework).

Read the full press release here.

Technorati Tags: ,

PTSEM Digital Library

I know I haven’t spoken much about my new job, but now I have something big to announce. The Princeton Theological Seminary has signed with Mark Logic to assist in the development of our digital library! The big release was today at a conference at the seminary, but I was at a training class so I’m not sure how it was received.

We’re just starting out, but I’m very excited about the potential this system holds for us - so keep and eye out for new great things!

Information R/evolution

I just love every video I’ve seen by this man! Michael Wesch, Assistant Professor of Cultural Anthropology and author/director of The Machine is Us/ing Us has another video addressing the issues brought up in Everything is Miscellaneous (a book I’m finally reading now that I have some time):

Future of MARC

I feel like I should duck after hitting submit on this post - I might be opening up the flood gates - but here it goes!

My friend (and colleague) Chris Schwartz has written about the future of MARC (and probably will have many more posts on this topic).

When it is mentioned, MARC usually gets a bad rap. It’s often viewed as worn out legacy metadata better suited for card catalogs with an antiquated late 1960’s data structure that mystifies computer programmers when they first encounter it.

Personally, I wasn’t mystified when I first saw MARC - it all made perfect sense ;) What doesn’t make sense to me are the silly ISBD punctuation rules - these are what’s really being carried over from the card catalog days.

My opinion on the matter? Well, I don’t think MARC can go anywhere. It’s at the center of nearly every library system - and I’m not sure that’s a bad thing. The way that MARC breaks up our data into parts makes it much more searchable. By breaking things down into pieces you have access to very detailed levels of data.

Now, you might be saying - who in the world needs that level of accuracy? Well, there are many researchers out there who want to find the edition that was published by publisher X in town Y and by having a schema that allows access on that level we make it easier for them (now the fact that our systems don’t offer that functionality is a whole other issue - but the point is that it is possible).

It is this level of detail that has me pushing for our library to use MARCXML for our digital collections - it just makes the most sense for our very specialized collection and patrons. I want to be able to provide searchabilty down to the tiniest level if the user wants it. My only complaint about MARCXML (if you want me to get techie on you) is that every field is titled “datafield” and the attributes are were you get the MARC fields.

<datafield tag=”245″ ind1=”1″ ind2=”0″>
  <subfield code=”a”>A. Janse over Karl Barth /</subfield>
  <subfield code=”c”>samensteller: J. L. Struik.</subfield>
</datafield>

Why not have it like this:

<m245 ind1=”1″ ind2=”0″>
  <a>A. Janse over Karl Barth /</a>
  <c>samensteller: J. L. Struik.</c>
</m245>

Which probably has some validation issues - but you get the idea.

I didn’t mean for this to turn into an evaluation of MARCXML, so I’ll leave the XML discussion at that for now.

The way I see it, MARC does what it needs to do - it’s the rules surrounding our cataloging (AACR2 & ISBD) that are holding us back - and for that reason (and the one I mentioned earlier about it being central to our systems) I don’t think MARC is going anywhere soon.

Technorati Tags: , , ,

ALA 2007: Metadata Presentations

Via Cataloging Futures (Chris won’t mind…):

Over on the LITA Blog, Rebecca Guenther provides information about an ALA Annual 2007 program that was sponsored by the LITA Standards Interest Group, Using Metadata Standards in Digital Libraries: implementing METS, MODS, PREMIS and MIX: