Code4Lib 2008: Open Library

In his presentation, Building the Open Library, Aaron Swartz introduced us to his vision of an online library. In his vision, like Brewster’s, he sees a wiki with one page for every book. For this reason, the small group (6 people spread out around the world) is starting their project with monographs.

To achieve this feat, the team is using their own database framework called ThingDB:

ThingDB stores a collection of objects, called “things”. For example, on the Open Library site, each page, book, author, and user is a thing in the database. Each thing then has a series of arbitrary key-value pairs as properties. For example, a book thing may have the key “title” with the value “A Heartbreaking Work of Staggering Genius” and the key “genre” with the value “Memoir”. Each collection of key-value pairs is stored as a version, along with the time it was saved and the person who saved it. This allows us to store full semi-structured data, as well as travel back thru time to retrieve old versions of it.

Gathering Data

Obviously a library isn’t anything without data, so to start, the team contacted publishers for their ONIX data – surprisingly they were mostly receptive – they wanted their books to be findable.

Next, they contacted librarians to ask for data dumps for their catalogs – unsurprisingly they didn’t get the same kind of response that they got from the publishers. Librarians wanted to think about it for a while… Long story short, they have some library data, but would love more.

Now that they had book data, they wanted to enhance it with additional content like book reviews from the New York Times, Harper’s, Reader’s Catalog, and the New York Review of Books. These titles will all soon have their reviews integrated into the site!!

Lastly, they’re scanning books to get data. This is where the Internet Archive comes in. They are providing their scans and data for the Open Library project.

The Library

The library itself has to focus on display. When a user enters a search term you will get back a book page, each book page gives you more info about the book – buy, borrow, download. From each book page, each author has a page as well, this way they’ll be able to auto generate bibliography for author. This is very much like the LibraryThing author pages.

So, now that we have library with pages for books and authors, we need to organize data. Aaron was awfully funny here – he had librarians arguing – but what subjects should we use? Which classification scheme do we use? We’re going to have to think about this! Aaron says quite simply – there is no need to argue – it’s only we can use them all!! I love it – very Everything is Miscellaneous – we can organize things in any way we want on the web – we aren’t limited by the physical world!!

There is also a sort of FRBR where you can link books together.

So now we have an online library – how do we keep it updated? Each page (book, author, etc) is editable – it’s a wiki!! In addition to that, you can easily edit the templates for your own need or make fixes to bugs you find in the templates that the Open Library is using.

The Future

In the future, they want to provide scan on demand – for $20 or $30 they’ll go get a scanned copy of the book. Then the PDF is put online with a bookplate saying that you paid for that book to be digitized. Now, the PDF is available to everyone!!

Aaron’s dream is to have a web of books online – all the information about the book – all the people who reviewed it, all the libraries that have it – all the places you can buy it – all in one place – so that everyone can find any book and find out how to access the information it holds.

In order to fulfull Aaron’s dream, we have to share. “We want your data” – share your MARC data with the project (something that a few people at the conference did as a gift to Brewster for his keynote). If this is to be a open-source project you need to share. Also, as an open-source project, they need all the help they can get – so chip in!

Questions & Answers

Q: Can we scan on demand now?

A: Scan on demand is not available now – but it should be done in the next couple weeks – we’ll see

Q: Will we get a copy of the items to put in our catalogs if we pay for it to be scanned?

A: The idea is that the book will scanned then a URL will be provided that can be put in the 856 field in your catalog.

Q: What about books that are only published online?

A: Yes – any and all books – get as much in there as possible

Q: Is there an API?

A: They are planning an API – so that you can get any book page in the format they need

Q: Where are you getting cover art?

A: LibraryThing – user scanned covers, Publishers give covers and we got a dump of covers from Amazon. We want to let libraries use them so we got as many covers as possible.

Q: Plans for Internationalization?

A: It should be translatable in the future

More Info


This article (subscription required) discusses the potential friction between Open Library and WorldCat. Will the success of the former spell doom for the latter? How will librarians respond to the invitation to send records to one or the other, or both? [via LISNews]

Find more press about Open Library.


There were no negatives out of this guy!!! The project sounds so much better than I had even realized from reading articles and blog posts. I love it – this is amazing 🙂 and I can’t wait to see more!

Technorati Tags: , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *