This session on Saturday 11th March had me from hello, after a fascinating half-day conference on book digitisation I attended and reported on last month in London held by the ALPSP…
SXSW 2006 session page
Liz Lawley – Professor at Rochester Institute of Technology, a trained librarian who used to work at the Library of Congress and visiting researcher at Microsoft Research, Liz blogs at mamamusings, Corante's social software weblog Many-to-Many, and Misbehaving.net
Danielle Tiedt – Head of Microsoft’s Books Program and General Manager of MSN Search
Bob Stein – Institute for the Research of The Book, visiting Fellow USC Annenberg Center for Communication and founder Night Kitchen
Daniel Clancy – Director, Google Book Search
Sue Thomas, Professor of New Media at de Montfort University (UK) jumped right in with a question about trans-literacy just as the session started. And a guy from Cyworld asked how does something written now fit into future writing – “the living book” – and changing texts (eg Wikipedia)? A delegate from New York Library asked as we digitised things, are we actually shrinking the realm of knowledge as people will be thinking that’s everything?
Liz Lawley first addressed privacy concerns, for instance the privacy surrendered by having to log-in through Google to look at books. An easy criticism of Google Book Search is that it only had options of where you could find the book online, but they now also include where you can borrow it from / ones in the public domain.
Daniel Clancy recounted how he went to a talk recently where half the students hadn’t been in the library in the last 6 months. There is a vast amount of authoritative content available and Google want it to be available at anytime and everywhere. To this end, Google has their Book Search and their Library Program.
Mary Hodder spoke up from the floor, positing that Google are not being good community members if they are signing exclusive contacts with publishers, etc, because others should be able to crawl and re-scan that information… Tom Clancy responded that for the public domain, it’s limited, but Hodder queried in turn, can I crawl all Google’s public domain content and use that for other things? Can I build new knowledge on top of it and build communities?
Danielle Tiedt said she got into book search for a lot of the same reasons as Google, for example to improve the answers in Microsoft Search. Only 5% of the world’s information is online today. Book digitisation is a very long-game process and is going to require a lot of people working together to make it happen. One of the reasons Microsoft joined the OCA, she continued, is because it is specifically focused around public domain work and they make it freely available to everyone. There are 3 copies of everything – one goes to the Internet Archive [run by Brewster Kahle – cheers Brewster, fragments of three former websites I’ve worked on that went bust or were retired are stored there!], one to the OCA, and one to a commercial company eg. Microsoft.
At this point Bob Stein countered that it is scary that Mary Hodder has to act like a supplicant if everything is “going to be okay” Hodder commented that Kahle says “trust me” but if you put the info out there, the concept of having as many copies as possible forces you to make a business model around better services, based on better user interfaces, and trust that validates (eg make an API for all the content so others can remix, mash up and build upon it).
Stein said he has a tremendous problem with any commercial organisation controlling the archive that is our culture, citing the instance of censorship in China. Ceding our culture in this way to large organisations is scary. We are giving up the role of the public librarian too easily, he stressed.
Danielle Tiedt noted that Europe is taking a more public approach with governments supporting digitisation. There’s not enough money to make it happen without public involvement, she added.
Daniel Clancy explained that 30 million books takes $1.5 billion to digitise – so how would Bob Stein et al have Google behave, and is Stein comfortable with the US government being the source of digitisation?
Liz Lawley interjected that maybe we need to look at more decentralized options, wondering how much would they have costed Wikipedia in advance?
An issue around the idea of the perfect book was raised from the audience – if pages online are collected from different editions, what edition [or what translation, I wonder] am I reading? What effect is this having on scholarship, he asked.
A delegate from iBiblio described the broadcasting and webcasting treaty currently being negotiated as “the Rome Convention” on steroids!” as it transfers the copying rights onto the web and broadcasting world.
Responding to the point that there is not a lot of demand for digitisation, Danielle Tiedt, said there is in regards to search. People want authoritative, book-sourced / originated content and a lot of search queries aren’t being answered because there’s a lack of authoritative content in the search results.
Bob Stein put forth the case that the books Google has digitized are reading us. But Daniel Clancy countered that you don’t have to login for public domain content, if you check 'fully accessible' in Google's 'advanced search'.
Liz Lawley asserted that Google aren’t organizing anything, they’re just indexing it, and usability issues have to be addressed. For instance the best edition of Hamlet for a six-year-old and the best for a PhD scholar aren’t one and the same. Librarians however, do have expertise in searching and sourcing the correct texts.
Danielle Tiedt took up the point about indexing and organizing – how pages are ranked for search is a lot harder to do with the types of technology we have today and she reckoned we’re still going to need a lot of human intervention.