NFAIS 2009: Digital Natives and Professional Searching: Improving the User Experience

The third panel for today was titled Digital Natives and Professional Searching: Improving the User Experience.

Chris Lamb from Thomson Reuters took the podium first with his talk entitled Desperately Seeking Paris in which he talked about Calais a free web service and open API (there are already plugins for Drupal, OpenOffice, WordPress and others) that makes obvious to computers what is obvious to humans. The difference when searching for ‘Paris’ between ‘Paris, France'; ‘Paris, Texas'; and ‘Paris Hilton’. Basically it reads the metadata to distinguish between results.

It takes unstructured documents (text, HTML, XML) and extracts named entities, facts, events, and categories from unstructured text and makes connections between entities in your content and related data in DBpedia, GeoNames, CIA World Factbook and more. In short, it’s a realization of the semantic web.

Some sample extraction applications for Calais include

  • Indexing and Abstracting (sorting and collating)
  • Investigative Reporting (tag background documents and reveal hidden connections)
  • Media Monitoring (competitive intelligence and blogosphere monitoring)
  • Online publishing

How are people using Calais in search? Calais is a platform, not an application – and so it’s not a search engine. People are using Calais to supplement indexing engines like FAST. Once the data is returned it has semantically enhanced content allowing the index engine to support semantic enabled results and links. There are people working on projects like this – but they have yet to be released or even announced, so there is no way to see it in action online yet.

That said, there are over 9,000 developers using it already and there are over 1 million daily submissions – so if you want to play with Calais you won’t be a guinea pig.

This talk made me curious enough to check out the plugins for apps that I already use to see what kind of value it will add.

Rudy Potenzone from Microsoft came next with his talk, And the Barbarians have Phasers: Authors and Their Tools Come of Age.

This is the most information aware group (the digital natives) that has ever come into our offices, our libraries, our universities and they come equipped – already knowing how to use Twitter and blogs and they expect these tools when then come into the workforce – how do we deal with this and prepare for them?

Microsoft is envisioning a new era of research reporting. The author of today is the reader of tomorrow – so how do we capture enough information to make the content interesting to the reader. Office 2007 and Sharepoint are the ways they’re opening up to these new ideals – Sharepoint is the most popular product that Microsoft has – with the highest sales of any product they have. This is a sign of the times – how people want to work in their offices.

Rudy talked about Microsoft’s efforts to bring their tools up to the expectations of today’s academic environment. There are a lot of projects going on to try and bring these ideals to life. One example is The British Library’s Research Information Centre (RIC) and another is an eJournal Publishing Service. All of these examples are built on Microsoft products and can be found online here. You can also find code online at

One tool that sounds neat to me was the author add-In tool that lets you get the rules for the publication you’re writing for and add your own metadata as you’re writing the article in Microsoft Word. My only problem is that while these add on tools are open source and free to download – you still need the Microsoft software to use them – which I do – but you get my point :)

Kristian Hammond finished up with his presentation Frictionless Information: Adding Value in the Age of Google.

Coming for the IT world, he’s trying to understand our world – the world of publishers and content providers. He thinks we create really high value, fabulous content – and that it used to be that people were willing to pay for that – but they’re not anymore :) He listed our pressing problems as:

  • Google
  • Social Media
  • Content, content, content
  • Bounce (somebody finds something on Google that leads them to you site and they bounce on and then bounce off – never to return again)
  • Free

His department decided that the way around these problems is focussing only on the user. Giving the user what they want, when they want it, without taking them away from what they’re doing. He doesn’t care what the source is (Blogs, Services, Web, News, Opinion, Video, etc etc). Nothing is going to stop people from using web resources – so let’s embrace it – bring together the high class content and mix it with the other content – provide it all to the user.

His solution to this is the Relevance Engine. While he’s writing it’s reading, while he’s reading it’s reading – it’s building a gist – what it thinks this document is about and give additional information about it from anywhere on the web – all sources. Because it’s reading the context of the document you’re working on it’s going to find better results than if you typed in a query. This can be done both on the desktop and on the web – from a piece of indexed text we can find anything! The better the indexing the better the results.

It all comes down to loving your user, protecting your user from the horror of the text box!! Providing your user with your amazing content so that they never have to go looking for it elsewhere.

Wow, what an animated speaker – I wish he was one of my professors when I was in school :) I bet that class is so much fun!!

Technorati Tags: ,

Social Mention

I just learned about Social Mention from ResourceShelf.

Social Mention is a social media search engine that searches user-generated content such as blogs, comments, bookmarks, events, news, videos, and microblogging services.

It allows you to easily track what people are saying about you, your company, a new product, or any topic across the web’s social media landscape in real-time.

I tried it out and like it – but I want one feature that isn’t there. I’d love to see the ability to remove items from a specific domain in the advanced search. That way if I’m searching for mentions of me or my company – I don’t find the posts that I’ve written.

Other than that – it’s a pretty cool tool.

New Google Gadgets

I just logged into my homepage and click on my Gmail link, like I always do and it opened up in my Google Homepage instead of it’s own window like I’m used to. Seems that Google has redone their homepage to make access to the gadgets different. There is now a menu on the left with your gadgets – you click gadgets there and they open up in your Google homepage.

Learn more.

Technorati Tags: ,

MarkMail: Mailing List Search

Last year I bookmarked MarkMail and then promptly forgot about it. Today I was searching for someone’s email address and was brought back to MarkMail – wow has it grown!!

Searching 4,310 lists and 22,872,030 messages. First list started in November 1992. There are 2,898 active lists, recently accumulating 5,907 messages per day.

Basically, MarkMail is a search engine you can use to search mailing lists. It’s built by MarkLogic – a group of awesome people!

More about MarkMail:

MarkMail is a free service for searching mailing list archives, with huge advantages over traditional search engines. It is powered by MarkLogic Server: Each email is stored internally as an XML document, and accessed using XQuery. All searches, faceted navigation, analytic calculations, and HTML page renderings are performed by a small MarkLogic Server cluster running against millions of messages.

Check it out for yourself!

Technorati Tags:

Google Search for Macs

Oooo – this looks neat:

I haven’t played with it yet, but Google has a Mac search now.

If you run into a problem on a Windows computer, all you have to do is type a little description of the problem and Google takes care of the rest; Mac users, on the other hand, often need to include a little context in their search—instead of typing a query like text editor, you type text editor mac. Google’s Mac-specific portal, found at, now includes a Mac-specific search box. It’s not groundbreaking, but the guaranteed Mac-specific results could come in handy next time you’re looking for a specific application or you’re troubleshooting your Mac.

Found via Lifehacker.

CIL2008 – Super Searcher

An awesome list of tools from Mary Ellen Bates:

  • – blog of alternative and niche search engines – click the top 100 tab – subscribe to rss feed
  • Keotag – search across web 2.0 sites (technorati, delicious twitter and more)
  • MSN product reviews – search for a specific brand
  • Google’s new n improved timelines – creates a readable page easy to scan and identify trends (find when there was a buzz about a particular topics) – yellow line at the top shows where there was a buzz
  • Watch for blended search results – lower precision results, but more long-tail content, esp. for obscure topics – seeing a lot more other search results (products, directions – what for what else appears at the top of the screen) – look at search results with new eyes
  • searchCrystal – touchy feeling
  • – clustering on demand with a choice of search engines – let’s your determine how the search results are organized – uses different algorithms
  • Loki toolbar – find location-dependent content – based on IP address or nearby wifi signals – tells you where you are not and locates on map – search locally
  • – Firefox fix for Google – nice customization – removes ads – infinite scroll results
  • Google has experimental search – new way to see results – add view:timeline or view:info to your search query and you see things like dates or images or measurements on the pages – more efficient way to find images on a page
  • Searchmash – unbranded Google site – cool interface – why do i care? it’s extremely cool – that’s why! free of ads – lets you see other search indexes on the top right
  • google date-limiting – advanced search screen (remember a date search on the web is never a reliable thing) can also roll your own – add +&as_qdr=dn to the SERP (search results page) URL – where n is the number of days (d15 = 15 days) – items spidered in the last n days
  • – a tool for comparing search results – i prefer more results from Google or Yahoo – trust-o-meter
  • I’d prefer this… – add prefer:word to query – ranks these search results higher – test search “hybrid car prefer:convertible
  • MSN’s misspelling-suggestion engine – lets you find ways to misspell things since things on the web are not always spelled right
  • Ask’s maps – both driving and walking directions – – takes local topography (san fran – hills=bad) into account (i always use this tool when at conferences – to find out how to walk somewhere)
  • – use Exalead’s NEAR/n operator — (solar OR sun) NEAR/3 power
  • use search engines’ quick answer features – Smart Answers – Google’s OneBox – Yahoo’s Shortcuts – MSN’s Instant Answers (at the top of the search results)
  • Gigablast – limit to multiple sites – has all kinds of advanced search features
  • SnapSearch – visual search results – lets you preview the page and lets you interact with the page on the search results screen – based on the Gigablast search engine
  • Pagebull – metasearch tool – entirely visual – no words – all pictures – good if you remember what the page looked liked and can’t remember name
  • – search results deliver small fact-bites – max 30 results – pull factual sentence from the search results
  • TextRunner “information mining” looks for statements like factbites
  • – source for national stats – cool tool for presenting graphical info (also a statemaster)
  • TouchGraph – find relationship among URLs – finds related books in amazon (uses subject terms) – graphical results
  • just a reminder here – check out podcast lectures from yale, princeton, uc berkley, stanford, johns hopkins – all providing lectures online for free
  • Kosmix – a vertical search engine on steroids – more than just websites – trusted sources – other concepts/related concepts – videos – yahoo questions and answers
  • LOUIS – library of unified information sources – searchable documents from congressional reports
  • for the full text of us supreme court cases – incomplete now – but keep an eye on this one –

I know this is a very note-like post – but this presentation lended itself to this style. See Mary Ellen’s list of links.

Technorati Tags: