Tuesday, June 24, 2008

Find Your Information Without the Camels

the Tower of Babel and confusion of tongues

Sixty-four years ago, in 1944, Jorge Luis Borges wrote a story he called “The Library of Babel.” Borges describes a library that contained all the knowledge in the world. And not merely all the knowledge was held there, but all the conceivable information, including "the interpolation of every book in all books", and a detailed history of the future.

"When it was proclaimed that the Library contained all books," Borges writes, "the first impression was one of extravagant happiness. All men felt themselves to be masters of an intact and secret treasure." But soon, this "inordinate hope was followed by an excessive depression." Some of Borges' characters grew frustrated, others went mad. The library contained everything, but because it was so imponderably vast, no one could ever find the book they needed.

The problem — how to find it — is not new. Around the year 1200, a Persian scholar named Abdul Kassem Ismael, decided that he must take his entire library with him when he traveled: a library of 117,000 books. He stored these books on the backs of 400 camels. The camels had been trained to walk in alphabetical order, so that the scholar could, with relative ease, find any one of his precious books.

For my book collection, I can train the camels; for searching the Web I have Google. How can I find the information on my own computer’s hard drive, which every year grows larger, and is now (on one computer alone), 500 gigabytes?

(One gigabyte, by the way, is slightly more than 1 billion bytes. My hard drive can hold approximately the same amount of information as stored in -- if the novel is 50,000 words -- 1.6 billion novels.)

So how to find my own vast quantities of data that I create or store? … There are three approaches: you can make a search tool; you can buy a search tool; or you can use a free search tool.

book cover, Ferret


1. Make Your Own Search Tool With Ferret


The camel can’t do it for you, but a Ferret can. Ferret is a powerful text search engine library that works with a programming language called Ruby. You can use the team of Ferret and Ruby to find data on hard drives and on servers. Ferret, a 100-page book by David Balmain (published by O’Reilly, 2008) is written in a concise and friendly style. The book shows you step-by-step how to create a powerful tool for finding your data in many kinds of file formats, including text, Open Office, MS-Word, HTML, PDF, the EXIF tags information stored in jpeg files, and the tagged information stored in mp3s. You’ll learn how to install Ferret, set up your index, fine tune and optimize your index, and how to search. There is an enormous amount of information on the Web about Ruby, but hardly anything about Ferret: a beginner-level programmer like me could never get along without a book like this. More information about the book may be found at the publisher’s website: http://oreilly.com/catalog/9780596519407/index.html.


2. Buy a Search Tool

Mac Users who want to drop 99 Euros might look at the newly released search tool named “Fox Trot”, (http://www.ctmdev.com/foxtrot/), which I have not yet tried. “Confidence,” wrote Robert Benchley, “is going after Moby Dick with a fishhook and a jar of tartar sauce.” … Confidence might be redefined this way: creating a search tool for Mac that’s better than the already-amazing “Spotlight”. Time and testers will tell.

Windows PC users who want a superb tool (that I use frequently), can test File Locator Pro ( http://www.mythicsoft.com/Page.aspx?type=filelocatorpro&page=home ). It’s fast, it’s flexible, and it finds everything I’ve ever lost. This software lets you search with simple words, or Boolean expressions, or Regular Expressions. The cost is $ 29.95, and it pays for itself in the time it saves. File Locator Pro is “the best of the best” in the world of Windows shareware.

3. Use A Free Search Tool: Google Desktop or Spotlight


Google’s free tool “Google Desktop” works on Windows PCs and Macs, and finds your computer’s information using a Google-like interface. You’ll need to download the software to your computer, and then (just like Google’s picture-organizing software, Picasa) let it run so it can index your data. It will find data in text files, HTML files, Microsoft files, zip files, and PDF. You can search additional file formats by installing any of the 81 (at the time of this writing) plug-ins, to extend the power of the search. For more information, take a look at a Google Desktop’s overview, here: http://desktop.google.com/features.html#overview

Spotlight is the search tool that comes free with every Leopard Operating system for Mac computers. Like many aspects of the Mac, it just works: type in your search term and you’ll find things in a blink. Spotlight has its plugins, too (see this page: http://www.apple.com/downloads/macosx/spotlight/). And Spotlight offers many advanced features, all explained in documentation on Apple’s website.

Years ago, your 40-megabyte hard drive (which you were proud of, thanks to its enormous capacity) was manageable: you could organize your data into folders, label these folders (work, projects, persons … ) and then use nothing but your eyeballs to find your files. These days, with home-computer hard drives moving toward the tremendous terabyte, you need to take some time to find the tools that are just right for you, and then take a bit more time to learn how to use these tools proficiently.

You've got it, but can you find it? ... As the African proverb reminds us: “Gold has no value if it remains inside the mountain.”