| Best of
the Lists (including Best of BUSLIB-L) |
||
|
|
Do portal search engines work? Posted to kmgov@list.jpl.nasa.gov on 10/20/2010 Question (posted by Jo Ann Remshard) In addition, I've learned that NASA has implemented Autonomy and Google. How were these implemented? What tips do you have? Were the two search engines integrated into one interface? If so, how? Answer (posted by Michael Corrigan) There is a perception within the general AF user population that the Portal Discovery, well, sucks. That perception is anecdotal - after 6 years working for the CIO and now the CMO of the Air Force, I can't think of a single positive response regarding the portal discovery. That is from personal conversations, technical meetings, conferences, a range of fora. The portal engineers might have other data to present, so I do not want to present this as an official AF position. This is purely observational. However, the AF has tried other search engines to improve the responsiveness of the Portal discovery. Not much improvement has been realized. We recently had one of the newer vendors come in to talk to us, and he asked us directly what we thought of the Portal discovery, and the unanimous response was, well, it sucks. He stated that he was surprised that so many users, not just us, said the same thing. We told him that we all used Google first to find DOD and AF information - the commercial site, not the appliance you can implement on your own network. And that was deemed far more useful than the portal. So the question becomes is it the technology? My response is no. The key is how the engine is set up. For most search engines, Autonomy included, they like to autogenerate a taxonomy for the content they will be searching by perusing a corpus of the content. This will generate the basic taxonomy, which then needs to be edited by SMEs. As feedback to you, do not underestimate this process. When we did some work with Autonomy, they wanted a corpus of 10,000 documents to ensure they get a comprehensive taxonomy. Of course, garbage in garbage out, so if those 10,000 documents include emails on birthdays and NCAA basketball pools, well, you get the idea. So we did a test in the Air Force of the ability of COTS search engines to autogenerate discovery metadata for us. And we used an approach whereby our SMEs, with the help of an ontologist (actually, a knowledge engineer), to develop our own taxonomies. These taxonomies represented the information in a specific problem context (we are now working with ontologies purely, so we are much closer to actual knowledge representation). We found that given these taxonomies the search engine generated 95% accurate matches between their results for identifying relevant discovery metadata and that generated manually by the SMEs. (Actually, the search engines were far more accurate than the SMEs, we had to train the SMEs on their metadata generation using the taxonomies before they actually got decent results. Something we have all known, the average human consumer of information doesn't generate metadata very well. With an ontologist/knowledge engineer, the SMEs generated excellent taxonomies. But left to their own devices to utilize those same taxonomies in metadata generation, they weren't so effective.) Forgive the long-windedness, but the bottom line is that the most critical aspect of a successful search engine is the specification of the problem context, or vocabulary represented as a taxonomy or ontology, that will drive the search engine. Even Google now recognizes this, acknowledging the limitations of their core search technology, and acquiring Metaweb (www.metaweb.com) to integrate contextual-based search into their engine. The engines we tested, Autonomy, FastSearch, ConceptSearching, and Convera, all performed well given a good vocabulary. That is the key. And if I can make one more recommendation - build the vocabulary first using your SMEs and good knowledge elicitation, rather than using the search engine to generate it from a corpus. We found that it takes just as long if not longer to edit the engine-generated taxonomy as it does to build it from scratch using SMEs, and engine-generated taxonomy is less comprehensive. If you do this, you reduce the amount of time you spend encountering and then eliminating errors such as that we discovered when searching for information about terrorist incidents. We found out the Yankees had bombed the Red Sox. |
|