Language Selection

English French German Italian Portuguese Spanish

Enough Keyword Searches. Just Answer My Question.

Filed under

SEARCH engines are so powerful. And they are so pathetically weak.

When it comes to digging up a specific name, date, phrase or price, search engines are unstoppable. The same is true for details from the previously concealed past. For better and worse, any information about any of us - true or false, flattering or compromising - that has ever appeared on a publicly available site is likely to be retrievable forever, or until we run out of electricity for the server farms. Carefree use of e-mail was once a sign of sophistication. Now to trust confidential information to e-mail is to be a rube. Despite the sneering term snail mail, plain old letters are the form of long-distance communication least likely to be intercepted, misdirected, forwarded, retrieved or otherwise inspected by someone you didn't have in mind.

Yet for anything but simple keyword queries, even the best search engines are surprisingly ineffective.

Recently, for example, I was trying to track the changes in California's spending on its schools. In the 1960's, when I was in public school there, the legend was that only Connecticut spent more per student than California did. Now, the legend is that only the likes of Louisiana and Mississippi spend less. Was either belief true? When I finally called an education expert on a Monday morning, she gave me the answer off the top of her head. (Answer: right in spirit, exaggerated in detail.) But that was only after I'd wasted what seemed like hours over the weekend with normal search tools. If it sounds easy, try using keyword searches to find consistent state-by-state data covering the last 40 years.

We live with these imperfections by trying to outguess the engines - what if I put "per capita spending by states" in quotation marks? - and by realizing that they're right for some jobs and wrong for others.

One branch of the federal government is desperate enough for a better search tool that its efforts could be a stimulus for fundamental long-term improvements. Last week, I spent a day at a workshop near Washington for the Aquaint project, whose work is unclassified but has gone virtually unnoticed in the news media. The name stands for "advanced question answering for intelligence," and it refers to a joint effort by the National Security Agency, the C.I.A. and other federal intelligence organizations. To computer scientists, "question answering," or Q.A., means a form of search that does not just match keywords but also scans, parses and "understands" vast quantities of information to respond to queries. An ideal Q.A. system would let me ask, "How has California's standing among states in per-student school funds changed since the 1960's?" - and it would draw from all relevant sources to find the right answer.

In the real Aquaint program, the questions are more likely to be, "Did any potential terrorist just buy an airplane ticket?" or "How strong is the new evidence of nuclear programs in Country X?" The presentations I saw, by scientists at universities and private companies, reported progress on seven approaches to the problem. (The new I.B.M. search technology discussed here last year is also part of the Aquaint project.)

There will be more to say later about this effort. On the bright side, apart from whatever the project does for national security, its innovations could eventually improve civilian search systems, much as the Pentagon's Arpanet eventually became the civilian Internet. Of course, the dark potential in ever more effective search-and-surveillance systems is also obvious.

For the moment, consider several here-and-now innovations that can improve on the standard Google-style list of search hits. Ask Jeeves, whose site is, recently introduced two features that enhance its long-established question-and-answer format. One tries to recast search terms into a question that can be answered on the Web; the other offers suggestions to broaden or narrow the search., a free version of what was once called GuruNet, combines conventional search results with questions and answers.

Two related sites, and its parent,, categorize the hits from each search, producing a kind of table of contents of results. Another site,, does something similar in a visual form; it is free online or $49 for a desktop version. And the bizarrely named but extremely useful has become my favorite search portal, because it allows quick, easy comparisons of the results of the same search on virtually any major engine.


More in Tux Machines

Raspberry Pi Alternatives

The phenomenon behind the Raspberry Pi computer series has been pretty amazing. It's obvious why it has become so popular for Linux projects—it's a low-cost computer that's actually quite capable for the price, and the GPIO pins allow you to use it in a number of electronics projects such that it starts to cross over into Arduino territory in some cases. Its overall popularity has spawned many different add-ons and accessories, not to mention step-by-step guides on how to use the platform. I've personally written about Raspberry Pis often in this space, and in my own home, I use one to control a beer fermentation fridge, one as my media PC, one to control my 3D printer and one as a handheld gaming device. Read more

Matrix Voice RPi add-on with 7-mic array relaunches

Matrix Labs has publicly relaunched its FPGA-driven “Matrix Voice” voice input add-on board for the Raspberry Pi and Up board for $55, or $65 for a standalone model equipped with an ESP32. Matrix Labs has shipped its “mostly open source” Matrix Voice Raspberry Pi add-on board for Linux-compatible voice recognition and voice assistant technologies such as Alexa and Google Assistant. The circular board launched in February on Indiegogo, and earned over $130,000 in pledges. The Matrix Voice is now available from the Matrix Labs website for only $10 over the original $45 early bird price. Read more

Programming: Donald Knuth’s 80th Birthday and More

  • Octogenarianhood
    2018 began for me with an absolutely incredible 80th birthday celebration called Knuth80, held in the delightful city of Piteå in northern Sweden. It's impossible for me to thank adequately all of the wonderful people who contributed their time to making this event such a stunning success, certainly one of the greatest highlights of my life. Many of the happenings were also captured digitally in state-of-the-art audio and video, so that others will be able to share some of this joy. I'll link to that data when it becomes available.
  • Celebrating Donald Knuth’s 80th Birthday

    Don suggests that some of the participants who have a little free time might like to look at a few conjectures about set partitions and generating functions that he has put online at

  • Tidyverse and data.table, sitting side by side ... (Part 1)
  • Rcpp 0.12.15: Numerous tweaks and enhancements
    The fifteenth release in the 0.12.* series of Rcpp landed on CRAN today after just a few days of gestation in incoming/. This release follows the 0.12.0 release from July 2016, the 0.12.1 release in September 2016, the 0.12.2 release in November 2016, the 0.12.3 release in January 2017, the 0.12.4 release in March 2016, the 0.12.5 release in May 2016, the 0.12.6 release in July 2016, the 0.12.7 release in September 2016, the 0.12.8 release in November 2016, the 0.12.9 release in January 2017, the 0.12.10.release in March 2017, the 0.12.11.release in May 2017, the 0.12.12 release in July 2017, the 0.12.13.release in late September 2017, and the 0.12.14.release in November 2017 making it the nineteenth release at the steady and predictable bi-montly release frequency. Rcpp has become the most popular way of enhancing GNU R with C or C++ code. As of today, 1288 packages on CRAN depend on Rcpp for making analytical code go faster and further, along with another 91 in BioConductor.

Android Leftovers