Language Selection

English French German Italian Portuguese Spanish

Enough Keyword Searches. Just Answer My Question.

Filed under
Web

SEARCH engines are so powerful. And they are so pathetically weak.

When it comes to digging up a specific name, date, phrase or price, search engines are unstoppable. The same is true for details from the previously concealed past. For better and worse, any information about any of us - true or false, flattering or compromising - that has ever appeared on a publicly available site is likely to be retrievable forever, or until we run out of electricity for the server farms. Carefree use of e-mail was once a sign of sophistication. Now to trust confidential information to e-mail is to be a rube. Despite the sneering term snail mail, plain old letters are the form of long-distance communication least likely to be intercepted, misdirected, forwarded, retrieved or otherwise inspected by someone you didn't have in mind.

Yet for anything but simple keyword queries, even the best search engines are surprisingly ineffective.

Recently, for example, I was trying to track the changes in California's spending on its schools. In the 1960's, when I was in public school there, the legend was that only Connecticut spent more per student than California did. Now, the legend is that only the likes of Louisiana and Mississippi spend less. Was either belief true? When I finally called an education expert on a Monday morning, she gave me the answer off the top of her head. (Answer: right in spirit, exaggerated in detail.) But that was only after I'd wasted what seemed like hours over the weekend with normal search tools. If it sounds easy, try using keyword searches to find consistent state-by-state data covering the last 40 years.

We live with these imperfections by trying to outguess the engines - what if I put "per capita spending by states" in quotation marks? - and by realizing that they're right for some jobs and wrong for others.

One branch of the federal government is desperate enough for a better search tool that its efforts could be a stimulus for fundamental long-term improvements. Last week, I spent a day at a workshop near Washington for the Aquaint project, whose work is unclassified but has gone virtually unnoticed in the news media. The name stands for "advanced question answering for intelligence," and it refers to a joint effort by the National Security Agency, the C.I.A. and other federal intelligence organizations. To computer scientists, "question answering," or Q.A., means a form of search that does not just match keywords but also scans, parses and "understands" vast quantities of information to respond to queries. An ideal Q.A. system would let me ask, "How has California's standing among states in per-student school funds changed since the 1960's?" - and it would draw from all relevant sources to find the right answer.

In the real Aquaint program, the questions are more likely to be, "Did any potential terrorist just buy an airplane ticket?" or "How strong is the new evidence of nuclear programs in Country X?" The presentations I saw, by scientists at universities and private companies, reported progress on seven approaches to the problem. (The new I.B.M. search technology discussed here last year is also part of the Aquaint project.)

There will be more to say later about this effort. On the bright side, apart from whatever the project does for national security, its innovations could eventually improve civilian search systems, much as the Pentagon's Arpanet eventually became the civilian Internet. Of course, the dark potential in ever more effective search-and-surveillance systems is also obvious.

For the moment, consider several here-and-now innovations that can improve on the standard Google-style list of search hits. Ask Jeeves, whose site is Ask.com, recently introduced two features that enhance its long-established question-and-answer format. One tries to recast search terms into a question that can be answered on the Web; the other offers suggestions to broaden or narrow the search. Answers.com, a free version of what was once called GuruNet, combines conventional search results with questions and answers.

Two related sites, Clusty.com and its parent, Vivisimo.com, categorize the hits from each search, producing a kind of table of contents of results. Another site, Grokker.com, does something similar in a visual form; it is free online or $49 for a desktop version. And the bizarrely named but extremely useful MrSapo.com has become my favorite search portal, because it allows quick, easy comparisons of the results of the same search on virtually any major engine.

By JAMES FALLOWS.

More in Tux Machines

Openwashing: Facebook, Microsoft/Adobe and More

Hyperthreading From Intel Seen as Dodgy, Buggy

  • Intel Hyper Threading Performance With A Core i7 On Ubuntu 18.04 LTS
    Following the news yesterday of OpenBSD disabling Intel Hyper Threading by default within its OS over security concerns and plans to disable Simultaneous Multi Threading for other processors/architectures too, here are some fresh Intel HT benchmarks albeit on Ubuntu Linux. The OpenBSD developer involved characterized HT/SMT as "doesn't necessarily have a positive effect on performance; it highly depends on the workload. In all likelihood it will actually slow down most workloads if you have a CPU with more than two cores." So here are some benchmarks using a current-generation Intel Core i7 8700K six-core processor with Hyper Threading.
  • SMT Disabled by Default in -current
  • OpenBSD Will Disable Intel Hyper-Threading To Avoid Spectre-Like Exploits
    OpenBSD, an open source operating system that focuses on security, announced that it will disable Intel’s Hyper-Threading (HT) feature so that attackers can no longer employ Spectre-like cache timing attacks.
  • Intel’s hyperthreading blocked on OpenBSD amid hints of new Spectre-like bugs
    The maintainer of open source Unix-like operating system, OpenBSD, has announced that it will disable hyperthreading on Intel CPUs because of security concerns. It claims that simultaneous multithreading creates a potential new attack vector for Spectre-like exploits, and plans to expand its disabling of multithreading technologies to other chip manufacturers in the near future.

Programming/Development: ISO C++, Rust, FBGraphics and So-called 'DevOps'

  • Trip Report: C++ Standards Meeting in Rapperswil, June 2018
    A couple of weeks ago I attended a meeting of the ISO C++ Standards Committee (also known as WG21) in Rapperswil, Switzerland. This was the second committee meeting in 2018; you can find my reports on preceding meetings here (March 2018, Jacksonville) and here (November 2017, Albuquerque), and earlier ones linked from those. These reports, particularly the Jacksonville one, provide useful context for this post. At this meeting, the committee was focused full-steam on C++20, including advancing several significant features — such as Ranges, Modules, Coroutines, and Executors — for possible inclusion in C++20, with a secondary focus on in-flight Technical Specifications such as the Parallelism TS v2, and the Reflection TS.
  • Proposal for a staged RFC process
    I consider Rust’s RFC process one of our great accomplishments, but it’s no secret that it has a few flaws. At its best, the RFC offers an opportunity for collaborative design that is really exciting to be a part of. At its worst, it can devolve into bickering without any real motion towards consensus. If you’ve not done so already, I strongly recommend reading aturon’s excellent blog posts on this topic. The RFC process has also evolved somewhat organically over time. What began as “just open a pull request on GitHub” has moved into a process with a number of formal and informal stages (described below). I think it’s a good time for us to take a step back and see if we can refine those stages into something that works better for everyone. This blog post describes a proposal that arose over some discussions at the Mo
  • C gfx library for the Linux framebuffer with parallelism support
    FBGraphics was made to produce fullscreen pixels effects easily with non-accelerated framebuffer by leveraging multi-core processors, it is a bit like a software GPU (much less complex and featured!), the initial target platform is a Raspberry PI 3B and extend to the NanoPI (and many others embedded devices), the library should just work with many others devices with a Linux framebuffer altough there is at the moment some restrictions on the supported framebuffer format (24 bits).
  • 16 blogs and newsletters to follow for DevOps practitioners

Brave/Mozilla News

  • Deterministic Firefox Builds
    As of Firefox 60, the build environment for official Firefox Linux builds switched from CentOS to Debian. As part of the transition, we overhauled how the build environment for Firefox is constructed. We now populate the environment from deterministic package snapshots and are much more stringent about dependencies and operations being deterministic and reproducible. The end result is that the build environment for Firefox is deterministic enough to enable Firefox itself to be built deterministically.
  • Brave Launches User Trials for Opt-In Ads That Reward Viewers
    We’ve been busy building our new Basic Attention Token (BAT) platform, which includes a new consent-based digital advertising model that benefits users, publishers, and advertisers. Our first phase started last Fall with the integration of BAT into Brave Payments, and enabled users to anonymously distribute contributions to their favorite publishers and creators.
  • Get Paid For Watching Ads: Brave Browser Announces Opt-in Trials
    Brave, the web browser which garnered a huge fan following, predominantly for its ad blocking feature, and depriving advertisers of confiscating private data by blocking trackers is in the news again. And this time, users can earn some cash. In a blog post, Brave announced that it will be conducting voluntary testing of their new ad model in which they will showcase at least 250 pre-packaged ads to users who will sign up for their early access version. Thus, offering a small amount of money in the form of micropayments.