Saturday, May 23, 2009

Wolfram|Alpha and non-searchable questions

WolframAlpha and non-searchable questions

What it is not

This product was widely panned in the blogosphere during its first week after release. But I want you to know...

WolframAlpha is not a search engine. It says it is a computational knowledge engine.

One obvious way to see this is to submit this string (stopping power air, 0.731MeV electron) into both products, like this WA Query and this Google Query.

The answer if you could call it that is equal to computing the stopping power of a material.

WolframAlpha gives a bunch of numbers and graphs. I am not sure if these are correct. I am not a student of nuclear physics.

The Google result is significantly more disappointing. Specifically, no answer, not even a bunch of numbers for which I cannot validate their veracity.

The reason is that WolframAlpha did not search the web for the answer to this question. Google did. Neither Google nor Wolfram can find it on the web, because it is not on the web!!

It does it while you wait

The Alpha product computed the answer in real time. Now maybe this is a parlor trick.

We could give the string from the Alpha examples page (fuel cost 50 miles, 20 mpg, $2.09/gal). Now Google finds the question because it crawled Alpha's page. But, it still does not have the answer! Because the answer is not on the web! And if you make the miles in the question 507, it has to be recomputed. It cannot be looked up.

So what I have been doing is giving Alpha questions that I know are not on the web. Maybe, cube root of longitude of chicago. Does this make any sense?

Well, it certainly makes numerical sense. What Alpha does in this case is to "parse" this phrase which is almost English into a few functions. I guess that this is where the NKS stuff came in. I certainly do not understand that!

Now the meat of it

Anyway, Alpha has "curated data" containing lots of cities around the globe. In this data are properties of those cities. One property that has been curated is longitude. This is a number (a magnitude) with units degrees.

Alpha has a version Mathematica running underneath it. Mathematica has a function to perform a cube root of a number. So Alpha asks Mathematica to compute the cube root of the magnitude and return it, which is very trivial for Mathematica.

Then Alpha displays the result. Does this question make sense? Does the question: compute-me-the-cube-root-of-longitude-of-chicago make sense?

I don't think so. I think that it is probably a non-sensical value.

The cost to become a champion

But when I first used Alpha and I saw it had some sports stuff, the first thing I wanted was "Tiger Woods: number of strokes career".

Alas, I could not compute this with Alpha's help. This number might be out on the web. Or someone might be working to keep such a number and post it on the web in the future. When it is out there, Goggle will crawl it and find it.

But it will be static. There probably will not be an application out there which will compute it for you whenever you want. Unil the Alpha curators put all of Tiger's matches in the DB. Then it will appear and you will be able to do other calculations with it.

I used to ask avid golfers: "Can you average under 90 if you only play once a week?" Most said no. I don't think Alpha knows this one either.

Other non-searchable, questions which can be asked now to Alpha

Some other queries which cannot be searched by anyone, they must be computed:

- square of molecular weight of iron
- eleventh largest US state (or thirteenth or 23rd)
- 2nd highest elevation in Africa
- orbital position of the international space station; of course Alpha will give a different answer at different times
- square root of longitude of chicago
- father's mother's sister's son's aunt's brother's father (play with adding more relations to the end of the string "..sister's husband's mother's father"
- Vitamin c in 214g orange juice (try 271 grams)
- elevation des moines, minneapolis, phoenix

Are these queries nonsense?

Of course these can be seen as non-sensical. Their absurdity does again point to a parlor trick.

Maybe it is just a savant who can detangle complicated English expressions easily. It is certainly possible.

When curated-data was added to Mathematica V6 & V7, I assumed it was for social scientists who wanted to ask: "How does the rainfall in Iowa vary with the square root of the annual in-migration to the US between 1936 and 1947?" With Mathematica and the requisite curated data, you can answer this question.

Further a great deal of the curated data is also based in the sciences. It has lots of stuff like nuclear physics and computational biology and chemical compound properties.

So this meant (to me) that scientists could ask their own "nonsensical questions" as part of their research.

And Alpha's plan is to broaden the scope of people who can ask these types of non-searchable questions and get an answer. And as more data becomes curated (as so much in our world is going that way), Alpha will be able to answer an infinitum of sense-less questions.

Syntax and adverbs

Alpha has been billed as understanding English. We all know about Jeeves and even the mixed success of putting full sentences into Google. Sometimes a cryptic phrase works better.

There is an interesting geneological example mentioned above. You can start by typing in "father's sister". Alpha will display a graph of a geneological tree. Then you can add an apparently infinite number of family relations onto this simple query and each time Alpha will display a gradually more complicated tree. This is great and a pefect example of a computed result from sociological data.

The issue is that if you type this "father's mother's sister' son's aunt" rather than this "father's mother's sister's son's aunt", Alpha is utterly stupid to provide any assistance, either towards the answer or even what is wrong with your question. If you crawl through these 2 strings like programmers do all day, you will see that the possessive is missing from the 3rd relation.

This is 2 distinct, but related problems. One is grammar. I called it an adverb, but it is really just a possessive noun. The other problem is just a simple syntax problem. Even the meaning is clear in the broken case, but the syntax issue "defies" Alpha. Problems like this are rampant in all current command-line software like Mathematica, Matlab and many others.


Many of these drawbacks are similar to those with Mathematica V7.

how to know how much is there? - I have not found an easy way to dump all the enities for any given subject. Sometimes, Vitamin A works and Vitamin b does not?!? I want to tell what is the range of valid inputs for this.

how to know what properties exists for which data? - Similarly there appears to be no way easily to find out the possible properties for an entity. In V7 Mathematica, this is simple. Also if there was a syntax to pickup the property, it would help the last drawback mentioned here.

how to deal with syntax issues? - I guess this goes to the heart of the NKS piece. I am sure there are many issues like those described above which cannot be as easily solved (or understood).

how to deal with semantics? (hurricane delores and hurricane sally) - I found the first query in the example text. Alpha thought the second query was comparing two movies. Of course, there is no "memory" across queries.

how to "pipeline" results? - Of course the intent would be to take a property of an entity, modify it, then use it as input to another function, then compare it to another suitably modified proerty. I cannot see a syntax to do this in the current version.

Any meaningful use?

Methinks. In fact, there are many more non-sensically structured relevant questions that are not out on the web, that we need to figure out. This is in contrast to find something that someone else already wrote, putting your own branding at the top and selling it someone.

Alpha has many issues and significant drawbacks, but I believe that it is the closest so far to these hard-to-reach answers.


Roger Williams
Franklin Laboratory

No comments:

Post a Comment