Sunday, August 9, 2009

video hosting business models

I used to upload my latest video presentation. Mainly I used this because I did not have my access to Amazon S3 setup yet and it was hooked directly into my Camtasia V4 platform I used to record the video.

Apparently the guy who I worked with on the video is fairly popular on the Mma newsgroup. So one night last week I announced it on a newsgroup he frequents with a URL to

So early the next morning I get a notice from screencast saying that my free account is about to be shut off because I have used up 75% of my 2GB data transfer allowance on my free account. This did not make sense because I know the video compressed down to only about 50 MB.

But maybe the people on that group wanted to see it. So I purchased the $9.95/month pro account from screen cast. This gives 25GB of storage and 200GB of data transfer each month.

Then I downloaded and installed FireFox (no! :-) because it has a Amazon S3 plug-in. And I practiced uploading and downloading from there. My plan was to move all my stuff over to S3 (at the end of the month) because it is way cheaper, right? Uhhh?

I tell lots of people to use Amazon's S3 service because it is only pennies per GB of storage and transfer rate. I looked out there this week and it was $0.17/GB to tranfer data from their server. That's way less than screencast, right?

But if you figure it out, 200GB for $10/month is actually only $0.05 per GB!

So if you look at the break even point, around 50GB tranfering out of Amazon is about the same as screencast charges. The next 150GB is cheaper on screencast!

So even though many people with Macs have had trouble seeing the video, lots of people have downloaded it and I have transferred more more than 2.5GB (+1.5GB on the free account) out of screencast so far. At 50MB per viewing, that could be 75 viewings, probably isn't...just me testing 50+ times! :-)

Finally, if I make a video (or a set) that will be viewed sufficiently many times to transfer more than 50GB of data, would be the way to go.


video about publishing in Mathematica

I have posted 2 versions of this:

the PC version is here.

the Mac version is here

Also this was originally announced here

I am interested in any and all comments.


Sunday, July 12, 2009

the "rain Google Columbus" question

In David Huynh's Parallax faceted browser for Freebase, his video describes how to use the application to answer the following question: give me a list of schools attended by the children of Republican presidents.

This is another example of a query that you cannot answer using a search engine. In fact, it is a Semantic Web type query.

I attended the Freebase hack-day and I spoke to David about Parallax. While describing W|A, I posited another query that Alpha may answer one day and that Mathematica (with a suitably written program) can answer today. This question is:

How many days in 2009 did the Google stock price decline on the same day that is rained in Columbus Ohio?

It is interesting to note that (according to a Metaweb engineer), Freebase cannot answer the Google component of this question either! :-)

This is a simple calculation using the Mathematica primitives and curated data. This is explained in detail in the short series rain Google Columbus Part 1, Part 2 and Part 3. There is also the Google query results. As stated earlier, it is not a fair question to ask Google.

We wanted to answer the rain Google Columbus question. The answer is: there are 26 trading days in the first half of 2009 when the Google stock price declined (compared to the previous day) and it rained in Columbus Ohio more than 0.3 inches on the same day.

Of course this could be done with some other stock (not GOOG), some other finance-related data (not amount closing price declined), some other city (not Columbus) and some other weather-related data (not rainfall).

We derived this answer using Mathematica curated WeatherData[] and FinancialData[]. We made a routine to return the precipitation in Columbus on a particular day. Then we computed which days the stock price declined. Then we passed that list of dates to the precipitation computation and resticted the list to rainy days.

In addition to the curated data functions, we also looked at some other Mathematica constructs such as:
1) functional programming (i.e. no loops are used, only Map[], Apply[], etc.)
2) pure and nested pure functions
3) assorted list manipulation techniques (e.g. interleaving)


Monday, June 8, 2009

Polyominoes and Graph Layouts

I have been doing some experiments with plotting graphs and their layouts. Apparently one method of layout of disconnected graphs is called "polyominoes". And this is a generalization of the word "dominoes". :-)

A log of one of my experiments with annotation is stored here.

Ref: The Geometry Junkyard

Saturday, May 30, 2009

what Alpha knows

Due to its large store of terms, W|A makes great attempts at disambiguation. Because of this certain queries will display multiple possible references.

One of the weaknesses of Alpha is opacity of the volume data behind it. Everyone (I guess) finds something they did not expect, then gets excited to look for something near by in their own mind. They type in the near by term, to find Alpha has gone stupid saying repeatedly: "W|A does not know what to do with your input".

There are 2 ways to find out more about "what Alpha knows"

One is the More dropdown button. In Wolfram's Overview video, he shows what happens when "Springfield" is entered and how the more button shows other Springfields.

Another example of the information behind is the Assumptions and suggestions. For example, "Springfield" can be used "as a phrase".

The query that really showed this to me is "cookie".

There are 5 separate other assumptions as well as more than 20 other things in the more boutton.

Each of these are other valid queries that result in info about what Alpha knows. Have at it!


Roger Williams
Franklin Laboratory

Monday, May 25, 2009

some example Alpha queries by SWolfram

The breadth of this list is amazing.

First some "Higher Math", let's integrate x^2 sin^3 x dx

Now some civics, how about gdp france

And some "civics with arithmetic", what is the gdp of france / italy

And some "internet statistics", internet users in europe compared to china and the US

And some "geography", springfield

And some "meterology", weather springfield, for the last 5 years

And some "meterology on a date", weather springfield 11/6/89

And some "meterology related to a particular day", weather in chicago when barack obama was born

And "a speed value", 5 miles/sec

And "an earnings or charging rate", $17/hour

And "a temperature", 6000C

And "a quantity of text", 6000 words

And some "info about a word", accretion

And "an amount of a precious metal", 133 g of gold

And some "chemistry", 2.5 molar H2SO4

And some "more chemistry", water 2.5 atm 200C

And some "medicine, a number from a test", LDL 50

And the "same test for a particular demographic", LDL 50 smoker male age 40

And "comparing 2 tests for that demographic", LDL vs. serum potassium male age 40 smoker

And "another common medical test", psa 0.04

And some "straight demographics", life expectancy male age 40 finland

And "someone's stats", 5'8" 160 lbs

And some "calculations about someone with those stats", running 4 mph 30 minutes 5'8" 160 lbs age 40 female

And some "bioinformatics, a DNA sequence", ATAGTCCTAGTTAAA

And let's "pick a gene from the sequence", gene FASTKD2

And let's "computation near that gene", 500 bp upstream gene FASTKD2

And "a stock symbol", MSFT

And "comparing 2 companies stock", MSFT Apple

And some "mortgage finance", mortgage 5% 30 years

now use 10000 euros

And some "financial arithmetic", bond 7% 21 years

now change the yield

And some "engineering, an airfoil computation", NACA 4351 15 deg

And some "assorted colors", red + yellow

And "a musical scale", D# minor

now play it

And some "website info",

now show history

And some "social sciences", high school teacher median wages

And some "more social sciences", france fish production

And some "comparative social science", france fish production vs. poland

And some "nutrition", Vitamin c in 214g orange juice

And "a dynamic calorie chart", 2 cups OJ + 1 slice cheddar cheese

And some "searching for a crossword entry", a__t_r

And "back to geography", mt everest

And some "computations with that", height mt everest / length golden gate bridge



And some "computational geography", 3rd largest country in europe

And some "comparative geography", gdp vs. railway length in europe

And some "civics", president of brazil in 1922

And "a name", andrew

And "two names", andrew paul

And some "probabilities", 10 flips 2 heads

And a "numerical sequence", 3, 7, 15, 31, 63, ...

And some "US aeronautics", ISS


Saturday, May 23, 2009

Wolfram|Alpha and non-searchable questions

WolframAlpha and non-searchable questions

What it is not

This product was widely panned in the blogosphere during its first week after release. But I want you to know...

WolframAlpha is not a search engine. It says it is a computational knowledge engine.

One obvious way to see this is to submit this string (stopping power air, 0.731MeV electron) into both products, like this WA Query and this Google Query.

The answer if you could call it that is equal to computing the stopping power of a material.

WolframAlpha gives a bunch of numbers and graphs. I am not sure if these are correct. I am not a student of nuclear physics.

The Google result is significantly more disappointing. Specifically, no answer, not even a bunch of numbers for which I cannot validate their veracity.

The reason is that WolframAlpha did not search the web for the answer to this question. Google did. Neither Google nor Wolfram can find it on the web, because it is not on the web!!

It does it while you wait

The Alpha product computed the answer in real time. Now maybe this is a parlor trick.

We could give the string from the Alpha examples page (fuel cost 50 miles, 20 mpg, $2.09/gal). Now Google finds the question because it crawled Alpha's page. But, it still does not have the answer! Because the answer is not on the web! And if you make the miles in the question 507, it has to be recomputed. It cannot be looked up.

So what I have been doing is giving Alpha questions that I know are not on the web. Maybe, cube root of longitude of chicago. Does this make any sense?

Well, it certainly makes numerical sense. What Alpha does in this case is to "parse" this phrase which is almost English into a few functions. I guess that this is where the NKS stuff came in. I certainly do not understand that!

Now the meat of it

Anyway, Alpha has "curated data" containing lots of cities around the globe. In this data are properties of those cities. One property that has been curated is longitude. This is a number (a magnitude) with units degrees.

Alpha has a version Mathematica running underneath it. Mathematica has a function to perform a cube root of a number. So Alpha asks Mathematica to compute the cube root of the magnitude and return it, which is very trivial for Mathematica.

Then Alpha displays the result. Does this question make sense? Does the question: compute-me-the-cube-root-of-longitude-of-chicago make sense?

I don't think so. I think that it is probably a non-sensical value.

The cost to become a champion

But when I first used Alpha and I saw it had some sports stuff, the first thing I wanted was "Tiger Woods: number of strokes career".

Alas, I could not compute this with Alpha's help. This number might be out on the web. Or someone might be working to keep such a number and post it on the web in the future. When it is out there, Goggle will crawl it and find it.

But it will be static. There probably will not be an application out there which will compute it for you whenever you want. Unil the Alpha curators put all of Tiger's matches in the DB. Then it will appear and you will be able to do other calculations with it.

I used to ask avid golfers: "Can you average under 90 if you only play once a week?" Most said no. I don't think Alpha knows this one either.

Other non-searchable, questions which can be asked now to Alpha

Some other queries which cannot be searched by anyone, they must be computed:

- square of molecular weight of iron
- eleventh largest US state (or thirteenth or 23rd)
- 2nd highest elevation in Africa
- orbital position of the international space station; of course Alpha will give a different answer at different times
- square root of longitude of chicago
- father's mother's sister's son's aunt's brother's father (play with adding more relations to the end of the string "..sister's husband's mother's father"
- Vitamin c in 214g orange juice (try 271 grams)
- elevation des moines, minneapolis, phoenix

Are these queries nonsense?

Of course these can be seen as non-sensical. Their absurdity does again point to a parlor trick.

Maybe it is just a savant who can detangle complicated English expressions easily. It is certainly possible.

When curated-data was added to Mathematica V6 & V7, I assumed it was for social scientists who wanted to ask: "How does the rainfall in Iowa vary with the square root of the annual in-migration to the US between 1936 and 1947?" With Mathematica and the requisite curated data, you can answer this question.

Further a great deal of the curated data is also based in the sciences. It has lots of stuff like nuclear physics and computational biology and chemical compound properties.

So this meant (to me) that scientists could ask their own "nonsensical questions" as part of their research.

And Alpha's plan is to broaden the scope of people who can ask these types of non-searchable questions and get an answer. And as more data becomes curated (as so much in our world is going that way), Alpha will be able to answer an infinitum of sense-less questions.

Syntax and adverbs

Alpha has been billed as understanding English. We all know about Jeeves and even the mixed success of putting full sentences into Google. Sometimes a cryptic phrase works better.

There is an interesting geneological example mentioned above. You can start by typing in "father's sister". Alpha will display a graph of a geneological tree. Then you can add an apparently infinite number of family relations onto this simple query and each time Alpha will display a gradually more complicated tree. This is great and a pefect example of a computed result from sociological data.

The issue is that if you type this "father's mother's sister' son's aunt" rather than this "father's mother's sister's son's aunt", Alpha is utterly stupid to provide any assistance, either towards the answer or even what is wrong with your question. If you crawl through these 2 strings like programmers do all day, you will see that the possessive is missing from the 3rd relation.

This is 2 distinct, but related problems. One is grammar. I called it an adverb, but it is really just a possessive noun. The other problem is just a simple syntax problem. Even the meaning is clear in the broken case, but the syntax issue "defies" Alpha. Problems like this are rampant in all current command-line software like Mathematica, Matlab and many others.


Many of these drawbacks are similar to those with Mathematica V7.

how to know how much is there? - I have not found an easy way to dump all the enities for any given subject. Sometimes, Vitamin A works and Vitamin b does not?!? I want to tell what is the range of valid inputs for this.

how to know what properties exists for which data? - Similarly there appears to be no way easily to find out the possible properties for an entity. In V7 Mathematica, this is simple. Also if there was a syntax to pickup the property, it would help the last drawback mentioned here.

how to deal with syntax issues? - I guess this goes to the heart of the NKS piece. I am sure there are many issues like those described above which cannot be as easily solved (or understood).

how to deal with semantics? (hurricane delores and hurricane sally) - I found the first query in the example text. Alpha thought the second query was comparing two movies. Of course, there is no "memory" across queries.

how to "pipeline" results? - Of course the intent would be to take a property of an entity, modify it, then use it as input to another function, then compare it to another suitably modified proerty. I cannot see a syntax to do this in the current version.

Any meaningful use?

Methinks. In fact, there are many more non-sensically structured relevant questions that are not out on the web, that we need to figure out. This is in contrast to find something that someone else already wrote, putting your own branding at the top and selling it someone.

Alpha has many issues and significant drawbacks, but I believe that it is the closest so far to these hard-to-reach answers.


Roger Williams
Franklin Laboratory