Friday, June 20, 2008

Cloud Computing and Math

I've just spent the better part of two hours now trying to figure out Cloud Computing. I'm focused of course on where my product line can fit in this latest hyped paradigm. While it's positioned right now as more of a disruptive technology than anything else, there's clearly going to be some kind of future here. There are a number of nuggets out there, but so much of what I'm seeing is based either on scalable storage (seems that this one is almost solved with things like Hadoop and Map-Reduce). Which is fine for the data center people, but what about the analysts? How do you distribute your long-running simulations in these environments?

Grid technologies have been around for a long time now. The Globus Toolkit has been around for more than a decade. I've actually set this up a few years ago on a couple dual CPU Linux boxes that have been retired to my basement - so I know it's real and works. This is certainly one avenue to the distribute math solution, but it's very different than all these "cloud" solutions. Web Services underlies the communication is many architectures, but Grid isn't about SOA or SaaS.

So if you have "real" problems to solve (beyond SQL queries and other general database/data center type problems), how do you get there? I'm just not seeing it. What's actually running on the Cloud? And how do you get there? With volatile virtual Xen images, where do my compilers live? What runtime libraries are available? I haven't found anything real here at all. One of our customers is actively looking at Amazon's EC2 in concert with our Java library and he's worried about licensing issues as the solution possibly scales out to 1000s of nodes. Of course we have pricing for this having lived in the MPI world for a long time, but this isn't your father's parallel computing environment anymore.

Apparently you can create your own Amazon Machine Image with all your software on it. But how do these images talk to each other? If you spin up 500 images to run a simulation, what are the APIs for getting your data back out? Is it really just like having 500 of the same machine sitting in the back room and you just have each one dump data to a web service running on a master node on your desktop?

I guess nobody is writing articles to address these kinds of questions, so I'll have to immerse myself in Amazon's developer forums -- but that's immensely frustrating as 99% of them are solving regular old data center things and not doing anything interesting.

No comments: