« The Mole returnsmelancholy anecdotes »

The problem of slide 19

04/28/09

  03:57:00 pm by The Jeering Mole, Categories: Announcements

The Mole recently attended an SDForum Software Architecture and Modeling SIG talk on Cassandra, a "distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers". Cassandra was developed at Facebook to speed up certain searches; to the Mole's understanding, it was built to support indexing of user-to-user messages by the words in those messages. (That is, if a message contains the phrase "Jeering Mole", the words "Jeering" and "Mole" will be put into the index as keys with the message id as the corresponding value.) The developers seemed to have hopes or expectations that Cassandra will be rolled out more widely into the Facebook infrastructure.

The Mole hesitates to contrast this talk with the previous one ("The Magic Behind Multi-tenancy", which The Mole described as a righteous hack) with the phrase "from the sublime to the ridiculous". There is some possibility that The Mole's understanding was more limited than he realized, a deficiency not compensated for by excellent presentation skills. More likely, however, is the possibility that Casandra is just hacking -- fancy, upscale hacking by talented programmers -- without the clean architectural lines of Salesforce.com's delightful perversion of relational database design principles. [The Mole promises to blog at greater length about that talk.]

But what really distressed The Mole was slide 19 of the Cassandra talk, "Information Flow in the Implementation". The content does not bother The Mole: it actually seems quite interesting, in its original context. The problem is that the diagram and the text below it are taken without attribution from the paper "The \phi Accrual Failure Detector" by Naohiro Hayashibara, Xavier Defago, Rami Yared and Takuya Katayama [pdf]. Poor form, gentlemen.

Overall, The Mole's conclusion is that these speakers are clever but sloppy. They have mashed up several good ideas -- deliberately accepting errors in order to achieve high availability, using hashing to load balance, accrual failure detection -- into a system that apparently works. While the result may scale to larger server farms, the process by which is was delevered will not scale to more substantial projects.

[The title of this entry is indeed meant to echo the title of the famous story by Jacques Futrelle.]

1 comment

Comment from: The Jeering Mole
The Jeering Mole

Ah, the interwebs, a land of broken links. Since this was posted SDForum has morphed into SVForum, replaced the technology serving up their site, and deleted at least some historical content. A version of the slides that prompted this post may be found (for now, at least) at http://www.slideshare.net/jhammerb/data-presentations-cassandra-sigmod/ where the diagram mentioned appears on slide 16.

04/26/13 @ 10:25


Form is loading...

Search

  XML Feeds

b2