My bad opinions

  • blog
  • notes
  • about
  • Beating the CAP Theorem Checklist

    Your ( ) tweet ( ) blog post ( ) marketing material ( ) online comment
    advocates a way to beat the CAP theorem. Your idea will not work. Here is why
    it won't work:
    
    ( ) you are assuming that software/network/hardware failures will not happen
    ( ) you pushed the actual problem to another layer of the system
    ( ) your solution is equivalent to an existing one that doesn't beat CAP
    ( ) you're actually building an AP system
    ( ) you're actually building a CP system
    ( ) you are not, in fact, designing a distributed system
    
    Specifically, your plan fails to account for:
    
    ( ) latency is a thing that exists
    ( ) high latency is indistinguishable from splits or unavailability
    ( ) network topology changes over time
    ( ) there might be more than 1 partition at the same time
    ( ) split nodes can vanish forever
    ( ) a split node cannot be differentiated from a crashed one by its peers
    ( ) clients are also part of the distributed system
    ( ) stable storage may become corrupt
    ( ) network failures will actually happen
    ( ) hardware failures will actually happen
    ( ) operator errors will actually happen
    ( ) deleted items will come back after synchronization with other nodes
    ( ) clocks drift across multiple parts of the system, forward and backwards in time
    ( ) things can happen at the same time on different machines
    ( ) side effects cannot be rolled back the way transactions can
    ( ) failures can occur while in a critical part of your algorithm
    ( ) designing distributed systems is actually hard
    ( ) implementing them is harder still
    
    And the following technical objections may apply:
    
    ( ) your solution requires a central authority that cannot be unavailable
    ( ) read-only mode is still unavailability for writes
    ( ) your quorum size cannot be changed over time
    ( ) your cluster size cannot be changed over time
    ( ) using 'infinite timeouts' is not an acceptable solution to lost messages
    ( ) your system accumulates data forever and assumes infinite storage
    ( ) re-synchronizing data will require more bandwidth than everything else put together
    ( ) acknowledging reception is not the same as confirming consumption of messages
    ( ) you don't even wait for messages to be written to disk
    ( ) you assume short periods of unavailability are insignificant
    ( ) you are basing yourself on a paper or theory that has not yet been proven
    
    Furthermore, this is what I think about you:
    
    ( ) nice try, but blatantly false advertising
    ( ) you are badly reinventing existing concepts and should do some research
    ( ) in particular, you should read the definition of the word 'theorem'
    ( ) also you should read the definition of 'distributed system'
    ( ) you have no idea what you are doing
    ( ) do you even know what a logical clock is?
    ( ) you shouldn't be in charge of people's data
    

    Also thanks to tef for some editing.