Posts

Showing posts with the label Nosql

Schemaless — The Dark Side

Image
“ We store everything as JSON, that way we don’t worry about schema ”  —  #CowboyDeveloper You’ve probably heard some variation of the above too — probably from some of the more religious of the   Schemaless   crowd. Oh there are many   good reasons for doing this — things like deferring decisions to the last responsible moment, being agile, and so forth. There is, however, a much darker — and sadder — possibility, that   they just don’t get it . The thing is, a database schema defines it’s   structure , the types/formats of data that are permitted, the relationships between them, their integrity constraints, etc. In relational databases, most of this is directly codified in the form of tables, views, indexes, triggers, and whatnot. What we tend to forget, however, is that all of these exist for a reason — they provide the   rules , the framework if you will, of how we interact with the database. These   rules   need to exist   somew...

Mahesh's Twentythird Law - JSON + NoSQL =/= Automatic Scale

Image
Throwing JSON documents into a NoSQL database is not going to automatically allow your system to scale. Corollary No, MongoDB is not  the answer. Note OK, MongoDB is  pretty good from a prototyping perspective, but it will  end up biting you in the butt... In fact, the general response for anything  other than prototyping should really be in two parts " Do you really know what you are doing?" " No you don't ." I'm getting pretty seriously curmudgeonly about this nowadays (" Get off my lawn" , etc.).  People probably shouldn't be using anything NoSQL  at all - they invariably end up shooting themselves in the foot...

Wither Couch? (Base, DB, whatever)

Curt Monash talks to James Phillips at Couchbase about their future, and comes away, well pretty much where he was before. There is nothing drastically new in the article as far as Couch (Base/DB) is concerned, there is plenty of information available through The Googles about whats going on there as far as futures, players, etc. are concerned. The part I found fascinating was at the very end, when he says MongoDB is the big competition. He believes Couchbase has an excellent win rate vs. 10gen for actual paying accounts. DataStax/Cassandra wins over Couchbase only when multi-data-center capability is important. Naturally, multi-data-center capability is planned for Couchbase. (Indeed, that’s one of the benefits of swapping in CouchDB at the back end.) Redis has “dropped off the radar”, presumably because there’s no particular persistence strategy for it. Riak doesn’t show up much. Which is interesting, to say the least. The MongoDB is the big competition part is absolute...

SQL Joins - Visualized

Image
Back in 2009, C.L. Moffatt put together this gorgeous cheat-sheet on SQL JOINs As he put it I'm a pretty visual person. Things seem to make more sense as a picture. I looked all over the Internet for a good graphical representation of SQL  JOIN s, but I couldn't find any to my liking. Some had good diagrams but lacked completeness (they didn't have all the possible JOINs), and some were just plain terrible. So, I decided to create my own and write an article about it. Enjoy... Hat tip Alex Popescu

The Database Landscape - Visualized

Image
Matthew Aslett has an updated version of the database landscape chart that he had put together a while back - done in hardcore London Underground style (and it is the better for it). Check it out (click to embiggen...)

MySQL vs Postgres (vs MongoDB)

Image
Chris Travers has a fairly nifty article up titled O/R Modelling interlude: PostgeSQL vs MySQL , where-in he makes the claim MySQL is what you get when application developers build an RDBMS. PostgreSQL is what you get when database developers build an application development platform. This isn't really flame-bait - its intended as a statement to show how people approach arguments (flame wars?) about MySQL vs PostgreSQL.  To paraphrase Chris, MySQL has been massively disruptive because it tends to really, really look at the world from a Use Case perspective, answering the questions " What problem are you trying to solve ", while Postgres has been massively disruptive because it tends to look at the world from a theory perspective, answering the question " How should the database work in the solution " I'd add MongoDB to this mix though, and extend the above quote to say MySQL is what you get when application developers build an RDBMS. PostgreS...

High Performance SQL @ Google

Image
There is a paper out by Google on how they migrated from MySQL to F1 for AdSense . Its fascinating reading on its own, but its got some key take-aways for scaling and reliability.  I'd recommend going and reading it, but these two are worth emphasizing Scaling Eliminate the R in ORM .  In short, instead of building out a nice layer (hibernate, whatever) between your DB and your code, go the other way, and deliberately expose the workings of your DB to your application developers.  This actually makes tremendous sense - in many ways the advent of NoSQL has been due to people specifically picking data stores that map to their application requirements .  Need a key-value store? Use Riak .  Need a document store wht guaranteed writes?  Use CouchDB .  Hiding the specifics of your data store behind an ORM layer is - increasingly - becoming irrelevant, and Google quite probably just made it official. Fault-Tolerance Use 5 replicas. 5? Why 5...

Payware - and the BigData Ecosystem

Image
Dan Woods has an article at Forbes about " How Hadoop and SAP HANA can accelerate Big Data Startups ".  Its pretty hard to read - you have to get past the obvious shilling for SAP  (the byline is a bit of a giveaway - " He has written several books and created other research and educational content for SAP "), and after that, you have to ignore the Hadoop-centric nature of the post (there are other fish in the sea, you know?) You have to get all the way to the end of the article before you get to the meat, which boils down to ... can SAP make it as easy to experiment with SAP HANA as it is to download and use open source? ... will developers buy into SAP’s efforts at being open and making SAP HANA easy to use? (If it is) Priced too high or with onerous terms, SAP HANA won’t make sense to startups. And these points, in the end, are what its all about. A significant - and somewhat under-reported - aspect of the current BigData boom is that people ar...

NoSQL will rot your brain! (Or something like that)

Image
D'you remember this ridiculous article in The Database Journal on " The Hidden Cost of Scaling in NoSQL " ? I won't go into it again - if I remember correctly, I summed up my take on the whole article as  Mind you, they did miss the following very important points too Baby Seals :  NoSQL developers club baby seals with joy and abandon, and make gloves with their flippers Puppies : Using NoSQL makes you want to kick puppies ( and kittens!) Nazis : Hitler used a NoSQL database Seriously, if this isn't sufficient reason for The Database Journal to be shut down, well, I can't think of a better one... Now Sreedhar Kajeepeta in InformationWeek has tried to one-up them with the provocatively titled NoSQL Everywhere? Not So Fast.  The difference, this time, is that instead of a series of ad hominem attacks, this article takes the Cool-Whip approach ( An argument that is so content-free you end up nauseated, and acquiesce by default ).  For ...

The CAP Theorem is Wrong! (Not really)

Image
I've been having this long argument w/ a colleague about the merits of SQL vs NoSQL, which I'm anot going to get into here, but recently it devolved into claims along the lines of "The CAP Theorem is Wrong" (no, thats not my quote). It took me a while to figure out his point, but it turned out to be a fairly common misunderstanding, viz., If you have predictable Network Partitions, you can guarantee C, A, and P . i.e., if you know, beforehand, exactly how your network is going to get blowed up, then you can protect against it, and have Consistency and Availability and Partition Tolerance .  This is absolutely true! Huh? Say what? Well, yeah.  Note the weasel phrase " if you know, beforehand ".  This is not unlike saying, "If you know, beforehand, what the winning lottery numbers are going to be, you can win the lottery every time ". The point about Partition Tolerance is that it needs to handle arbitrary / random scenarios.  ...

Eventual Consistency is not something that needs to be Worked Around. Its a feature

Image
You can absolutely go read up on Eventual Consistency , the CAP theorem , etc. to understand Eventual Consistency, quite a bit of which is good.  They tend to skimp on something I find quite important though, i.e., the role of White Lies in an Eventually Consistent environment. Depending on the architectural context, you can end up with a system where there is a requirement for consistency, but only once all the Business Processes are taken into count.   Or, to phrase it differently, if you are willing to lie Just a Little , you can relax constraints without getting yourself into trouble. Consider my company, which, to put it technically, does all sorts of big cloud telephony thingies.  These thingies are all donewith clumps of servers ("clump" --> another technical term).  At the simplest level, what we have is just a big honking phone system.  Y'know, the kind where people call each other and leaves voicemails and stuff. So, consider a fairl...

NoSQL developers club baby seals! With Abandon!

Image
Herewith I present to you a <sarcasm> spectacular </sarcasm> article in The Database Journal on The Hidden Cost of Scaling with NoSQL which includes brilliant  bon-mots like the following ( bolding is mine) Data integrity —In order to achieve high performance despite massive size, non-relational database systems compromise data correctness guarantees. The traditional rules about writing data are loosened, making it far more likely that data can be lost or overwritten. Thus the best applications for a non-relational approach are those that have low-to-medium requirements for data integrity, for example, social media applications. Any application whose data integrity requirements are absolute requires a relational database; NoSQL is a non-starter. Flexible indexing —Relational databases are very good at letting users query data from multiple perspectives. Joins and indexes are not weaknesses of relational databases, they are strengths. To achiev...

I have a Hammer! Show me them Nails!

Image
Its part of the human condition - the desire to solve all the problems you encounter with the tools that you currently have.  Its not necessarily a bad thing - when Goongrah was being attacked by the Bear in the days of yore, he defended himself with the club he was carrying, since saying " Yo Mr. Bear.  Hang on whilst i go fetch my spear from yonder cave.  And by the way, Ugh "  wasn't really an option. But ye gods, has this ever evolved into some remarkable insanity, especially in The World Of Development.  Everything, but everything needs to be done using the same tools.  People will go out of their way, and twist themselves into knots to retrofit all their problems to their one solution.  Take databases for example - till fairly recently people had two hammers (if that) - that they could use MySQL / PostgreSQL / Oracle for their persistence needs, and memcache for their key-value store.  And to be serious, some of the more devout ones...

.NET brings the pain to DynamoDB

Oh Nifty - for all you .NET developers out there, here is a Step-by-Step Guide to Amazon DynamoDB w/ .NET .  11 short steps to creating a table and setting up CRUD on it. . <pause while - hopefully - people go look> . <pause> . <you're back, right?> Is it just me?  Or does this seem overly complex?  Then again, I don't do Visual Studio, and I *really* don't do .NET (Does one imply the other? I don't really want to know), but this feels, well, pretty painful to me. Have I mentioned that I hate Java too? Do I sound like Andy Rooney?

Documents as the single source of truth - Telephony Edition

Image
Paul Hammant has an interesting article up about The document as the single source of truth - which made me think about How We Do Things.  Aeons ago, we wrote the first version of our telephony platform, and it was State Of The Art, and It Was Good .  It had turbochargers and coffee-grinders, and a honking huge database into which we dumped all of our data, but first the data was broken up so that it fit well into various tables and columns and it was fast and well thought out and It Was Good . Well, maybe Not So Good , because pretty early on we realized that it would be useful if we had some record of changes - when a user calls up and sez. " I never turned on Call Recording ", its useful to be able to say " Why yes sir, you did indeed, it was on the 17th at 3:30pm, and by the way, Yes, we *are* Big Brother thank you very much ".  So we added _changes tables that logged the updates to user's addresses, and profiles, and parameters, and, well, pret...

GeoSpatial BigData Analytics (ok - "Buzzword Bingo")

Image
SpaceCurve just scored $2.7M in new funding. And, why do you care? Well, you may not, but they are doing something interesting vis-a-vis the merging of BigData and Spatial Data .  And yes, that is a deliberately vague description, but thats pretty much how things work in the BigData world nowadays, don't you think? Seriously though, think of it this way -   Spatial data is not inherently easy to deal with in traditional terms (think How do I get the distance between two points?  How do I find all the data-elements in this cube? etc. ).  The big SQL players all have Spatial extensions to their DBs , as well as some of the NoSQL players ( CouchDB , MongoDB , and Cassandra ).  Now, add in "Location Services" - where you need to keep track of not just data scattered in space, but time too.  e.g., Your Foursquare check-ins last week, your GPS readings last week, etc. And *then*, add in connections between the entities generating this data....

SQL vs NoSQL - Scooters vs Motorcycles

Image
There seems to be a pretty potent Meme floating around right now viz. The Future Of NoSQL - To Be Borged .  The general argument seems to be some combination of  Databases are complex beasts We've spent eons simplifying access to these data stores (SQL) BigData revolves around an even more complex set of problems Everybody is building custom solutions to these problems Regression to the mean will result in them all becoming somewhat similar SQL stores will just add these common features All your bases will belong to them  As you might imagine, I quite disagree with this. Oh yes, I fully expect SQL DBs to incorporate some NoSQL features over time, but it really isn't the same as when Object Oriented Databases went to the Great Dustbin In The Sky.  As I've mentioned before, NoSQL stores are more solution-oriented - there is a huge and thriving ecosystem, and the vast majority of them are designed around specific types of solutions.  Can they be ...

CouchDB - blessings and curses

Image
John Wood (of Signal fame) has a post up about Signal's experience moving to, and away from, CouchDB.  Its an interesting real-world example of what I've described in " NoSQL - What you'll find (for sure!) ".  To recap, when first getting into NoSQL, you are sure to find that You didn't understand your own problem-space as well as you thought you did.  You didn't understand the package that you are using as well as you think you do. It will not  scale the way you thought it would.  Oh, it'll scale all right, just not the way you thought it would.  Your object/document/JSON/whatever model really doesn't map exactly the way you expected it to. In John's case, they found that HTTP is a Very Slow Database Protocol MVCC Overhead (is bad) Large Databases Beat Up the Hard Disk CouchDB is not a Distributed Database (by default) map/reduce takes a while to get used to Views take forever to build Views are gigantic on disk Replicati...

MongoDB - the MySQL of the NoSQL movement?

Image
Its a pretty serious point since, as far as I can tell, it is rapidly becoming the default DB that everybody uses when they have to do something "cloud-y". Wheres the problem with that? Well, lets think back to SQL databases - by and large MySQL and PosgreSQL really do tend to be "one size fits all".  If you're already thinking relational, then it really doesnt matter which of these two you use (no flames please!)  Yes there are differences in how they replicate, shard, do stored procedures, etc., but they're really the same damn thing (as is Oracle, for what its worth). NoSQL databases? Ah well, thats different.  They come in all sorts of shapes and sizes ( document , column-oriented , key-value , etc.) and each of these has a different sweet-spot.  I tend to think of them as solution-oriented data stores, i.e., each DB is tuned towards a specific solution domain.  Oh yes, they are certainly moving towards each other - Riak has secondary index...