“I went through the usual stages: imp, rascal, scalawag, whippersnapper. And, of course, after that it’s just a small step to full-blown sociopath.”
― George Carlin, Brain Droppings
Hi, Search Curmudgeon here – at least that’s what I call myself. Think of me as that old guy in the back of the conference room that mutters things like “that’ll never work” or “you think you invented that?” during architecture reviews or code walk-throughs. If you listen carefully to what the old geezer is muttering about, you get the gist of what I will be conveying in this blog. Assume that my coding skills did not stop with COBOL like many of my contemporaries, but that said, some of this new stuff that you kids are inventing is a) awesome and b) scary to old farts like myself. I can’t drink with my buddies and then come back home and code till 2-3 AM like I used to do (or was that in a dream?) That scene in The Social Network where Zuckerberg is having a live boozed-up hacking contest to see who gets a job at Facebook scares the hell out of me.
Anyway, time marches on, but what we old guys know is that some of the hot things that you whiz kids are doing now were done before, i.e., “back in the day”. But we didn’t have the advantage of Moore’s Law. So its not that you kids are any smarter than we were, you just have much better hardware to play with. Reminds me of how Microsoft used to talk about how each new OS was much faster and powerful than the previous one – yeah right like they had something to do with that – mostly their new OS gobbled up most of the gains in memory, CPU and disk that the silicon packing guys gave us (I remember when 5 MB hard drives were a BIG deal – no more suitcases full of floppies!) In other words Bill, bullshit – the real heroes of the PC revolution worked at Intel for Mr. Gordon Moore himself.
So this first installment will be a random collection of discussions – eh, OK – rants, styled after a book by one of my absolute favorite comedians – George Carlin – “Brain Droppings” (may you rest in peace George – and believe me, after the ride that you had, you need it!) So if I may be so bold to try to step into Sir George’s shoes imagine that this blog were being written by Carlin or his disciple Lewis Black – i.e. if those guys were software dudes like yours truly. I’ll try to make this PG 13 unlike the stuff that George and Lewis are famous for but you have to admit Mr. Black is one ‘effin funny dude! I’m certainly not striving for a ‘G’ rating – then I couldn’t have any fun. Another way to think of it is that I am writing jokes for a Software Geek Celebrity Roast – this is safer to do under a pseudonym – trust me.
And if we are lucky, I may even be able to slip in some pearls of wisdom that you young guys need to hear from the elder generation. You are not as smart as you think you are kiddies – dual quad core, 3 GHz CPUs and 512 GB of RAM can hide lots of coding sins. (When I was your age sonny, we had to walk three miles through snow to submit our box of punch cards … talk about crappy BAUD rates!) And we were (are?) pretty damn smart too. Its just that we can’t remember stuff now, have to take a few Tums and a few Tylenols once in awhile especially after too many beers and need visual aids to read the monitor (cell phone text fonts? No freakin’ way!) I just keep forgetting to bring my reading glasses to restaurants, but now with that killer iPhone app (I am sure that you Android losers have one of these too) that turns my camera into a magnifying glass with an LED spotlight, problem solved! Yeah, I’m even old enough to remember when we didn’t have smart phones – how the hell did we ever get anything done?
Rant 1: Notes from Palookaville: I coulda been a Contender
“… you said, ‘Kid, this ain’t your night. We’re going for the price on Wilson.’ You remember that? ‘This ain’t your night!’ My night! I coulda taken Wilson apart! So what happens? He gets the title shot outdoors on the ballpark and what do I get? A one-way ticket to Palookaville! You was my brother, Charley, you shoulda looked out for me a little bit. You shoulda taken care of me just a little bit so I wouldn’t have to take them dives for the short-end money. … You don’t understand. I coulda had class. I coulda been a contender. I coulda been somebody. Instead of a bum, which is what I am. Let’s face it. It was you, Charley.”
– Marlon Brando as Terry Malloy in On the Waterfront
It was the Summer of 1976 – I was starting grad school. I had come to campus that summer to find a place to live and was killing time before classes started and getting some new friends like Gavin Perry who was a year ahead of me. Gavin was a techie like me (my Dad was an EE who would do ‘father-son’ projects with me where we – uh, he – would build cool things with microchips – so I wasn’t Steve Wozniak but I had a little bit of The Woz in my DNA at that point). So one day, Gavin said to me “Why don’t we forget grad school and go build a computer” (the first Motorola CPU had come out a few months before that). I said without thinking too much “Nahhh!” – maybe because I didn’t know what Gavin’s Electrical Engineering skills were and knew that mine were totally inadequate for the task. Anyway, I remember thinking that this was a very tempting idea but that I really wanted to do Science, so I said no. At that very moment while this conversation was happening, I’m sure that Jobs and Wozniak were in their garage working on their first prototype for the Apple I. I can just imagine that their conversation went something like this:
Jobs: “Steve, is it working yet?”
Woz: “Grunt, leave me alone Steve.”
Jobs: “Steve, how much do you think we can sell this thing for?”
Woz: “Dunno, Steve – let me figure out why this signal is not getting to the disk drive first.”
Jobs: “Steve, do we have enough money left for a Pizza? I’m starving.”
But as you all know by now, Jobs and Wozniak created a machine that launched Apple Computer – now one of the richest companies in the world with a revenue stream that Oil Companies can respect. Malcolm Gladwell’s book “Outliers” makes the claim that being at the right place at the right time is one of the keys to success; i.e. one of the reasons that Bill Gates and Steve Jobs are household names right now is that they were born at just the right time (1955) so that they could be at the right age with enough experience in 1975-1976 when the first CPU chips were coming out. These guys are exactly my age (actually a year YOUNGER than me goddammit, so yup – I’m a legitimate curmudgeon – get junk mail from AARP), and if I had not turned Gavin down and we actually were able to beat Steve and Woz to the punch … (that part seems very implausible because Woz is a technical genius – Tesla-like really – to have any chance, it would have had to have been Gavin, me and my Dad but that wasn’t going to happen) So, yeah, I coulda been somebody! I coulda been a contenda! What I need now is a Wayne’s World alternate mega-happy ending (“diddly-oop diddly-oop diddly-oop”). Now you know why I’m such a curmudgeon.
Rant 2: “Big Data”
As Gilda Radner’s character Emily Littela would say on SNL: “And what’s all this I hear about ‘Big Data’?” What does the phrase “big data” actually tell you? That its “Big” and that its “Data”. Big Whoop! I’m not imagining that we are talking about a 50 foot tall Brent Spiner here so help me out on this one. I get it though – we generate tremendous amounts of data every day, much of it is junk (like this blog post?) and we need to “mine” it to get out the “nuggets” (good thing that we are not talking about Chicken Nuggets here because that would be gross). Big data isn’t really all that interesting, its just running the same boring little subroutine over and over and over again on huge amounts of information. We want it to get done in a reasonable amount of time (so we can go home at a decent hour mostly). Why “big” though? That seems underwhelming. Why not “unbefreakinlevably huge data”? Now, that has some pizzazz! And lets face it – search is still important here (see below)! Without good search, all of this follderall around HDFS, Hive, Pig, Oozie, YARN (the buzzwords just keep on coming don’t they?) is just a very, very, very expensive exercise in “write-only” memory! I mean if you can’t find stuff in your “data lake” it becomes like one of those real deep-water lakes where mythical creatures abound. You don’t want your data to become mythical do you? If it lies at the bottom of a very deep data lake and you can’t find it again, then you may as well have sent it to that great bit-bucket in the sky (I mean heaven or course, not “The Cloud” – ah but that’s a subject for a future rant!)
The MapReduce paradigm – invented by Google and popularized by Sir Douglas Cutting, KBE (I mean, too bad he is not a Brit, he would certainly have been knighted by now just for Lucene alone – but Nutch and Hadoop too? Most Excellent Resume dude!) – uh, basically solves the “big data problem” in a very elegant way by breaking it down into small chunks that can be processed in parallel and then buying a gazillion dirt-cheap PCs to run it on (or as I should say “commodity hardware” whoop-de-freakin’ doo). The bad guys are reputedly doing this under our noses by hacking into our home PCs and stealing our CPU cycles to crunch RSA keys and then break into government computers. So if your password is something stupid like uh ‘password’ or even the slightly more secure ‘secretpassword’ or you haven’t renewed you Anti-Virus subscription, you may be inadvertently (ignorantly?) helping the criminals to rob our National Nuggets – shame on you! Those bad guys are pretty damn smart, which is not mutually exclusive to “evil” unfortunately. (And they are also government funded! Foreign though, we hope, but cyber espionage is probably one of those Black Ops things that Robert Ludlum and John LeCarre like to write about …).
Rant 3: Search vs. Information Access
The term “Search” became blasé after Google had apparently commoditized it, so enterprise search vendors started referring to what they do as “Information Retrieval” or “Information Access” rather than “Search”. OK, so what? If you have to dress up what you call what you do so people don’t think it’s trivial, you have a lot more to worry about than your choice of buzz words. Yeah, I get it, “Information Access” encompasses more than search – its also browse and security and digital rights management and God knows what else. Or we don’t just do Search, we do Semantic Search – a righteous if also facetious distinction because we ALL do semantic search whether we know it or not because we are not building freakin’ RDBMS applications here (nothing wrong with that though, ask Larry Ellison) – users generally type in words and phrases, not SKUs. Some of us just do it better than others, thats all. Also, Information Access is vague, yes it means that we want to get “access” to our “information” but how? Is our information in prison? (Yes, in some cases, I guess that it is! Some of it is locked up to protect the innocent and some of it should never be allowed back into society again!)
Or what about the term “Information Architecture”? Yeah it sounds good but what does it mean? I guess that bad IA would be like designing a building that’s like a maze inside so that newbies get lost for like hours going to and from the bathroom (I like to use the word ‘like’ instead of commas – it makes me feel connected to the younger generation – but my son freaks out when I try to wear my baseball cap backwards). So if you like bill yourself as an “Information Architect”, no offense but that in my experience is like a smoke screen for “taxonomist” (no not taxidermist) or the even more hoity toity job description “ontologist” – which gives your customers sticker shock if they know what building a taxonomy or an ontology really means in terms of TCO. “Back off man, I’m an Ontologist! – no not an Oncologist for God’s sake!”
I don’t mean to say that we shouldn’t want to structure our information in such a way as to enhance our ability to access it, of course we should. My rant here is about buzz words – yes an easy target I know, but too often the hype gets in the way of getting the job done. Bullshit doesn’t make it to production in other words. Computers are hardass things. You can’t dress up crappy code with slick marketing. Come to think of it, “Big Data” is one of those “buzz word” things – words that generate a lot of “buzz” – which with the double-z has kind of a “sizzle” to it so it sounds “hot”. But like 4th of July sparklers, the sizzle doesn’t often last. I’m not saying that Hadoop or MapReduce are not way cool – they absolutely are. But there is more to understanding “big data” than a choice of platform or algorithm. And for those of you that speak only in buzz words because you want to appear to be a) hip and b) knowledgable – watch out for the engineers in your audience. Using too many buzz words in a sentence without explaining any of them in any detail will trip their BS meters big time.
Rant 4: NoSQL (No sequel?)
This buzz word has really gotten out of hand. It has become a meaningless catchall phrase for “database” – which means a “base” for “data” I guess – that doesn’t use Structured Query Language (SQL), as if SQL is a bad thing (it’s not). There are many types of NoSQL databases vying for our attention. The important distinction is that some of these scale much better than SQL DB’s do – also known as Relational Database Management Systems or RDBMS. RDBMSes struggle with tables with a millions of rows in them. NoSQL datastores like Couchbase, Mongo or Cassandra can have billions or 100’s of billions of items without breaking a sweat.
OK, that’s the benefit, but whats the cost of removing SQL? That’s the thing that the hype-cyclists miss. When they tell you how much better their new thing is by focusing on what things it does much better than the old one, and gloss over (or don’t mention) what it doesn’t do as well (like join things) – keep your hand on your wallet. So if you just need a base for your data where you can stuff humongous quantities of junk and get it back with a simple key, then NoSQL is the way to go. If you need to analyze stuff and mine relationships between different types of things, then maybe not. However, Graph databases which are a type of NoSQL database designed specifically to model relationships that can be organized as a ‘graph’ – a thingie with “nodes” and “edges” are way freakin’ cool (What are those you ask? Sorry, I could explain it but then you’d want to shoot me.) – and for some use cases, can do a serious case of whoop ass on RDBMS. And for the most part (Neo4J being one exception), NoSQL databases don’t do ACID – yes, I’m thinking what you’re thinking here – “Lucy in the Sky with Diamonds” right? Hmmm … I wonder if that explains something about the Lotus Notes APIs?
So as the old saying goes “caveat emptor” – buyer beware – if you don’t understand what you are buying and jump off the cliff with the other lemmings because you want to be one of the cool kids, then its your funeral. Larry’s software and its ilk may be old fashioned SQL DBs, but he will be able to buy new yachts for many years to come. And the role of DBA will likely be staffed by curmudgeons like me – so be nice to them – they can save your ass. We’ve seen our share of techno cliff jumpers – it doesn’t end well.
Curmudgeon sticker courtesy of Clip arts