The Twilight of the Vengine Gods (Die Göttervenginedämmerung) or Die Hard with A Vengines!!!

The term ‘Vengines’** is short for “Vendor Engines” – like HP Autonomy, Google Search Appliance, MS Fast and Oracle Endeca who as we speak are fading from the scene. Not that this is news to anyone who works in this field. The Curmudgeon doesn’t dispense news, he just tells you what information, new or old sucks or what pisses him off and then rants about it. I should also say that there is absolutely no fact checking in this post. For some things that I am absolutely sure about, I don’t have to – I mean if you can’t trust the Curmudgeon, who can you trust? Seriously. For other things that may or may not be true, I decided to put them on the Internet so that they would become so. Also, I’m just lazy.

“Back in the day” as us geezers are wont to say – about 1995 or so, Linus Torvalds introduced Linux and the concept of Open Source to the world. Torvalds published a web page in which he stated that it should be pronounced “Lee-nooxe” like his name not “Lih-nix” and we’ve been ignoring him on that point ever since. At the time, I was working in a small Computer Graphics company and my boss Jim Spatz predicted that Linux would ultimately rule the OS world, replacing Windows as the dominant server OS for Intel machines. Jim was not a developer so I disagreed with him, thinking that hundreds or thousands of programmers working independently can never get anything right. (I remembered a sign on an outhouse once that said “Eat poo because 40 trillion flies can’t be wrong” – although it didn’t say “poo” – use your imagination here – I’m just trying to be PC like Mike Rowe used to do on Discovery Channel’s “Dirty Jobs”). Of course, as it turned out, the Curmudgeon was wrong!! I know, hard to believe, right? Obviously one of the very few times that this has ever happened, but in my defense I wasn’t a curmudgeon then. Also, I couldn’t convert erroneous assertions to facts by uploading them to a web site as I can now. Even the Search Curmudgeon sucks at prognostication. I’m more into nostalgia now or in this case bad dreams as I think about the times that the Vendor Gods Ruled the Planet. Eric Raymond published a great book – “The Cathedral and the Bazaar” that explains why the open-source paradigm works so well. Still a great read even though it is 15 years old now. Get it on Amazon like everything else.

So just like OSes for commodity Intel hardware, Open Source is killing the search vendor engines as a buddy of mine predicted in a previous blog on this site. He told me that he fantasizes that The Larry’s read his blog, decided that he was right and made big business decisions because of it. (So if you are reading this blog Larry E. – we’ll take a ride on your America’s Cup yacht as a thank you – that would be cool – just post a comment to this blog and we’ll set up a date. Thanks in advance.) We know that Bill bought Fast just to improve Sharepoint search and then imprison it into his .NET castle, but one of the first things that Microsoft did was to deprecate support for Fast ESP on Linux – who could have seen that one coming?

Autonomy is an interesting story. I heard a story once that when Lynch was pitching it to HP he pointed at a closet and told them that there were 50 developers in there – or maybe he just didn’t spend much time hanging with the worker bees. Anyway, HP was ultimately mighty disillusioned with the purchase and there are rumors that they want to dump it but nobody wants to buy it. Before that, Autonomy bought Verity of course to get rid of them as a competitor and to get their customers. Then they created this total kludge called K2 V7 which had a Verity K2 API and an IDOL core that they never really got working and it was rumored that this was never their intention. They just wanted to upsell K2 customers to IDOL – what, by pissing them off? The joke was on them because most of Verity’s customers were Ultraseek customers who couldn’t afford IDOL anyway. Most or all of these have almost certainly transitioned to Open Source by now. Ultraseek wasn’t bad as search engines go (it was originally Inktomi before Verity bought it) – a little bit ahead of its time actually, but was totally buried within Autonomy and certainly within HP – but that’s the way it crumbles vendor-wise. Verity used to give it for free up to a certain document count.

Anyway, compared to Solr, IDOL (I forget what that stands for but I really don’t give a poo) should really be spelled IDLE because Solr blows its doors off for speed and such. Its also kinda black-boxy like GSA. That’s why it is called Autonomy – you just plug in your data and it works. But not really, as customers are discovering. That, as some would say, was another hoax perpetrated by Mike Lynch. IDLE also has a horrendous configuration layer that always caused problems for us. Good riddance I say. I started off with Verity which was a fine search engine and Autonomy at that time was our enemy until they bought Verity and we started to work with them – but they sucked way more than Verity did.

Another vendor is Fast Search and Transfer, which pitched itself as highly scalable and fast but is really neither. I remember a project at a pharmaceutical in which we were trying to index about 1.5 TB of eRoom data (Fast was claiming petabyte scale back then). The project never really got done because jobs would fail in the middle of the night and you had to try to figure out where it crashed so you could start from there. One of those very tedious, months long fire fights where the Fast indexes were erroring out all the time and you had to write your own code to error check, collate and reindex. Fast spend a large amount of effort on error correction and recovery – much more than SolrCloud for example. It needed to because it’s clusters were inherently unstable. Another thing is speed. I was once on a customer call where we were pitching a Fast – Fusion/Solr conversion or ‘rip-and-replace’ and I quipped that of the two engines in question, one is named Fast but the other one actually is fast. Everyone laughed.

When Microsoft bought them, everybody knew that it was to replace native Sharepoint search (BING is an entirely separate project). I had the dubious pleasure of working on Fast Search for Sharepoint 2010 – another kludge that has been replaced I believe with a more .NET based version. Fast has now disappeared into the MS fortress, never to be used for anything but Sharepoint search, all as originally intended by the Gods of Redmond. Fast ESP is less so and continues to be a rip-and-replace target although I think that its sunset date has passed.

That brings us to Endeca – which was a company in Cambridge MA that I worked a lot with that was geared primarily for eCommerce. Endeca is now of course owned by Larry E’s company and is fast losing ground to Open Source. One reason is that most (all?) of the engineers that were at Endeca left after Oracle acquired them (including the principal Steve Papa) and now nobody in this behemoth organization really understands it. I had a brilliant young engineer that used to work for me who had Endeca chops, joined Oracle to get that on his resume and spent several miserable months being thrown into the deep pool as “The Endeca Guy”. He then left to do other things. We had tried to recruit him to Lucidworks and are interested again (I’ll use the pseudonym Saurav for him) – more on that later. Endeca doesn’t scale well – we used to cringe when customers told us that they wanted to index 1 million documents which Solr eats for a quick breakfast snack. Getting that much into Endeca was a struggle. Its also not fast as the others. My first experience indexing data into Solr that I had previously indexed into Endeca was a revelation –what took several hours in Endeca indexed in about 10 minutes and at first I though that my Solr set up was broken. Many others have experienced the same thing. We are doing a lot of rip & replaces with Endeca now and while Solr is not built primarily as an eCommerce engine as Endeca was – we are putting features into Fusion so that it can do all the things that Endeca does at much better scale and speed and at lower cost than Larry’s product – and with much better support.

Finally to what a buddy of mine called the Google Toaster in his blog, which they are now putting in End of Life. The main problem there is lack of flexibility and programability, but also scale. It is literally a black box except that its not black – it has a user friendly yellow color. I remember being at a Search Summit where Google was paying for our lunch and making us watch a presentation on GSA and they presented some pedestrian number for scalability that we were chuckling over. I am really glad that Larry P.’s company is getting out of the Enterprise Search business and other do-it-yourself usages because I am sick and tired of having potential customers tell us that “We want it to work just like Google” – although many times that’s nonsensical because its Enterprise not Web search and you can’t use things like Page Rank. Of course Google will continue to innovate on their awesome web search engine which is were they make the vast majority of their money anyway.

All four of them over promise, over charge and under deliver, whereas Solr even with Fusion on top does the opposite in every case. As a result, the Vengines are disappearing like the Dinosaurs once did, leaving us to compete mostly with that other distributed search platform that is built on top of Lucene. I won’t mention their name because Lucidworks might ask me to remove it (no free press for them) but you know who I am talking about. Lucidworks by the way has never asked me to change anything by sending an email saying “Could you tone it down Curmudgeon?” or something like that but the joke’s on them because I don’t have an email address – but they could send something to my buddy and ask him to relay it to me. Anyway to avoid redaction by the LW censors, I’ll use code words for now. The name of the company is like the material that is used to hold up my Jockey Shorts (hint, hint).

I heard a story from one of my Solr committer friends that the guys that started Fruit-of-the-Loom Finders, had been in the Solr community but got into a disagreement with Yonik, Hoss, Erickson (?) and others about the direction of the architecture, stormed out like petulant little boys with a “We’ll show you!!” attitude and started the company named like a Tightie Whitie Quest. If so, maybe they have a few curmudgeons of their own. So if someone who works or evangelizes for the RubberBand Finders wants to challenge me to a mud-slinging debate ALA Trump/Hillary or WWF, I think that would be fun. We could trade hyperbolic insults about each others flaws like Trump does – which make for more heated entertainment but don’t really make a point. “Your code absolutely SUCKS!”, “It can’t scale for POO!”, “We’re easier to use”, “We’re less filling” or “GC HELL Raisers!”, “Brain Splitters” and other such geeky stuff. Or we could do some trash talk with ridiculous, trumped up (rimshot please) numbers like “400 Quadrillion documents running on 24 thousand shards at 75 thousand QPS with 10 ms average latency, 500 Billion updates a second, 24/7 for MONTHS without a reboot – in your FACE BungeeSeek!!!” Hey come on – it would be fun. To keep my anonymity maybe I could come with a paper bag over my head like the unknown comic used to do.

Anyway, as we all know, all software has bugs but in our view, theirs has more and we fix ours faster. All fair game for a manno-a-manno or manno-a-womanno style mud wrestling contest. If anyone from the Lucene Dark Side wants to take me up on this, just comment on this blog post and my representatives will get back to you. (Note that we are the Bright Side because Solr is the Sun and its Hot!) It would also give me a chance to bone up on your code so I don’t look totally clueless and as Sun Tzu would say, a chance to study my enemy. Another thing about the bra-strap guys is that they call their input things “Rivers” while we at Lucidworks call them “Pipelines”. I have heard stories that their customers sometimes feel that they are up poo creek without a paddle. But whatever, we are both children of what The Doug has brought forth and will duke it out for market supremacy. I’m betting on Solr (surprised?) And whatever the outcome, we are both pushing each other to get even better which will be good for all of us, so I don’t see a resurgence of the vengines anytime soon (like Night of the Living Dead or something).

Another sign that Open Source search is booming is that everyone is looking to hire Solr engineers. Our strategy is to train them through our Solr Unleashed and Solr Under The Hood training courses, send them out into the world to mature and ripen and then poach them back from our customers to work for our company when they get really good. Working well I hear, but we still need more of them. If you come to Lucidworks, you might even get to work with me, but I would understand if you made a stipulation on accepting a job offer that “I’ll come if you don’t make me work with that crotchety old bastard!” I’m sure that the Stretch Armstrong Boys are also hiring vigorously.

So getting back to the vendor gods. To their credit, they are also embracing the Open Source Revolution in a big way. If Oracle does nothing more than to contribute, maintain and evolve Java, they’ve done plenty! Google of course has contributed many awesome things including Guice, AngularJS and Word2Vec to mention a few. They have also published research papers on their core technologies like BigTable and MapReduce that helped The Doug when he started working on HDFS and Hadoop respectively. So they “get” it. Microsoft is of course a totally different story and always will be. HP isn’t even a software company – really. We’ve learned to interoperate with the MS parallel universe but its always been a royal pain. The thing that floors me is how they managed to rip-off Java when they created C-sharp (D flat?) – and made some improvements to be sure – but kept us from being able to easily interoperate with it. It only works natively with their own poo. Black-box genius.

So to close, it is abundantly clear that the search world is fully embracing the enterprise that Yonik and The Chump have created for us and are running with it. I for one am having a great time and don’t miss working with the vengines at all. Thanks Yonik. Thanks Chumpman! Thanks Solr Community – especially the committers thou Sultans of Solr, Lords of Lucene – super job! And I’ll see you Rubbermaid Retrievalware knuckleheads on the debate floor/pit when you’re ready to take the gloves off (just not on a Football night please). Bring it on!

** I stole the term ‘vengines’ from my close friend Ted Sullivan’s Well Tempered Search Application blogs published on Lucidworks almost two years ago. To paraphrase Groucho – if Ted were any closer, he’d be in back of me. Check out the hint at the bottom of this post.