Your company or organization is energized. Finally you’re going to provide an organization enterprise search solution or enhance your online retail customer experience! You get everyone onboard, make some technology decisions, deploy a solution and thud…it doesn’t work. Where did you go wrong? Here is where others have failed. Learn from their misery.
- Bad schema design – Just like your relational database or any other nosql database, your search solution requires some thought on how you represent an entry (aka document). Bungle this and you end up with sub-optimal search performance or an index process that takes too long to complete.
- Inadequate resource planning – That old desktop in the corner of your cube makes a fine footrest. It’s Itanium processor was ahead of its time and totally unappreciated. However great it is at alleviating your back problems or restless leg syndrome, it probably won’t on its own be an adequate host for an Enterprise search solution. Complicated queries or calculations may require more.
- Inadequate scalability testing – You bought some hardware, ingested some data, designed a UI and turned your users loose on it… It sank. Queries started returning in minutes, “connection refused” errors and you start thinking that maybe your life is just not what you want it to be. With some testing, you’d have realized the combination of your schema design, hardware choices and use cases don’t work.
- Returning too much data – Whether you stuck a big bag of everything in your bad schema design as a “just in case” and called it a “grab-bag pattern” or designed dumb queries that return more than you need, remember less is more and The Buffet Rule: You can always go back for more. Deceptively this works fine until you get a few more users putting load on the system.
- Not using compatible index and query-time analyzers – The most common example of this is stemming on the index side but not on the query side or vice-versa which appears to work…until it doesn’t. Meaning you start searching on differently stemmed words and nothing comes back when it should.
- Not planning for and testing relevance – A big fallacy of big data is that you can find an answer without any idea what you’re looking for. The Rolling Stones said you MIGHT get what you need, but you do have to know what you want! This means understanding what your users are looking for and testing that they get it before you roll the whole thing out. See how Salesforce does this with subsequent releases.
- Not planning for HA and DR – High Availability and Disaster Recovery aren’t the hottest buzzwords anymore but good gosh having your service constantly available and planning for a fiber cut or lost data center is like remembering to buy food. You just need to do it. (see Solr Cloud and Cross Data Center Replication)
- Not capturing signals from the start – Or incomplete signal data, e.g. a click event that doesn’t include info about where the clicked on doc occurred in the ranked results. Too often, user interaction with search results is an afterthought and then you have to piece together an incomplete story from query logs.
- Inadequate KPIs – Whether it is performance tuning or relevance, you need to have goals. To know when you’re done tuning you need to measure those goals. Fail on either side of that and you won’t even know if you’re failing.
- Using a technology not proven to reliably scale – We get it. You read that this one search technology is the hot shizzle. You went to their conference and even heard about someone using it on a big project. This is all fine and well until you have a split brain problem. Maybe your company used a client-server solution in the past or something based on RDBMS technology and now you find that your data and search requirements exceed its capabilities.
- Rolling your own – In any major or even many minor search projects you have: a data ingestion function that has to connect to some data sources and transform the data appropriately; a server/search management and monitoring process; access control; a UI; a query process that may need to use more than one datasource or collection. Writing all of these pieces is a lot of work and cost. Next there is the ongoing cost of maintaining all of that. There is consequently no reason to. Use a product written to manage all of that for you.
That’s an accountant’s dozen ways you can blow your next search project that aren’t specific to any particular technology. Which is your favorite?