You can bet that one of the world’s largest oil and gas companies has a lot of data. Going back almost 150 years. On paper. In databases. Hidden in apps. Squirreled away in email and personal drives. Sitting on shared drives. Spread across the world.
And they’ve figured out how to search it. Successfully.
The company had tried solutions before, but they tended to be niche – focused on one workflow, application, or data store. Those efforts were not successful.
People couldn’t find data, so they started squirreling it away. This created a data proliferation problem – to the tune of 250 million documents – which made finding the most up to date, accurate version difficult.
These niche solutions weren’t successful because they needed a unified way of searching data – across, literally, hundreds of sources using at least 28 different tools. Some data wasn’t indexed and other sources were more than a terabyte large, which is too difficult to index.
And even when the tools returned good results, people didn’t trust the results.
To overcome these issues, they also had to solve technical issues such as large file ingestions, file permissions, server acquisitions.
They handled the large, seismic files with header data followed by non-relevant amplitude data. They implemented file streaming so as not to overload processing server memory. The more than 100 data owners got reduced to four, who quickly provided permission to access data.
They moved processing to the cloud, which reduced on-premise bottlenecks and can scale dynamically. And they provided users with a readable URL to increase trust in search results.
Look for more details post-Activate on how they accomplished all of this with millions and millions and millions of files.