mlockall For All
Sometimes, several ideas can get into your head at the same time and mingle around. They bounce up against each other, and in some cases: they fuse together into and plant a seed for an interesting new hybrid idea. This is an example of how that happened to me three weeks ago at Lucene Revolution and the end result that came out of it.
On the last day of Lucene Revolution, I was attending Gregg Donovan’s “Living with Garbage” session. While the majority of Gregg’s talk was about garbage collection, there were two independent comments he made during that session that started to mingle around in my brain.
Idea #1: mlockall
One of Gregg’s comments was about how Etsy had seen good results using native code to call mlockall() in their embedded Solr application.
In an ideal world, any performance critical java application would be run on dedicated hardware with swap disabled — because nothing will slow down the responsiveness of a java application then having big sections of the Heap swapped to disk when a Garbage Collection pass happens. But people don’t always get to run their software in an ideal world. This is where the mlockall() C function can come in handy. It instructs the OS to lock all virtual memory the process is using into RAM, so that it can’t be swapped out.
Gregg’s talk wasn’t the first time I’d heard of using mlockall with java applications. A little while back someone had mentioned Apache Cassandra was using mlockall via JNA. At the time I didn’t think much of it — it seemed like an unnecessary complication and poor substitute for disabling swap on your production servers. I probably still wouldn’t have thought much about it after Gregg’s talk, except for how it intermingled in my head with something else Gregg mentioned….
Idea #2: Java Agents
Several times during his talk, Gregg referred to running java applications with monitoring agents like NewRelic and YourKit enabled. Many agents like this work using the Java Instrumentation API that lets you specify a jar containing code to run even before the application’s main method.
The predominant reason for implementing a Java instrumentation agent is to do exactly what it sounds like: instrument the application. Agents can do just about anything, including modifying the byte code of classes in the application. But it occurred to me sitting there in Gregg’s talk….
“Why not make a tiny little agent
that does nothing but call mlockall?”
Idea #1 + Idea #2 = mlockall-agent
So a few days later, when I had some time to kill in an airport I started poking around in the Cassandra code base to see how they decided to deal with mlockall.
I didn’t make much progress in that airport lounge beyond reading up on mlockall and JNA, but it didn’t take much work a few days later to distill the key bits of Cassandra’s code into a new project I’ve named mlockall-agent.jar. It’s a simple little agent that can be specified on the command line when running your java app.
java -Xms1024m -Xmx1024m -javaagent:lib/mlockall-agent.jar -jar yourapp.jar
There are a few important caveats to using mlockall-agent.jar, mostly because there are some important caveats to using mlockall in general (pay attention to your ulimits) and some specific caveats to using it in java (make sure your min heap = your max heap). The details can be found in the README.
I’m interested to know what people think of the this idea. It should hopefully serve as a nice turn key solution to anyone wanting to lock their jaa application into RAM. If you have any feedback or suggestions, let me know in the comments (or via pull requests).
Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.