How to achieve high encryption performance in Java

If, dear reader, you are a Java developer and have any involvement in Java security or encryption, then most likely you used or at least hear about Java Bouncycastle encryption library.

It is excellent library, especially to work with ASN1, digital certificates etc. However, is it the best library to work with AES ?!?!?  To find out, don't stop here, continue with the text below...

A quick intro

General modern encryption, especially in a server environment, boils down to a relatively simple concept. We won't go too deep into technology, but rather from a simple bird's eye view.  So, here it goes...

Main asynchronous encryption techniques such as RSA or EC (most people link to digital certificates; wrongly), are used to safely exchange encrypted synchronous (mostly AES) key that both sides (browser and server) will use for data encryption. EC is a more modern, much faster algorithm slowly replacing RSA, but at the bottom line, the principle is the same... to encrypt shared secret key and safely exchange it between 2 parties (for RSA), and derive AES key for EC.

AES - a synchronous algorithm has several variations for different use, again, boils down to passing data through AES to get encrypted data.

In other words, when AES key is exchanged through TLS, all further encryption is done with AES. So, for the servers to achieve a high throughput and serve as many as possible parallel requests, AES performance is of a crucial importance.  

Where is the problem, then ?!

In the beginning, we as many used Bouncycastle as a top framework for all related to security (and we still use it for most of it), however, there is a catch....

Do you remember SSE and MMX that arrived in the first Pentium processors?  Modern processors bring AES-NI (Advanced Encryption Security Instruction Set) to the table. A hardware implemented and high performance encoders which allows up to 10x better encryption performance without breaking a sweat and achieving ~2 million transaction per second per single core.

Well, the problem lays in the fact that Bouncy castle use pure Java AES implementation without using AES-NI. If you were hoped that Java VM will be able to recognize such high demanding mathematical functions and be able to translate Java Code into native AES-NI, you would be very wrong. The answer is no.

Even, for our product use case, Bouncycastle AES performance is quite good and achieves around 200.000 transactions per second per core, we still wanted to reduce JVM stress and make Green Screens server even more performant.

The solution...

After deep investigation, we discovered that Java JVM actually has a support for AES-NI, especially for AES encryption. The only question is how to activate it.

Well, believe it or not, the whole solution boils down to a single parameter. With that single parameter, AES encryption performance will increase about 10x, from 200k to 2M transaction per second per core in our measurments.

Here is the answer... instead of using "BC" provider, simply use "SunJCE".

Cipher.getInstance("AES/CTR/NoPadding", "BC");

// switch to enable AES-NI hardware acceleration

Cipher.getInstance("AES/CTR/NoPadding", "SunJCE");

Here is the full source code, part of our Quark engine.

Sugar for the end...

Along AES, Java use some native code for several other important elements, such as commonly used DEFLATE/ZLIB/GZIP.  GZIP is very often used in server to web resource compressed delivery, mostly for JavaScript, CSS, and web pages.

GZIP has a property called "compression level" which goes from 0-9. The higher the number, the better compression with more processing power used. However, the problem with Java GZIPOutputStream is that it is defaulted to level 6, and it is not possible to change it. Sure, for general use, especially for images and other binary data, level 6 is optimal between compression size and performance, however, for textual data, level 3 is more than enough, with performance increased by 4x compared to level 6.

In dynamic encryption when caching is not possible, and all the data is text related, such as JSON, we might need a better solution. This is exactly what we did in our Quark Engine to significantly increase compression performance from 50.000 to 200.000 transactions per seconds per core (measured for 14KB of JSON data).

We extended standard Java GZIPOutputStream allowing to set custom compression level (defaulted to Level 3). Here is the full source code. It's quite simple, isn't it?!

Conclusion

We might wrap it up with this.... If everyone use something and swear about the best library, the best performance, everyone use it... Just don't follow the crowd. In many cases, a simple and often not so obvious solution is the best one.