Skip to Main Content

Java APIs

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Getting full SSE performance from Java

843810Mar 15 2007 — edited Jun 25 2008
Like most C++ compilers, Java is not able to take full advantage of the SSE capabilities on the Intel/AMD CPUs. Other CPUs (PowerPC, Cell) have similar features. Future CPUs are likely to take advantage of these special instructions even more. The current difference in floating point performance between using these instructions and not using them is 3x-4x! The difference will probably increase with future CPUs, such as AMD's Fusion.

The main reason compilers/JVMs can't create efficient SSE code is that SSE requires 16 byte alignment yet neither Java or C++ have facilities to tell the compiler when data is guaranteed to have 16 byte alignment.

It would seem that an easy way around this limitation is to create a simple class, similar to F32vec4 used in C++. This class is essentially just a wrapper for the aligned 4 floating point values that SSE instructions like to operate on. The class will have to be part of the core language to ensure the alignment and so the compiler knows it can safely take advantage of the alignment.

So, what are the chances of getting these special classes in Java to take advantage of these new chip features?

Appendix: Here are the main elements of the F32vec4 class:
 
class F32vec4
{
protected:
   	 align(16)  float  vec[4];  // this is just 4 floating point values aligned
public:

	/* Constructors: __m128, 4 floats, 1 float */
	F32vec4() {}

	/* initialize 4 SP FP with __m128 data type */
	F32vec4(__m128 m)					{ vec = m;}

	/* initialize 4 SP FPs with 4 floats */
	F32vec4(float f3, float f2, float f1, float f0)		{ vec= _mm_set_ps(f3,f2,f1,f0); }

	/* Conversion functions */
	operator  __m128() const	{ return vec; }		/* Convert to __m128 */

 	/* Logical Operators */
	friend F32vec4 operator &(const F32vec4 &a, const F32vec4 &b) { return _mm_and_ps(a,b); }
	friend F32vec4 operator |(const F32vec4 &a, const F32vec4 &b) { return _mm_or_ps(a,b); }
	friend F32vec4 operator ^(const F32vec4 &a, const F32vec4 &b) { return _mm_xor_ps(a,b); }

	/* Arithmetic Operators */
	friend F32vec4 operator +(const F32vec4 &a, const F32vec4 &b) { return _mm_add_ps(a,b); }
	friend F32vec4 operator -(const F32vec4 &a, const F32vec4 &b) { return _mm_sub_ps(a,b); }
	friend F32vec4 operator *(const F32vec4 &a, const F32vec4 &b) { return _mm_mul_ps(a,b); }
	friend F32vec4 operator /(const F32vec4 &a, const F32vec4 &b) { return _mm_div_ps(a,b); }

}
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jul 23 2008
Added on Mar 15 2007
8 comments
755 views