90% of the time is spent in 10% of the code, so make that 10% the fastest code it can be.
Load-Hit-Store: is one of those quirky CPU implementation details that can cause significant performance problems in high-level code. It happens when the compiler writes data to an address 'x' and the tries to load the data from 'x' again too song.
This sequence of a memory read operation (LOAD), the assignment of the value to a register (HIT) and the actual writing of the value into a register (LOAD) is usually hidden away in stages of the pipelines, so these operations cause no stalls. However, if the memory location being read was one recently written to by a previous write operation, it can take as many at 40 cycles before the Store operation can complete.
stfs fr3, 0(r3) // Store the float - takes up to 40 cycles
lwz r9, 0(r3) // Load r3 into r9
add r9, r1, r9 // Stall: use r9 before the store operation has finished
There are different ways to generate LHS:
Using member values or references pointers as iterators in tight loops
Example A:for( int i = 0; i < 100; i++ )
{
m_iData++; // As member function it is stored in memory
}
//-----------------------------------------------------------
Example B:
void foo( int & count ) // the variable count is memory bound
{
for( int i = 0; i < 100; i++ )
{
count++; // As member function it is stored in memory
}
}
Solution: use registers that invoke no penalty
Example A:
int iData = m_iData;
for( int i = 0; i < 100; i++ )
{
iData++; // The local variable is stores in a register
}
m_iData = iData;
//-----------------------------------------------------------
Example B:
void foo( int & output )
{
int count = output;
for( int i = 0; i < 100; i++ )
{
count++; // As member function it is stored in memory
}
output = count;
}
Conversion between int and float
Try to avoid int to float conversions like:
float fAngle = (float)i * fAngleDelta;
Solution: It will be better to have int and float duplicated members.
typedef struct ScreenSize
{
int m_iWidth;
int m_iHeight;
float m_fWidth;
float m_fHeight;
// Update both, int and float
inline void SetHeight( int iWidth)
{
m_iWidth = iWidth;
m_fWidht = static_cast(iWidth);
}
}
C++ constructors that have just one parameter automatically
perform implicit type conversion. If you pass anint when the
constructor expects a float, the compiler will add the
necessary code to convert int to float. This will cause a
Load-Hit-Store issue. It is possible to add the explicit
keyword to the constructor declaration to prevent implicit
conversions. This, forces the code to either use a parameter of
the correct type, or cast the parameter to the correct type.
Read and write in memory too close
int CauseLHS( int *ptrA )
{
int a,b;
int * ptrB = ptrA; // B and A point to the same direction
*ptrA = 5; // Write data to address prtA
b = *ptrB; // Read that data back again
//(won't be available for 40/80 cycles)
a = b + 10;// Stall! The data b isn't available yet
}
Solution: this seems like the sort of thing the compiler should notice and fix by simply keeping content of *ptrA in a register. But it doesn't, so it is obliged to read memory back from a pointer every time yo dereference it, because any other pointer in the function might have aliased and modified the data. The keyword __restrict on a pointer promises the compiler that it has no aliases: nothing else in the function points to that same data. Thus, this keyword helps to avoid LHS.
The compiler knows that if it writes data to a pointer, it doesn't need to read it back into a register later on because nothing else could have written to that address. Without __restrict, the compiler is forced to read data from every pointer every time it is used, because another pointer may have aliased x.
This keyword is a promise you make to the compiler. If you break your promise, you can get incorrect results. If pointer pA and pB are __restrict and pA==pB that will cause mysterious bugs.
int slow( int * a, int * b)
{
*a = 5;
*b = 7;
return *a + *b; // Stall! The compiler doesn't know whether
// a==b, so it has to reload both
// before the add
}
int fast( int *__restrict a, int *__restrict b)
{
*a = 5;
*b = 7; // Restrict promises that a!=b
return *a + *b; // No stall, a & b are in registers
}
There is no way to mark references as __restrict. In this case, copy the parameters to local variables inside your function, then write the final values back out again at the end, as we saw in the previous solutions.
Hey :) Good article but you did a mistake here:
ReplyDeletethe actual writing of the value into a register (LOAD) <<<< STORE?
Bye!
I have never seen this kind of cool blog ever
ReplyDeleteYou ought to indulge in a contest for just one of the best blogs on-line. I will recommend this page! ทางเข้า superslot
ReplyDeleteTitanium Eyeglass Frame - TITNAL ARENA
ReplyDeleteCustom, premium quality glass frames manufactured by titanium automatic watch T-E raw titanium International. T-E is hand crafted with high quality resin and titanium white rocket league stainless steel 출장안마 frames. titanium blade
si524 geoxsrbija,dope ישראל,احذية converse الكويت,longchamp latvia,rei co op canada,gill marine tool,air jordan brasil,xn--allstarespaa-khb,camisas nba mais vendidas kl136
ReplyDeletehm487 pepejeansuae,arkk copenhagen vibram,nobull tenis mujer,newbalancetenisky,autry schoenen heren,moon boots chile,benetton ullgenser dame,pepe jeans london handbags,champion magliette donna he834
ReplyDeleteGreat and I have a swell offer: How To Reno A House split level house exterior remodel
ReplyDelete