## PR3 final assignment

Programming topics can be found here.
bikker
Posts: 67
Joined: Fri Sep 27, 2013 12:40 pm

### PR3 final assignment

Hi guys,

The description of the final assignment for PR3 is now up on N@TSchool. Have fun!

- Jacco.

131143
Posts: 2
Joined: Fri Feb 07, 2014 4:02 pm

### Re: PR3 final assignment

Just to make sure, pre-calculating still counts as a High Level Optimization right?

bikker
Posts: 67
Joined: Fri Sep 27, 2013 12:40 pm

### Re: PR3 final assignment

No, in general.

A high-level optimization is one that reduces algorithmic complexity. So you start with a sorting algorithm that is O(N^2), and you replace it with one that is O(N log N); that is a high-level optimization.

Sometimes however, precalculations achieve this. And sometimes, one O(N) algo is cheaper than another, e.g. when you replace sprite scaling (which has a constant cost per pixel, thus O(N)) with precalculated sprites (also O(N), but now the cost per pixel is much lower). This would count as a high-level optimization.

So, you need to replace an algorithm to claim it as a high-level optimization. If this involves precalculations, you're fine. But often, the precalculations replace smaller scale operations, and do not effect the algorithm itself.

Does that help? Please say if it's still unclear; this is has proven to be a tricky concept in previous years.

- Jacco.

130048
Posts: 4
Joined: Wed Feb 19, 2014 6:24 pm

### Re: PR3 final assignment

So when you have an algorithm that is O(n*k) where k = 100, reducing k to 50 would count as a high-level optimization?
Eric Polman

bikker
Posts: 67
Joined: Fri Sep 27, 2013 12:40 pm

### Re: PR3 final assignment

Yes, unless 'k' denotes the cost of doing one operation. In 'big O' notation, you normally leave out the constants, so any algorithm that requires N iterations for N items in the input data set is O(N). Obviously, if one operation is a http lookup, the cost is going to be different from a situation where one operation is a dot product on data in L1 cache, but in terms of complexity, these algorithms would have the same complexity (they both scale linearly).

Strictly, an optimization is only high-level if you reduce time complexity, so O(N) becomes O(logN), or O(N^3) becomes O(N^2). In your example, O(N*k) should become O(logN * k) or O(N * log k). However, if reducing the constant factor is caused by a change in algorithm, I still consider it a high-level optimization. So replacing a sprite scale function by displaying pre-scaled sprites would be considered high-level.

More on this in tomorrow's optional work lecture at 13.00.

- Jacco.

122828
Posts: 11
Joined: Tue Feb 11, 2014 11:24 pm

### Re: PR3 final assignment

Has anyone figured out a way to improve the Sprite::Draw function yet? And the background drawing? I can't seem to find any improvements in those (Without SIMD and multithreading)

bikker
Posts: 67
Joined: Fri Sep 27, 2013 12:40 pm

### Re: PR3 final assignment

Here's the code I promised, with the SIMD / masking sticky balls:

Code: Select all

``````// Template, major revision 5
// IGAD/NHTV - Jacco Bikker - 2006-2013

#include "string.h"
#include "surface.h"
#include "stdlib.h"
#include "template.h"
#include "game.h"

using namespace Tmpl8;

static union { __m128 bx4; float bx; };
static union { __m128 by4; float by; };
static union { __m128 vx4; float vx; };
static union { __m128 vy4; float vy; };
Sprite* ball = new Sprite( new Surface( "assets/ball.png" ), 1 );

void Game::Init()
{
for ( int i = 0; i < 64; i++ )
{
bx[i] = Rand( SCRWIDTH - 100 ) +  50;
by[i] = Rand( SCRHEIGHT - 100 ) +  50;
vx[i] = Rand( 1 ) - 0.5f;
vy[i] = Rand( 1 ) - 0.5f;
float rl = 1.0f / sqrtf( vx[i] * vx[i] + vy[i] * vy[i] );
vx[i] *= rl * 0.1f;
vy[i] *= rl * 0.1f;
}
}

void Game::Tick( float a_DT )
{
m_Screen->Clear( 0 );
for ( int i = 0; i < 64; i++ )
{
ball->Draw( bx[i], by[i], m_Screen );
bx[i] += vx[i];
by[i] += vy[i];
if (bx[i] < 0) bx[i] = 0, vx[i] = -vx[i];
if (bx[i] > (SCRWIDTH - 50)) bx[i] = SCRWIDTH - 50, vx[i] = -vx[i];
if (by[i] < 0) by[i] = 0, vy[i] = -vy[i];
if (by[i] > (SCRHEIGHT - 50)) by[i] = SCRHEIGHT - 50, vy[i] = -vy[i];
}
#if 1
for ( int i = 0; i < 64; i++ )
{
__m128 i4 = _mm_set_ps1( float( i ) );
for ( int j = 0; j < 16; j++ ) // if (j != i)
{
// i   i     i     i
// j*4 j*4+1 j*4+2 j*4+3
__m128 j4 = _mm_cvtepi32_ps( _mm_set_epi32( j * 4 + 3, j * 4 + 2, j * 4 + 1, j * 4 + 0 ) );
__m128 mask1 = _mm_cmpneq_ps( i4, j4 );
// float dx = bx[i] - bx[j];
// float dy = by[i] - by[j];
__m128 dx4 = _mm_sub_ps( _mm_set_ps1( bx[i] ), bx4[j] );
__m128 dy4 = _mm_sub_ps( _mm_set_ps1( by[i] ), by4[j] );
// float d = sqrtf( dx * dx + dy * dy );
__m128 d4 = _mm_sqrt_ps( _mm_add_ps( _mm_mul_ps( dx4, dx4 ), _mm_mul_ps( dy4, dy4 ) ) );
__m128 mask2 = _mm_cmplt_ps( d4, _mm_set_ps1( 50.0f ) );

// if (d < 50)

// float rd = 1.0f / d;
__m128 rd4 = _mm_rcp_ps( d4 );
// float ndx = dx * rd;
union { __m128 ndx4; float ndx; };
ndx4 = _mm_mul_ps( dx4, rd4 );
// float ndy = dy * rd;
union { __m128 ndy4; float ndy; };
ndy4 = _mm_mul_ps( dy4, rd4 );

ndx4 = _mm_and_ps( ndx4, finalmask );
ndy4 = _mm_and_ps( ndy4, finalmask );

// bx[i] += ndx, by[i] += ndy;
// bx[j] -= ndx, by[j] -= ndy;
for ( int q = 0; q < 4; q++ )
{
bx[i] += ndx[q];
by[i] += ndy[q];
}
bx4[j] = _mm_sub_ps( bx4[j], ndx4 );
by4[j] = _mm_sub_ps( by4[j], ndy4 );
}
}
#else
for ( int i = 0; i < 64; i++ )
{
for ( int j = 0; j < 64; j++ ) if (j != i)
{
float dx = bx[i] - bx[j];
float dy = by[i] - by[j];
float d = sqrtf( dx * dx + dy * dy );
if (d < 50)
{
float rd = 1.0f / d;
float ndx = dx * rd;
float ndy = dy * rd;
bx[i] += ndx;
by[i] += ndy;
bx[j] -= ndx;
by[j] -= ndy;
}
}
}
#endif
}``````

130802
Posts: 10
Joined: Tue Feb 18, 2014 5:41 pm
Location: Breda

### Re: PR3 final assignment

122828 wrote:Has anyone figured out a way to improve the Sprite::Draw function yet? And the background drawing? I can't seem to find any improvements in those (Without SIMD and multithreading)
Well for starters, in the Sprite::Draw function is an if condition in a loop that doesn't has to be in that loop.

For the background drawing, have you ever heard of the function called memcpy? It's the fastest way of copying data (that I know of). You can only use it with surfaces that are the same size. Luckily the background and the screen buffer are EDIT: It might also be helpful to look at how sequential the data is in a sprite.
- Max Oomen 2GA-2PR (Avatar was made by Anthea van Leeuwen)

122828
Posts: 11
Joined: Tue Feb 11, 2014 11:24 pm

### Re: PR3 final assignment

Thanks 130802 for your answer. For the background I already use 1 line of code that does memcpy. Still it takes about 2.5 million ticks (with timerRDTSC) which is quite a lot, isn't it? Maybe I'm just wrong and it's supposed to be taking that amount of ticks. Everything is relatively fast ATM sitting on 250 FPS with 750 total tanks in the field. The problem now is, when tanks die they produce this smoke particle. When all of my tanks have died (150) it takes 100 FPS from the game to produce the smoke. That stuff is killing me Anyone found something nice for that yet? 122828
Posts: 11
Joined: Tue Feb 11, 2014 11:24 pm

### Re: PR3 final assignment

Or am I the only one experiencing this issue? ### Who is online

Users browsing this forum: No registered users and 3 guests