problems with globulation2

Feature Requests have moved to http://glob2.uservoice.com
This section is read only for reference.
Locked
Ogorter
Worker
Worker
Posts: 6
Joined: Sat Dec 17, 2005 9:38 am

problems with globulation2

Post by Ogorter »

Hi,

I'm new to globulation, I tried it and liked it, but admittedly my main interest is porting it to the 770.

So I did some profiling because currently it doesn't even run smoothly on my 15" powerbook 1.5 Ghz ...

Now I am sitting on a reasonably large patch that helps in different area's:

The main offending problem is drawing, if you use SDL in software the alpha blitting is very slow. So I de-coupled the UI drawing from the rest of the engine, I do have some problems getting the framerate stable if frames need to be dropped. (The internal engine assumes we run at 25 fps, otherwise globs run faster ... however, running without drawing can be done at a speeds of 150+ fps on my laptop.) So I'll need some advise here. Please see somewhat further down.

When using opengl, the main problem becomes the gradient calculation. and then the minimap generation. And then the locate resources from gradient (unsure about how the func is called) ...

So I re-implemented the gradient calculation. I had an idea that would touch each tile only once, turned out the existing algorithm worked exactly the same, except that the code obfuscated everything, and there was no documentation in the code whatsoever. Also it seemed to have been optimized for memory instead of speed.

The re-implementation runs only a little faster, but it did enable me to merge the updateResourceGradient and updateGlobalGradient functions into one.Because the first already calculates some stuff that the second needs, 30% of the updateGlobalGradient was spend in recalculating this (making the first list of x,y coords where resource are located).

When I have some more spare hours, I will move the resource gradients into a container object, such that each gradient can be updated incrementally. And remove the horrible gradient[(y << wDec | x)] which is an example of memory optimization, while what we need is speed optimization gradient[y][x] ... (or gradient[y*256 + x] but lets leave that up to the compiler, shall we).

But as a final remark, the code is not in too good a shape. There is hardly any documentation or comments, which makes it hard to figure out what certain portions of the code do. And some programming practices are making it also difficult to read just the code: who can read this:

Code: Select all

Sint32 ticksToWait=computationAvailableTicks+ticksToDelayInside+missedTicksToWait;
(line 462 in Engine.cpp)

It violates a few rules.
* First it declares a new variable mid function ... makes it hard to read where which data is use.
* There are no spaces between variables and operators, making it hard to read what it actually says. (WhatifIdon'tusespaceshere,canyoustillreadit?)
* the variable names don't are what they say they are, and are obscure

computationAvailableTicks is actually milliseconds, not Ticks, as SDL_GetTicks() gives you a timestamp (milliseconds since init of SDL). And the milliseconds are the time not spend in calculation; they are not available for calculation, as we are going to use them to wait! So ms_left_after_calculation is more accurate and readable.

(It is my personal opinion that camel casing is not useful as it is hard to read.)

Due to these issues I ignored the coding styles there were in the code, because this style actually leads to quality code ...

The new gradient implementataion:

Code: Select all

void Map::updateGlobalGradientSmall(Uint8 *gradient)
{
	// this function does it the same way as previous function, except I wrote it :)
	 
	Uint8 * x_queue = new Uint8[size]; // todo, could be static!
	Uint8 * y_queue = new Uint8[size];
	Uint16 write_pos = 0;
	Uint16 read_pos = 0;
	
	Uint8 x;
	Uint8 y;
	Uint8 x_dec, x_inc, y_dec, y_inc; // x_dec = do_wrap(x - 1), x_inc = do_wrap(x + 1)  ...
	Uint8 v, cur_v; // value, current_value
	Uint16 p; //position
	
	// all squares with interesting items go in the queues to be processed
	// this list should be given to the function, because the caller already knows about this list!
	for (y=0; y < h; y++) {
		for (x=0; x < w; x++) {
			if (gradient[y << wDec | x] >= 3) {
				y_queue[write_pos] = y;
				x_queue[write_pos] = x;
				write_pos++;
			}
		}
	}
	
	// each square in the queue is read and its neighbours are filled, unless already filled with higher number
	// note that every square only gets touched once!
	while (read_pos < write_pos) {
		
		x = x_queue[read_pos];
		y = y_queue[read_pos];
		cur_v = gradient[y << wDec | x] - 1;
		
		read_pos++;
		
		x_dec = (x - 1) & wMask;
		x_inc = (x + 1) & wMask;
		y_dec = (y - 1) & hMask;
		y_inc = (y + 1) & hMask;
		
		// 1
		p = y << wDec | x_dec;
		v = gradient[p];
		if (v < cur_v && v > 0) {
			gradient[p] = cur_v;
			x_queue[write_pos] = x_dec;
			y_queue[write_pos] = y;
			write_pos++;
		}
		// 2
		p = (y << wDec | x_inc);
		v = gradient[p];
		if (v < cur_v && v > 0) {
			gradient[p] = cur_v;
			x_queue[write_pos] = x_inc;
			y_queue[write_pos] = y;
			write_pos++;
		}
		// 2
		p = (y_dec << wDec | x);
		v = gradient[p];
		if (v < cur_v && v > 0) {
			gradient[p] = cur_v;
			x_queue[write_pos] = x;
			y_queue[write_pos] = y_dec;
			write_pos++;
		}
		// 4
		p = (y_inc << wDec | x);
		v = gradient[p];
		if (v < cur_v && v > 0) {
			gradient[p] = cur_v;
			x_queue[write_pos] = x;
			y_queue[write_pos] = y_inc;
			write_pos++;
		}
	}
	
	delete[] x_queue;
	delete[] y_queue;
}

// a gradient is a number array that given a certain x,y gives you the closest resource
// this function puts 0 1 or 255 for no-go, can-go, resource then updateGlobalGradient generates the 
// wave pattern, every neighbor of a 255 square gets 254, the neighbor of that get 253 etc ...
void Map::updateRessourcesGradient(int teamNumber, Uint8 ressourceType, bool canSwim)
{
	Uint8 * x_queue = new Uint8[size]; // todo, could be static!
	Uint8 * y_queue = new Uint8[size];
	Uint16 write_pos = 0;
	Uint16 read_pos = 0;

	Uint8 x;
	Uint8 y;
	Uint8 x_dec, x_inc, y_dec, y_inc; // x_dec = do_wrap(x - 1), x_inc = do_wrap(x + 1)  ...
	Uint8 v, cur_v; // value, current_value
	Uint16 p; //position
	
	Uint8 *gradient=ressourcesGradient[teamNumber][ressourceType][canSwim];
	assert(gradient);
	
	Uint32 teamMask=Team::teamNumberToMask(teamNumber);
	assert(globalContainer);
	
	Case *c; // the tile we are about to check
	
	for (y=0; y < h; y++) {
		for (x=0; x < w; x++) {
			p = y << wDec | x;
			c = cases + p; // get current tile pointer
			
			// there is something to discuss here:
			// this means that going to somewhere, included knowing a route has been blocked somehow
			// eventhough globs should know this because of fog of war and discovery ...
			if (c->forbidden & teamMask) {
				gradient[p] = 0;
			} else if (c->ressource.type == NO_RES_TYPE) {
				if (c->building != NOGBID) {
					gradient[p] = 0;
				} else if (!canSwim && (c->terrain >= 256 && c->terrain < 16 + 256)) {//!canSwim && isWater
					gradient[p] = 0;
				} else {
					gradient[p] = 1; // can only walk here
				}
			} else if (c->ressource.type == ressourceType) {
 				// there is the longed for resource type on this tile!!
				if (globalContainer->ressourcesTypes.get(ressourceType)->visibleToBeCollected && !(fogOfWar[p] & teamMask)) {
					gradient[p] = 0;
				} else {
					// this is the thing we were looking for
					gradient[p] = 255;
					x_queue[write_pos] = x;
					y_queue[write_pos] = y;
					write_pos++;
				}
			} else {
			// there is nothing on this tile
			gradient[p] = 0;
			}
		}
	}

	// each square in the queue is read and its neighbours are filled, unless already filled with higher number
	// note that every square only gets touched once!
	while (read_pos < write_pos) {

		x = x_queue[read_pos];
		y = y_queue[read_pos];
		cur_v = gradient[y << wDec | x] - 1;

		read_pos++;

		x_dec = (x - 1) & wMask;
		x_inc = (x + 1) & wMask;
		y_dec = (y - 1) & hMask;
		y_inc = (y + 1) & hMask;

		// 1
		p = y << wDec | x_dec;
		v = gradient[p];
		if (v < cur_v && v > 0) {
			gradient[p] = cur_v;
			x_queue[write_pos] = x_dec;
			y_queue[write_pos] = y;
			write_pos++;
		}
		// 2
		p = (y << wDec | x_inc);
		v = gradient[p];
		if (v < cur_v && v > 0) {
			gradient[p] = cur_v;
			x_queue[write_pos] = x_inc;
			y_queue[write_pos] = y;
			write_pos++;
		}
		// 2
		p = (y_dec << wDec | x);
		v = gradient[p];
		if (v < cur_v && v > 0) {
			gradient[p] = cur_v;
			x_queue[write_pos] = x;
			y_queue[write_pos] = y_dec;
			write_pos++;
		}
		// 4
		p = (y_inc << wDec | x);
		v = gradient[p];
		if (v < cur_v && v > 0) {
			gradient[p] = cur_v;
			x_queue[write_pos] = x;
			y_queue[write_pos] = y_inc;
			write_pos++;
		}
	}

	delete[] x_queue;
	delete[] y_queue;
}
Now I need some help

1) in the Engine main loop, we wait for the network, do we wait specifically in the place where we do, or can we wait at any place. The problem is that the waiting code is now spread out through the loop, and it will be much more accurate when we place it completely at the end of the loop, where we know exactly how much time we spend where ... and how much time we need to make up for, or we need to sleep in order to get at 25 fps ...

Code: Select all

net->ticksToDelayInside();
Do we wait here to make all engines run on the same fps? or because we wait for network events/packets in particular?

Code: Select all

bool Map::getGlobalGradientDestination(Uint8 *gradient, int x, int y, Sint32 *targetX, Sint32 *targetY)
This function, why do we want a complete destination, why don't we make the globs check the gradient every time as they go, because that is a matter of polling exactly 4 positions to know where to go ...


hope somebody takes the time to respond to this way to long forum message, but thanks for getting this far anyway!

happy hacking!
Ogorter
Worker
Worker
Posts: 6
Joined: Sat Dec 17, 2005 9:38 am

Post by Ogorter »

the patch against latest cvs:

http://tech.inhelsinki.nl/globulation2/ ... patch.diff

current speed measurements, These are the functions that themselves take the most processing time.

Notice that I aggregated all gl related stuff and it is using 13.8%.

And I aggregated all kernel activity, which you can ignore, as it goes for all processes, including the measuring itself.

Code: Select all

13.8%	ATIRadeon9700GLDriver	gldUpdateDispatch	
4.0%	mach_kernel	shandler	
3.2%	glob2	Map::isRessource(int, int)	
3.1%	glob2	Game::renderMiniMap(int, bool, int, int)	
3.1%	glob2	Map::getGroundUnit(int, int)	
2.8%	mach_kernel	thandler	
2.8%	com.apple.ATIRadeon9700	IOATIR300GLContext::clientMemoryForType(unsigned long, unsigned long*, IOMemoryDescriptor**)	
2.7%	glob2	Map::isFreeForBuilding(int, int)	
2.7%	glob2	Map::getBuilding(int, int)	
2.3%	glob2	Map::updateRessourcesGradient(int, unsigned char, bool)	
2.2%	glob2	Map::getTerrain(int, int)	
1.9%	glob2	Map::isGrass(int, int)	
1.3%	glob2	Map::getGlobalGradientDestination(unsigned char*, int, int, int*, int*)	
1.2%	glob2	GAGCore::GraphicContext::drawSurface(float, float, float, float, GAGCore::DrawableSurface*, int, int, int, int, unsigned char)	
1.1%	glob2	GAGCore::DrawableSurface::uploadToTexture()	
1.1%	glob2	dyld_stub__ZN3Map11getBuildingEii	
1.0%	glob2	dyld_stub__ZN3Map11isRessourceEii	
0.9%	glob2	Map::isMapDiscovered(int, int, unsigned)	
0.9%	glob2	Team::checkSum(std::vector<unsigned, std::allocator<unsigned> >*, std::vector<unsigned, std::allocator<unsigned> >*, std::vector<unsigned, std::allocator<unsigned> >*)	
0.9%	glob2	dyld_stub__ZN3Map10getTerrainEii	
0.8%	glob2	Game::drawMap(int, int, int, int, int, int, int, unsigned)	
0.8%	glob2	dyld_stub__ZN3Map13getGroundUnitEii	
0.8%	glob2	AINumbi::nbFreeAround(int, int, int, int, int)	
0.7%	glob2	std::valarray<unsigned>::operator[](unsigned long)	
0.7%	libSystem.B.dylib	pthread_mutex_lock	
Genixpro
Developer
Posts: 26
Joined: Sun Dec 18, 2005 4:15 am

Re: problems with globulation2

Post by Genixpro »

Ogorter wrote:
But as a final remark, the code is not in too good a shape. There is hardly any documentation or comments, which makes it hard to figure out what certain portions of the code do. And some programming practices are making it also difficult to read just the code: who can read this:

Code: Select all

Sint32 ticksToWait=computationAvailableTicks+ticksToDelayInside+missedTicksToWait;
(line 462 in Engine.cpp)

It violates a few rules.
* First it declares a new variable mid function ... makes it hard to read where which data is use.
I don't think declaring variables mid function makes it *harder* to know where that variable is used. You can automagically assume that the variable is used in the code that follows. If anything, declaring variables mid function makes it *easier* to read, and its a recommended c++ style, top declarations are a C thing.
* There are no spaces between variables and operators, making it hard to read what it actually says. (WhatifIdon'tusespaceshere,canyoustillreadit?)
The fact that there are no spaces obvioussly indicates the code was written rather quickly, you know those programming rushes you get when you drink two cans of soda. I do agree that lines like this should have proper whitespacing.
* the variable names don't are what they say they are, and are obscure
I don't mean to be naggy, but I don't think declaring variables mid function makes it *harder* to know where that variable is used. You can automagically assume that the variable is used in the code that follows.
(It is my personal opinion that camel casing is not useful as it is hard to read.)
TitleCasing is hard to read, camalCasing is quiet easy, we all use it because you don't have needless seperators, yet you can still understand the variables names. Your not going to be able to change the naming styles of this program, however input is welcome.

As with the debugging issues, the code is currently undergoing a major change. The programmers are trying to rewrite the program bottom up, with full documentation. My personal contributation to the project is AINicowar, which examplifies my prefered style, and what all the code might look like 6 months down the road.
Genixpro
Developer
Posts: 26
Joined: Sun Dec 18, 2005 4:15 am

Post by Genixpro »

We just made an improvement to the gradient calculation a couple of weeks ago, saving 30%, andrew thought it was all that could be done. So.. how much did you manage to shave off? (It doens't say in your speed report). Also, on non opengl machines, SDL does take up allot of proccessing time, its been a major issue, particularly on net games, a person using SDL can severly lag the game.
Genixpro
Developer
Posts: 26
Joined: Sun Dec 18, 2005 4:15 am

Re: problems with globulation2

Post by Genixpro »

Ogorter wrote:
This function, why do we want a complete destination, why don't we make the globs check the gradient every time as they go, because that is a matter of polling exactly 4 positions to know where to go ...
!
Polling 8 directions, actually, globs can move diaganolly. I'm not overly knowledgable on glob2's internals.

your ideas, and patch, is quite welcome, if you would mind pasting it to the glob2-devel@nongnu.org mailing list, it would get responded to, and discussed, much quicker.
Ogorter
Worker
Worker
Posts: 6
Joined: Sat Dec 17, 2005 9:38 am

Post by Ogorter »

I now have a new version of the patch, I got the engine to run continuously on 25fps, regardless of how many frames we don't draw.

http://tech.inhelsinki.nl/globulation2/ ... atch2.diff

Currently I think I have the multiplayer directive wrong on how much to wait, but if everybody's engine runs at exactly 25 fps, it is not needed to tell the network to wait for the slower machine ...

About the gradient calculation, I only calculate one gradient every 4 frames, so I cheat a little ...

but I got about 30% off of total gradient calculation if I don't apply the extra cheat ...

but gradients can be done even more efficiently, and I will implement that next coding session I do on glob2, though anybody else is free to do it.

Here is how:

Anything that changes the map such that the gradient needs to be recalculated knows about the x,y coordinates where stuff changed. This change is registered to the gradient recalculation queue. (x_queue, y_queue for relevant gradient)

or if this is not possible, the gradient pre-calculation knows about this. See my patch on how updateResourceGradient does it now.

Then the algorithm that is currently used can be executed 100 steps at the time, next frame we can do another 100 steps. etc.

Gradients will be in a incomplete form (but never invalid form) sometimes, but the worst thing that will happen is that a glob stands still or walks in the wrong direction for a couple of frames ...

Besides, by using a changed queue, you don't need to erase the entire gradient, you can simply start from the updates applying the same algorithm as we do now, and the gradient will always be in a valid form. (Though special care when registering discoveries/fog or war, we need to check around for highest value, decrease it by one and continue ... )

And a complete recalculation has good changes of stopping very early, because it only needs to update a specific path, not the whole gradient ...

we will see what we can do with this, but I think we can shave off much, much more ...
Moa3333
Posts: 1
Joined: Sat Nov 26, 2005 10:23 pm

Post by Moa3333 »

Good news. I didn't had the courage to undestand the gradient code yet, mainly because i assumed the gradient system as it works now cannot be improoved much.

I thinked a lot about new ways of rewriting everythink, but this double problem is always dificult to avoid:
- the map is changing all the time
- there are many globules, and they move all the time

The gradient system does not take into accound the seccond problem witch produce high congestion quite offten.

It would be logical to think of a diferent algorithm for short distance and one for long distance. It would also be logical to think the algorithm should take into account the big obstacles in the road as big "things" (areas,regions) to avoid, and the little obsacles or globules as inpredictible.

The algorithm should know the big geometry of the map. But the little obstacles should be avoided by using a combination of a statistical/pheromonal algorithm with just-on-time decisions.

Supposing there is a GOD (that we call the gadient system) outside the game that tells each globule how to move will not poduce a fast game, because GOD is consuming al the cpu (lol)

I have posted a few emails on the mailing list recently, but none is yet a ready-to-implement solution, but a starting point. The last one is:

http://lists.gnu.org/archive/html/glob2 ... 00100.html


All contributions are wellcome, especialy if the code gets more clean and commented.
Kai Antweiler
Posts: 2
Joined: Mon Dec 19, 2005 1:10 pm
Location: Germany

Post by Kai Antweiler »

Ogorter wrote: Then the algorithm that is currently used can be executed 100 steps at the time, next frame we can do another 100 steps. etc.

Gradients will be in a incomplete form (but never invalid form) sometimes, but the worst thing that will happen is that a glob stands still or walks in the wrong direction for a couple of frames ...

Besides, by using a changed queue, you don't need to erase the entire gradient, you can simply start from the updates applying the same algorithm as we do now, and the gradient will always be in a valid form. (Though special care when registering discoveries/fog or war, we need to check around for highest value, decrease it by one and continue ... )

And a complete recalculation has good changes of stopping very early, because it only needs to update a specific path, not the whole gradient ...

we will see what we can do with this, but I think we can shave off much, much more ...
I think you will run into a terrible problem when a resource vanishes
or gets blocked. You'll have to reinitialize the gradient or the globs
might cumulate infront of a forbidden area or a no longer existing fruit.

You might treat such issues as special cases:
Whenever an isolated resource vanishes or a new
forbidden area / building / resource blockes a path, the grudient must
be reinitialized.

I guess such things happen to often. Using the sloppy gradient
update combined with a now and then recalculation will be better.
(that is: the mixing based on a fixed frequency instead of special cases)
Locked