Flash performance slideshow.

21 posts

Flag Post

http://www.slideshare.net/fenomas1/flash-performance-tuning-en

Kewry shared this in the GDR.

My views on int:

Fast code should be the standard way of development for anyone, not something one should do once the application starts running slow.

Sometimes it’s even quicker to type fast code than to code the wrong way.

So I kinda agree with the next slide (Architect for performance → Build for speed → Optimize), the issue is that many take the “premature optimization is the root of all evil” as a “don’t optimize until the lag makes it unplayable”; and that’s totally incorrect.

When we say “optimize”, we mean “making already fast code even faster”, not “fix your ugly, slow code”.

About the pooling slide, I agree with the use, reuse and abuse of constants, but I wouldn’t suggest people to pool objects all the time; there’s a time for everything, and pooling when you don’t need it may result in slower performance.

Then there’s the “avoid function calls”; I totally agree with this.
I always see people telling others to “call enemy.move() every frame”. No; throw the enemy movement logic in your frame handler.
Although some of this won’t be necessary anymore since inline is being implemented (with limitations), but still, there are many places where people call methods in critical areas for no good reason (like a lot of the Math functionalities which can be replaced by typing a dozen of characters).

Most of the other things have already been mentioned, like using the “smallest” object inheritance (don’t use MovieClip if you need an effing thing with x and y), being careful with blend modes and filters, cache stuff, etc.

Also, the Monocle thing seems nice, I’ll give it a try later.

 
Flag Post

Haven’t seen the slideshow yet but in terms of optimization, I almost always code with optimization, readability, and flexibility in mind. At the end when the game is done mostly I’ll run through and do things I forgot, such as removing listeners, switching a variable to the most fitting type (i.e. int, uint, Number), and trying to use bitshifting, 1D Arrays/Vectors whenever I can. So I think a good rule is code optimally, but not to the point where 90% of the time is optimizing. Code well, clean, and not slow, and then you can do the BIG changes in the end.

 
Flag Post

Sometimes you can’t help write code → optimize later.

For instance, in my Unity tower defense game, I had no idea when I started as to the impact Draw.VBO had. So I had to do research on how to reduce draw counts.

That still left me with a 60ms spike every time an object was destroyed (and pooling made things worse!) so I ended up having to use a hybrid method: instead of destroying creeps when they die, simply put them into a suspended state off-screen, and do an end-of-wave cleanup. Essentially batching the destroy calls and putting the lag spike where it would go unnoticed.

Despite all of that, I still had another problem:
Because I had 300+ objects that were all independently dynamic, they couldn’t have their draw calls combined (so, still a Draw.VBO problem, but not standardly solvable). There I had to think about how I could display them when there was CPU time, and hide them when there was no CPU time. The solution? During a wave when draw counts go up naturally, hide the 300 objects. Otherwise show them.

So yes. Plan for optimization where you can. Prototype and test and improve where you cannot plan.

 
Flag Post

Yeh, I have monocle but have yet to do any experience on it because it requires publishing with FP11.4 (and setting some code if you want to use detailed information) and I haven’t downloaded the playerglobal.swc

Code architecture is quite important. Instead of diving into a problem, people need to take a step back and try to design a way so that the solution isn’t slow because the lack of design.

Great architecture will possibly lead to faster execution and development time. Something we could all use.

 
Flag Post

I always type the remove listener line as soon as I type the add listener one, and make sure to add a var=null; line whenever I declare a complex class variable.

Many people say that it’s a lot of effort or it’s extra work, but I don’t feel it’s like that; once you start doing it, it just comes naturally. And again, most of that is because they started coding the wrong way from the start.

Also, have this in mind while reading your code:
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.”

E: The first part was @RTL.

E2: @Draco. You’re right, it’s not always possible and sadly, a lot of times bad performance ends up in the hands of the tool we use. In Flash’s case, the rendering is a mess and it all goes through its API.

But yeah, optimize within your posibilities, there will always be things we can’t control, but the ones we do should run fast enough that if there’s a performance issue, we can blame the language, the interpreter or the computer’s limitations, but not our code.

 
Flag Post

http://www.onflex.org/ACDS/AS3TuningInsideAVM2JIT.pdf
http://www.onflex.org/download/AS3Perf.pdf

It would be nice to have the audio that accompanies these.

 
Flag Post

I almost always used typed vectors now, not arrays.

And if I have relational data I want to store (e.g. [a] with [b] and ©, even if all three are ints) I create a custom class to contain the data, and vectorize that.

 
Flag Post

AS3 needs structs.

 
Flag Post

Thanks fo the links.

The “Sometimes just using Number is more efficient” tip is important, many (including me) are used to type many things as int and forget all the conversions tha can occur while manipulating the values, especially when dealing with multiplication or division.

 
Flag Post

Strongly disagree.

Computers have been built to serve you. Not the other way round. Your time as a programmer is infinitely more valuable than a nano second on a computer somewhere on the other side of the world. Reusable, human readable code is golden; even if that means you’ve got an extra layer of functions (or several).

Computer speed has increased exponentially over the last 20 years. When computers were 1/1000th the speed they are now, optimisation was really important. Some coders invested a lot of time and effort becoming very good at it, and because they don’t want that investment to lose its value, they will defend optimisation aggressively. Don’t let them drag you down into the mire. It’s better to release 3 normal games than 1 super optimised one.

Optimisation can be useful if you’re making a really intense game like a bullet hell. Understanding the basic principles is important when you’re building games to avoid falling into the lag swamp in the first place. But general case? Optimisation is a waste of time. Especially for the type of games we’re likely to develop here.

 
Flag Post

That’s all subjective, CuriousGaming

 
Flag Post

I don’t think it’s a waste of time. Actually, doing things the right way is just as easy:

a.push(value); | v[v.length]=value;
while(i<a.length)i++,stuff; | while(--i>-1)stuff;
if(a&&b)stuff | if(a)if(b)stuff
v=Math.abs(v); | v=v<0?-v:v;
if(someExpensiveTest&&boolean) | if(boolean)if(expensivetest)
p={x:10,y:20}; | p=new Point(10,20);

The list could go on for a long time.
Or what about the ones who create like 10 variables to achieve what you can do in a single statement, with no extra variables declared?
Or the ones who extend MovieClip when they only need Shape functionality?
Etc, etc ad infinitum.

I don’t see how that’s a waste of time; in many cases, you even have to type less, and in every case, the application will run faster.

As I said, fast code should be your default; it’s not something that takes time from you, since you’re supposed to do it naturally. The problem is that most people are used to bad code, and it feels like many think that coding properly is an extra effort which would require a lot of time, when it’s not.

 
Flag Post

Interesting list. Along with the usual suspects there’s two I didn’t know about. Can you elaborate on how much faster a double if statement is compared to to the && operation? And on the last line are you really advocating the creation of a Point object as the faster solution?

As a rule, if I have to reassign an object every frame (a Point or a Rectangle, usually, for blitting) I’ll create the object as part of the class declaration and only recompute the properties I need changed in the frame handler (usually x and y). I had assumed that making a new instance every frame would be wasteful not just for performance reasons, but also for memory usage. Was I wrong in my thinking?

Edit: And one last thing, regarding the first line. Isn’t v[v.length] outside of the Array? what if that memory is already used by something else? Or is my expectation that an Array be a contiguous block of memory an antiquated idea?

 
Flag Post

I expect that both Array and Vector data types are lists in memory, except maybe if a Vector is set to fixed length, then a Flash player can allocate correct chunk of memory and transfer the data there once, and then use optimised address arithmetic to access elements invisibly to programmer.

 
Flag Post

So I went ahead and tried to answer my own question as to how much faster if if is when compared to &&. The result is: I don’t know. Here’s the piece of code I used to run the tests:

		private static const DEFAULTSAMPLESIZE:int = 20;
		private static const DEFAULTITERATIONS:int = 4000000;
		private static var dummy:int;
		
		public static function run(sample:int = DEFAULTSAMPLESIZE, 
			iterates:int = DEFAULTITERATIONS):void
		{
			var i:int = 0;
			var j:int;
			var timedata:Array = new Array(sample);
			trace("Starting...");
			var start:int = getTimer();
			while (i < sample)
			{
				j = 0;
				while (j < iterates)
				{
					if (i >= 0 && j >= 0) dummy = 0;
					j++;
				}
				timedata[i] = getTimer();
				i++;
			}
			postProcess(start, timedata);
		}

As you can see from the code above, I ran four million iterations for each data point, and did 20 points for each case, so I would have a somewhat statistically significant sample to process. For each case, I computed the fastest and slowest times, as well as the average and standard deviation. The only line that changed between cases was the if statement.

First I ran an empty case where the statement was commented out, it is the case labeled ‘empty’. Then I ran another where I replaced the if statement with an assignment of dummy = 0, hence what would happen if both conditions were systematically true. That case is labeled ‘assign’. Finally, I ran two series of four cases, with serial ifs and the && operator inside a single if, respectively. The four cases correspond to the four possible situations (true, true; true, false; false, true; and false, false). The two conditions involved comparing the counters i and j to 0, so the condition would be systematically false if the comparison operator was ‘<’, and systematically true for the operator ‘>=’. The results are shown below:

Series min MAX average  std dev
Empty  405 437 410.25 +- 9.3
Assign 592 640 609.20 +- 8.1
ifFifF 608 640 616.20 +- 9.5
ifFifT 624 655 628.65 +- 8.9
ifTifF 624 686 633.35 +- 14.6
ifTifT 624 640 631.00 +- 7.9
F&&F   624 640 628.65 +- 7.3
F&&T   593 609 607.60 +- 3.5
T&&F   639 656 643.50 +- 6.9
T&&T   639 656 641.95 +- 5.9

First, it is obvious that the empty case takes a significant time, and all it does is iterate the counters. Second, the ‘assign’ case is only 3-5 standard deviations faster than the true-true cases, so it is not at all obvious exactly how much time was spent in the actual tests. Finally, it is not apparent either by comparing cases with a false first clause to cases with a first true clause that the second clause was left untested if the first one was found false. It appears that this might have been the case for the && operator, but the data are too noisy to be sure.

My general conclusion is that one of three things happened:
1) I bungled the test. I’ve never studied how to do proper CS benchmarking, and I could very possibly have done this test completely wrong, to the point that the results are worthless. If it is so, please teach me how to do it properly so I can do a better job next time.
2) The code is not doing what I think it’s doing. For instance, one would expect the true-true cases to be significantly longer than any other combination, since they involve an extra variable assignment as a consequence. They are not. That result alone is very puzzling to me.
3) Comparison tests and assignments are much faster than parsing an extra line of code, so their cost is absorbed and drowned in the cost of running that extra line. That would explain the difference between the ‘empty’ case and everything else. It would explain why true-false and true-true take as much time even though one requires an extra assignment, or why the false-first cases don’t seem to be all that much faster than the the true-first cases, if at all.

As a conclusion, I just want to say that, unless someone can show me how I bungled and teach me how to correctly run these tests, I’m just going to go ahead and state I could not measure any difference in execution time between serial ifs and the && operator.

 
Flag Post
Originally posted by Ace_Blue:

Interesting list. Along with the usual suspects there’s two I didn’t know about. Can you elaborate on how much faster a double if statement is compared to to the && operation?

The difference is really small (I wasn’t able to find the thread wher I posted the actual values), but there’s a difference, as output with && results in extra instructions.
But again, extra ms are gained when tons of those small gains start to stack.

Also, since your test could translate into if(true&&true), there’s a good chance that the JIT or any other witchery is already optimizing it. Not sure about this, I don’t know that much about the subject. I’ll just run a test:

private function test(e:MouseEvent):void{
			var 
				start:int,
				i:int=1000000,
				j:int;
			start=getTimer();
			while(--i>-1)
				if(Math.random()>0.5&&Math.random()>0.5)++j;
			trace(getTimer()-start);
		}

Results:422, 438, 453.
Changing it to if(Math.random()>0.5)if(Math.random()>0.5)++j;
Results: 406, 437, 437.

And on the last line are you really advocating the creation of a Point object as the faster solution?


Totally:
public function test(e:MouseEvent):void{
			var 
				start:int,
				i:int=1000000,
				j:int,
				v:Vector.<Object>=new Vector.<Object>(i),
				p:Vector.<Point>=new Vector.<Point>(i);
			start=getTimer();
			while(--i>-1){
				p[i]=new Point(10,20);
			}
			trace(getTimer()-start);
			start=getTimer(),
			i=1000000;
			while(--i>-1){
				v[i]={x:10,y:20};
			}
			trace(getTimer()-start);
		}

Results: 953 (Point) 1906 (Object).

Of course, the memory used by a Point is bigger because of all thr functions. If you need an object with x and y without all the Point methods, it’d be better to create a class with nothing but x and y.

As a rule, if I have to reassign an object every frame (a Point or a Rectangle, usually, for blitting) I’ll create the object as part of the class declaration and only recompute the properties I need changed in the frame handler (usually x and y). I had assumed that making a new instance every frame would be wasteful not just for performance reasons, but also for memory usage. Was I wrong in my thinking?

No, you’re correct.

And one last thing, regarding the first line. Isn’t v[v.length] outside of the Array?

Yes, it’s technically outside, but youcan access it without any problems if it’s an array or as long as th previous index exists in the case of a vector.

 
Flag Post

well, the trick with Object is that this class is dynamic, while Point and whatever custom class you make is static unless stated. So Flash needs to do overhead job to allocate an instance of a dynamic class.

 
Flag Post

Actually, tbh, I think CuriousGaming is right. You don’t need to try and over-optimize things, unless you need it. Sticking everything in one frame loop without calling methods can be very messy. I’d prefer more organized code at the cost of a little speed, tbh. There are other things you can optimize, which, imho, are much more important. You don’t need to do everything.

 
Flag Post
Originally posted by TheAwsomeOpossum:

Sticking everything in one frame loop without calling methods

This is not necessary in avoiding the use of multiple event listeners, which may or may not carry a significant overhead for your program

 
Flag Post

I hold everybody who thinks performance isn’t important personally responsible for the popular public view that flash is slow and bothersome. Coding with performance in mind takes very little once you have learned it, but gives a lot in return.

 
Flag Post