Identical, but not equal

31 posts

Flag Post

I am trying to figure out the fastest way to check that two identical Arrays are in fact identical, even if they are not the same. For instance:

			var a:Array = [1, 2, 3, 4];
			var b:Array = [1, 2, 3, 4];
			trace(a == b);

traces to false, because a and b do not point to the same object. They point to two different Arrays, that happen to have identical contents. I need a fast test that will recognize that a and b are the same.

Right now I am looking at each component of each array and comparing those, but that’s incredibly slow. Any ideas?

 
Flag Post

Could making the arrays into strings and comparing the two strings work?

 
Flag Post

^ If integers, then most likely this will work. If objects, then it will not.

 
Flag Post
Originally posted by Drakim:

Could making the arrays into strings and comparing the two strings work?

Oooh, clever! I like. ^^

And yes, they’re Arrays of integers.

Edit: And after a little moment of panic at the lack of a toString() function, I even learned about join().

Second Edit: join() and compare strings is actually much slower than just doing systematic element comparisons. So much for that idea…

 
Flag Post

Edit: And after a little moment of panic at the lack of a toString() function,

http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/Array.html#toString()

 
Flag Post

Mmh, you’re right, it does have it. Why doesn’t FD autocomplete for it like it does for every other function and property?

Anyway. Since I have a very large series of these arrays to compare to remove duplicates, I thought of using a dictionary, in which I copy the arrays at the key formed with their contents. I’m not sure if it’s faster to systematically overwrite keys or to test if the key is in use and only copying if it’s not. But in any case, it’s still somewhat slow.

 
Flag Post

String comparison is mind-bogglingly slow. It’d actually be faster to compare strings by turning them into an array of characters and doing an array compare.

(Hint: string comparison has to compare every character to the other string, even if it finds a non-match, it doesn’t exit. Nor does it exit if the strings are of different lengths)

 
Flag Post

I’m not sure if this will be at all helpful (since I’ve really no idea why you want this or the implementation details), but if you use the first element of each array to store some [simplified] content information, you can test array length and this content information element to determine if a “deeper” test is needed. For example:

// Array a length is 1, a[0] == 0. 
a.push(cat);
a[0] += cat.getID();                // say cat ID is 1001001 -- its an electric cat
// etc.
// Array b is similar to a, but the last element is different

if (a.length == b.length)     // a and b match
{
  if (a[0] == b[0])           // a and b content summary elements don't match, so exit
  {
      // if their content summary elements match, do an element-by-element test
  }
}

It would, of course, be possiible to get the same values in the first element with different objects in the arrays, but you would cut down on quite a bit of unnecessary tests, I would think.

Addition: Depending on the number of object types being stored (and the number of objects in total in any given array), you could make the possibility of false equivalence almost non-existent if you set ID values far enough apart. If cat’s ID is 1, dog’s is 100, llama is 10000, and wombat is 1000000, you’d need [exactly] a hundred of a given object to get a false equivalence result. Of course you’d need to be concerned about overflow values if your IDs go high enough and/or you could potentially have many large-ID objects in an array…

Edit: Wait, nevermind, you said they’re arrays of integers, so the first element would be a sum of the integers in the array, rather than an object ID value (and thus, the paragraph above is irrelevant).

 
Flag Post
Originally posted by Ace_Blue:

Right now I am looking at each component of each array and comparing those, but that’s incredibly slow. Any ideas?

Have you tried factoring in the optimizations Draco mentioned? I think that, unfortunately, looping through the array contents really is the fastest and only reliable way to do it. Anyway, I cobbled this thing together, partially as a bit of practice in writing efficient code. Could I have done this better?

		public function ArraysHaveIdenticalContents(a:Array, b:Array):Boolean
		{
			var bReturn:Boolean = (a.length === b.length);
			
			if (bReturn)
			{
				for (var i:int = 0; i < a.length; i++)
				{
					if (a[i] !== b[i])
					{
						bReturn = false;
						break;
					}
				}
			}
			
			return bReturn;
		}
 
Flag Post
Originally posted by Aesica:
public function ArraysHaveIdenticalContents(a:Array, b:Array):Boolean
{
var bReturn:Boolean = (a.length == b.length);

if (bReturn)
{
for (var i:int = a.length-1; i >= 0; i--)
{
if (a[i] != b[i])
{
bReturn = false;
break;
}
}
}

return bReturn;
}

Make it so you only evaluate the length of a once. You can do this by traversing the arrays backwards, or storing the value of a.length outside the loop.
Also, “if (a[i] !== b[i])” should be “if (a[i] != b[i])”, I think??
Looked it up: the only thing I noticed is that, if they are both null, strict equality will return false, whereas normal equality will return true. Not sure which you’d prefer.
Why is my formatting broken?

 
Flag Post

I would loop through the arrays backwards.

for(var i:int = a.length; i--; )

As the arrays are the same length, there won’t be any out-of-range errors, and the loop execution will be ever so slightly faster.

 
Flag Post
Originally posted by Draco18s:

String comparison is mind-bogglingly slow. It’d actually be faster to compare strings by turning them into an array of characters and doing an array compare.

(Hint: string comparison has to compare every character to the other string, even if it finds a non-match, it doesn’t exit. Nor does it exit if the strings are of different lengths)

Wat. How is that a thing?

 
Flag Post

@feartehstickman: I seem to recall seeing somewhere that strict equality was a wee bit faster, but also, since he wants to check for identical arrays, strict equality will ensure that when two different data types are compared (and would return true via normal equality) that one doesn’t get converted before comparing. The speed boost might only apply to non-primitive datatypes, though.

@Draco: There shouldn’t be any out of range errors, since the loop isn’t even touched if they aren’t equal from the start. As for loop speed, is it faster by virtue of what feartehstickman mentioned—that you’re only determining the length of the array once? Or is there even more to it?

 
Flag Post
Originally posted by Aesica:

@Draco: There shouldn’t be any out of range errors, since the loop isn’t even touched if they aren’t equal from the start. As for loop speed, is it faster by virtue of what feartehstickman mentioned—that you’re only determining the length of the array once? Or is there even more to it?

I just mean I’d change the order of execution of the loop.

I’d only need to poll a.length once rather than every iteration.

 
Flag Post

I would suggest when creating the arrays to create some hash and keep it updated. Then, you can be pretty sure with quite high confidence that the arrays are different.

 
Flag Post

Are any of these faster than one another:
i++, ++i, i—, —i ?

Or are they effectively the same?

 
Flag Post
Originally posted by feartehstickman:

Are any of these faster than one another:
i++, ++i, i—, —i ?

Or are they effectively the same?

++i and —i are slightly faster than their counterparts. But it’s on the order of 1 ms every million calls.

(Has to do with the fact that i++ and i— return their value twice, effectively: once before the increment/decrement which is what the expression evaluates to, and once after the increment/decrement, which is what the variable is equal to after the expression evaluates).

 
Flag Post

Fair enough.

 
Flag Post

Thanks for all the answers. So if I must loop

for (var i:int = array.length - 1; i >= 0 ; --i)

is superior to

for (var i:int = 0; i < array.length; i++)

but they both scale linearly with the size of the list of arrays, and linearly with the size of each array.

Alternately, I can avoid the loop altogether by creating a hash of the contents of each array in my list and storing those hashes in an ordered array, at which point I can simply hash the array I wish to compare to those in the list and get a match (or not) in O(log N) time. but if the hash is based on all the contents of the array creating it I still have a O(M) step to create the hash. (O(M log N) is still an improvement over O(MN).)

Thank you all.

 
Flag Post

It’s just a stylistic choice, but personally I find this more readable than a backwards loop:

var length:int = array.length;
for (var i:int = 0; i < length; ++i)
{

}
 
Flag Post

Originally posted by Draco18s:

(Hint: string comparison has to compare every character to the other string, even if it finds a non-match, it doesn’t exit. Nor does it exit if the strings are of different lengths)

which is good security practice to prevent timing attacks (figuring out a hash like a safe cracker listening for the tumblers to click into position)

Originally posted by Draco18s:

++i and —i are slightly faster than their counterparts. But it’s on the order of 1 ms every million calls.

(Has to do with the fact that i++ and i— return their value twice, effectively: once before the increment/decrement which is what the expression evaluates to, and once after the increment/decrement, which is what the variable is equal to after the expression evaluates).

they perform identically: same operations in a different order

i++    | ++i
getloc | getloc
dup    | inc
inc    | dup
setloc | setloc

though if you have it in a statement by itself (not assigning it to a variable/indexing/in a comparison/etc.) they compile to exactly the same code: getloc inc setloc

Originally posted by oscarwilld:

It’s just a stylistic choice, but personally I find this more readable than a backwards loop:

var length:int = array.length;
for (var i:int = 0; i < length; ++i)
{

}

it’s just a stylistic choice on modern computers (usually1), but on older computers iterating forward is faster than iterating backwards because prefetching doesn’t work when looping backwards meaning the CPU is waiting for data to come in from RAM

1 though when flash is interpreted instead of compiled (at runtime), prefetching benefits for looping backwards are fully negated by nested looping tripping up the CPU, making forward looping once again faster


the fastest i can think of for your ints (use a Vector, too!) would be:

public function comp(a:Array, b:Array):Boolean {
	var i:int = a.length;
	if (i !== b.length)
		return false;

	var e:int = i, r:int = 1;
	for (i = 0; int(i !== e) & r; ++i)
		r = a[i] === b[i];

	return Boolean(r);
}
 
Flag Post

OK, I’m guessing my suggestion wasn’t a very good one, since it was completely ignored. I’m not surprised, but I don’t understand why — while the results wouldn’t be “reliable” (in that you may still need to do a deeper, element-by-element test), it still roughly doubles the effectiveness of the length test (which everyone happily does) for little cost (as far as I can see, anyway). Anyone care to explain?

 
Flag Post

for an easy way to do this, I’d forget arrays completely and use a string from the start, then use == to compare.

 
Flag Post
Originally posted by dragon_of_celts:

OK, I’m guessing my suggestion wasn’t a very good one, since it was completely ignored. I’m not surprised, but I don’t understand why — while the results wouldn’t be “reliable” (in that you may still need to do a deeper, element-by-element test), it still roughly doubles the effectiveness of the length test (which everyone happily does) for little cost (as far as I can see, anyway). Anyone care to explain?

It’s not a bad idea, and I read it even if I didn’t specifically reply about it. It could certainly cut some time in the comparison as long as the checksum in the first element is kept updated, and is designed in such a way that it’s reasonably unique to an array configuration.
However, creating the checksum is an O(N) operation with N the size of the array, and unless the checksum is very simple it has to be recomputed entirely every time the array is modified, and unless it’s sufficiently complex there is a non-negligible chance that checksum equality could occur coincidentally between different arrays and that the whole check would have to be performed anyway when the two checksums are equal. Still, it is worth a try.

 
Flag Post

there is a native as3 solution: Array.some() and Array.every(), so this

function sameArray(a:Array,b:Array):Boolean {
	   return a.every(function(element:*, index:int, object:*):Boolean {
	           return( element == b[index] );
	   }); 
}
//returns "true" if all elements are equal

should actually be faster.