How to store bunch of objects?

21 posts

Flag Post

I need to store upwards of 2000 objects and be able to search for them in O(1) time. Each object already has a unique integer ID. Knowing this, should I store them in an Array with the ID as the key, or should I still store them in something like a Dictionary?

 
Flag Post

O(1) = dictionary searching
O(n) = array searching

Or you could go amazingly unnecessary and use a hash type deal with linked lists to manage collisions. :D

 
Flag Post

So you want to search for the objects by ID? If the IDs are contiguous, I’d use a Vector.

 
Flag Post

@UG

The array option was to store the object in the index equal to its ID. Locating the object in the Array would be O(1).
But tell me more about this linked lists pseudo-Hash.

@BGM

The program will both create IDs in no particular order, and have to search for a random ID.

 
Flag Post

Well, I specifically wrote a program to solve the problem of bar codes and stores.

Bar codes have ‘random’ numbers, but we need to store the objects in a small data structure like a length defined array. (Benefits of the array would be fast access, cause elements are stored in memory close together, while maybe not so for a dictionary. And you won’t ‘lose’ objects like you might if you use a dictionary and forget the key.)
So you would start with a defined array, say 600 length, and perform a hash on the unique item ‘key’ and store it in a corresponding index in the array. (hash would have something like % 600 to stay in-bounds)

If you had a collision (existing object at calculated index) you’d use a Linked List (if your programming language only has defined length arrays, or you like whatever properties of a Linked List) or an array.

To search objects, you’d simply perform your hash on the key and locate the index. If the index had >1 element then you’d search through all elements there and find the one with the matching unhashed id.

 
Flag Post

I’d just use an Array and store the items with their unique ID as key. I think Arrays perform better than Dictionaries, however no guarantees :D . I’m not sure there would be much point in creating your own hashing method and fixed length vector, as everything like that will be done automatically for you.

Also, you can’t really lose items in a dictionary in my opinion, as you can just loop through items with a for - in loop.

 
Flag Post

@UG

That’s a good idea, thanks. But I need to decide if the memory saved by implementing that hash method is more important than the time spent searching through collisions.I’ll try to come up with a good hash algorithm. Anyway, it seems like the array strategy is better than the dictionary strategy in my situation.

 
Flag Post

Access from the dense portion of an array is about as fast as it gets. If the IDs are unique, how will you get a collision?

Edit: Missed where you said they aren’t sequential. I’d use some kind of hash; probably just a good ol’ Object.

 
Flag Post

How dense an Array are we talking about here? I’m assuming the unique ID could be stored in a uint (so a bit more than 4 billion possible IDs) and you say you have 2000 objects. An Array of 4 billion objects, with only 2000 of its indexes being something other than ‘undefined’ sounds like a huge waste to me. So in that case I’d go Dictionary. If you get to pick the ID and can restrict it to the range 0-2000, then populating an Array of size 2000 by having the ID be the index is likely the preferable solution.

And then again, if you could have at most 2000 objects but you will likely have only a few most of the time, then it’s back to Dictionary.

Edit: \/\/\/ “Why would you have an array of 4 billion items when you only have 2000 objects?”
Beats me. I was making the case that you shouldn’t.

 
Flag Post

In my opinion, hash tables are definitely not “amazingly unnecessary” in this situation

 
Flag Post

If your unique id’s are largely sequential, i.e. 1,2,3,4,5….1998,1999,2000 then use a Vector. From my experience with something similar, I originally used a Dictionary to store around 20,000 class objects, but when I switched to a Vector, where the unique ID corresponded to the index of the Vector, I noticed that there was a roughly 35% speed increase, where my function would originally take around 4.4 ms, it went down to around 2.9 ms.

The speed difference is not huge, however if your running some test quite often (like I was), in the grand scheme of things, its quite a large reduction.

If your unique id’s were either very large or very spread apart, I would probably either try using a Dictionary (and see if it runs at an acceptable speed), or try using some sort of hashing or modulo functions to get your values in acceptable ranges.

 
Flag Post
Originally posted by Ace_Blue:

How dense an Array are we talking about here? I’m assuming the unique ID could be stored in a uint (so a bit more than 4 billion possible IDs) and you say you have 2000 objects. An Array of 4 billion objects, with only 2000 of its indexes being something other than ‘undefined’ sounds like a huge waste to me. So in that case I’d go Dictionary. If you get to pick the ID and can restrict it to the range 0-2000, then populating an Array of size 2000 by having the ID be the index is likely the preferable solution.

And then again, if you could have at most 2000 objects but you will likely have only a few most of the time, then it’s back to Dictionary.

Why would you have an array of 4 billion items when you only have 2000 objects?

 
Flag Post

He means having [0,1,2,3...] instead of 0,,,,,,,,,1,,,,,,,,,,,,,,,,,2 as in… instead of storing elements by id from 0 to 1999, you’d store them based on something else, allowing them to have indexes as high as 4,294,967,294.
But actually, the used memory would be pretty much the same. An array with a length of thousands of millions, yet with only 2,000 of its indexes filled would use about as much memory as an array with length 2000, with every index filled.

That said, using the second strategy would force you to use an array (instead of a vector), which is horrible.

 
Flag Post
Originally posted by Senekis93:

He means having [0,1,2,3...] instead of 0,,,,,,,,,1,,,,,,,,,,,,,,,,,2 as in… instead of storing elements by id from 0 to 1999, you’d store them based on something else, allowing them to have indexes as high as 4,294,967,294.
But actually, the used memory would be pretty much the same. An array with a length of thousands of millions, yet with only 2,000 of its indexes filled would use about as much memory as an array with length 2000, with every index filled.

That was the point I was trying to make in my rhetoric.

 
Flag Post

@Ace Blue
Sorry, I forgot to mention that the IDs are also in the range of 0-2000

Originally posted by BigJM:
Originally posted by Senekis93:

He means having [0,1,2,3...] instead of 0,,,,,,,,,1,,,,,,,,,,,,,,,,,2 as in… instead of storing elements by id from 0 to 1999, you’d store them based on something else, allowing them to have indexes as high as 4,294,967,294.
But actually, the used memory would be pretty much the same. An array with a length of thousands of millions, yet with only 2,000 of its indexes filled would use about as much memory as an array with length 2000, with every index filled.

That was the point I was trying to make in my rhetoric.

So you’re saying I should make it a vector instead of array, manually nulling out all the slots between the last entry and the current entry if necessary?

 
Flag Post

If your IDs are from 0 to 1999 and you have 2000 objects, you should most definitely use a vector to store them:

var myObjList:Vector.<MyClass> = new Vector.<MyClass>(2000, true);  //length of 2000, fixed length
myObjList[myObj.id] = myObj;

If your IDs are long random numbers, just use an object or an array:

var myObjList:Array = [];
myObjList[myObj.id] = myObj;
 
Flag Post

If you create a vector with a length of 2000 it will automatically create all the null references for you. However it will also allocate a big chunk of memory. The idea behind arrays and vectors is that the objects are stored in memory at equally spaced locations so that their location can be found simply and quickly by computing it based on its index in the array. If your IDs are not contiguous then using a vector won’t make much sense because vectors must be contiguous. Even if you use null to fill in the gaps, adding items will probably result in some costly resizing/reallocating, and you still have a lot of wasted memory. You can use an array but you’ll lose any benefit from it since the objects that are not in the dense portion (the part that starts at index 0 and goes to the last contiguously defined index) will not use fast access. I’d imagine that these objects are referenced through some kind of hash, just like objects.

Is there a reason the IDs must be random?

 
Flag Post

A new Vector of 2000 objects does not use (2000 * object size) memory, just (2000 * reference size), so a few kb. Really nothing to worry about. If your IDs went up to a million maybe, but 2000 is not a problem.

As you populate the Vector with objects the memory is allocated for each object as it’s created, the same as when you have an Array of them, so the main memory cost is your objects. Though even with 2000 chances are they use less memory than a single graphics object or sound.

Vectors only do things differently for primitive types – int, Number, uint, Boolean – when they allocate memory all at once, which is far more efficient than a separate allocation for every number which an Array has to do.

 
Flag Post
Originally posted by JWBSoftware:

A new Vector of 2000 objects does not use (2000 * object size) memory, just (2000 * reference size), so a few kb. Really nothing to worry about. If your IDs went up to a million maybe, but 2000 is not a problem.

A vector with a length of 1M uses about 3.8 mb.

 
Flag Post

Can someone explain me or link to explanation of the O(1) and O(n) thing?

 
Flag Post

Good yummy tech details

tl;dr: A way to compare a method’s efficiency by looking at it’s growth rate (speed when you add more and more and more data)

From good to worse using the more common ones:
O(1) → O(logn) → O(n) → O(nlogn) → O(n^2) → O(2^n)