What is the optimal ID key size for minimizing data usage and avoiding overflow?

  • I
  • Thread starter volican
  • Start date
In summary, the conversation discusses the best way to assign unique identifying numbers to billions of objects. The main concern is minimizing the amount of data needed to store and transmit the IDs, and various methods are suggested such as using a central store or using different byte sizes for the IDs. The conversation also touches on the issue of overflow and the need to consider the maximal number of concurrent events.
  • #1
volican
41
0
Hi, say I have billions of things that I would like to individually name (give a unique identifying number), what is the best way to go about that? I would like to minimise the amount of data that is needed to uniquely identify each object, as it then takes less memory to store and transmit. So far what I have been thinking along the lines of concatenating the date and time of creation. This is fine but if things are made concurrently this would not work. Any ideas?
 
Physics news on Phys.org
  • #2
Just use numbers from 1 to N?

More context would help.
 
  • #3
If you have a central store that knows the ID already issued (or how many) then, as mfb says, it is hard to do it more compactly than simply increase a counter. With a 4 byte (32 bit) count you can identify up to 4 billion unique IDs. Using 8 bytes (64 bit) this increases to 18 * 1018.

However, if by concurrent mean you will have distributed registration of ID's (on a computer), then other schemes like GUIDs that uses 16 bytes may be more appropriate. While these are a practical solution for use by computer systems you should be able to extract the "mathematical" methodology behind this if that is what you want.

Since you are talking about storing and transmitting IDs in a forum for mathematical set theory it is really not clear what you are after. More context, as mfb said, would definitely increase the chances for others to give you relevant help.
 
  • #4
Some of this discussion of IDs came up in the early years of computing where we started with Baudot code (5-bits) then ASCII (7-bit) then EBCDIC (8-bit) and extended ASCII (8-bit) then WideChar and Unicode (16-bit) ... The complaint often was too much storage is needed when we want to save documents written in language X. At the time, the byte was popular as a unit of storage and so 8-bit extended ASCII was a natural encoding (actually there were many coding standards that dictated the sorting of characters in the locale) to cover all the European alphabets.

The notion of indexing comes into play when you assign a number to a thing like a letter in an alphabet. Chinese, Japanese and Korean challenged the dominance of ASCII since they needed 2 bytes to represent a character. This means that a text document must use 2-byte characters to represent Chinese and English (0-byte + 8-bit ASCII) in the same document in memory and use a variable length coding like UTF-8 (1 byte for ASCII and 2, 3 ... bytes for Chinese, nulls and certain control characters not allowed) when saving to a file. Unicode further challenged the 2-byte notion when all the alphabets of the world are added into the mix.

The point here is that the byte size of your ID will determine how long it will be useful before it "overflows" ie runs out of number space. We have seen this happen with seconds, milliseconds and microsecond timestamps of 32-bit, 64-bit and 128-bit size causing the Y2K and related overflow events. You'll need to consider the maximal number of concurrent events that can occur to then determine the best ID key size that won't overflow too soon.

seconds + upto 2, 4, 8 ... 256 events
or
millisecs + upto 2, 4, 8 ... 256 events
or
microseconds + upto 2, 4, 8 ... 256 events
 

FAQ: What is the optimal ID key size for minimizing data usage and avoiding overflow?

What is a unique identifier problem?

A unique identifier problem is a situation where there is no single, consistent way to identify a particular object or entity. This can occur when multiple systems or databases use different formats or methods for assigning and tracking unique identifiers.

Why is a unique identifier important?

A unique identifier is important because it allows for accurate and efficient identification and tracking of objects or entities. It ensures that each item has a distinct and consistent identifier that can be used across different systems and databases.

What are some common examples of unique identifier problems?

Some common examples of unique identifier problems include duplicate Social Security numbers, conflicting employee identification numbers within a company, and discrepancies in product or inventory codes across different systems or companies.

How can a unique identifier problem be solved?

A unique identifier problem can be solved by implementing a standardized system for assigning and tracking unique identifiers. This may involve creating a unique numbering system or using a universal identifier such as a barcode or RFID tag.

What are the potential consequences of not addressing a unique identifier problem?

The potential consequences of not addressing a unique identifier problem include data errors and inconsistencies, difficulties in data integration and analysis, and problems with accurately identifying and tracking objects or entities. This can lead to inefficiency, delays, and errors in various processes and systems.

Back
Top