How does Python tell that an integer is actually an integer?

In summary,The first example is. For instance I have given you a binary '0b1100010' does it represent an integer or string ? How python understands the difference ? The second example was obvious thing so I have deleted that.We're getting some edit lag ... I deleted the post you just asked about because of a bit in the first post which I didn't see the first time, and don't see now. Ghosts, or something.
  • #1
Arman777
Insights Author
Gold Member
2,168
193
Look at this example,

Code:
>>> a = 97
>>> type(a)
<class 'int'>
>>> bin(a)
'0b1100001'
>>> b = ord('a')
>>> b
97
>>> type(b)
<class 'int'>
>>> bin(b)
'0b1100001'

Is this means that the string 'a' and the integer 97 stored as the same binary in the memory ? If so then how can python tell the difference ? As far as I can remember in C we have to define the type of the variable before we use it. In Python we don't have to do that. In that case how it separates integers from strings ?
 
Last edited:
Technology news on Phys.org
  • #2
hmmm27 said:
No offence, but is that supposed to be confusing ?
The first example is. For instance I have given you a binary '0b1100010' does it represent an integer or string ? How python understands the difference ? The second example was obvious thing so I have deleted that.
 
  • #3
We're getting some edit lag ... I deleted the post you just asked about because of a bit in the first post which I didn't see the first time, and don't see now. Ghosts, or something. [edit: which you then explained you deleted, so whatever].

I'm really not sure how you're confused, though I'm sure I am.

Who gave you that code snippet ? or did you make it up yourself.
 
Last edited:
  • #4
Python uses dynamic typing. When you assign a value to a variable it can determine the datatype at that time.

So while ‘a’ and 97 are the same in memory python sets the datatype of x to char when x=‘a’ is used to assign the value.

it is possible to mess with the storage but then you know the dangers and reap the whirlwind of disaster.
 
  • Like
Likes Janosh89 and Arman777
  • #5
Something like C stores very little except the binary value of a variable and relies on type information provided by the programmer in a declaration to know what to do with the data. Thus you can simply cast 'a' (edit: or "\0a" anyway, I think) to integer and get 97.

On the other hand, everything in python is an object, so carries around type information as well as the basic binary data. It recognises 97 as an integer when it parses your input and stores it as an int object. At least, that's the way you can think of it. There may be a lot of gory implementation details I'm not aware of.
 
Last edited:
  • Like
Likes WWGD, FactChecker, jedishrfu and 1 other person
  • #6
hmmm27 said:
Who gave you that code snippet ?
I wrote it myself. Look at this example.
Code:
Python 3.9.5 (tags/v3.9.5:0a7dcbd, May  3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 97
>>> b = 'a'
>>> bin(a)
'0b1100001'
>>> bin(ord(b))
'0b1100001'
>>>
My question is something like this. We are storing an integer, 97, in a variable called a. We are also storing a string, "a", to another variable b. As far as I know everything is turned binary in the process of calculations. So I am thinking that variable a is stored as '0b1100001'. But that is the same as variable b, which happens to be a string.
 
  • #7
Arman777 said:
So I am thinking that variable a is stored as '0b1100001'. But that is the same as variable b, which happens to be a string.
I see no problem with different variables having the same value, no matter what format it is stored in at the time.
 
  • Like
Likes Arman777
  • #8
Since I did not know what "bin" does, I looked it up. You seem to assume it gives you the representation that Python uses internally to represent an object. I think it doesn't: "The bin() method converts and returns the binary equivalent string of a given integer. If the parameter isn't an integer, it has to implement __index__() method to return an integer." (from: https://www.programiz.com/python-programming/methods/built-in/bin). In other words: In case of variable a, it returns the "binary equivalent string of 97" (whatever that means), and in case of variable b it returns "the binary equivalent string of b.__index__()". Check if b.__index__() returns 97 :).

Bottom line: I don't think that bin() tells you how the Python interpreter stores the variable internally. The first google hit I found on it may be wrong, though.
 
  • Like
Likes Arman777
  • #9
Timo said:
Since I did not know what "bin" does, I looked it up. You seem to assume it gives you the representation that Python uses internally to represent an object. I think it doesn't: "The bin() method converts and returns the binary equivalent string of a given integer. If the parameter isn't an integer, it has to implement __index__() method to return an integer." (from: https://www.programiz.com/python-programming/methods/built-in/bin). In other words: In case of variable a, it returns the "binary equivalent string of 97" (whatever that means), and in case of variable b it returns "the binary equivalent string of b.__index__()". Check if b.__index__() returns 97 :).

Bottom line: I don't think that bin() tells you how the Python interpreter stores the variable internally. The first google hit I found on it may be wrong, though.
this might be the reason
 
  • #10
jedishrfu said:
ython uses dynamic typing. When you assign a value to a variable it can determine the datatype at that time.
This and
Ibix said:
Something like C stores very little except the binary value of a variable and relies on type information provided by the programmer in a declaration to know what to do with the data. Thus you can simply cast 'a' (edit: or "\0a" anyway, I think) to integer and get 97.

On the other hand, everything in python is an object, so carries around type information as well as the basic binary data. It recognises 97 as an integer when it parses your input and stores it as an int object. At least, that's the way you can think of it. There may be a lot of gory implementation details I'm not aware of.
this is also makes sense..so there also other information that it takes I guess or defines
 
  • #11
It seems when I call ord(), the string becomes an integer...which is clear from this

Code:
>>> type(ord('a'))
<class 'int'>
>>>
which seems that I have already typed in the OP. So its no longer a string and its just a number. So the strings and integers are stored differently and I guess it has nothing do with the bin as pointed out.

Thanks for the help anyways. I don't know why but I was confused
 
  • #12
I'm still confused about what you're actually confused about.

I mean, you know there isn't a little gnome inside the memory chip painting itty-bitty lower-case a's and b's and zeros and ones onto the silicon, right ?

[edit: Characterizing the many things - like presence/absence of a magnetic field, punched hole in paper/cardboard, state of a transistor, etc. - as "ones" and "zeros", is purely for linguistic convenience. (Though yes, if the computer is doing a binary math operation, it's also convenient that the "1's" and "0's" can actually be used directly as 1's and 0's.]
 
Last edited:
  • Haha
  • Like
Likes Arman777 and jedishrfu
  • #13
As you start investigating how things actually work it’s easy to get confused about this. When we read core dumps we could separate areas of code from data and data itself into textual, integer or floating pt based on the bit patterns we saw.

Program Opcodes had a denser all bits used format but good readers could spot common opcodes and recognize memory addresses being used….

32 bit integers typically were stored in even word boundaries and always had high bits zeroed ie integer data usually used the low range of possible integer values. 32 bit floats had a similar layout but high bits were used for the value Stored as power and mantissa And low bits were zeros. Textual ascii data had high bits of every byte as zeros As ascii codes were in the 0-127 range With 228 to 255 zeroed.

Of course nowadays, a dump reader would have to contend with Unicode multi byte textual data and 16 bit, 32 bit, 64 bit … numeric data too. And then there’s the heap and stack and big median and little endians which wasnt a thing for older mainframe machines.

My dump reading experience was with GE 635 and Honeywell 6000 mainframes Which would be consider big median 36bit word architectures. Folks today use debuggers not dumps to investigate program / data memory.
 
  • Like
Likes Arman777
  • #14
hmmm27 said:
I'm still confused about what you're actually confused about.

I mean, you know there isn't a little gnome inside the memory chip painting itty-bitty lower-case a's and b's and zeros and ones onto the silicon, right ?

[edit: Characterizing the many things - like presence/absence of a magnetic field, punched hole in paper/cardboard, state of a transistor, etc. - as "ones" and "zeros", is purely for linguistic convenience. (Though yes, if the computer is doing a binary math operation, it's also convenient that the "1's" and "0's" can actually be used directly as 1's and 0's.]
Yes indeed. I was misinterpreting some of the things/ideas, I guess.

This kind of confusion is coming from this.

https://www.physicsforums.com/threa...of-strings-and-integers.1005423/#post-6519298

I guess I asked the wrong question, but in any case, it does not matter. I was thinking something like this.

Someone gives you a piece of paper and there writes '0b1100110'.

When you are doing an XOR operation, we take every letter as a string and turn them into binary. In that case, '0b1100110' is just the letter 'f,' But in general, it's just the number 102. So the representation of a binary might depend on how you are using it. Since when you are trying to encrypt/decrypt a text in XOR, you cannot encrypt/decrypt 102 as '0b1100110' (or ['0b1','0b0', '0b10'] from what I have learned) but you can do that for 'f.'

Similarly in an XOR type operation we need to represent '2' as '0b110010' but not as '0b10'.

So the bottom line is, depending on the given situation, you can map the binary representation to an integer or a strinh, which leads us to the OP. But I have realized that this is just a case-dependent situation, and it has nothing to do with how you store integers or strings.
 
Last edited:
  • #15
Arman777 said:
Is this means that the string 'a' and the integer 97 stored as the same binary in the memory ? If so then how can python tell the difference ? As far as I can remember in C we have to define the type of the variable before we use it.
If Python works anything like C or C++, 'a' is not a string, but rather a character constant or literal. In C and C++ there's a big difference between the character constant 'a' and the string literal "a" -- the former evaluates to the character whose ASCII code is 97 (i.e., 'a'), and the latter evaluates to the address in memory where the character is stored.

Ibix said:
Thus you can simply cast 'a' (edit: or "\0a" anyway, I think) to integer and get 97.
No, 'a' and "\0a" are very different things. As already mentioned, 'a', this is a character constant. "\0a" would be a null string, since its first character is the null byte. I don't think that the character 'a' would even be stored.
 
  • Like
Likes Delta2 and Ibix
  • #16
Mark44 said:
If Python works anything like C or C++
It doesn't, the internals of Python are very different from C or C++. @Ibix quoted the key statement above:
Ibix said:
everything in python is an object
You can see this in action by using sys.getsizeof() which returns the number of bytes in an entity's internal representation in the REPL:
Python 3.9 REPL, 64 bit system:
import sys
sys.getsizeof(1) # 28 (bytes)
sys.getsizeof('a') # 50 (bytes)
sys.getsizeof('ab') # 51 (bytes)
How is it that a single character takes 50 bytes? As well as the ASCII (actually Latin-1) value which takes 1 byte, Python also stores a pointer to the string class (8 bytes on a 64 bit system), the size of the string in it's fixed length internal representation and its length (the number of Unicode characters in the string), information about how the string is encoded (only 1 byte I think) etc. The full detail is contained in PEP 393.
 
  • Like
Likes Arman777, Ibix and jedishrfu
  • #17
Mark44 said:
"\0a" would be a null string, since its first character is the null byte. I don't think that the character 'a' would even be stored.
No, '\0' (ASCII NULL) does not act as a string terminator in Python: '\0a' has length of 2 characters.
 
  • Like
Likes Arman777 and jedishrfu
  • #18
Arman777 said:
Yes indeed. I was misinterpreting some of the things/ideas, I guess.

This kind of confusion is coming from this.

https://www.physicsforums.com/threa...of-strings-and-integers.1005423/#post-6519298

I guess I asked the wrong question, but in any case, it does not matter. I was thinking something like this.

Someone gives you a piece of paper and there writes '0b1100110'.

When you are doing an XOR operation, we take every letter as a string and turn them into binary. In that case, '0b1100110' is just the letter 'f,' But in general, it's just the number 102. So the representation of a binary might depend on how you are using it. Since when you are trying to encrypt/decrypt a text in XOR, you cannot encrypt/decrypt 102 as '0b1100110' (or ['0b1','0b0', '0b10'] from what I have learned) but you can do that for 'f.'

Similarly in an XOR type operation we need to represent '2' as '0b110010' but not as '0b10'.

So the bottom line is, depending on the given situation, you can map the binary representation to an integer or a strinh, which leads us to the OP. But I have realized that this is just a case-dependent situation, and it has nothing to do with how you store integers or strings.
Great : so, are you at the point yet where you can see that basically all you're doing is mathematically subtracting the original-message : eg: "The quick brown fox jumped over the lazy dog." from the (shared) cipher-page : eg: "Wally's World : a great place for a summer vacation" to get the encrypted-message : <garbage, mostly : don't try to print or display> ; then to unencrypt just mathematically add the cipher page to the encrypted-message to get the original message.

[edit: which makes it not an "xor encryption" feel free to ignore ; usual royalties apply, otherwise :wink: ]
 
Last edited:
  • #19
hmmm27 said:
Great : so, are you at the point yet where you can see that basically all you're doing is mathematically subtracting the original-message : eg: "The quick brown fox jumped over the lazy dog." from the (shared) cipher-page : eg: "Wally's World : a great place for a summer vacation" to get the encrypted-message : <garbage, mostly : don't try to print or display> ; then to unencrypt just add the cipher page to the encrypted-message to get the original message.
I did not understand what are you trying to mean but I am just trying to implement XOR encryption.
 
  • #20
Arman777 said:
I did not understand what are you trying to mean but I am just trying to implement XOR encryption.
Yeah ; see what happens when you let your brain wander around off leash ? I just "invented" a "new" encryption method. (I should probably check in with a crypto forum to see what it's called)

[edit : It's a Vigenère cipher... first published in 1533.]

Meanwhile, here's an online XOR en/decryptor (the explanatory paragraph looks useful) and there's how Python does XOR's.
 
Last edited:
  • Like
Likes Arman777
  • #21
Arman777 said:
As far as I know everything is turned binary in the process of calculations.
What do you mean by this?
 
  • #22
Timo said:
I don't think that bin() tells you how the Python interpreter stores the variable internally.
That's correct, it doesn't. Note that it doesn't even work on objects that aren't integers (try calling bin() on a one-character string, for example).
 
  • #23
Arman777 said:
When you are doing an XOR operation, we take every letter as a string and turn them into binary.
What does "turn them into binary" mean? (What data type do you want this "turn them into binary" operation to produce?)
 
  • #24
PeterDonis said:
What does "turn them into binary" mean? (What data type do you want this "turn them into binary" operation to produce?)

So for any given text message (it must be given as a string) and for a given key (it also must be a string), I am encrypting the message.

I will not use a binary key in my implementation since, for unbreakable XOR encryption, you need to generate a random binary with the same size as the len(message)*8. So for a ten-character password, you need to store 80 bits. Instead, I am taking the key as a string so that each string has one byte. It seems more reasonable if you want to store a key.

In this case, I am just turning any text message and key into binary arrays, then performing XOR operation, and then turning an encrypted binary message into an encrypted message, which is just a string.

Here the only problem is that sometimes the encrypted message takes really strange and I guess unprintable values

I wanted to share the code but the console is giving an error
 
  • #25
@Arman777, none of what you posted answers the questions I asked. "Binary" is a meaningless term as far as Python is concerned; it's not a data type. What data type do you want these "binary" thingies to be?
 
  • #26
they are strings
 
  • #27
Arman777 said:
I will not use a binary key in my implementation since, for unbreakable XOR encryption, you need to generate a random binary with the same size as the len(message)*8.
You're thinking of the message and key as bits, but "bits" isn't a Python data type either. What Python data type would you use for the XOR operation on these "bits"?

Arman777 said:
So for a ten-character password, you need to store 80 bits.
Which is 10 bytes (8 bits per byte).

Arman777 said:
Instead, I am taking the key as a string so that each string has one byte. It seems more reasonable if you want to store a key.
This doesn't make sense. One byte is one byte, whether you call it a "string" or something else.

What does make a difference, though, is that if you store the key as a single string, you have one string object in Python, which will take storage equal to the number of bytes in the string plus the Python overhead for one string object. Whereas if you store the key as a bunch of one-byte strings, you will need storage equal to the number of bytes in the string, plus the Python overhead for the same number of Python string objects (in your case, ten) instead of just one string object. That can be a lot more storage.

For a "fun" program this might not be an issue, but it certainly would be for any real-world program.

Also, since you can't do a Python XOR operation on strings, you will need to convert the key (and the message) to some other Python data type to do the XOR operation. What data type will that be?
 
  • #28
Arman777 said:
they are strings
You can't do a Python XOR operation on strings. So what data type will you need to convert the strings to in order to do the XOR operation?
 
  • #29
Arman777 said:
they are strings
In Python 3, the "string" data type (i.e., str) isn't bytes, it's Unicode. The Python 3 "byte string" data type is bytes.
 
  • #30
PeterDonis said:
You can't do a Python XOR operation on strings. So what data type will you need to convert the strings to in order to do the XOR operation?
yes I know that. they must be integers ofc. The implementation in my head is different then the usual XOR encryption I guess.
 
  • #32
PeterDonis said:
In Python 3, the "string" data type (i.e., str) isn't bytes, it's Unicode. The Python 3 "byte string" data type is bytes.
I meant this

Code:
>>> bin(97)
'0b1100001'
>>> type(bin(97))
<class 'str'>
>>>

I mean I am doing the XOR operation by taking these and ofc there are intermediate steps.
 
  • #33
PeterDonis said:
This doesn't make sense. One byte is one byte, whether you call it a "string" or something else.
For instance you can create a XOR encryption that can take the key as bits (a binary key) or you can create an ASCII key (see this site https://www.dcode.fr/xor-cipher)

PeterDonis said:
What does make a difference, though, is that if you store the key as a single string, you have one string object in Python, which will take storage equal to the number of bytes in the string plus the Python overhead for one string object. Whereas if you store the key as a bunch of one-byte strings, you will need storage equal to the number of bytes in the string, plus the Python overhead for the same number of Python string objects (in your case, ten) instead of just one string object. That can be a lot more storage.
In my previous post, I was trying to say that you'll need a lot of storage if you store the key as binary. But an ASCII key will be more useful in terms of storage.

If your message is "password," and if you want to make the XOR encryption "unbreakable," you need to create a random binary key that is 64 bits long.

For instace, I have generated a random binary key. So for an unbreakable message you'll need to store this.

0000110011010000111111100100101110110010010011110110110000101011

but if you convert this into ASCII (or as ASCII key), you'll need to store just

♀ÐþK²Ol+

Here, the only problem is that the binary representation of the key can also represent the unprintable ASCII characters. So when you try to convert a random binary key into ASCII key, you'll see really random characters and most of the times unprintable ones. That also applies if you want decrypt them.

So even creating a binary key is advantageous in terms of encryption (since you don't have to worry about if its printable or not), it's disadvantageous in terms of storage (ofc that is really personal but that's true for me)
 
  • #34
Arman777 said:
See this code please

https://codeshare.io/EBkJrO

Edit: I have fixed the Example in byteArray function
Wow, that is some pretty complicated code for doing something pretty simple. I think the best learning point you can take away from this is 'representing binary data as a string of 0's and 1's is a really bad idea'.

A string is a better way (and in Python we have an even better way, bytes, as @PeterDonis said), but as you have seen this can lead to unprintable characters. There are a number of ways of dealing with this: one of the most common is hexadecimal: any 8 bit value can be represented by two characters in the range [0..9,a..f].

So if you want a readable binary key and encoded message, you could display them in hex; however for internal working it makes a lot more sense to use bytes, or a string (using all values 0-255) if you must, or even a list or array of integers. Anything but a string of '1's and '0's.
 
  • Like
Likes Arman777 and Vanadium 50
  • #35
Well I am not a mind reader and no is telling me to do something like this, which I was not aware until now.

Instead of turning 67 and 45 into bytes and then doing XOR bitwise I could have just do

bin(67 ^ 45)

which gives the correct answer. This approach will definately shorten my code

You guys are giving hints but I cannot understand something that i don't know...

Hex is a good approach
 

Similar threads

Replies
10
Views
2K
Replies
4
Views
1K
Replies
5
Views
1K
Replies
6
Views
1K
Replies
10
Views
4K
Replies
10
Views
2K
Replies
23
Views
2K
Back
Top