Python Strings: Revising & Altering for AI Programming

In summary: Python. In summary, strings are immutable, which makes it difficult to change a string's name without creating a new string. However, there are methods that allow the contents of the original string to be modified and placed in a new string. Additionally, Python is oversold, and there are several things regarding strings that are tedious to do.
  • #1
BiGyElLoWhAt
Gold Member
1,623
131
I'm revisiting python, because recently our physics department decided to adopt it as our computational language, and I missed the class. I've also been given the impression that it's a very good language for algorithms and what-not.

According to https://docs.python.org/3/tutorial/introduction.html
Python strings cannot be changed — they are https://www.physicsforums.com/glossary.html#term-immutable . Therefore, assigning to an indexed position in the string results in an error:

This seems so useless!

I'm really wanting to try to screw around with some simple AI, via writing a program that rewrites itself with new variables generated via input (for later reference).

I'm anticipating the need to alter strings based on new input. Any work arounds? I went through the string methods library and couldn't find anything I thought would be of use:
https://docs.python.org/3/library/stdtypes.html#string-methods

Perhaps a way to generate a new string, copy the contents over, delete the old, copy them back to a string with the same name as the original but with the necessary alterations.

Would I be better off just going with something along the lines of string_2 = string_1[:5] + 'new information' +string[5:]
?

Also, I haven't figured out exactly how I'm going to implement a lot of these things, so if you have other idea's, feel free to toss them my way :smile:

For right now the goal is to come up with a program that takes an input as a series of tokens, parses them into a 'command' object (or array, I don't know if python actually has objects), then executes the command which "alters" the program itself.

Thanks!
 
Last edited by a moderator:
Technology news on Phys.org
  • #2
There are methods that allow the contents of the original string to be modified and placed in a new string. str.replace is one. Those methods should let you do whatever you need to do.
 
  • #3
BiGyElLoWhAt said:
I'm revisiting python, because recently our physics department decided to adopt it as our computational language, and I missed the class. I've also been given the impression that it's a very good language for algorithms and what-not.

According to https://docs.python.org/3/tutorial/introduction.html
Python strings cannot be changed — they are https://www.physicsforums.com/glossary.html#term-immutable . Therefore, assigning to an indexed position in the string results in an error:

This seems so useless!

The section immediately after that line says:
If you need a different string, you should create a new one:
Code:
>>> 'J' + word[1:]
'Jython'
>>> word[:2] + 'py'
'Pypy'

Possibly enlightening...
http://stackoverflow.com/questions/...rings-immutable-best-practices-for-using-them
 
Last edited by a moderator:
  • #4
BiGyElLoWhAt said:
This seems so useless!
To the contrary!

There's a lot, a whole lot, to be said in favor of immutability. You specifically mentioned AI in the opening post. Lisp, prolog, haskell, ocaml, and erlang are very widely used languages in the AI community. One of the things these languages share in common is a very strong concept of immutable objects. Another thing they share in common is the concept of functional programming.

Python, C#, and java (and now c++) just touch on immutability and functional programming. None of these are particularly powerful AI languages. Immutability and functional programming go hand-in-hand, and both are central concepts in powerful AI languages.
 
  • #5
@robphy I know what the next line said. I mentioned making a new variable in the opening post.

@D H I might have used AI a bit loosely. This is a very recent interest of mine, and what I have in mind for this program is more about mutating itself. I can see one argument for immutability, and that's the fact that some information should be immutable, but what I'm curious about is does it need to be immutable? Or could the program solve a recursive algorithm to decide what should be mutable or not (using the state/"experience" variables defined in the program to determine which variables to modify, keep and discard based on new input)

Something maybe medium term that I'd be interested in doing is (keeping along the lines of the op, programs writing programs) pattern recognition, but with the program "learning" about the problem solving process by referencing the thinking skills acquired through previous successful recognitions.

When the pattern recognizing function gets called, the program would reference its algorithm bank, which would be regularly updated with new algorithm from the patterns. The thinking skills I'm talking about would essentially (in this basic program) be patterns between patterns such as the relativity of the functions y=x and y=2x .

I think I have some relatively descent pseudocode (it seems great in my head haha), but there are some details I still need to work out (mostly specifics of the functions and how to implement them)
 
  • #6
BiGyElLoWhAt said:
@robphy I know what the next line said. I mentioned making a new variable in the opening post.
Ah. Sorry. I missed that. So your complaint is about the hassle of changing a string while keeping the string name unchanged. That may be true. I believe that Python is currently oversold. There are several things regarding strings that are tedious to do in Python.
 
  • #7
Thats what I'm gathering unfortunately. But c and java aren't good enough according to the physics department, we need python. I figured messing with something like this would be a fun way to relearn this from high school.
 
  • #8
I don't understand the problem here with immutable strings. Let's say you did something like
Code:
a = "hello"
b = "world"
a = a + b
print a

You'll see "helloworld"

Now if you have something that was like this
Code:
1. a = "hello"
2. b = a
3. a = "world"
4. print b
You'll see "hello" because 'b' was given a reference to the string 'a' had a reference to at line 2, and at line 3, 'a' was given a reference to a different string.

At no point are the original strings modified (at least that is the conceptual model, various optimizations could be happening under the covers), but you get the effect you want.

Note that I do most of my programming in 'C', and almost every string operation requires me to explicitly determine the length of the final string, allocate that memory (handling errors), and then copy character by character the input strings/characters into the space I just allocated [all of which is a huge potential source of random crashes and security vulnerabilities]. While there are benefits to how C does it (because it let's you control so much of the process you can really optimize building a large string composed from many difference parameters), most of the time you just want what Python does.
 
  • #9
FactChecker said:
So your complaint is about the hassle of changing a string while keeping the string name unchanged. That may be true. I believe that Python is currently oversold. There are several things regarding strings that are tedious to do in Python.
BiGyElLoWhAt said:
Thats what I'm gathering unfortunately.

This is not correct.

It is the string that is immutable, not the variable. What about the variable to which the string is assigned? It's variable. In fact, it's very variable. You can assign a string to a python variable, then a tuple, then a list, then an integer, and finally a dict. Demo:
Code:
python
>>> a = "hello"
>>> type(a)
<type 'str'>
>>> a = (1,2)
>>> type(a)
<type 'tuple'>
>>> a = [1,2]
>>> type(a)
<type 'list'>
>>> a = 1
>>> type(a)
<type 'int'>
>>> a = {'this':'that','foo':'bar'}
>>> type(a)
<type 'dict'>
 
  • Like
Likes FactChecker
  • #10
That's good to know.
 
  • #11
Hmm... Assuming you ripped that from idle, I guess I'm confused as to what an immutable string is, then. What exactly is immutable? The data in the stack?

So if I define a string and assign a variable to it, say a = "hello", there are some bytes with a location and a binary value equivalent to hello. Now defining 'a' to a new value, a = "world", there are now some bytes with a new location and a binary value equivalent to world. Hello no longer has a reference, and therefore is inaccessable due to the fact that the program no longer has a reference to those bytes. Am I understanding string immutability correctly? If so, how is this useful? Hello is just occupying space in the RAM, and for a large program, that seems like a bad idea.

This idea reminds me of the JASS language (for Warcraft III world editor); when you loop commands, you get these things that are called memory leaks, and for long duration games (~large programs for python) you have to bypass the GUI and hard code the removal of these previous commands, otherwise the game will eventually use all your available memory and will cause a crash (either the program or the computer, I'm not really sure, I've never made that large of a map). Is this also an example of pseudo immutability? All it does in JASS is cause problems that you have to work around. I think they do it because in the GUI there are commands you can issue that reference previous commands (You have events, conditions, and actions, and conditions can be things such as "a unit = triggering unit"). Now this would be useful for a period of time, but after a while, i.e. after my program can no longer reference the triggereing unit comparison and get the same unit, that information should be dropped.

Wouldn't it make sense to drop this string "hello" once the pointer's direction is relocated? What if I did something like this:
a="hello"
print a
a = "world"
print a
a= "hello"
print a

the output should be
hello
world
hello

but the second hello, will it have the same location as the first? If not, what methods are available to handle this?
 
  • #12
The Python system automatically handles this and there won't be any memory leaks.

Some of the benefits of 'immutability' (not just for strings) is: functions can't modify the data itself (no side effects), data is thread safe, easier to cache, and it simplifies everything for the for the language user.
 
  • #13
If you want a mutable array, do this:

import array
myarray=array.array('c', "hello")

If you later need a string with the same data as in the array, do this:
mystring=myarray.tostring()
 
  • Like
Likes BiGyElLoWhAt
  • #14
BiGyElLoWhAt said:
Hmm... Assuming you ripped that from idle, I guess I'm confused as to what an immutable string is, then. What exactly is immutable? The data in the stack?

Forget about words like stack and heap. Python abstracts all that away, and you simply don't need to know. The interpreter does a lot of 'magic' and it's not at all close to the metal. Unless you count the VM itself, cPython doesn't even have a stack. Everything you create is an object (even integers) and is stored on the private heap.

Immutable means the value cannot be changed. To change it, you assign a new one with the value you require.

BiGyElLoWhAt said:
So if I define a string and assign a variable to it, say a = "hello", there are some bytes with a location and a binary value equivalent to hello. Now defining 'a' to a new value, a = "world", there are now some bytes with a new location and a binary value equivalent to world. Hello no longer has a reference, and therefore is inaccessable due to the fact that the program no longer has a reference to those bytes. Am I understanding string immutability correctly? If so, how is this useful? Hello is just occupying space in the RAM, and for a large program, that seems like a bad idea.

This is nothing to do with immutability, which is simply that the value can't be changed.

Don't worry about cleaning up no-longer used variables. The garbage collector will pick them up so it won't leak. Python will automatically free them once it sees they are no-longer used. You only really need to take care to close things such as file handles, which are much more critical to return (they are fewer in number, and an opened file is locked and inaccessible to other programs). Python has the 'with' context manager for dealing with resources such as file handles.

Only worry about RAM if you start to run out (it's possible, the garbage collector isn't perfect).

BiGyElLoWhAt said:
Wouldn't it make sense to drop this string "hello" once the pointer's direction is relocated? What if I did something like this:
a="hello"
print a
a = "world"
print a
a= "hello"
print a

the output should be
hello
world
hello

It is. You know you could try that in the intepreter, pretty trivially :-)

If you want to fully understand how assignment and variables work in python, see here: http://nedbatchelder.com/text/names.html

Be aware that it's not a straight value or reference like it is in C.

Idiomatically the recommendation for programming in Python is to always assign, and never mutate, unless the cost of assignment is too high. The reason being some things are, and some thing aren't, mutable, and you can't always keep track of which is which due to duck typing. Therefore always assign, and your code will do what you think it does. I've seen too many bugs relating to mutating and returning from a function so I second the idiomatic recommendation.

If you care about performance and memory usage, use Fortran for your scientific code. If you want to get the job done fast and go home early, use Python. Python does have a kickass set of mathematical and scientific libraries too (scipi, numpy, pandas etc).
 
Last edited:
  • Like
Likes D H
  • #15
Carno Raar said:
If you care about performance and memory usage, use Fortran for your scientific code. If you want to get the job done fast and go home early, use Python.
Python (and Matlab and perl and lua and ruby and ...) let you think and work at a higher level. There's a cost to be paid; scripting languages are not exactly fast. That what you wrote is a bit bloated memory-wise, a bit sluggish time-wise: Sometimes that's no big deal. You got the job done, it works. Time to move on to the next problem.

Even in the cases where those performance and memory penalties are too much to bear, it oftentimes still pays off to start in python or some other scripting language. My first guess at a solution is typically completely wrong. I'll throw it out and start over again, and rinse and repeat until I get it right. If I had started off in a compiled language from the start, I would still be working on bad_idea_number_1 in the time it took me to write bad_idea_number_1.py, bad_idea_number_2.py, ..., not_so_bad.py, getting_close.py, nailed_it.py. Now all that's left is translating that slow and bloated nailed_it.py to nailed_it.cpp or nailed_it.f.
BiGyElLoWhAt said:
Wouldn't it make sense to drop this string "hello" once the pointer's direction is relocated? What if I did something like this:
Code:
a="hello"
print a
a = "world"
print a
a= "hello"
print a

the output should be
Code:
hello
world
hello

but the second hello, will it have the same location as the first? If not, what methods are available to handle this?

That is indeed the output. Try it. The answer to your question about whether the address of the second hello will be the same as the first is "maybe", but you shouldn't care.

Demo:
Code:
python
>>> a = "hello"
>>> id(a)
4412112304
>>> a = "world"
>>> id(a)
4412112400
>>> a = "hello"
>>> id(a)
4412112304
In this case, it's the same.

Code:
python
>>> a = "hello"
>>> a
'hello'
>>> id(a)
4541693360
>>> a = "world"
>>> a
'world'
>>> id(a)
4541693600
>>> a = "hello"
>>> a
'hello'
>>> id(a)
4541693696
And now it's different.

Perhaps the garbage collector kicked in somewhere between a="world" and the second a="hello" and collected the first "hello". Perhaps something else happened. It really doesn't matter which.
 
  • #16
Yea that wasnt a typical example i was worried about, just a simplified example of what i was trying to understand. Ill check out thet link here soon @Carno Raar
 

FAQ: Python Strings: Revising & Altering for AI Programming

What is a string in Python?

A string in Python is a sequence of characters, enclosed in either single quotes ('') or double quotes (""). It is a data type that is used to store and manipulate text, such as words, sentences, or even numbers.

How can I access individual characters in a string?

To access individual characters in a string, you can use indexing. Each character in a string has a specific index, starting from 0. So, to access the first character in a string, you would use index 0, and to access the last character, you would use index -1. For example, if the string is "hello", the first character 'h' would have an index of 0, and the last character 'o' would have an index of -1.

Can I change or modify a string in Python?

Yes, you can change or modify a string in Python. However, strings are immutable data types, which means that they cannot be modified in-place. So, any changes or modifications made to a string will result in a new string being created. Some ways to modify a string in Python include using string methods, slicing, and string concatenation.

What is string concatenation?

String concatenation is the process of combining two or more strings together to create a new string. In Python, you can use the plus (+) operator to concatenate strings. For example, if you have two strings "Hello" and "World", concatenating them would result in a new string "HelloWorld".

How can I convert a string to uppercase or lowercase?

To convert a string to uppercase or lowercase in Python, you can use the string methods upper() and lower(), respectively. These methods will return a new string with all the characters converted to uppercase or lowercase, while leaving the original string unchanged.

Back
Top