149 lines
4.2 KiB
Plaintext
149 lines
4.2 KiB
Plaintext
From: tim_one at email.msn.com (Tim Peters)
|
|
Date: Sat, 3 Apr 1999 05:29:16 GMT
|
|
Subject: string.join() vs % and + operators
|
|
In-Reply-To: <37054490.E79ADCC9@easystreet.com>
|
|
References: <37054490.E79ADCC9@easystreet.com>
|
|
Message-ID: <000001be7d92$ea8da080$879e2299@tim>
|
|
Content-Length: 3983
|
|
X-UID: 1268
|
|
|
|
[Al Christians]
|
|
> Whereas Python's strings are immutable, there is potentially a strong
|
|
> incentive to get them right the first time.
|
|
|
|
Not really much more than in a language with mutable strings -- if you
|
|
overwrite substrings in one of the latter, it's going to be s-l-o-w when the
|
|
length changes. OTOH, if your fields are 50's-style fixed-width <wink>, a
|
|
Python character array (array.array('c')) makes a fine mutable string.
|
|
|
|
> In my applications, I want to create strings that have many fields
|
|
> within them. I assume that it would be nice to be able to modify
|
|
> the fields without creating a new string any time its contents get
|
|
> changed, but I don't think that Python gives any nice way to do this.
|
|
|
|
Maybe see above? I'm not sure what you mean. Is there any language that
|
|
*does* give you "a nice way to do this", keeping in mind that you're worried
|
|
about efficiency too? E.g., use "substr" on the left-hand side of a Perl
|
|
string assignment, and under the covers it's going to copy the whole
|
|
thing -- even if the length doesn't change:
|
|
|
|
$a = "gold you so";
|
|
$b = $a;
|
|
substr($a, 0, 1) = "t";
|
|
print "$a\n$b\n";
|
|
|
|
prints
|
|
|
|
told you so
|
|
gold you so
|
|
|
|
You can do better than this in Python as-is, although you need a little more
|
|
typing:
|
|
|
|
import array
|
|
a = array.array('c', "gold you so")
|
|
b = a
|
|
a[0] = "t"
|
|
print a.tostring()
|
|
print b.tostring()
|
|
|
|
In return for letting you change a[0] in-place, this prints "told you so"
|
|
twice.
|
|
|
|
> So, I'm stuck with building the string from many little pieces.
|
|
|
|
Think of it instead as an opportunity to excel <wink>.
|
|
|
|
> The 'many' part of this gives me some worry about efficiency, which it
|
|
> is better not to worry about, so I did a brief test to see if there is
|
|
> ugly downside to this. I ran the following script:
|
|
|
|
[tries a long "%s%s%s..." format, string.join, and repeated catenation;
|
|
discovers the 2nd is fastest, the first 2nd-fastest, and third much slower
|
|
]
|
|
|
|
> ...
|
|
> Way 3, the way that one would expect to be bad, recreating the string
|
|
> with each concatenation, was much slower, but only took about 1 minute.
|
|
> Surprisingly swift as well.
|
|
|
|
It can be very much worse, of course -- it's a quadratic-time approach, and
|
|
you're helped here in that the final length of your string is only a few
|
|
hundred characters.
|
|
|
|
> Anybody have anything to add to this?
|
|
|
|
Maybe the array module; maybe not.
|
|
|
|
> Are there any related pitfalls that I may have missed?
|
|
|
|
Not if you stick to string.join -- it's reliably good at this. I'll attach
|
|
a rewrite of your timing harness that avoids the common Python timing
|
|
pitfalls, and adds a fourth method showing that array.tostring() blows
|
|
everything else out of the water. But then doing a length-changing slice
|
|
assignment to a character array is like doing a length-changing assignment
|
|
to a Python list: under the covers, everything "to the right" is shifted
|
|
left or right as needed to keep the array contiguous; mutability can be
|
|
expensive in under-the-cover ways.
|
|
|
|
pay-now-pay-later-or-pay-all-the-time-ly y'rs - tim
|
|
|
|
import string
|
|
N = 100
|
|
S = []
|
|
for i in range(N):
|
|
S.append(`i`)
|
|
F = "%s" * N
|
|
|
|
# for method 4 (character array)
|
|
import array
|
|
SARRAY = array.array('c')
|
|
for i in S:
|
|
SARRAY.fromstring(i)
|
|
|
|
# if time.clock has good enough resolution (it does under Windows),
|
|
# no point to looping more often than this
|
|
indices = range(10000)
|
|
|
|
def f1(s=S, f=F):
|
|
for i in indices:
|
|
z = f % tuple(s)
|
|
return z
|
|
|
|
def f2(s=S, join=string.join):
|
|
for i in indices:
|
|
z = join(s, '')
|
|
return z
|
|
|
|
def f3(s=S):
|
|
for i in indices:
|
|
z = ''
|
|
for j in s:
|
|
z = z + j
|
|
return z
|
|
|
|
def f4(s=SARRAY):
|
|
for i in indices:
|
|
z = s.tostring()
|
|
return z
|
|
|
|
def timeit(f):
|
|
from time import clock
|
|
start = clock()
|
|
result = f()
|
|
finish = clock()
|
|
print f.__name__, round(finish - start, 2)
|
|
return result
|
|
|
|
z1 = timeit(f1)
|
|
z2 = timeit(f2)
|
|
z3 = timeit(f3)
|
|
z4 = timeit(f4)
|
|
assert z1 == z2 == z3 == z4
|
|
|
|
|
|
|
|
|
|
|
|
|