wasm-demo/demo/ermis-f/python_m/cur/1268

149 lines
4.2 KiB
Plaintext

From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 3 Apr 1999 05:29:16 GMT
Subject: string.join() vs % and + operators
In-Reply-To: <37054490.E79ADCC9@easystreet.com>
References: <37054490.E79ADCC9@easystreet.com>
Message-ID: <000001be7d92$ea8da080$879e2299@tim>
Content-Length: 3983
X-UID: 1268
[Al Christians]
> Whereas Python's strings are immutable, there is potentially a strong
> incentive to get them right the first time.
Not really much more than in a language with mutable strings -- if you
overwrite substrings in one of the latter, it's going to be s-l-o-w when the
length changes. OTOH, if your fields are 50's-style fixed-width <wink>, a
Python character array (array.array('c')) makes a fine mutable string.
> In my applications, I want to create strings that have many fields
> within them. I assume that it would be nice to be able to modify
> the fields without creating a new string any time its contents get
> changed, but I don't think that Python gives any nice way to do this.
Maybe see above? I'm not sure what you mean. Is there any language that
*does* give you "a nice way to do this", keeping in mind that you're worried
about efficiency too? E.g., use "substr" on the left-hand side of a Perl
string assignment, and under the covers it's going to copy the whole
thing -- even if the length doesn't change:
$a = "gold you so";
$b = $a;
substr($a, 0, 1) = "t";
print "$a\n$b\n";
prints
told you so
gold you so
You can do better than this in Python as-is, although you need a little more
typing:
import array
a = array.array('c', "gold you so")
b = a
a[0] = "t"
print a.tostring()
print b.tostring()
In return for letting you change a[0] in-place, this prints "told you so"
twice.
> So, I'm stuck with building the string from many little pieces.
Think of it instead as an opportunity to excel <wink>.
> The 'many' part of this gives me some worry about efficiency, which it
> is better not to worry about, so I did a brief test to see if there is
> ugly downside to this. I ran the following script:
[tries a long "%s%s%s..." format, string.join, and repeated catenation;
discovers the 2nd is fastest, the first 2nd-fastest, and third much slower
]
> ...
> Way 3, the way that one would expect to be bad, recreating the string
> with each concatenation, was much slower, but only took about 1 minute.
> Surprisingly swift as well.
It can be very much worse, of course -- it's a quadratic-time approach, and
you're helped here in that the final length of your string is only a few
hundred characters.
> Anybody have anything to add to this?
Maybe the array module; maybe not.
> Are there any related pitfalls that I may have missed?
Not if you stick to string.join -- it's reliably good at this. I'll attach
a rewrite of your timing harness that avoids the common Python timing
pitfalls, and adds a fourth method showing that array.tostring() blows
everything else out of the water. But then doing a length-changing slice
assignment to a character array is like doing a length-changing assignment
to a Python list: under the covers, everything "to the right" is shifted
left or right as needed to keep the array contiguous; mutability can be
expensive in under-the-cover ways.
pay-now-pay-later-or-pay-all-the-time-ly y'rs - tim
import string
N = 100
S = []
for i in range(N):
S.append(`i`)
F = "%s" * N
# for method 4 (character array)
import array
SARRAY = array.array('c')
for i in S:
SARRAY.fromstring(i)
# if time.clock has good enough resolution (it does under Windows),
# no point to looping more often than this
indices = range(10000)
def f1(s=S, f=F):
for i in indices:
z = f % tuple(s)
return z
def f2(s=S, join=string.join):
for i in indices:
z = join(s, '')
return z
def f3(s=S):
for i in indices:
z = ''
for j in s:
z = z + j
return z
def f4(s=SARRAY):
for i in indices:
z = s.tostring()
return z
def timeit(f):
from time import clock
start = clock()
result = f()
finish = clock()
print f.__name__, round(finish - start, 2)
return result
z1 = timeit(f1)
z2 = timeit(f2)
z3 = timeit(f3)
z4 = timeit(f4)
assert z1 == z2 == z3 == z4