wasm-demo/demo/ermis-f/python_m/cur/1268

From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 3 Apr 1999 05:29:16 GMT
Subject: string.join() vs % and + operators
In-Reply-To: <37054490.E79ADCC9@easystreet.com>
References: <37054490.E79ADCC9@easystreet.com>
Message-ID: <000001be7d92$ea8da080$879e2299@tim>
Content-Length: 3983
X-UID: 1268

[Al Christians]
> Whereas Python's strings are immutable, there is potentially a strong
> incentive to get them right the first time.

Not really much more than in a language with mutable strings -- if you
overwrite substrings in one of the latter, it's going to be s-l-o-w when the
length changes.  OTOH, if your fields are 50's-style fixed-width <wink>, a
Python character array (array.array('c')) makes a fine mutable string.

> In my applications, I want to create strings that have many fields
> within them.  I assume that it would be nice to be able to modify
> the fields without creating a new string any time its contents get
> changed, but I don't think that Python gives any nice way to do this.

Maybe see above?  I'm not sure what you mean.  Is there any language that
*does* give you "a nice way to do this", keeping in mind that you're worried
about efficiency too?  E.g., use "substr" on the left-hand side of a Perl
string assignment, and under the covers it's going to copy the whole
thing -- even if the length doesn't change:

    $a = "gold you so";
    $b = $a;
    substr($a, 0, 1) = "t";
    print "$a\n$b\n";

prints

    told you so
    gold you so

You can do better than this in Python as-is, although you need a little more
typing:

    import array
    a = array.array('c', "gold you so")
    b = a
    a[0] = "t"
    print a.tostring()
    print b.tostring()

In return for letting you change a[0] in-place, this prints "told you so"
twice.

> So, I'm stuck with building the string from many little pieces.

Think of it instead as an opportunity to excel <wink>.

> The 'many' part of this gives me some worry about efficiency, which it
> is better not to worry about, so I did a  brief test to see if there is
> ugly downside to this.  I ran the following script:

[tries a long "%s%s%s..." format, string.join, and repeated catenation;
 discovers the 2nd is fastest, the first 2nd-fastest, and third much slower
]

> ...
> Way 3, the way that one would expect to be bad, recreating the string
> with each concatenation, was much slower, but only took about 1 minute.
> Surprisingly swift as well.

It can be very much worse, of course -- it's a quadratic-time approach, and
you're helped here in that the final length of your string is only a few
hundred characters.

> Anybody have anything to add to this?

Maybe the array module; maybe not.

> Are there any related pitfalls that I may have missed?

Not if you stick to string.join -- it's reliably good at this.  I'll attach
a rewrite of your timing harness that avoids the common Python timing
pitfalls, and adds a fourth method showing that array.tostring() blows
everything else out of the water.  But then doing a length-changing slice
assignment to a character array is like doing a length-changing assignment
to a Python list:  under the covers, everything "to the right" is shifted
left or right as needed to keep the array contiguous; mutability can be
expensive in under-the-cover ways.

pay-now-pay-later-or-pay-all-the-time-ly y'rs  - tim

import string
N = 100
S = []
for i in range(N):
    S.append(`i`)
F = "%s" * N

# for method 4 (character array)
import array
SARRAY = array.array('c')
for i in S:
    SARRAY.fromstring(i)

# if time.clock has good enough resolution (it does under Windows),
# no point to looping more often than this
indices = range(10000)

def f1(s=S, f=F):
    for i in indices:
        z = f % tuple(s)
    return z

def f2(s=S, join=string.join):
    for i in indices:
        z = join(s, '')
    return z

def f3(s=S):
    for i in indices:
        z = ''
        for j in s:
            z = z + j
    return z

def f4(s=SARRAY):
    for i in indices:
        z = s.tostring()
    return z

def timeit(f):
    from time import clock
    start = clock()
    result = f()
    finish = clock()
    print f.__name__, round(finish - start, 2)
    return result

z1 = timeit(f1)
z2 = timeit(f2)
z3 = timeit(f3)
z4 = timeit(f4)
assert z1 == z2 == z3 == z4