32 lines
1.1 KiB
Plaintext
32 lines
1.1 KiB
Plaintext
From: moshez at math.huji.ac.il (Moshe Zadka)
|
|
Date: Fri, 16 Apr 1999 15:40:41 +0300
|
|
Subject: Word Counting -- A Novell Approach
|
|
Message-ID: <Pine.SUN.3.95-heb-2.07.990416153530.2115A-100000@sunset.ma.huji.ac.il>
|
|
X-UID: 866
|
|
|
|
There was a thread here about word counting, when reading in arbitary
|
|
chunks, instead of line-by-line.
|
|
|
|
I have a friend who continually reminds me "In Rome, do as the Romans", so
|
|
it seems to me the right way is to count with an object you 'feed()' data
|
|
into, like other non-line-based Python parsers (XML, HTML, etc.).
|
|
|
|
So I wrote a small word counting class, whose interface is:
|
|
* feed: Feed some data into the counter.
|
|
* flush: Force a word break. The next feed will force new words.
|
|
This is useful, for example, when counting words in multiple
|
|
files, to make sure words are not concatenated across files.
|
|
* items: Will return a list of (word, count) pairs.
|
|
|
|
(This is an excerpt from the documentation)
|
|
|
|
I will happily mail this class to anyone who wants.
|
|
--
|
|
Moshe Zadka <mzadka at geocities.com>.
|
|
QOTD: What fun to me! I'm not signing permanent.
|
|
|
|
|
|
|
|
|
|
|