69 lines
3.6 KiB
Plaintext
69 lines
3.6 KiB
Plaintext
From: mbf2y at my-dejanews.com (mbf2y at my-dejanews.com)
|
|
Date: Mon, 19 Apr 1999 22:17:10 GMT
|
|
Subject: Can PyApache slow things down?!
|
|
Message-ID: <7fga0t$55l$1@nnrp1.dejanews.com>
|
|
Content-Length: 3422
|
|
X-UID: 963
|
|
|
|
Short version of question:
|
|
|
|
I have a project where I am using SGMLParser to parse a HTML document I fetch
|
|
from another site. The machine I'm using is slow - a DEC Alpha with (get
|
|
this) 20MB of Ram. (Someone must have canibalized the memory to boost
|
|
another machine at some point. The lack of RAM makes the machine crawl.)
|
|
I'm also sharing this machine with other people. Bottom line is that my
|
|
queries take 5-7 sec of wall-clock time for one query, and 3-5 seconds of
|
|
wall-clock time for the other. In an attempt to speed things up, I
|
|
downloaded and built apache with PyApache included. Added the "AddHandler"
|
|
line. I can tell that PyApache is running properly because using "top" I can
|
|
see that "httpd" is the process doing all the work, whereas previously,
|
|
"myscript.py" was doing the work. Problem: I noticed a slowdown in
|
|
wall-clock time. After much pondering, I decided that since my machine is so
|
|
low on RAM and the httpd binary nearly doubled in size (to just over a meg),
|
|
maybe I'm doing more context switches. So instead of starting 5 httpd's, I
|
|
dropped to 2.
|
|
|
|
Still, even when I'm the only user on the webserver, the queries were slower
|
|
than the "normal" way. I ended up backing out the change, reverted to the
|
|
old apache binary and kept the number of webservers at 2; this did speed
|
|
things up a touch (should have thought of that sooner.) Anyway, I'm
|
|
wondering if this is normal, and if not, what could I be doing wrong? I used
|
|
Python 1.5.2b2. However, I also tried this exact same solution with Python
|
|
1.5.2 on a machine with more RAM (128MB, but slower chip - an old Sparc 5).
|
|
The scripts ran slower with PyApache than without... this makes no sense to
|
|
me as at minimum I should be saving time by having fewer context switches...
|
|
|
|
Thanks for any help (and if you have an extra second, could you read the
|
|
P.S.?) -Fred (I don't ever check dejanews mail... if you want to e-mail me
|
|
instead of post here, my address is fred-at-cs-dot-umd-dot-edu)
|
|
|
|
|
|
P.S. I'm a graduate student working on a project analyzing search engine
|
|
usage. What my project does is it presents the user with a type-in box just
|
|
like the "real" search engines. I then take the query and pass it onto the
|
|
"real" engine that the user chose (either hotbot or altavista). I get the
|
|
page back from altavista/hotbot and then use a class derived from SGMLParser
|
|
to parse the page and extract the hits. I then present to the user the
|
|
hitlist, free from all the advertising junk present on the "real" search
|
|
engine sites. I also have it set up so that whenever the user clicks on a
|
|
URL, I write something like "CLICK-ON #47" to a file. My hope is to analyze
|
|
usage patterns and try to generate some sort of metric which can indicate a
|
|
user's satisfaction with the hitlist based on the user behavior (what
|
|
numbered hits they clicked on, etc.) I am trying to collect as many query
|
|
sessions as I can over the next 2 weeks or so. If you ever use Altavista or
|
|
Hotbot, could you please travel to http://www.cs.umd.edu/~fred/search/ and
|
|
bookmark my site? Then the next time (or few times) you have to run a search
|
|
engine query, could you use the site? If you have privacy concerns or want a
|
|
more detailed description of the research goals, answers can be found at that
|
|
site.
|
|
|
|
Thanks,
|
|
-Fred
|
|
|
|
-----------== Posted via Deja News, The Discussion Network ==----------
|
|
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
|
|
|
|
|
|
|
|
|