PUDICA


Introduction

Pudica is a perl script for analysing Apache web logs, specifically. I wrote it in the first place since most logs will parse out information like which files are being access, but that is not the same as how many people have visited. So, here's an analyser that tries to do that. It's really not optimised for speed.

Downloads

pudica-0.2.2.tar.gz Version 0.2.2
Much the same as version 0.2.1, but this has one or two experimental features in (search engine search terms and page refereneces), and the odd bugfix.

pudica-0.2.1.tar.gz Version 0.2.1 This has options for the output format, can output user agents and referrers stats and understands combined log and agent log formats. Still a bit basic and I think theres better out there (although perhaps less portable)

This is a perl script, so oddly enough would need perl, which you can get from www.perl.com

Comments should be directed at pudica@mrsneeze.com.

If you're interested in other programs for analysing web logs, see my http logging page.

Methodology

I work on the theory that if the same IP visits its the same person, unless its 24 hours since their first look at the page, and so an IP can only clock up a single count on any "independant" count.

Firstly I count the total number of requests of any sort, and requests by an independant IP for any site in the log. Also counts up the errors found and the number of those that are 404 (file not found) errors... :)

I work on several scales, and since this is built for me they're the ones I find useful in the order I find useful.

Independant requests by directory.
This is the number of independant IPs that have looked at a file in each directory. This is possibly the most real definition of "hits" (in my opinion) and will increase the count whenever any IP makes any request for any file in the directory.

Independant File requests by directory / Independant File requests.
These are as above, but only count when each file is accessed by an independant IP. "by directory" bunches all the files into the directory for ease of use.

Directory requests.
Shows how many times a file from the given directory has been accessed.

File requests.
Number of times the file has been accessed.

External referrers
Referring pages external to the page.

Internal referrers
Referring pages within the page itself.

User Agents
User agents used by people.

Basic User Agents
User agents used by people, minus version information.

Note. Ive not written this for quality of code nor optimised for speed. Its not designed to be read or used by anyone but me, but I think its worth the effort to make it avaliable, so others like me dont need to write their own if they need it. And if you want to change it and speed it up feel free... if you mail me the changes I'll probably include them...

Name

For those that are interested, Pudica got its name from mimosa pudica which is the latin name for "sensitive plant". And I've got a dozen or so in this room, so its named after them. It's also latin for "modest" and "chaste" which has no relevance at all. The name has no relevance (intended anyway) to the script.

License

Its under the GPL - see www.gnu.org for details, and the license text is here.

Other

Pudica is the production of Nick Mann, who has some other pages here and other programs might be found here if he's written anything else.

Last update 21 April 2002