Generating static html files for crawler

Asif_Iqbal2 · January 31, 2007, 11:25pm

Hi All

Currently I am using a wget/perl script to generate static html pages
so that my crawler can index those html files. However `wget/perl’
script is giving my mysql a jump from a usual 1% cpu to now–when I
run the script–27% cpu.

Is there a less expensive way (RT way) to generate exact static
replica of a default ticket page–look and feel as well?

I am using a for loop' and rt show ticket/id’ to generate a list of
valid ticket numbers and the
createstatic.pl file takes those numbers as arguments and creates
static html files.

for example I am assuming–not sure how to get the latest ticket id
otherwise–my latest ticket id is 400000. So I run

for i in seq 1 400000; do rt show ticket/$i | grep -q id && echo $i
; done >> tickets

Then I run the next `for loop’ to generate the static html pages

for t in cat tickets ; do perl createstatic.pl $t >
/var/apache/htdocs/tickets/${t}.html; sleep 2; done

So now my crawler can index the static pages.

Here is my createstatic.pl attached

Thanks

Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

createstatic.pl (975 Bytes)

Asif_Iqbal2 · February 5, 2007, 5:13pm

Hi All

Currently I am using a wget/perl script to generate static html pages
so that my crawler can index those html files. However `wget/perl’
script is giving my mysql a jump from a usual 1% cpu to now–when I
run the script–27% cpu.

Is there a less expensive way (RT way) to generate exact static
replica of a default ticket page–look and feel as well?

I am using a for loop' and rt show ticket/id’ to generate a list of
valid ticket numbers and the
createstatic.pl file takes those numbers as arguments and creates
static html files.

for example I am assuming–not sure how to get the latest ticket id
otherwise–my latest ticket id is 400000. So I run

for i in seq 1 400000; do rt show ticket/$i | grep -q id && echo $i
; done >> tickets

Then I run the next `for loop’ to generate the static html pages

for t in cat tickets ; do perl createstatic.pl $t >
/var/apache/htdocs/tickets/${t}.html; sleep 2; done

So now my crawler can index the static pages.

Here is my createstatic.pl attached

Anyone one know a better way–RT way–to slurp a ticket html page
besides using wget or curl? I collect them as html pages for my
crawler to index them

Thanks

Thanks

–
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

Asif_Iqbal2 · February 12, 2007, 2:59pm

Hi All

Currently I am using a wget/perl script to generate static html pages
so that my crawler can index those html files. However `wget/perl’
script is giving my mysql a jump from a usual 1% cpu to now–when I
run the script–27% cpu.

Is there a less expensive way (RT way) to generate exact static
replica of a default ticket page–look and feel as well?

I am using a for loop' and rt show ticket/id’ to generate a list of
valid ticket numbers and the
createstatic.pl file takes those numbers as arguments and creates
static html files.

for example I am assuming–not sure how to get the latest ticket id
otherwise–my latest ticket id is 400000. So I run

for i in seq 1 400000; do rt show ticket/$i | grep -q id && echo $i
; done >> tickets

Then I run the next `for loop’ to generate the static html pages

for t in cat tickets ; do perl createstatic.pl $t >
/var/apache/htdocs/tickets/${t}.html; sleep 2; done

So now my crawler can index the static pages.

Here is my createstatic.pl attached

Anyone one know a better way–RT way–to slurp a ticket html page
besides using wget or curl? I collect them as html pages for my
crawler to index them

In case anyone missed the previous emails I am reposting the question.

How do I generate static html page of a ticket besides using wget' or curl’ which is
pretty expensive, resource wise?

Thanks

Thanks

–
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

–
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu