Generating static html files for crawler

Hi All

Currently I am using a wget/perl script to generate static html pages
so that my crawler can index those html files. However `wget/perl’
script is giving my mysql a jump from a usual 1% cpu to now–when I
run the script–27% cpu.

Is there a less expensive way (RT way) to generate exact static
replica of a default ticket page–look and feel as well?

I am using a for loop' andrt show ticket/id’ to generate a list of
valid ticket numbers and the
createstatic.pl file takes those numbers as arguments and creates
static html files.

for example I am assuming–not sure how to get the latest ticket id
otherwise–my latest ticket id is 400000. So I run

for i in seq 1 400000; do rt show ticket/$i | grep -q id && echo $i
; done >> tickets

Then I run the next `for loop’ to generate the static html pages

for t in cat tickets ; do perl createstatic.pl $t >
/var/apache/htdocs/tickets/${t}.html; sleep 2; done

So now my crawler can index the static pages.

Here is my createstatic.pl attached

Thanks

Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

createstatic.pl (975 Bytes)

Hi All

Currently I am using a wget/perl script to generate static html pages
so that my crawler can index those html files. However `wget/perl’
script is giving my mysql a jump from a usual 1% cpu to now–when I
run the script–27% cpu.

Is there a less expensive way (RT way) to generate exact static
replica of a default ticket page–look and feel as well?

I am using a for loop' and rt show ticket/id’ to generate a list of
valid ticket numbers and the
createstatic.pl file takes those numbers as arguments and creates
static html files.

for example I am assuming–not sure how to get the latest ticket id
otherwise–my latest ticket id is 400000. So I run

for i in seq 1 400000; do rt show ticket/$i | grep -q id && echo $i
; done >> tickets

Then I run the next `for loop’ to generate the static html pages

for t in cat tickets ; do perl createstatic.pl $t >
/var/apache/htdocs/tickets/${t}.html; sleep 2; done

So now my crawler can index the static pages.

Here is my createstatic.pl attached

Anyone one know a better way–RT way–to slurp a ticket html page
besides using wget or curl? I collect them as html pages for my
crawler to index them

Thanks

Thanks


Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

Hi All

Currently I am using a wget/perl script to generate static html pages
so that my crawler can index those html files. However `wget/perl’
script is giving my mysql a jump from a usual 1% cpu to now–when I
run the script–27% cpu.

Is there a less expensive way (RT way) to generate exact static
replica of a default ticket page–look and feel as well?

I am using a for loop' and rt show ticket/id’ to generate a list of
valid ticket numbers and the
createstatic.pl file takes those numbers as arguments and creates
static html files.

for example I am assuming–not sure how to get the latest ticket id
otherwise–my latest ticket id is 400000. So I run

for i in seq 1 400000; do rt show ticket/$i | grep -q id && echo $i
; done >> tickets

Then I run the next `for loop’ to generate the static html pages

for t in cat tickets ; do perl createstatic.pl $t >
/var/apache/htdocs/tickets/${t}.html; sleep 2; done

So now my crawler can index the static pages.

Here is my createstatic.pl attached

Anyone one know a better way–RT way–to slurp a ticket html page
besides using wget or curl? I collect them as html pages for my
crawler to index them

In case anyone missed the previous emails I am reposting the question.

How do I generate static html page of a ticket besides using wget' or curl’ which is
pretty expensive, resource wise?

Thanks

Thanks


Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu


Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu