Fulltext Indexing

Hi RT developers,

first, thanks for the Fulltext Indexing improvements in RT 4.2.11.
My first time index creation drops from estimated 13 hours (I killed the
indexing after 2,5 hours and calculated the estimated time) to 35 minutes.

By playing around fullext indexing, I noticed that the EmailRecord and
CommentEmailRecord transaction attachments are also indexed. These
contains quite redundant informations as this attachments consist of the
content (Create, Correspond or Comment) and the template text.
For example in the default RT configuration with queue AdminCcs, a
ticket create results in a Create transaction and two EmailRecord
transactions (one for the Requestor autoreply and one for the queue
AdminCcs). So the valuable information in the create attachments is
indexed three times.

Attached a patch which excludes attachments from EmailRecord and
CommentEmailRecord transactions and the indexing results (using MySQL
5.5, AttachmentsIndex is a MyISAM table, 1333901 text/plain and
text/html attachments):

RT 4.2.11:
time /opt/rt4/sbin/rt-setup-fulltext-index --index-type mysql --table
AttachmentsIndex
36m39.340s

mysql -e 'select count(*) from rt4.AttachmentsIndex’
1333901

du -h /var/lib/mysql/rt4/AttachmentsIndex*
12K /var/lib/mysql/rt4/AttachmentsIndex.frm
1.3G /var/lib/mysql/rt4/AttachmentsIndex.MYD
653M /var/lib/mysql/rt4/AttachmentsIndex.MYI

RT 4.2.11 with the patch applied:
time /opt/rt4/sbin/rt-setup-fulltext-index --index-type mysql --table
AttachmentsIndex
26m43.715s

mysql -e 'select count(*) from rt4.AttachmentsIndex’
867423

du -h /var/lib/mysql/rt4/AttachmentsIndex*
12K /var/lib/mysql/rt4/AttachmentsIndex.frm
877M /var/lib/mysql/rt4/AttachmentsIndex.MYD
399M /var/lib/mysql/rt4/AttachmentsIndex.MYI

I think the results are worth to consider integrating my patch.

Thanks.
Chris

rt-fulltext-indexer.patch (816 Bytes)