Searches on attachments.content using Oracle

Hi All,

We’re using Oracle9i and I noticed that SearchBuilder is not optimised
to use Oracle Context searches when searching for a specific keyword in
the content of a email/attachment. The current situation where the
select looks something like: select * from attachments where content
like ‘sendmail%’ does work but it is taking ages to come up with a
result.
I have added the following index to RT3.2.1:
CREATE INDEX cnt ON attachments(content) INDEXTYPE IS ctxsys.context;
Beware that if your attachments table is rather large that it is going
to take a while for the index to build. Ours is around 300MB.
Also have a look at the Oracle documentation to see where you can cut
down on creation and maintanance time of this index.
http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96518/csql.htm#19446
http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96518/toc.htm
Also needed as I said is regular updating of the index. I use a Job for
that, SQL follows and adjust to your needs, jobs runs at 06:00AM every
day.:

DECLARE
X NUMBER;
BEGIN
SYS.DBMS_JOB.SUBMIT
( job => X
,what => ‘DECLARE
stmt VARCHAR2(200);
BEGIN
stmt := ‘‘ALTER INDEX cnt REBUILD ONLINE PARAMETERS(’’ || ‘’’‘’‘sync MEMORY 40M’‘’‘’’ || ‘’)‘’;
EXECUTE IMMEDIATE stmt;
END;

,next_date => TO_DATE(‘22-08-2004 06:00:00’,‘dd/mm/yyyy hh24:mi:ss’)
,INTERVAL => ‘TRUNC(SYSDATE+1)+6/24’
,no_parse => TRUE
);
END;
/

And most importantly the modification to Searchbuilder.pm (1.01)

— SearchBuilder.pm.orig 2004-06-28 04:39:00.000000000 +0200
+++ SearchBuilder.pm 2004-08-04 13:14:00.000000000 +0200
@@ -792,9 +792,20 @@
}

 }
  • my $clause = “($QualifiedField $args{‘OPERATOR’} $args{‘VALUE’})”;
  • my $clause;

  • if ($QualifiedField eq ‘lower(Attachments_2.Content)’ and $args{‘OPERATOR’} eq ‘LIKE’) {

  • $clause = ‘(contains(Attachments_2.Content,’ . $args{‘VALUE’} . ‘) >0)’;

  •    }
    
  • elsif ($QualifiedField eq ‘lower(Attachments_2.Content)’ and $args{‘OPERATOR’} eq ‘NOT LIKE’) {

  • $clause = ‘(not contains(Attachments_2.Content,’ . $args{‘VALUE’} . ‘) >0)’;

  •    }
    
  • else {

  •    $clause = "($QualifiedField $args{'OPERATOR'} $args{'VALUE'})"; 
    
  • }
    +#my $clause = “($QualifiedField $args{‘OPERATOR’} $args{‘VALUE’})”;

    Juju because this should come AFTER the EA

    my $prefix = “”;
    if ( $self->{_open_parens}{$Clause} ) {

To put things into perspective:
select * from attachments where contains(content, ‘sendmail’)>0
12 rows in ~150msec, that is cached by the database, first time ~25sec.

select * from attachments where content like ‘%sendmail%’
12 rows in 2:23 first time, ~40 sec every next invocation.

That is just testing this query but it pushes so much data into Oracle
caches that it invalidates a lot of indices causing problems for other
queries. Example: after running the … like ‘%sendmail%’ the index
caches are almost clear because running my version will now take
~500msec for first invocation and ~150msec on subsequent invocations.
YMMV.

Anyone using Oracle is invited to have a go at it using a test
installation and comment/update this so that once it is stable it can be
incorporated into SearchBuilder and RT.

Thanks Jesse for RT3.2.1. I really, really like it.

Joop

Joop van de Wege JoopvandeWege@mococo.nl

Oh wow. Thanks! This is great!
I’m only in the netherlands for the next 12 hours. if I were here for
longer I’d show up to buy you a beer.

JesseOn Aug 22, 2004, at 3:33 PM, Joop van de Wege wrote:

Hi All,

We’re using Oracle9i and I noticed that SearchBuilder is not optimised
to use Oracle Context searches when searching for a specific keyword in
the content of a email/attachment. The current situation where the
select looks something like: select * from attachments where content
like ‘sendmail%’ does work but it is taking ages to come up with a
result.
I have added the following index to RT3.2.1:
CREATE INDEX cnt ON attachments(content) INDEXTYPE IS ctxsys.context;
Beware that if your attachments table is rather large that it is going
to take a while for the index to build. Ours is around 300MB.
Also have a look at the Oracle documentation to see where you can cut
down on creation and maintanance time of this index.
http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96518/
csql.htm#19446
http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96518/
toc.htm
Also needed as I said is regular updating of the index. I use a Job for
that, SQL follows and adjust to your needs, jobs runs at 06:00AM every
day.:

DECLARE
X NUMBER;
BEGIN
SYS.DBMS_JOB.SUBMIT
( job => X
,what => ‘DECLARE
stmt VARCHAR2(200);
BEGIN
stmt := ‘‘ALTER INDEX cnt REBUILD ONLINE PARAMETERS(’’ ||
‘’’‘’‘sync MEMORY 40M’‘’‘’’ || ‘’)‘’;
EXECUTE IMMEDIATE stmt;
END;

,next_date => TO_DATE(‘22-08-2004 06:00:00’,‘dd/mm/yyyy
hh24:mi:ss’)
,INTERVAL => ‘TRUNC(SYSDATE+1)+6/24’
,no_parse => TRUE
);
END;
/

And most importantly the modification to Searchbuilder.pm (1.01)

— SearchBuilder.pm.orig 2004-06-28 04:39:00.000000000 +0200
+++ SearchBuilder.pm 2004-08-04 13:14:00.000000000 +0200
@@ -792,9 +792,20 @@
}

 }
  • my $clause = “($QualifiedField $args{‘OPERATOR’} $args{‘VALUE’})”;
  • my $clause;
  • if ($QualifiedField eq ‘lower(Attachments_2.Content)’ and
    $args{‘OPERATOR’} eq ‘LIKE’) {
  • $clause = ‘(contains(Attachments_2.Content,’ . $args{‘VALUE’} . ')

0)';

  •    }
    
  • elsif ($QualifiedField eq ‘lower(Attachments_2.Content)’ and
    $args{‘OPERATOR’} eq ‘NOT LIKE’) {
  • $clause = ‘(not contains(Attachments_2.Content,’ . $args{‘VALUE’} .
    ‘) >0)’;
  •    }
    
  • else {
  •    $clause = "($QualifiedField $args{'OPERATOR'}  
    

$args{‘VALUE’})";

  • }

+#my $clause = “($QualifiedField $args{‘OPERATOR’} $args{‘VALUE’})”;

 # Juju because this should come _AFTER_ the EA
 my $prefix = "";
 if ( $self->{_open_parens}{$Clause} ) {

To put things into perspective:
select * from attachments where contains(content, ‘sendmail’)>0
12 rows in ~150msec, that is cached by the database, first time ~25sec.

select * from attachments where content like ‘%sendmail%’
12 rows in 2:23 first time, ~40 sec every next invocation.

That is just testing this query but it pushes so much data into Oracle
caches that it invalidates a lot of indices causing problems for other
queries. Example: after running the … like ‘%sendmail%’ the index
caches are almost clear because running my version will now take
~500msec for first invocation and ~150msec on subsequent invocations.
YMMV.

Anyone using Oracle is invited to have a go at it using a test
installation and comment/update this so that once it is stable it can
be
incorporated into SearchBuilder and RT.

Thanks Jesse for RT3.2.1. I really, really like it.

Joop

Joop van de Wege JoopvandeWege@mococo.nl


Rt-devel mailing list
Rt-devel@lists.bestpractical.com
The rt-devel Archives

To put things into perspective:
select * from attachments where contains(content, ‘sendmail’)>0
12 rows in ~150msec, that is cached by the database, first time ~25sec.

select * from attachments where content like ‘%sendmail%’
12 rows in 2:23 first time, ~40 sec every next invocation.

That’s impressive. Does anybody have a similar optimisation for
postgresql?

–apb (Alan Barrett)