Search results anomaly

Hi Jeff,

Yes, that sounds like the problem. What happens if you change
the SQL query to:

LOG: duration: 35.815 ms statement: EXECUTE [PREPARE:
SELECT
DISTINCT main.* FROM Tickets main
JOIN Transactions Transactions_1
ON ( Transactions_1.ObjectId = main.id )
JOIN Attachments Attachments_2
ON ( Attachments_2.TransactionId = Transactions_1.id )
WHERE (Transactions_1.ObjectType = ‘RT::Ticket’)
AND (main.Status != ‘deleted’)
AND ( ( Attachments_2.Content LIKE ‘%members.linode%’ ) AND main.Owner = ‘66’)
AND (main.Type = ‘ticket’)
AND (main.EffectiveId = main.id)
ORDER BY main.id ASC ]

Using LIKE instead of ILIKE. It looks like there is no uppercase
period which is why it fails. It looks like you will need to use
a compound check for “members” and “linode”. You might also give
the fulltext indexing support a try since the case is basically
not included in the indexes so does not matter.

At least it is doing the right thing, not what you want. :slight_smile:

Regards,
KenOn Mon, May 17, 2010 at 04:31:57PM -0400, Jeff Blaine wrote:

  • The tickets failing to match “members.linode”, but matching
    “linode” have an HTML attachment containing the following:

    <META … content=“text/html; charset=windows-1252”>

  • The handful of tickets that match “members.linode” just so
    happen to have that troublesome attachment quoted in a
    reply (as text, not HTML).

So, that sounds like the problem. I wish that translated
to an obvious solution for me, but it doesn’t :slight_smile:

On 5/17/2010 3:50 PM, Kenneth Marshall wrote:

Hi Jeff,

Here is a link to some PostgreSQL documentation on setting
locales:

PostgreSQL: Documentation: 8.4: Locale Support

A crude example could be a locale that “sorted” a-i the
same and o-z the same, and nothing for h. Then the following

ahb
bha

would be equivalent to:

ab
ba

Or you could have a set of chars where the upper case
for one letter (c) did not exist. Then you could not match

upper(abc)

using lower(upper(abc)) == abc since it would produce
ab == abc and fail. I think that you may be seeing something
like that with your dot character.

Regards,
Ken

On Mon, May 17, 2010 at 03:30:27PM -0400, Jeff Blaine wrote:

On 5/17/2010 3:26 PM, Kenneth Marshall wrote:

Hi Jeff,

Those are just what I would have expected. I would take one
of the results from the second set of 77 and figure out why
it is not matching the first query. It does look more and more
like a collation problem. What is the collation of an attachment
that matches one query but not the other?

Thanks for the reply.

I’ll gladly provide any info I can, but you’re talking over
my head re: collation. Can you provide some more info? I’m
not a DB person.

Ken

On Mon, May 17, 2010 at 03:20:00PM -0400, Jeff Blaine wrote:

Here are the transactions as logged. FWIW, I find it curious
that all of these search failures for us so far are with
search terms that have a ‘.’ in the term…

LOG: duration: 35.815 ms statement: EXECUTE [PREPARE:
SELECT
DISTINCT main.* FROM Tickets main
JOIN Transactions Transactions_1
ON ( Transactions_1.ObjectId = main.id )
JOIN Attachments Attachments_2
ON ( Attachments_2.TransactionId = Transactions_1.id )
WHERE (Transactions_1.ObjectType = ‘RT::Ticket’)
AND (main.Status != ‘deleted’)
AND ( ( Attachments_2.Content ILIKE ‘%members.linode%’ ) AND
main.Owner = ‘66’)
AND (main.Type = ‘ticket’)
AND (main.EffectiveId = main.id)
ORDER BY main.id ASC ]

LOG: duration: 38.362 ms statement: EXECUTE [PREPARE:
SELECT
DISTINCT main.* FROM Tickets main
JOIN Transactions Transactions_1
ON ( Transactions_1.ObjectId = main.id )
JOIN Attachments Attachments_2
ON ( Attachments_2.TransactionId = Transactions_1.id )
WHERE (Transactions_1.ObjectType = ‘RT::Ticket’)
AND (main.Status != ‘deleted’)
AND ( ( Attachments_2.Content ILIKE ‘%linode%’ ) AND
main.Owner =
‘66’)
AND (main.Type = ‘ticket’)
AND (main.EffectiveId = main.id)
ORDER BY main.id ASC ]

On 5/17/2010 3:00 PM, Kenneth Marshall wrote:

And what queries actually hit the database backend for these
two? If it is the same SQL query, then the problem is somewhere
else. If the query differs, try running the query manually to
see why it is giving unexpected results. Again as others have
mentioned, collation/encoding could cause these types of differences.

Cheers,
Ken

On Mon, May 17, 2010 at 02:48:49PM -0400, Jeff Blaine wrote:

Another example of this failing:

rt list -t ticket “Content like members.linode AND Owner = jsmith”

14 results

rt list -t ticket “Content like linode AND Owner = jsmith”

77 results, all of which have “members.linode” in the content.

On 4/26/2010 5:32 PM, Jeff Blaine wrote:

On 4/26/2010 5:04 PM, Kenneth Marshall wrote:

I am not certain what to do. For myself, I would turn on
SQL statement logging within the database. Set:

log_min_duration_statement = 0

in your postgresql.conf for the database and run both
queries in RT. Then turn it back done and pick apart the
query results to see if the problem is a logic problem or
possibly an encoding issue which results in “equal” items
not showing as “equal”. Also, it looks like you are just

These were sequential in the log as a result of my simple
“Content matches foo.com” query via the web GUI. One is
a COUNT and the other is for getting data, apparently.

I formatted them here for easier reading.

LOG: duration: 101.115 ms statement: EXECUTE [PREPARE:
SELECT COUNT(DISTINCT main.id) FROM Tickets main
JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId =
main.id
)
JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId =
Transactions_1.id )
WHERE (Transactions_1.ObjectType = ‘RT::Ticket’)
AND (main.Status != ‘deleted’)
AND ( ( Attachments_2.Content ILIKE ‘%foo.com%’ ) )
AND (main.Type = ‘ticket’)
AND (main.EffectiveId = main.id)
]

LOG: duration: 105.350 ms statement: EXECUTE [PREPARE:
SELECT DISTINCT main.* FROM Tickets main
JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId =
main.id
)
JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId =
Transactions_1.id )
WHERE (Transactions_1.ObjectType = ‘RT::Ticket’)
AND (main.Status != ‘deleted’)
AND ( ( Attachments_2.Content ILIKE ‘%foo.com%’ ) )
AND (main.Type = ‘ticket’)
AND (main.EffectiveId = main.id)
ORDER BY main.id ASC LIMIT 50
]

What “both queries” are you referring to? The RT CLI and the
RT web GUI both fail the same way with my 1 test query. I’m
confused by what you mean there.

Thanks for the ideas, Ken

starting with such a low ticket number, but if you plan
on using full content searches for any large-ish amount
of data you should really plan on using the full-text index
support for PostgreSQL/RT that is described in the wiki.
Speaking as one whose database was brought to its knees by
some creative data mining. :slight_smile:

Good luck and let us know what you find to be the cause.
Regards,

Ken

On Mon, Apr 26, 2010 at 04:54:45PM -0400, Jeff Blaine wrote:

On 4/26/2010 4:52 PM, Kenneth Marshall wrote:

Jeff,

Are you using the Full-text index support from the wiki?

Nope.

Just RT 3.8.7 + RTFM 2.4.2. No add-ons/tweaks from the wiki.

There were some index bugs that may require you to re-index
to fix, specifically some rows were not reported correctly
as valid matches. I believe that the PostgreSQL release
notes mentioned that need. Maybe that is your problem. We
are using RT-3.8.5 and PostgreSQL 8.4.2 here with the full-text
support without an issue.

Regards,
Ken

On Mon, Apr 26, 2010 at 03:00:39PM -0400, Jeff Blaine wrote:

On 4/26/2010 2:19 PM, Kenneth Marshall wrote:

Well, that knocks out the ACL issue. Do you think that your
Mason cache is confused? Maybe stop RT, clear the cache, and
restart RT to see if that helps. What DB backend are you using
and which version of RT are you running?

RT 3.8.7
PostgreSQL as it comes with RHELv5 + updates from yum

Clearing the Mason cache didn’t help :expressionless:

Cheers,
Ken

On Mon, Apr 26, 2010 at 12:47:22PM -0400, Jeff Blaine wrote:

On 4/26/2010 12:29 PM, Raed El-Hames wrote:

Jeff;

Does your CLI user have permissions on the queue that ticket
39
is
in??
login to the web interface with the same cli user and see if
you
can
view the ticket.

Yes, it does.

Again, however, this is not really a report about an anomaly
in
the RT CLI.

The incorrect search results are returned via a web GUI search
of “Content matches foo.com

Here, maybe this makes it more clear, showing the same problem
when using the RT CLI:

[root@rtsrv1 etc]# /apps/rt/bin/rt list “Content like foo.com
Query:Content like ‘foo.com
Ticket Owner Queue Age Told Status Requestor Subject

23 mbs Incid 1 wk resolv enVision@ alert
-NICAlert-Secur
[root@rtsrv1 etc]#

[root@rtsrv1 etc]# /apps/rt/bin/rt show 39 | grep foo.com
foo.com blah blah… 1 line… not including in this email
[root@rtsrv1 etc]#

[root@rtsrv1 etc]# /apps/rt/bin/rt show 23 | grep foo.com
foo.com blah blah… not including in this email
foo.com matching lines 66 more times… not including in this
email
[root@rtsrv1 etc]#

Regards;
Roy

Jeff Blaine wrote:

On 4/26/2010 11:50 AM, Kenneth Marshall wrote:

Hi Jeff,

There is nothing here that indicates a problem. It looks
like an apples vs. oranges comparison by the time you
include
the actual parameters of the search from the web interface
and the rt commandline interface and possible privilege and
ACL differences. You can use DB query logging to figure out

I think my original post is being misinterpreted. The ‘rt’
CLI commands aren’t doing a search. They’re just showing
this list’s readers that ‘foo.com’ does show up in each of
the tickets when doing a simple ‘rt show’. It’s
not a comparison of “CLI search vs. web search”.

what SQL is being used in the web search or the commandline
rt and compare the output piece-wise to put yourself at
ease.
Maybe look at the individual components of each of the two
tickets, as well.

When viewing the tickets using 'Full headers" and then
“Ctrl-F” to examine every instance of ‘foo.com’ in each
ticket
shows that both tickets have the ‘foo.com’ in text/html
parts
(and only there).

Ticket 23 has 67 of those parts and is returned when RT
searching
for ‘foo.com

Ticket 39 has 1 of those parts and is not returned when RT
searching
for ‘foo.com

By “DB query logging” do you mean Set($StatementLog,
“DEBUG”);
or something?

Thanks for the reply, Ken

Jeff

Cheers,
Ken

On Mon, Apr 26, 2010 at 11:21:45AM -0400, Jeff Blaine wrote:

Does anyone have any suggestions for how to go about
figuring out what is wrong here?

On 4/22/2010 2:09 PM, Jeff Blaine wrote:

RT 3.8.7

A search for ‘Content matches foo.com’ is returning some
tickets
and missing others that clearly have foo.com in the
Content.

[root@rtsrv1 bin]# ./rt show 39 | grep foo.com | wc -l
1
[root@rtsrv1 bin]#
[root@rtsrv1 bin]# ./rt show 23 | grep foo.com | wc -l
67
[root@rtsrv1 bin]#
23 shows up in the web search results.

39 does not.

Any ideas?

Discover RT’s hidden secrets with RT Essentials from
O’Reilly
Media.
Buy a copy at http://rtbook.bestpractical.com

Discover RT’s hidden secrets with RT Essentials from
O’Reilly
Media.
Buy a copy at http://rtbook.bestpractical.com

Discover RT’s hidden secrets with RT Essentials from
O’Reilly
Media.
Buy a copy at http://rtbook.bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com