Full-Text Search: RT-4.2.11 MySQL 5.6.25 vs MariaDB 10.0.19+Sphinx

In short, trying to find the best mechanism for FTS, whether I can use MySQL 5.6.25 InnoDB, or MariaDB 10.0.19 + Sphinx on RHEL 7.

I’d prefer to use MySQL, since that seems to support FTS by itself. Otherwise, I opt for MariaDB, simply because that has the Sphinx engine built-in to the RPM I can get (saving me the hassle of build-from-source) although I’d still have to separately maintain the actual Sphinx/searchd daemon package.

Also, when running rt-fulltext-indexer for MySQL the notes say it does 200 tickets at a time. I take it the script remembers the last ticket it did, and continues where it left off? Does running this place any locks on any tables/rows, or can it be run at any time?

And to be clear, for Sphinx, the rt-setup-fulltext-indexer & indexer rt is a one-time thing, and subsequent is simply indexer rt --rotate? And similarly, running indexer rt --rotate can be done at any time even while RT is being used?

Oh, I do intend to “DontSearchFileATtachments = true”, although we might try it undef for a while, and see.

Thank you,
PH
Paul Hirose

In short, trying to find the best mechanism for FTS, whether I can
use MySQL 5.6.25 InnoDB, or MariaDB 10.0.19 + Sphinx on RHEL 7.

My answers below assume RT 4.2.11 – there were some order-of-magnitude
speed improvements to the indexer since 4.2.10. I’d recommend pure
MySQL FTS (either InnoDB or MyISAM), and not Sphinx, as there are a
large number of caveats with the Sphinx search (no-result responses and
false negatives, primarily). Interestingly the performance of InnoDB
FTS is slower than MyISAM’s: https://chmrr.net/fts-charts/query.html

I’d prefer to use MySQL, since that seems to support FTS by itself.
Otherwise, I opt for MariaDB, simply because that has the Sphinx
engine built-in to the RPM I can get (saving me the hassle of
build-from-source) although I’d still have to separately maintain the
actual Sphinx/searchd daemon package.

Also, when running rt-fulltext-indexer for MySQL the notes say it
does 200 tickets at a time. I take it the script remembers the last
ticket it did, and continues where it left off? Does running this
place any locks on any tables/rows, or can it be run at any time?

It indexes in batches, based on the high-water-mark of the last
successfully index attachment. It does not use database-level
locking, instead using taking a lock on the indexer script itself;
there is no impact on availability of RT for other operations. It can be
canceled and resumed at will.

And to be clear, for Sphinx, the rt-setup-fulltext-indexer & indexer
rt is a one-time thing, and subsequent is simply indexer rt
–rotate? And similarly, running indexer rt --rotate can be done at
any time even while RT is being used?

Yes – however, indexer rt re-indexes all attachments, since Sphinx
does not support incremental indexes. As such, having close-to-live
FTS indexes with Sphinx requires either a small database, or a main +
delta technique – see
http://www.sphinxconsultant.com/sphinx-search-delta-indexing/#delta-2
for an example of such.

Oh, I do intend to “DontSearchFileATtachments = true”, although we
might try it undef for a while, and see.

As a note, DontSearchFileAttachments is respected on search, not on
index, so it won’t impact indexing time.

  • Alex