Better full text search engine for rt

Arkadiusz_Miskiewicz · July 23, 2018, 10:51am

Hi.

Is anyone using some external full text search engine with rt but different than sphinx or mysql+fulltext indexes?

I was using sphinx and now switched to mysql+fulltext but both solutions are very slow and unreliable with 2.5 mln tickets (+ attachments). It takes few minutes for mysql+rt to return results for example.

Looking for better solutions. Any suggestions?

Petr_Hanousek · July 25, 2018, 11:37am

Hmm, mariadb (10.2) + sphinx (2.2.11) worked pretty good for me. Well the first run on each search was sometimes quite slow but it can be tuned with number of dedicated processors (cores). The more cores you give to RT the faster searching you have. Next running the same fulltext search was fast. It also depends on the database volume, mine was about 17GBs.

MariaDB on the same hardware is faster than Mysql.

But I think the ultimate solution of this is to migrate to PostgreSQL which I am trying right now and adding some cores to the machine. I’ll write some article to rt-wiki about it but it will take me some time due to vacancy.

Arkadiusz_Miskiewicz · July 25, 2018, 11:54am

My DB is a bit bigger - 234GB.

Does anyone have such big rt DB on postgresql and can confirm that fulltext search is fast there? And by fast I mean no more than 30s per query.

Barton_Chittenden · July 31, 2018, 7:01pm

I’ve been thinking about using ElasticSearch with RT. Creating the indexes is pretty straight forward; I think the trick would access control.

jr69ss · September 25, 2018, 12:30pm

I’ve been looking to integrate ElasticSearch specifically for fulltext search as well. Have you started working on this?

Arkadiusz_Miskiewicz · September 25, 2018, 12:42pm

Not yet unfortunately.

Barton_Chittenden · September 30, 2018, 6:35pm

I did some research about dumping SQL databases into ElasticSearch – there’s a fairly straightforward way to do it using logStash, a bit like this:

# file: contacts-index-logstash.conf
input {
    jdbc {
        jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
        jdbc_user => "user"
        jdbc_password => "pswd"
        schedule => "* * * * *"
        jdbc_validate_connection => true
        jdbc_driver_library => "/path/to/latest/mysql-connector-java-jar"
        jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
        statement => "SELECT * from contacts where updatedAt > :sql_last_value"
    }
}
output {
    elasticsearch {
        protocol => http
        index => "contacts"
        document_type => "contact"
        document_id => "%{id}"
        host => "ES_NODE_HOST"
    }
}
# "* * * * *" -> run every minute
# sql_last_value is a built in parameter whose value is set
# to Thursday, 1 January 1970, or 0 if use_column_value is
# true and tracking_column is set

So essentially, for each thing that you’re indexing using elastic, you write a query:

input {
    jdbc {
    ...
        statement => "SQL GOES HERE"
    }
}

Here’s an example of a query on the users table:

input {
    jdbc {
        ...
        statement => "SELECT
                 u.id as 'source_id',
                 'ticket.example.com' as 'source',
                 u.EmailAddress as 'email',
                 u.RealName as 'name',
                 g.name as 'organization'
             FROM
                 users u
                 INNER JOIN GroupMemebers m on u.id = m.MemberId
                 INNER JOIN Groups g on g.id = m.GroupId
        "
    }
}

What makes this especially interesting is that we can use this to integrate user data from our SuiteCRM instance as well:

SuiteCRM

input {
    jdbc {
        ...
        statement => "SELECT
                 c.id as 'source_id',
                 'suitecrm.example.com' as 'source',
                 e.email_address as 'email'
                 CONCAT( c.first_name, ' ', c.last_name ) as 'name',
                 a.name as 'organization'
             FROM
                 contacts c
                 INNER JOIN accounts_contacts on c.id = accounts_contacts.contact_id
                 INNER JOIN accounts a on a.id = accounts_contacts.account_id
                 LEFT JOIN email_addr_bean_rel email_bean on a.id = email_bean.bean_id
                 LEFT JOIN emails on e.id = email_bean.email_id
        "
    }
}
...

Both the RT and SuiteCRM configs would have the same output section:

output {
    elasticsearch {
        protocol => http
        index => "contacts"
        document_type => "contact"
        document_id => "%{id}"
        host => "ES_NODE_HOST"
    }
}

meaning that we would get both RT and SuiteCRM contacts with one search against elastic, the source and source_id columns would allow us to connect back to the original system.

This is all vaporware, mind you. I haven’t tested a bit of it.

ktm · October 2, 2018, 1:52pm

Hi,

Our DB is 104GB with PostgreSQL 9.6 and the full text searches are nice and fast. The key is to make sure that you can keep the index in RAM.

Regards,
Ken