I did some research about dumping SQL databases into ElasticSearch – there’s a fairly straightforward way to do it using logStash, a bit like this:
# file: contacts-index-logstash.conf
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
jdbc_user => "user"
jdbc_password => "pswd"
schedule => "* * * * *"
jdbc_validate_connection => true
jdbc_driver_library => "/path/to/latest/mysql-connector-java-jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
statement => "SELECT * from contacts where updatedAt > :sql_last_value"
}
}
output {
elasticsearch {
protocol => http
index => "contacts"
document_type => "contact"
document_id => "%{id}"
host => "ES_NODE_HOST"
}
}
# "* * * * *" -> run every minute
# sql_last_value is a built in parameter whose value is set
# to Thursday, 1 January 1970, or 0 if use_column_value is
# true and tracking_column is set
So essentially, for each thing that you’re indexing using elastic, you write a query:
input {
jdbc {
...
statement => "SQL GOES HERE"
}
}
Here’s an example of a query on the users
table:
input {
jdbc {
...
statement => "SELECT
u.id as 'source_id',
'ticket.example.com' as 'source',
u.EmailAddress as 'email',
u.RealName as 'name',
g.name as 'organization'
FROM
users u
INNER JOIN GroupMemebers m on u.id = m.MemberId
INNER JOIN Groups g on g.id = m.GroupId
"
}
}
What makes this especially interesting is that we can use this to integrate user data from our SuiteCRM instance as well:
SuiteCRM
input {
jdbc {
...
statement => "SELECT
c.id as 'source_id',
'suitecrm.example.com' as 'source',
e.email_address as 'email'
CONCAT( c.first_name, ' ', c.last_name ) as 'name',
a.name as 'organization'
FROM
contacts c
INNER JOIN accounts_contacts on c.id = accounts_contacts.contact_id
INNER JOIN accounts a on a.id = accounts_contacts.account_id
LEFT JOIN email_addr_bean_rel email_bean on a.id = email_bean.bean_id
LEFT JOIN emails on e.id = email_bean.email_id
"
}
}
...
Both the RT and SuiteCRM configs would have the same output section:
output {
elasticsearch {
protocol => http
index => "contacts"
document_type => "contact"
document_id => "%{id}"
host => "ES_NODE_HOST"
}
}
meaning that we would get both RT and SuiteCRM contacts with one search against elastic, the source and source_id columns would allow us to connect back to the original system.
This is all vaporware, mind you. I haven’t tested a bit of it.