Full text indexing problem: Attachment cannot be indexed: Incorrect string value

I’m running into an issue with setting up full text indexing, which I believe is related to character set in MariaDB. Is there a way to skip or deal with these?

/opt/rt4/sbin/rt-fulltext-indexer

[6932] [Tue Jul 23 16:00:26 2019] [warning]: DBD::mysql::st execute failed: Incorrect string value: ‘\xB3\xF6\xCA\xDB\xC8\xAB…’ for column rt.AttachmentsIndex.Content at row 1 at /opt/rt4/sbin/rt-fulltext-indexer line 238. (/opt/rt4/sbin/rt-fulltext-indexer:238)
[6932] [Tue Jul 23 16:00:26 2019] [warning]: DBD::mysql::st execute failed: Incorrect string value: ‘\xB3\xF6\xCA\xDB\xC8\xAB…’ for column rt.AttachmentsIndex.Content at row 1 at /opt/rt4/sbin/rt-

Is there something I can do to skip tickets like this?

I get exactly the same problem. Debian + MariaDB.

All has been working fine for a few weeks, now the cron job starts to report these errors.

Interestingly, the upgrade/migration from 3.8.8 to 4.4.3 script fell over also with debian+Mariadb (defaults) so I had to do the migration on a fresh ubuntu install with MySQL, and then dump the database and import it to the Debian server.

I had these errors at install time when enabling indexing, so just ran the indexing on the ubuntu box, dumped the tables and then imported them again on the debian box. Errors went away until recently.

The errors did not occur on the ubuntu box, but do on the Debian install.

There’s difference in setting for the “locale” setup on the debian box:

# locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

Ubuntu:

$ locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=

I’ve tried all sorts of suggestions to change this, but it doesn’t have any effect so far…

I’m also running Debian (10). For locale, I have:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

The database was converted to utf8mb4.

I was guessing that some of the tickets created with spam related text have extended characters like ü and å, and the indexing was unable to cope with these characters. Going to try on Ubuntu… Thanks.

Hi Frank,
had the same recently.

Converting the database to utf8mb4 is not enough, you also need to patch lib/RT/Handle.pm to use utf8mb4 as ‘connection encoding’:

if ( $db_type eq ‘mysql’ ) {
my $version = $self->DatabaseVersion;
($version) = $version =~ /^(\d+.\d+)/;
$self->dbh->do(“SET NAMES ‘utf8’”) if $version >= 4.1;
}

replace with
$self->dbh->do(“SET NAMES ‘utf8mb4’”)

Best regards,
Lars

2 Likes

Thanks, This patch was useful. Finally getting somewhere with this.
It seems to have corrected it in my case, well, the errors have gone away. (So far, though I thought I’d fixed it twice before but the errors came back, so I’ll have to see if the problem is really fixed this time!)

Taking some advice from: https://mathiasbynens.be/notes/mysql-utf8mb4
(Read that. It’s quite clear and explains the problem in more detail.)

Backup the database with mysqldump of course, in case something goes horribly wrong. :slight_smile:

Show existing charset to see if this is the problem. If it’s utf8, you need to convert to utf8mb4.
(replace ‘rtdb’ if needed with your database name.)

use rtdb;
SELECT default_character_set_name FROM information_schema.SCHEMATA S WHERE schema_name = "rtdb";

Show structure of existing table:

MariaDB [rtdb]> describe AttachmentsIndex;
+---------+------------------+------+-----+---------+----------------+
| Field   | Type             | Null | Key | Default | Extra          |
+---------+------------------+------+-----+---------+----------------+
| id      | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| Content | longtext         | YES  | MUL | NULL    |                |
+---------+------------------+------+-----+---------+----------------+
2 rows in set (0.003 sec)

Convert to utf8mb4:

ALTER DATABASE rtdb CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE AttachmentsIndex CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE AttachmentsIndex CHANGE Content Content longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
OPTIMIZE TABLE AttachmentsIndex;

(Also there is a REPAIR option suggested, but in my case, it isn’t supported for the database type.)

The link above also has suggestions to change your mysql client/server config if needed, i.e., how to set it to utf8mb4 if it’s utf8. I didn’t need to change this, only apply the patch above to RT.

mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)