Full text indexing problem: Attachment cannot be indexed: Incorrect string value

I’m running into an issue with setting up full text indexing, which I believe is related to character set in MariaDB. Is there a way to skip or deal with these?

/opt/rt4/sbin/rt-fulltext-indexer

[6932] [Tue Jul 23 16:00:26 2019] [warning]: DBD::mysql::st execute failed: Incorrect string value: ‘\xB3\xF6\xCA\xDB\xC8\xAB…’ for column rt.AttachmentsIndex.Content at row 1 at /opt/rt4/sbin/rt-fulltext-indexer line 238. (/opt/rt4/sbin/rt-fulltext-indexer:238)
[6932] [Tue Jul 23 16:00:26 2019] [warning]: DBD::mysql::st execute failed: Incorrect string value: ‘\xB3\xF6\xCA\xDB\xC8\xAB…’ for column rt.AttachmentsIndex.Content at row 1 at /opt/rt4/sbin/rt-

Is there something I can do to skip tickets like this?

I get exactly the same problem. Debian + MariaDB.

All has been working fine for a few weeks, now the cron job starts to report these errors.

Interestingly, the upgrade/migration from 3.8.8 to 4.4.3 script fell over also with debian+Mariadb (defaults) so I had to do the migration on a fresh ubuntu install with MySQL, and then dump the database and import it to the Debian server.

I had these errors at install time when enabling indexing, so just ran the indexing on the ubuntu box, dumped the tables and then imported them again on the debian box. Errors went away until recently.

The errors did not occur on the ubuntu box, but do on the Debian install.

There’s difference in setting for the “locale” setup on the debian box:

# locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

Ubuntu:

$ locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=

I’ve tried all sorts of suggestions to change this, but it doesn’t have any effect so far…

I’m also running Debian (10). For locale, I have:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

The database was converted to utf8mb4.

I was guessing that some of the tickets created with spam related text have extended characters like ü and å, and the indexing was unable to cope with these characters. Going to try on Ubuntu… Thanks.