FullTextSearch the parts of word

Hi,

I have installed RT 4.2 with PostgreSQL FullTextSearch. In base i can
search only the whole word. Is in possible search only for parts of word?
Best regards,
Arkady Glazov
http://globster.ru

Hi,

I have installed RT 4.2 with PostgreSQL FullTextSearch. In base i can
search only the whole word. Is in possible search only for parts of word?

Best regards,
Arkady Glazov
http://globster.ru

Hi Arkady,

What do you mean by “parts of a word”? You can alter what can be searched
for by changing the parsers and dictionaries:

But you will need to understand a LOT more about how it works to do that
successfully. Is there a particular problem you are trying to solve?

Regards,
Ken

Hi,On Wed, Apr 09, 2014 at 05:53:35PM +0400, Arkady Glazov wrote:

Hi,

I have installed RT 4.2 with PostgreSQL FullTextSearch. In base i can
search only the whole word. Is in possible search only for parts of word?

I did this in the past for RT 3.8.x and I have configuration ready to
use it in the 4.2.3 too. But this is

  • a bit hacky
  • there is redundant information in the database

the wiki page is outdated
http://requesttracker.wikia.com/wiki/PostgreSQLFullTextTrgm

give me some time to prepare up-to-date instructions please.
Zito

Please look at GitHub - zito/rt-pgsql-fttrgm: Request Tracker - PostgreSQL - FullText searching using trigrams mod
I hope it will work, also I didn’t try to run the script rt-mysql2pg on
RT4 database (I did simple upgrade of RT3 database with indexes already
setup). Let me know if it works.
Thanks
Zito

Hi Arkady,

Hi V�clav,
I will be in wait.

I look databases. All content saved as ‘quoted-printable’. I can send
example If it help you.
…On Fri, Apr 11, 2014 at 09:38:26AM +0400, Arkady Glazov wrote:
Hi V�clav,

I send example of letter after this email.

In my RT Database body of email show as:

I can confirm this. I try your message and my own message with latin2
chars and both are qp encoded :(.

-[ RECORD 1 ]—±-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | 313295
transactionid | 411992
parent | 313294
messageid | E1WYUC2-0005Yb-35@skat-rt.seagroup.inc
subject | Quoted-printable test for RT
filename |
contenttype | text/plain
contentencoding | quoted-printable
content | =D0=9F=D1=80=D0=B8=D0=BC=D0=B5=D1=80 =D1=82=D0=B5=D0=BA=D1=81=D1=82=D0=B0 =
| =D1=81=D0=BE=D0=B4=D0=B5=D1=80=D0=B6=D0=B0=D1=89=D0=B5=D0=B3=D0=BE =D0=BA=
| =D0=B8=D1=80=D0=B8=D0=BB=D0=B8=D1=86=D1=83 =D0=B8 =D0=BB=D0=B0=D1=82=D0=B8=
| =D0=BD=D0=B8=D1=86=D1=83.
| This is example of cyrillic and latin text in th body.
| Encode as quoted-printable.
|
| –
| Arkady Glazov
|

nis=# \x
Expanded display is on.
nis=# select * from attachments where transactionid =411999;
-[ RECORD 1 ]—±-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | 313298
transactionid | 411999
parent | 0
messageid | 20140411083151.GF8681@bobek.localdomain
subject | test latin2
filename |
contenttype | text/plain
contentencoding | quoted-printable
content | This is latin2 test:
| Diakritika v =C4=8Desk=C3=BDch znac=C3=ADch…
| =C5=BDlu=C5=A5ou=C4=8Dk=C3=BD k=C5=AF=C5=88 =C3=BAp=C4=9Bl =C4=8F=C3=A1bels=
| k=C3=A9 =C3=B3dy.
| --=20
| V=C3=A1clav Ovs=C3=ADk
| IIT-UNIX
| ICZ a.s.
| Pobo=C4=8Dka Plze=C5=88
| N=C3=A1m=C4=9Bst=C3=AD M=C3=ADru 10, 301 00 Plze=C5=88, CZ
| Tel. +420 222 275 511
| vaclav.ovsik@i.cz
| http://www.i.cz
|
|

Than this is maybe ready for bugreport. I will try to debug this a little…
I think the previous versions of RT did decoding MIME encodings as possible
into UTF-8 raw shape, so fulltext can work.

My current RT 3.8.16 has distribution of encoding:

nis=# select distinct contentencoding, count(contentencoding) from attachments group by contentencoding;
contentencoding | count
none | 283405
quoted-printable | 547
base64 | 1711
| 0
(4 rows)

Maybe this is a regression or some ugly feature of RT 4.2.x.

Zito

I have tried to feed test message into production RT instance 3.8.16
and it ends in the database:

interni=# select contenttype, contentencoding, content, trigrams from attachments where transactionid =254774;
-[ RECORD 1 ]—±-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
contenttype | text/plain
contentencoding | none
content | Пример текста содержащего кирилицу и латиницу.
| This is example of cyrillic and latin text in th body.
| Encode as quoted-printable.
|
| –
| Arkady Glazov
|
trigrams | ‘-pr’ ‘abl’ ‘ady’ ‘amp’ ‘and’ ‘ark’ ‘ati’ ‘azo’ ‘ble’ ‘bod’ ‘cod’ ‘cyr’ ‘d-p’ ‘dy.’ ‘ed-’ ‘enc’ ‘est’ ‘exa’ ‘ext’ ‘for’ ‘gla’ ‘his’ ‘ill’ ‘int’ ‘kad’ ‘lat’ ‘laz’ ‘le.’ ‘lic’ ‘lli’ ‘mpl’ ‘nco’ ‘nta’ ‘ode’ ‘ody’ ‘ote’ ‘ple’ ‘pri’ ‘quo’ ‘ril’ ‘rin’ ‘rka’ ‘tab’ ‘ted’ ‘tes’ ‘tex’ ‘thi’ ‘tin’ ‘uot’ ‘xam’ ‘yri’ ‘zov’ ‘ати’ ‘аще’ ‘дер’ ‘его’ ‘екс’ ‘ерж’ ‘жащ’ ‘или’ ‘име’ ‘ини’ ‘ири’ ‘ицу’ ‘кир’ ‘кст’ ‘лат’ ‘лиц’ ‘мер’ ‘ниц’ ‘оде’ ‘при’ ‘ржа’ ‘рил’ ‘рим’ ‘сод’ ‘ста’ ‘тек’ ‘тин’ ‘цу.’ ‘щег’

So I think, this is really problem with RT 4.2.3 :(.
Zito

I think I found the critical point.
The problem is in the method RT::Record::_EncodeLOB().
I run a little script feeding message into the RT under debugger:
last actions was to setup breakpoint on b RT::Record::_EncodeLOB
and there is several steps:

DB<45> v
788: } elsif ( !$RT::Handle->BinarySafeBLOBs
789 && $Body =~ /\P{ASCII}/
790 && !Encode::is_utf8( $Body, 1 ) ) {
791==> $ContentEncoding = ‘quoted-printable’;
792 }
793
794 #if the attachment is larger than the maximum size
795: if ( ($MaxSize) and ( $MaxSize < length($Body) ) ) {
796
797 # if we’re supposed to truncate large attachments
DB<45> x $Body
0 'Пример текста содержащего кирилицу и латиницу.
This is example of cyrillic and latin text in th body.
Encode as quoted-printable.

Arkady Glazov

DB<46> p Encode::is_utf8( $Body, 1 ) ? “true” : "false"
false

For some reason Encode::is_utf8(…) returns false :(.

Maybe the problem is with the libmime-tools-perl (I’m running on the Debian), I
have version 5.503-1.
Zito


Maybe the problem is with the libmime-tools-perl (I’m running on the Debian), I
have version 5.503-1.

correction, I have localy installed version 5.505…

zito@rt2:~/migration/rt$ make testdeps |fgrep -i mime
MIME::Entity >= 5.504 …found
zito@rt2:~/migration/rt$ perl -MMIME::Entity -e ’ print “$MIME::Entity::VERSION\n”;’
5.505

Zito

FYI: The problem has ticket
http://issues.bestpractical.com/Ticket/Display.html?id=29735
I found a temporary workaround - patch attached…
Zito

utf8-valid.patch (489 Bytes)