RT 4.4.2, Extract Custom Fields, problem extracting from HTML email body

Hi,

I’ve spun up a test environment to migrate our RT from v4.2.12 to v4.4.2. Have followed the upgrade instructions to the letter, and upgrade the dbase as directed.

We’ve been using Extract Custom Fields (RT-Extension-ExtractCustomFieldValues-3.13) with RT v4.2.12 successfully for some time.

After migrating the dbase to v4.4.2 with Extract Custom Fields (RT-Extension-ExtractCustomFieldValues-3.14), we’ve discovered that data is no longer being extracted from the body of HTML emails, however it does work for Plain Text emails.

Below is an extract from the Extract Custom Fields template in use:-
Product|Body|Product->\s*(.+)$

Obviously, this extracts data after the following email body text:-
Product-> MyProductInfo

The above text is written in HTML differently as you would expect, due the ‘>’ character being a meta character in HTML.
This is an example of the way it’s displayed in HTML:-
<span style="font-weight: bold;">Product-&gt; </span>MyProductInfo<br>

As you can see, our Product-> marker is reformatted to Product-&gt; in HTML; it seems obvious why it is working in Plain Text and not in HTML.
If I were setting this up from scratch I’d just use another character that was more HTML tolerant rather than the >.

BTW… if I change the Extract Custom Fields template to use this in instead:-
Product|Body|Product--\s*(.+)$
Then the email body text Product-- MyProductInfo works as expected for Plain Text and HTML email bodies.

However, the above has been working for years using Product-> MyProductInfo with HTML and plain text sourced emails.

Has anyone else discovered this behaviour?

Regards,
Brett

After changing the Extract Custom Fields template to use – instead of ->, I now have custom field values being extracted twice for each custom field.

It first extracts values from ‘Attachment 2’ (text/plain). The values extracted are correct.
After that, it extracts values from ‘Attachement 3’ (text/html). These values include HTML code in the email, and overwrite the previous correct values extracted.

An example of the data in the HTML encoded email: -

Manufacturer-- Super Cali Fraji Listics
Product-- Phonecitcs discombobulator
PartNumber-- 6666-9999-6666-9999
MasterID-- 6969696969
SerialNumber-- 1234567890

Below is an extract from RT.log.
[3368] [Tue Oct 31 04:56:44 2017] [debug]: Committing scrip #27 on txn #147782 of ticket #2311 (/opt/rt4/sbin/../lib/RT/Scrips.pm:293)
[3368] [Tue Oct 31 04:56:44 2017] [debug]: Looking to extract: CFName=Manufacturer Field=Body Match=Manufacturer-- \s*(.+)$ Options= PostEdit= (/opt/rt4/local/plugins/RT-Extension-ExtractCustomFieldValues/lib/RT/Action/ExtractCustomFieldValues.pm:76)
[3368] [Tue Oct 31 04:56:44 2017] [debug]: Looking for CF Manufacturer (/opt/rt4/local/plugins/RT-Extension-ExtractCustomFieldValues/lib/RT/Action/ExtractCustomFieldValues.pm:146)
[3368] [Tue Oct 31 04:56:44 2017] [debug]: Found CF id 51 (/opt/rt4/local/plugins/RT-Extension-ExtractCustomFieldValues/lib/RT/Action/ExtractCustomFieldValues.pm:161)
[3368] [Tue Oct 31 04:56:44 2017] [debug]: Looking at attachment 1, content-type multipart/alternative (/opt/rt4/local/plugins/RT-Extension-ExtractCustomFieldValues/lib/RT/Action/ExtractCustomFieldValues.pm:180)
[3368] [Tue Oct 31 04:56:44 2017] [debug]: Looking at attachment 2, content-type text/plain (/opt/rt4/local/plugins/RT-Extension-ExtractCustomFieldValues/lib/RT/Action/ExtractCustomFieldValues.pm:180)
[3368] [Tue Oct 31 04:56:44 2017] [debug]: Examining content of body (/opt/rt4/local/plugins/RT-Extension-ExtractCustomFieldValues/lib/RT/Action/ExtractCustomFieldValues.pm:191)
[3368] [Tue Oct 31 04:56:44 2017] [debug]: Found value for CF: Super Cali Fraji Listics (/opt/rt4/local/plugins/RT-Extension-ExtractCustomFieldValues/lib/RT/Action/ExtractCustomFieldValues.pm:230)
… snip …
[3368] [Tue Oct 31 04:56:44 2017] [info]: CustomFieldValue (Manufacturer,Super Cali Fraji Listics) added: 133456 Manufacturer Super Cali Fraji Listics added (/opt/rt4/local/plugins/RT-Extension-ExtractCustomFieldValues/lib/RT/Action/ExtractCustomFieldValues.pm:239)
[3368] [Tue Oct 31 04:56:44 2017] [debug]: Looking at attachment 3, content-type text/html (/opt/rt4/local/plugins/RT-Extension-ExtractCustomFieldValues/lib/RT/Action/ExtractCustomFieldValues.pm:180)
[3368] [Tue Oct 31 04:56:44 2017] [debug]: Examining content of body (/opt/rt4/local/plugins/RT-Extension-ExtractCustomFieldValues/lib/RT/Action/ExtractCustomFieldValues.pm:191)
[3368] [Tue Oct 31 04:56:44 2017] [debug]: Found value for CF: </span>Super Cali Fraji Listics<br> (/opt/rt4/local/plugins/RT-Extension-ExtractCustomFieldValues/lib/RT/Action/ExtractCustomFieldValues.pm:230)

Am I doing anything wrong here, or is this a new ‘feature’?

Cheers!