ExtractCustomFields template and dropping errant HTML tags

I have a scrip to assign CustomFields based on a template and it often ends up collecting junk like HTML tags trailing after the data I want to match.

I think I have made my regex as specific as I can, but now I’m concerned that I went about this the wrong way. I would love an opinion.

Emails that aren’t human-generated typically have a block of data in them that includes data like:

Room:Y10A

Building:ddd
IP:172.16.2.2,fe80::250:43ff:fe00:ed31
MAC:DE:CA:FB:AD:11:97
Port:ddd-1@4/40

And sometimes they’re handled by applications that generate them with HTML formatting, or are copy/pasted with HTML formatting, etc.

I have a CustomField called ‘Building’ and in my Template I have:

Building|Body|Building:([^<]+)\n||

a) Is this ([^<]) necessary – or is there a way to merely ignore all HTML on incoming mail before it gets handed off to rt-mailtool that is preferred/better?

b) Is there something about my Template use that is obviously wrong?

 [✔]  Never use your HawkID's email address and password anywhere else! 

 Emory Lundberg, Security Friend, Information Security & Policy Office 
 University of Iowa, UCC, Campus Phone: 5-6174

Some additional information as I’m getting big blobs of text with markup in several fields on emails and forms that are submitted to my RT-4.0.10:

2013-05-20 21:37:32 The RT System itself - FQDN you’ve completed this form!

 
added

I’m stuck on not being sure about the most appropriate way to proceed. I don’t know if I should be trying to adjust my CustomField template to ignore HTML tags designated with ‘<’, or if I should be stripping HTML at the MTA and forcing plain text on everybody.

I’m not using Set($PreferRichText, 1); because I don’t know if that will even help with body parsing at all.

Anyone have an experience like this?

My Template for that field is written as:

FQDN|Body|FQDN:([^<].+)||

//emory

Original message below:On May 15, 2013, at 10:39 AM, “Lundberg, Emory” emory-lundberg@uiowa.edu wrote:

I have a scrip to assign CustomFields based on a template and it often ends up collecting junk like HTML tags trailing after the data I want to match.

I think I have made my regex as specific as I can, but now I’m concerned that I went about this the wrong way. I would love an opinion.

Emails that aren’t human-generated typically have a block of data in them that includes data like:

Room:Y10A
Building:ddd
IP:172.16.2.2,fe80::250:43ff:fe00:ed31
MAC:DE:CA:FB:AD:11:97
Port:ddd-1@4/40

And sometimes they’re handled by applications that generate them with HTML formatting, or are copy/pasted with HTML formatting, etc.

I have a CustomField called ‘Building’ and in my Template I have:

Building|Body|Building:([^<]+)\n||

a) Is this ([^<]) necessary – or is there a way to merely ignore all HTML on incoming mail before it gets handed off to rt-mailtool that is preferred/better?
b) Is there something about my Template use that is obviously wrong?

Some additional information as I’m getting big blobs of text with
markup in several fields on emails and forms that are submitted to my
RT-4.0.10:

2013-05-20 21:37:32 The RT System itself - FQDN you’ve completed
this form!  added

I’m stuck on not being sure about the most appropriate way to
proceed. I don’t know if I should be trying to adjust my CustomField
template to ignore HTML tags designated with ‘<’, or if I should be
stripping HTML at the MTA and forcing plain text on everybody.

Perhaps ECFV should preferentially use an alternative text/plain part if
one exists, and only use text/html if necessary.

I’m not using Set($PreferRichText, 1); because I don’t know if that
will even help with body parsing at all.

It doesn’t affect this parsing, it only affects display in the ticket
history.

My Template for that field is written as:

FQDN|Body|FQDN:([^<].+)||

I don’t think your regex is matching what you expect.

It says to match the string “FQDN”, then zero or more colons “:”, then
any single character except “<”, then any character zero or more times,
then a syntax error since a quantifier “+” is used without anything to
quantify.

Perhaps you meant:

FQDN:\s*([^<]+)$

which matches “FQDN:”, optional whitespace, one or more characters that
aren’t “<” to the end of the line.