Running RT 5.0.0 on FreeBSD. System has been running without issue for several years, but has developed this issue as of this morning.
Dec 5 15:43:35 Varley-rt-01 RT:  Use of uninitialized value $tag in string eq at /usr/local/lib/perl5/site_perl/HTML/Element.pm line 2634.
The ticket number and the line number change.
Nothing has changed on this system, and no software has been updated.
Are you getting a lot of incoming mail traffic that is trying to create lots of tickets? This could be spam, or a deliberate denial of service attack or even something simple such as an old local mailing list with lots of out of date addresses getting a mail sent to it resulting in lots of delivery failure messages from remote mail servers.
Doesn’t appear so, no. Thanks for the suggestion though.
Well the error above appears to be in the Perl HTML::Element module rather than in the RT code itself. Assuming the line number is for the latest stable version it appears to be whilst processing numbered lists in HTML. Not much help probably, but it might give you a clue of where to look possibly.
I’m not sure what to make of that, but I do appreciate the reply.
So, it seems that every ticket number mentioned in the errors doesn’t exist in the database. Something trying to iterate tickets and erroring because they’re not there?
I guess if it is, you’ll need to track down the process that is doing that. I assume you’ve tried flushing the Mason cache and restarting the web server already, so I’d be looking at cron jobs, extensions/local code changes and then incoming mail contents next to see if I could find what was generating these non-existent ticket number references. Its not something I’ve ever seen on our RT installations I’m afraid.
Wait wait, tell me more about flushing the Mason cache. I know naught of this.
Ok, googled the Mason cache thing, tried that, no effect.
No, I’ve never seen this either, in ~12 years of running RT3, RT4, and now RT5.
FWIW, we flush our Mason cache daily (in the wee hours using a cron job) as that means that any local tweaks, etc that we do to the Mason pages will magically appear to the users overnight whilst we’re tucked up in bed.
Do you have a process monitor such as
top installed that might show what processes are consuming all the CPU? In other words, is the RT web UI handler, the database server or some other external process gobbling up the cycles? Or is it just Apache? When you restart the web server does it immediately pin up to 100% CPU or ramp up slowly to that?
I’d never done it before. We don’t generally make any modifications to RT, we just use it.
The processes are all httpd processes. From a restart I get 1x httpd process at 99-100% within a few seconds. The second one a minute or two later, and eventually I think 5x httpd processes all at 99%. Then the swap usage starts ramping upwards from about 30% to 100%.
The weird thing about this is that it started happening suddenly, after 2+ years of running fine, with no changes, mods, upgrades or anything else.
Are there any hits in the web server logs? Either in the access log to show that the server is being hammered or in the error log to give a possible trace for what is being done?
Looks similar to Are these log errors normal?, not sure if @Stephen_Switzer ever got to the bottom of it, but it was related to viewing a ticket (missing a tag? or with special chars that trip somewhere down the line, RT handles special/regional chars pretty badly). Are the tickets from when around it started all fine? Maybe viewing one of them generates more output?
The web server itself is functioning fine (other than being unresponsive when the swap fills up) and httpd-access.log seems to reflect that. httpd-error.log is full of the same errors as /var/log/messages but with less detail.
Yeah, I did see that thread. There doesn’t seem to be any mention of the underlying cause or what he did to resolve it though.
So I searched my system for ‘mason_cache’ and only found one directory so figured it must be it. The cache doesn’t seem to have regenerated at all, which makes me wonder I’m looking in the wrong place. What causes the cache to re-populate?
Ok, so looks like I found the actual mason_cache folder, deleted the cache, and it’s re-populated it ok. Main issue remains, however.
Yeah, that’s a plague on this forum.
Do you have some accidental special character Subject Tag added to one of the queues maybe? Seems to be the only use of ‘tag’ from UI side, so worth a shot.
Nah nvm, $tag is from the HTML Element/Formatter number_lists so it’s most likely html tags
Nothing that I can see. I did an sql query in the database for around the time it started happening, looking for anything weird and didn’t find it. There was one ticket with a suspicious attachment, but I deleted that.
We don’t use gpg, and I had never even used shredder until yesterday. I’ll have a closer look at those other ones.
Considering updating from 5.0.0 to 5.0.3 in case it’s just a bug in the software.
Appreciate your efforts here guys.
Sorry, I went ahead looking for $tag, but it’s not in RT code, but perl’s HTML, edited out the post but seems too late (seeing how this is happening in 4.4 on Ubuntu and 5.0 on BSD it seems like a pretty nasty bug, if it actually is just malformatted email that would cause CPU spiking like that… Hopefully noone forwards it to our instance)