"Use of uninitialized value $tag in string.." errors

So, it seems that every ticket number mentioned in the errors doesn’t exist in the database. Something trying to iterate tickets and erroring because they’re not there?

I guess if it is, you’ll need to track down the process that is doing that. I assume you’ve tried flushing the Mason cache and restarting the web server already, so I’d be looking at cron jobs, extensions/local code changes and then incoming mail contents next to see if I could find what was generating these non-existent ticket number references. Its not something I’ve ever seen on our RT installations I’m afraid.

Wait wait, tell me more about flushing the Mason cache. I know naught of this.

Ok, googled the Mason cache thing, tried that, no effect.

No, I’ve never seen this either, in ~12 years of running RT3, RT4, and now RT5.

FWIW, we flush our Mason cache daily (in the wee hours using a cron job) as that means that any local tweaks, etc that we do to the Mason pages will magically appear to the users overnight whilst we’re tucked up in bed. :wink:

Do you have a process monitor such as top installed that might show what processes are consuming all the CPU? In other words, is the RT web UI handler, the database server or some other external process gobbling up the cycles? Or is it just Apache? When you restart the web server does it immediately pin up to 100% CPU or ramp up slowly to that?

I’d never done it before. We don’t generally make any modifications to RT, we just use it.

The processes are all httpd processes. From a restart I get 1x httpd process at 99-100% within a few seconds. The second one a minute or two later, and eventually I think 5x httpd processes all at 99%. Then the swap usage starts ramping upwards from about 30% to 100%.

The weird thing about this is that it started happening suddenly, after 2+ years of running fine, with no changes, mods, upgrades or anything else.

Are there any hits in the web server logs? Either in the access log to show that the server is being hammered or in the error log to give a possible trace for what is being done?

Looks similar to Are these log errors normal?, not sure if @Stephen_Switzer ever got to the bottom of it, but it was related to viewing a ticket (missing a tag? or with special chars that trip somewhere down the line, RT handles special/regional chars pretty badly). Are the tickets from when around it started all fine? Maybe viewing one of them generates more output?

The web server itself is functioning fine (other than being unresponsive when the swap fills up) and httpd-access.log seems to reflect that. httpd-error.log is full of the same errors as /var/log/messages but with less detail.

Yeah, I did see that thread. There doesn’t seem to be any mention of the underlying cause or what he did to resolve it though.

So I searched my system for ‘mason_cache’ and only found one directory so figured it must be it. The cache doesn’t seem to have regenerated at all, which makes me wonder I’m looking in the wrong place. What causes the cache to re-populate?

Ok, so looks like I found the actual mason_cache folder, deleted the cache, and it’s re-populated it ok. Main issue remains, however.

Yeah, that’s a plague on this forum.
Do you have some accidental special character Subject Tag added to one of the queues maybe? Seems to be the only use of ‘tag’ from UI side, so worth a shot.
Nah nvm, $tag is from the HTML Element/Formatter number_lists so it’s most likely html tags

Nothing that I can see. I did an sql query in the database for around the time it started happening, looking for anything weird and didn’t find it. There was one ticket with a suspicious attachment, but I deleted that.

We don’t use gpg, and I had never even used shredder until yesterday. I’ll have a closer look at those other ones.

Considering updating from 5.0.0 to 5.0.3 in case it’s just a bug in the software.

Appreciate your efforts here guys.

Sorry, I went ahead looking for $tag, but it’s not in RT code, but perl’s HTML, edited out the post but seems too late (seeing how this is happening in 4.4 on Ubuntu and 5.0 on BSD it seems like a pretty nasty bug, if it actually is just malformatted email that would cause CPU spiking like that… Hopefully noone forwards it to our instance)

Can you check the inbox if there is an email hanging there, maybe RT is trying to receive/parse/process but fails? We had an issue with big emails failing to get processed by RT but it was sending out the confirmation emails with ticket numbers that never got created in the DB, long story short we had to increase FcgidIOTimeout (and the only thing pointing in that direction was a single line warning in apache logs, RT logs even on debug level had nothing), if the logs are referring to non-existent ticket IDs it is at least somewhat similar in that regard, maybe it is trying to process tickets that failed in the first place

I think I may be wrong about the log entries being ticket numbers. I took the “RT[xxxx]” to be a ticket number as it’s in the same format, but I think it may just be a log event number.

RT[92913]: [92913] Use of uninitialized value $tag in string eq at /usr/local/lib/perl5/site_perl/HTML/Element.pm line 2634.

Or possibly the process ID? I don’t know how BSD Unix does it’s logging.

Whelp, this is embarassing. I resolved the issue. Turns out there was an email in the inbox that RT was somehow failing to retrieve.

Fetchmail was still working fine, and the email question wasn’t blocking the mailbox for other emails. The email itself didn’t appear to be problematic or corrupt in anyway, and nothing in any of the logs pointed to it being an email issue. The only reason I found it was because I had decided to make a copy of the VM to mess with. I copied it, and edited the copy with a new IP address and name, and disabled fetchmail to avoid screwing with the production VM. In doing so, I discovered the copy didn’t have the issue.

I want to give a big thanks to both of you for continuing to help me with this issue.

Adam

Sounds very similar to an issue we had with wsgetmail getting stuck on an email (but still continued retrieving other mail). If I recall, adding the timeout option to the config file was the solution, as otherwise processes started stacking up.

Just figured I’d add this in case someone runs into a similar issue with wsgetmail & O365 mail retrieval. Anyways—glad you got it sorted.