• Punkie@lemmy.world
    link
    fedilink
    English
    arrow-up
    93
    ·
    edit-2
    9 months ago

    I was burned afoul by a former admin who, instead of diagnosing why a mail service was failing, labeled a script as a /etc/cron.d file entry as “…” (three dots) which, unless you were careful, you’d never notice in an "ls " listing casually. The cron job ran a script with a similar name which he ran once every 5 minutes. It would launch the mail service, but simultaneous services were not allowed to run on the same box, so if it was running, nothing would happen, although this later explained hundreds of “[program] service is already running” errors in our logs. It was every 5 minutes because our solarwinds check would only notice if the service had been down for 5 minutes. The reason why the service was crashing was later fixed in a patch, but nobody knew about this little “helper” script for years.

    Until one day, we had a service failover from primary to backup. Normally, we had two mail servers servers behind a load balancer. It would serve only the IP that was reporting as up. Before, we manually disabled the other network port, but this time, that step was forgotten, so BOTH IPs were listening. We shut down the primary mail service, but after 5 minutes, it came back up. The mail software would sync all the mail from one server to the other (like primary to backup, or reversed, but one way only). With both up, the load balancer just sent traffic to a random one.

    So now, both IPs received and sent mail, along with web interface users could use. But now, with mail going to both, it created mass confusion, and the mailbox sync was copying from backup to primary. Mail would appear and disappear randomly, and if it disappeared, it was because backup was syncing to primary. It was slow, and the first people to notice were the scant IMAP customers over the next several days. Those customers were always complaining because they had old and cranky systems, and our weekend customer service just told them to wait until Monday. But then more and more POP3 customers started to notice, and after 5 days had passed, we figured out what had happened. And we only did Netbackups every week, so now thousands of legitimate emails were lost for good over 3000 customers. A lot of them were lawyers.

    Oof.

    • Kid_Thunder@kbin.social
      link
      fedilink
      arrow-up
      42
      ·
      9 months ago

      I was shadow IT for a project and asked IT to design this special unconventional thing which of course they wouldn’t. So I made this little embedded linux device to take care of it. Gave them the design and steps I made and all that. They were like “nah” so I told them to give me admin on their file server and switch and I’d just do it myself. So they did (lol?).

      I had to create a service account, so instead of just having the system account do it on their file server because I figured that wouldn’t be OK. I asked them how do I properly get a service account approved and they passed me to Cyber who had me submit a user request. It got denied because it didn’t have a signed user agreement or a Sec+ or similar cert…

      So I created a word doc that said “I am not a real person and therefore cannot sign any contracts. I am just software man.” and exported it to PDF and named it the same name of the agreement file name. Did the same for the cert. They approved it.

      Then nobody ever created the account because IT’s helpdesk couldn’t figure out how to do it. I think it was more that they probably didn’t have an OU structure properly set up so they wanted some architect or something to weigh in.

      Anyway, I just let System do it because, well I had been waiting months at that point. The service account probably still doesn’t exist in AD. They then took my admin privs away and got credit from upper management for solving this odd problem that my stuff took care of.

      Eventually they needed a more robust solution and also in a few more places since it worked well but they started slamming it a bit too hard with data. They wanted to just keep giving me specific rights and then take them away when I was done but also submit paperwork every single time to them to do it.

      Apparently, I burnt bridges when I said “nah” as a Reply to All when they told me that. But who cares to have a bridge to nowhere anyway? As far as I know (since I still occasionally get a technical question about it) my little guy is still chugging away today, though I’ve moved on since then.