7

I had a backup mail server eat a bunch of messages and I want to just configure a simple server to send 450 for all messages.

However, I'm not sure this is a good idea because if the sending servers on retry do not attempt to send to the higher priority server, the messages can never be delivered.

I failed to find anything in RFCs on this, but do servers retry on a higher priority server or forever attempt on the same server that responded with 450?

Paul
  • 3,137
  • 5
    Is there any reason to have a second MX sending 450s? Having nothing at all (i.e. no server answering on that port, or no second MX listed at all) should have the same result and would be less work... But be aware that whatever the solution, senders do not keep mail forever. While many will retry for days, some will have a much shorter timeout (and some will blacklist your address at the first error!) – jcaron Jan 26 '24 at 00:28
  • 1
    I have experienced MTAs that do not retry under any circumstances. Usually they're related to authentication mechanisms that send a token to your email address for you to "prove" you are who you claim to be. My theory is that the messages are sent by monolithic applications designed/written by people who don't really understand SMTP and who certainly have never checked the relevant RFCs. – Chris Davies Jan 26 '24 at 12:01

2 Answers2

9

When SMTP delivery fails (without a receiving permanent 5xy negative completion SMTP response) and there's no alternate/backup MX defined, IIRC the RFC's leave it up to the sender implementation if delivery is attempted again and how frequently and for how long subsequent delivery attempts will be made. The standards also leave it up to the sender implementation if and how soon temporary delivery delay notifications get returned to the original sender and/or how quickly a permanent delivery failure error notification is returned.

When more than one MX record is defined for the recipient domain then for every delivery attempt a standards compliant sender should attempt delivery to all MX records, one after the another and honouring their priority, until either one MX accepts the message ( the sender receives a 2xx positive completion SMTP response), or one MX rejects the message (with a permanent 5xy negative completion SMTP response) or until all have been contacted.
When none of (primary and or backup) MX records have permanently accepted or rejected the mail message for your domain, then I would expect that the same, sender implementation specific, failure path is followed as for the case when no alternate MX records exist. I other words: "the sender may queue the message for a later delivery attempt or give up".

Do or do not

The whole concept of a multiple MX records and a backup mail server under your own control is the fact that you control it and you don't have to rely on the queuing and retry policy of the original sender, or the absence thereof.

When you don't want that control; when you don't want your back-up MX to queue your mail but want to leave queuing and delivery retrying up to the sender, simply don't configure a backup MX record at all. That should have the same effect (if not a better effect) than setting up a backup MX pointing to a server that only generates errors and which in reality won't accept and queue your email.

IMHO the intended purpose of a backup MX

Is that when your primary server goes offline and/or becomes unavailable due to an outage or planned maintenance, the RFC's and a correct implementation of the SMTP protocol by the sender should ensure that e-mail messages addressed to your domain get sent to your backup MX, where the mail will be accepted and queued for as long as it takes for the (planned) outage to end.
Your backup mailserver (and not the sender) will control if/when mail delivery delay/failure notifications will be sent to the original sender or not.
Once the outage has ended and your primary mail server and mailboxes are online again, you can flush the queue on the backup MX and (almost) immediately receive all queued mail. With the ETRN SMTP command for example.

HBruijn
  • 80,330
  • 24
  • 138
  • 209
3

As suggested in the comment by jcaron:

Is there any reason to have a second MX sending 450s? Having nothing at all (i.e. no server answering on that port, or no second MX listed at all) should have the same result and would be less work... But be aware that whatever the solution, senders do not keep mail forever. While many will retry for days, some will have a much shorter timeout (and some will blacklist your address at the first error!)

The right solution to your XY problem is not to have a backup MX. There is never any need to have more than one MX. Every legitimate sender retries delivery for well over a day, often over a week. In the event that your server has downtime and cannot accept the mail immediately, leaving it as the responsibility of the sending party's mail system, rather than some sketchy third party queuing service, is the obvious right thing to do. Not only does it avoid your mail getting "eaten"; it also avoids trusting a third party with authority to intercept your mail, and it ensures the sender will be notified (by their own mail system) in the event that an extremely long outage renders their mail undeliverable.

As for your actual question, how retry works for SMTP is under-specified, and I'm not sure there is any guarantee that a conforming sender would loop back to trying your primary after finding it unreachable, rather than continuing to try to deliver to the fake secondary MX reporting temporary failures. So even without the above reasoning why you shouldn't have a secondary MX, I think it's a bad idea to try this particular hack.

In case it's relevant, my experience with this is 22+ years of continuously running a personal/small-site mail system, and never having used more than one MX.

  • 2
    It's 2024. Having to wait for half a day for an e-mail message to make its way through the tubes just because a mail server needed a reboot can be seen as unacceptable. So there is often a need to have more than one server performing the MX role for a domain. Of course you can use a single MX record and put the IP addresses of those servers under a single hostname, but that can cause trouble with reverse DNS, for example. So yes, there are valid reasons to have multiple MX entries. – TooTea Jan 26 '24 at 11:12
  • @TooTea: The amount of time you wait is exactly the same either way. One way, there's backpressure that keeps it on the sender's outgoing mail system side, where they'll see a "temporarily undeliverable" after a while and know it hasn't been successfully delivered yet, and they're guaranteed a "permanently undeliverable" if things go really bad. The other way, a third-party, usually low-quality queuing service takes responsibility for it, and does the exact same type of delayed retry process the sender's outgoing server would have done, but the sender has no visibility into this. – R.. GitHub STOP HELPING ICE Jan 26 '24 at 15:09
  • The general principle here is that backpressure is almost always a very desirable thing. You make systems more complex and less reliable by letting data move forward to get buffered indefinitely despite a blockage in the path. Avoiding acceptance until the data can actually move is the better thing to do. – R.. GitHub STOP HELPING ICE Jan 26 '24 at 15:11
  • I'm not advocating using a "third-party, low quality" service as a MX (backup or primary). That's indeed likely always a bad idea. I'm saying you can perfectly well run two MX servers under your own control so that when one goes down for whatever reason, the other can take over. Sort of a poor man's HA cluster, without having to actually handle clustering. Either you literally make them equivalent in priority and get some basic load balancing, or you keep one "primary" and the other just sitting there doing nothing (perhaps a cloud instance), but ready to take over when needed. – TooTea Jan 26 '24 at 15:42
  • @TooTea: Unless you actually have a need for "HA", HA in general greatly increases system complexity and decreases reliability. Yes, you can evaluate on a case by case basis whether it makes sense (i.e. in the event of an outage, will you be able to do anything useful to actually reach/read your mail on the secondary MX? or will it just be queued there waiting to be transferred to your primary one when it comes back up?) but just doing HA stuff by default is not a good policy. – R.. GitHub STOP HELPING ICE Jan 26 '24 at 16:45