What am I doing with mod_rewrite?

Question

I am aware of the canonical question and have read it, yet I seem to be unable to find some stuff there.

Here are my conditions and rules to drop www and force https:

RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ https://%1/$1 [R=301,L,NE]

RewriteCond %{HTTPS} off
RewriteCond %{HTTP:X-Forwarded-Proto} !https
RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L,NE]

I understand what I am trying to match. However the substitution rules are a bit unclear to me. What I don't understand is:

How did my hostname (without www.) end up in %1?
Why isn't the query string lost when the second rule is applied?

The reason behind the second question is that the manual explicitly states (highlighted by me):

REQUEST_URI

The path component of the requested URI, such as "/index.html". This notably excludes the query string which is available as as its own variable named QUERY_STRING.

MrWhite · Accepted Answer · 2017-04-22T15:17:52.313

3

I assume these directives are working OK and you are just after an explanation as to why?

How did my hostname (without www.) end up in %1?

%1 is a backreference to the first captured group in the last matched CondPattern. So, given the following condition:

RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]

The regex (ie. CondPattern) ^www\.(.*)$ is matched against the HTTP_HOST server variable. The match is successful when HTTP_HOST satisfies the regex ^www\.(.*)$, which is www. followed by anything. That anything is part of a captured group (parenthesised subpattern). ie. (.*), not simply .*. Whatever matches the (.*) group is saved in the %1 backreference and can be used later in the RewriteRule substitution. For example, given a request for www.example.com/something, this becomes:

RewriteCond www.example.com ^www\.(.*)$ [NC]

%1 will therefore contain example.com.

Why isn't the query string lost when the second rule is applied?

Because, if you don't explicitly include a query string on the RewriteRule substitution then the query string from the request is automatically appended onto the end of the resulting substitution.

However, if you included a query string on the end of the substitution, even just an empty query string (a ? followed by nothing), then the query string from the request is not appended. For example:

RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI}? [R=301,L,NE]

This will result in the query string being stripped from the request (note the trailing ?). Alternatively, on Apache 2.4+ you can use the QSD (Query String Discard) flag to prevent the query string being appended.

Aside: I also removed the parentheses from the RewriteRule pattern. You don't need a captured group here, since you are using the REQUEST_URI server variable instead. (This would be available in the $1 backreference - note the $ prefix. Storing backreferences when you don't need them is just waste of resources and hampers readability.)

RewriteCond %{HTTP:X-Forwarded-Proto} !https

I assume your server is behind a proxy server that is setting the X-Forwarded-Proto header?

edited Apr 22 '17 at 15:17

answered Apr 22 '17 at 14:58

MrWhite

13,016

Thank you for your explanations! Yes, I patched the directives from examples and they are working. I became suspicious because one rule consisted of %1 and $1 and the other included variables directly. It turned out useful as you revealed to me that i store a redundant reference in the last rule. Do I understand correctly that the %{REQUEST_URI} and $1would behave the same in both rules and are different only because I got the parts from different examples? If so, how should one choose which to use? And to expand on query string - is anything else "automatically appended" or only that? – Džuris Apr 22 '17 at 15:23
1

As to your last question - that document root is served over two domain names, let's say example.org and assets.example.com. The assets on example.org are included using assets.example.com domain name which points to a proxy that caches the assets. That's why I had to put two RewriteConds there. – Džuris Apr 22 '17 at 15:29
1

%1 is a backreference to the last matched RewriteCond directive and $1 is a backreference to the RewriteRule pattern. Using %{REQUEST_URI} or $1 in this instance is largely a matter of preference. However, they are not necessarily the same - it depends on context. In a directory (incl. .htaccess) context then they are slightly different, however, in a server config/virtual host context they are probably the same. – MrWhite Apr 22 '17 at 16:12
1

eg. In .htaccess then a request for example.com/path/to/file would result in REQUEST_URI containing /path/to/file, but $1 would contain path/to/file (note the missing slash prefix). This is consistent with your code example and from that I would assume you are in a directory (or .htaccess) context? – MrWhite Apr 22 '17 at 16:13
Yes, these are directives in a .htaccess. – Džuris Apr 22 '17 at 16:16
1

Nothing else is "automatically appended". Btw, the same applies to Redirect and RedirectMatch (mod_alias) directives. As regards which to use... REQUEST_URI or $1... REQUEST_URI is always the same, regardless of whether you are using server config or .htaccess. But it's not always possible to use $1 like this (instead of REQUEST_URI), for example: RewriteRule !^foo$ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L,NE] - this only redirects when the request is not /foo. (It's not possible to have a captured group in a negated regex.) – MrWhite Apr 22 '17 at 16:29

What am I doing with mod_rewrite?

1 Answers1