It is just a consequence of how the lazy quantifier in regular expression works. Online test gives the same result.
You should understand that string expressions are first converted to regular expressions by Mathematica. You can see the result with StringPattern`PatternConvert:
StringPattern`PatternConvert[#][[1]] & /@ {"href=\"" ~~ Shortest[x__] ~~ y : ".css",
"=\"" ~~ Shortest[x__] ~~ y : ".css", "\"" ~~ Shortest[x__] ~~ y : ".css"}
{"(?ms)href=\"(.+?)(\\.css)", "(?ms)=\"(.+?)(\\.css)", "(?ms)\"(.+?)(\\.css)"}
Hence you shouln't be fooled by the name Shortest: it has no relation to Mathematica's own patter matcher's Shortest, which behaves differently.
Further reading:
UPDATE
On this page, several techniques to overcome this feature of the lazy quantifier are provided, including the most general (but not the most efficient) Tempered Greedy Token Solution. It can be applied as follows for making an equivalent of what could be called a true shortest BlankNullSequence string expression*:
Clear[shortest]
shortest[start_, end_, "IncludeBoundaries" -> True] :=
RegularExpression[
StringTemplate["`START`(?:(?!`START`)(?!`END`).)*`END`"][<|"START" -> start,
"END" -> end|>]];
shortest[start_, end_] := shortest[start, end, "IncludeBoundaries" -> True]
shortest[start_, end_, "IncludeBoundaries" -> False] :=
RegularExpression[
StringTemplate["(?<=`START`)(?:(?!`START`)(?!`END`).)*(?=`END`)"][<|"START" -> start,
"END" -> end|>]];
Testing:
front = "A";
back = "B";
str = "A---A--A____B-A_B-A---A______B---AAAAB";
StringCases[str, shortest[front, back, "IncludeBoundaries" -> False]]
{"____", "_", "______", ""}
front = "href=\"";
back = "\\.css";
str = "\"/><link rel=\"stylesheet\" type=\"text/css\" \
href=\"/some/path/to/css/name.min.css";
StringCases[str, shortest[front, back, "IncludeBoundaries" -> False]]
front = "=\"";
StringCases[str, shortest[front, back, "IncludeBoundaries" -> False]]
front = "\"";
StringCases[str, shortest[front, back, "IncludeBoundaries" -> False]]
{"/some/path/to/css/name.min"}
{"/some/path/to/css/name.min"}
{"/some/path/to/css/name.min"}
*As the OP shows in the comments, this method fails miserably in more complicated cases:
front = "tomato";
back = "iconic";
str = "gffghtomatomato12345iconiconictomatomatoiconiconic";
StringCases[str, shortest[front, back, "IncludeBoundaries" -> False]]
{"mato12345", "mato", ""}
This result is wrong. The expected result is {"12345",""}.
Here is another version which gives the desired result:
Clear[shortest2]
shortest2[str_, start_, end_] :=
StringCases[str,
RegularExpression[
StringTemplate["(?!.{1,`len`}`START`)`START`((?:(?!`START`)(?!`END`).)*)`END`"][<|
"len" -> StringLength[start], "START" -> start, "END" -> end|>]] -> "$1"];
front = "tomato";
back = "iconic";
str = "gffghtomatomato12345iconiconictomatomatoiconiconic";
shortest2[str, front, back]
{"12345", ""}
However, in some special cases this method also fails:
front = "NotEnd";
back = "End";
str = "NotEndNotEnd1234NotEnd";
shortest2[str, front, back]
{}
Hence the approach suggested by the OP should be preferred.
UPDATE 2
It seems that I managed to find a really universal solution through regular expressions:
Clear[ShortestStringBetween]
Options[ShortestStringBetween] = {"IncludeBoundaries" -> False,
"BoundaryOverlaps" -> False};
ShortestStringBetween[str_String, start_String, end_String, OptionsPattern[]] :=
Module[{bInclude = OptionValue["IncludeBoundaries"],
bOvelap = OptionValue["BoundaryOverlaps"]},
Which[
bInclude && Not[bOvelap],
StringCases[str, RegularExpression[
StringTemplate["`START`(?:(?!`END`).(?<!`START`))*`END`"][
<|"START" -> start, "END" -> end|>]]],
Not[bInclude] && Not[bOvelap],
StringCases[str, RegularExpression[
StringTemplate["`START`((?:(?!`END`).(?<!`START`))*)`END`"][
<|"START" -> start, "END" -> end|>]] -> "$1"],
Not[bInclude] && bOvelap,
StringCases[str, RegularExpression[
StringTemplate["(?<=`START`)(?:(?!`END`).(?<!`START`))*(?=`END`)"][
<|"START" -> start, "END" -> end|>]]],
bInclude && bOvelap,
StringCases[str, match : RegularExpression[
StringTemplate["(?<=`START`)(?:(?!`END`).(?<!`START`))*(?=`END`)"][
<|"START" -> start, "END" -> end|>]] :> StringJoin[start, match, end]]
]];
Note that the start and end parameters are directly inserted into RegularExpression and therefore must be regular expressions in the Mathematica format. And since PCRE (on which RegularExpression is based) doesn't support infinite repetition within a lookbehind, the start parameter must be a fixed-length regexp or contain alternations of different but pre-determined lengths (for example, "cat|raccoon"). The end parameter has no such restriction. But I haven't tested how this implementation behaves with non-fixed length parameters.
It works correctly in the all test cases:
front = "tomato";
back = "iconic";
str = "gffghtomatomato12345iconiconictomatomatoiconiconic";
ShortestStringBetween[str, front, back]
{"12345", ""}
front = "NotEnd";
back = "End";
str = "NotEndNotEnd1234NotEnd";
ShortestStringBetween[str, front, back]
ShortestStringBetween[str, front, back, "BoundaryOverlaps" -> True]
{"Not"}
{"Not", "1234Not"}
Shortestwhen it does not represent the shortest possible string then? – azerbajdzan Aug 26 '22 at 14:11Shortestis dubious. Please read this answer for a discussion. – Alexey Popkov Aug 26 '22 at 14:14RegularExpression-based solution. – Alexey Popkov Aug 28 '22 at 04:05shortest["gffghtomatomato12345iconiconictomatomatoiconiconic","tomato","iconic"]=={"12345",""}while yoursStringCases["gffghtomatomato12345iconiconictomatomatoiconiconic", shortest["tomato", "iconic", "IncludeBoundaries" -> False]]=={"mato12345","mato",""}. Your code did not even find "12345" and on the other hand returns"mato"which is longer then""."mato"would be valid only if overlaps are allowed. – azerbajdzan Aug 28 '22 at 08:02"mato"there would be missing"icon"in your output. Furthermore - if overlaps were allowed then "shortest" loses its meaning because then all occurrences ofstart~~___~~endwould be valid. – azerbajdzan Aug 28 '22 at 08:18Shortest. Because it is hard to test the "real shortest" in all circumstances of different possible nested strings. You can never be sure whether you overlooked some complexly nested string for which it might not work. – azerbajdzan Aug 29 '22 at 09:23