1

I have thousands of files I downloaded text files of and they all follow the same pattern. The pattern seemed to work in a parser (and notepad++) but when I try to find it on the console and then ultimately want to pipe it to wget for downloading, I get grep: Invalid range end

grep -E "\(https://foo.domain.com/([A-z])\w+.pdf\)" * > wget

I am unfamiliar with proper wildcarding, as I tried .* or similiar, escaping the forward slashes to all no avail. I am sure it is something stupid.

Essentially everything is correct except there is a random string of text between the .com/zzz.pdf

Jonathan
  • 151
  • Provide example of any source text, escaped parentheses is required? – Gedweb Mar 31 '19 at 07:33
  • 1
    grep -oP "https://foo.domain.com/[A-z]+\w+.pdf" | wget -i - – sparse Mar 31 '19 at 08:33
  • @sparse Can you post as an answer, while I had to do additional steps in vi, (it added prefixed items and duplicate lines) but was trivial enough to fix and allowed me to use wget -i from a file rather than piping, thank you! – Jonathan Mar 31 '19 at 22:36

2 Answers2

0

By default, grep matches case-sensitively, therefore you must end any range with a character following the range start.

  1. This is invalid: [A-z] (because lower case z comes before upper case A)
  2. This is valid: [A-Z] (because upper case Z comes after upper case A)
  3. This is valid: [a-z] (because lower case z comes after lower case a)

I suspect you meant to write the third one (meaning all your matched URLs start with lower case)

The pattern may have worked in a different environment because that was configured to match case-insensitively, or, more likely, with a different collation order (try LC_COLLATE=C grep 'A-z').

anx
  • 9,748
  • The file names are all random cased, so it may be AzRRjkL.pdf for example. The length of the file name isn't fixed. So I just need to redirect what grep finds to wget for download. – Jonathan Mar 31 '19 at 21:23
0

grep -oP "https:\/\/foo\.domain\.com\/[A-z]+\w+\.pdf" | wget -i -

sparse
  • 81
  • 3