Using sed get substring between two double quotes

Question

I have a file

xyz... rsync: "/home/path/to/file": Permission denied (13) rsync:
"/home/path/to/file1": Permission denied (13) rsync:
"/home/path/to/file2": Permission denied (13) rsync:
"/home/path/to/file3": Permission denied (13)

Now I want to extract the file paths only and store it to another file. Output file is like:

/home/path/to/file 
/home/path/to/file1 
/home/path/to/file2
/home/path/to/file3

Using sed or awk how can I do this?

I have tried sed -n '/"/,/"/p' myfile but its not working.

To those voting to close — How can this possibly be off-topic? It is about shell programming!! That's PROGRAMMING which is ON TOPIC for Stack Overflow! — Jonathan Leffler, Dec 03 '12 at 17:01
Welcome to Stack Overflow. As you can see, we occasionally have problems with people having itchy trigger fingers closing perfectly good questions (such as this one) with bad reasons for closure. It doesn't happen all that often (or, I don't get to see the problem in time all that often), but it does happen. Don't forget to read the [FAQ] before too long. — Jonathan Leffler, Dec 03 '12 at 17:04

score 24 · Accepted Answer · answered Dec 03 '12 at 13:55

24

You can pipe stderr of your rsync command to a awk script:

awk -F '"' '{print $2}'

Or to a cut command like this:

cut -d'"' -f2

answered Dec 03 '12 at 13:55

anubhava

1,068

3

Or, shorter: cut -d\" -f2 – Dec 03 '12 at 13:58
@AndersJohansson: Thanks I added your cut command to answer as well. – anubhava Dec 03 '12 at 13:59
I think this is not going to work ..as you can see the field number of file path is not fixed $2 or f2 ..Thanks! – Dec 04 '12 at 05:47
Actually rsync will always write filepath first between " and " on stderr. – anubhava Dec 04 '12 at 06:31
1

@Jam88: Actually, it will work because of the way anubbhava has written it. The field delimiter is set to double quote. That means that everything up to the first double quote (possibly an empty string) is $1; everything between the first and second double quotes is $2; and everything after the second double quote is in $3 ($4, ...). The file name is (apparently) always between the first two double quotes, so this solution should work (and did when I tested it). – Jonathan Leffler Dec 04 '12 at 06:32
@JonathanLeffler: Thanks a lot for you detailed comment, I couldn't have explained it better. – anubhava Dec 04 '12 at 06:33

Jonathan Leffler · Answer 2 · 2014-06-10T18:07:44.563

Using sed:

sed 's/^[^"]*"\([^"]*\)".*/\1/'

That looks for: beginning of line, a series of non-quotes, a double quote, captures a series of non-quotes, a double quote and anything else on the line, and replaces it by the captured material.

$ sed 's/^[^"]*"\([^"]*\)".*/\1/' <<'EOF'
> xyz... rsync: "/home/path/to/file": Permission denied (13) rsync:
> "/home/path/to/file1": Permission denied (13) rsync:
> "/home/path/to/file2": Permission denied (13) rsync:
> "/home/path/to/file3": Permission denied (13)
> EOF
/home/path/to/file
/home/path/to/file1
/home/path/to/file2
/home/path/to/file3
$

Test on RHEL 5 Linux with GNU sed, but only using features that would have worked in 7th Edition UNIX™ version of sed.

Incidentally, a slightly simpler way to do it is with two substitute commands; change everything up to and including the first double quote to an empty string (that's a sequence of zero or more non quotes followed by a double quote); change everything after what is now the first double quote to nothing:

sed 's/^[^"]*"//; s/".*//'

Incidentally, the command you tried (`sed -n '/"/,/"/p') prints from one line containing a double quote to the next line containing a double quote, without editing the lines at all. Which was why it didn't seem to work for you — it did what you asked, but what you asked it to do wasn't what you intended to ask it to do.

Efficiency-wise, there's unlikely to be a measurable difference in the performance. In terms of ease of maintenance, I suspect the latter is less taxing on the brain cells.

score 1 · Answer 3 · answered Dec 03 '12 at 21:37

1

If your version of grep supports Perl-regexp:

grep -oP '(?<=")/home/.*?(?=")' file >> anotherfile

Results:

/home/path/to/file
/home/path/to/file1
/home/path/to/file2
/home/path/to/file3

You could also make this less strict, to match anything between the doubles if you desire:

grep -oP '(?<=")[^"]*' file >> anotherfile

answered Dec 03 '12 at 21:37

Steve

511

Do you need to make the .* non-greedy with .*? just in case there's an extra double quote later in the line? Or use [^"]* in place of .*? – Jonathan Leffler Dec 04 '12 at 06:33

score -1 · Answer 4 · answered Dec 03 '12 at 13:52

-1

Use the >> operator to save any output to a file.

Like

grep -r "pattern" * >> file.txt

So just change that for your specific scenario using sed by appending

>> filename

to the command

answered Dec 03 '12 at 13:52

AStupidNoob

371

1

The grep -r does a recursive search through any directories listed in the arguments (*). It's not clear what pattern you have in mind, but grep will pick up the whole line. The purpose of the exercise is to collect information from part of a line. If you're using GNU grep, there are ways to do that (-o); these are non-standard (except to the extent that GNU defines a de facto standard). Similarly with the use of PCRE regular expressions; those are another GNU extension. They're fine if you have GNU grep and no plans to work on platforms where GNU grep is not available by default. – Jonathan Leffler Dec 04 '12 at 06:40
Sorry I missed that, I thought he wanted to know in general what to do to put output into a file, and grep was just an example. – Dec 04 '12 at 07:12

Using sed get substring between two double quotes

4 Answers4