17

I need something similar to grep -A and grep -B but for characters. In other words, I have a file with incredibly long lines, e.g.:

[thousands of characters] mytext [thousands of characters]

If I do grep mytext file, I don't want the full lines because it will become way too difficult to read and result in a huge file if I pipe it out to a file. grep -o doesn't work for me because it only returns mytext and I need to see X characters around the match. So imagine a fake option -Y:

$ grep -Y mytext file
Pz8mytextgxe
sd@mytext.com

How do I do this?

3 Answers3

14

If you know Y up front, then you can do e.g.

grep -o '...mytext...' file

where the ... is Y characters long. E.g. the above does for Y=3. The '.' character in a regular expression matches any character.

4

If you want to find, let's say, from 0 to 10 chars before or after your intended search string, mytext, do this:

grep -rnioE '.{0,10}mytext.{0,10}'

Grep Options Explanation:

  1. The -r says to recursively search down the file and folder tree
  2. -n says to show line numbers
  3. -i says to be case insensitive
  4. -o says to only show the matching parts, not the whole line
  5. -E says to use Extended regular expressions

Regex Explanation:

See: https://regex101.com/r/BUpUdp/2.

  1. . matches any character except newlines
  2. {0,10} matches 0 to 10 instances of whatever is before it, which is ., or any character except newlines

Example usage:

I'd like to find any instances of this web page color code (#005cc5) to figure out if it's being used. But, the CSS is literally 5000 lines all on a single line with no line breaks, so I need to only capture a few of the surrounding chars for context--let's say up to 20 before and after. So, I search for grep -rnioE '.{0,20}#005cc5.{0,20}':

$ grep -rnioE '.{0,20}#005cc5.{0,20}'
Test Syntax Highlighting _ GabrielStaples.com home_files/main.css:5:.highlight .l{color:#005cc5}.highlight .n{color
Test Syntax Highlighting _ GabrielStaples.com home_files/main.css:5:.highlight .m{color:#005cc5}.highlight .s{color
Test Syntax Highlighting _ GabrielStaples.com home_files/main.css:5:highlight .mf{color:#005cc5}.highlight .mh{colo
Test Syntax Highlighting _ GabrielStaples.com home_files/main.css:5:r:#005cc5}.highlight .mi{colo
Test Syntax Highlighting _ GabrielStaples.com home_files/main.css:5:r:#005cc5}.highlight .mo{colo
Test Syntax Highlighting _ GabrielStaples.com home_files/main.css:5:r:#005cc5}.highlight .sb{colo
Test Syntax Highlighting _ GabrielStaples.com home_files/main.css:5:highlight .se{color:#005cc5}.highlight .sh{colo
Test Syntax Highlighting _ GabrielStaples.com home_files/main.css:5:highlight .si{color:#005cc5}.highlight .sx{colo
Test Syntax Highlighting _ GabrielStaples.com home_files/main.css:5:highlight .il{color:#005cc5}.gist th,.gist td{b
Test Syntax Highlighting _ GabrielStaples.com home_files/main.css:5:highlight .nb{color:#005cc5}.highlight .nc{colo

Screenshot with coloring:

enter image description here

The second match above shows this color applies to .m CSS classes, for instance, so I can now search the code for any matches using this "m" class, which may show up in some *.html files. (This next search seems to not find everything I want, but you get the idea! The search above works fine.)

grep -rniE 'class="m[\s"]?'
0

You can try applying line wrapping (e.g. to 80 characters) before actually searching:

cat file | fmt -w 80 | grep mytext

This has the drawbacks that whitespace (e.g. space vs. tab) is not always preserved in its exact form, and strings previously on the same line may now be on adjacent lines.

This doesn’t use fold because that command may break within (long) non-whitespace sequences (e.g. very long words).

caw
  • 198