I have a file containing list of URLs (one entry in one line).
After processing it to extract the host- (server-)names
with the script below (which works correctly),
the host names that appeared multiple times in the input
were appearing multiple times in the displayed output.
I want each name to appear only once.
I tried uniq and sort -u, but they didn't help.
Below is the code I had used to extract the hosts:
function extract_parts {
if [ -f "wget-list" ]; then
while read a; do
a=${a:8}
host=$(echo -e "$a" | awk -F '/' '{print $1}' | sort -u)
# host=$(echo -e "$a" | awk -F '/' '{print $1}' | uniq -iu)
echo -e ${host}
done <<< $(cat ./wget-list)
fi
}
where the wget-list contains (as a truncated example):
https://downloads.sourceforge.net/tcl/tcl8.6.12-html.tar.gz
https://downloads.sourceforge.net/tcl/tcl8.6.12-src.tar.gz
https://files.pythonhosted.org/packages/source/J/Jinja2/Jinja2-3.1.2.tar.gz
https://files.pythonhosted.org/packages/source/M/MarkupSafe/MarkupSafe-2.1.1.tar.gz
https://ftp.gnu.org/gnu/autoconf/autoconf-2.71.tar.xz
https://ftp.gnu.org/gnu/automake/automake-1.16.5.tar.xz
Result after the script
(only the hosts, without the https:// and path parts):
downloads.sourceforge.net
downloads.sourceforge.net
files.pythonhosted.org
files.pythonhosted.org
ftp.gnu.org
ftp.gnu.org
Desired output (the above, but with no duplicates):
downloads.sourceforge.net
files.pythonhosted.org
ftp.gnu.org
a=${a:8}statement strips the first eight characters off$a. This will give undesired results if you ever get a URL beginning withhttp://(orftp://, etc.) instead ofhttps://. (2) You should always quote all shell variable references (e.g.,"$host") unless you have a good reason not to, and you’re sure you know what you’re doing. (3) Why are you using the-eoption ofecho? P.S.printfis better thanecho. … (Cont’d) – G-Man Says 'Reinstate Monica' Dec 21 '22 at 22:53<<< $(cat ./wget-list)—< ./wget-listis better. (5) See Why is using a shell loop to process text considered bad practice? – G-Man Says 'Reinstate Monica' Dec 21 '22 at 22:53