Mac OSX Terminal – Lynx HTML Parsing

Lynx has a lot of good purposes but in this example below I’m simply going to strip URLs from a bookmarks HTML file exported from Chrome.

brew install lynx

lynx -dump -nonumbers -listonly bookmarks_11_18_16.html > links1.txt

If you have created folders and links all over the place in Chrome or Firefox this below will be helpful

lynx -dump -nonumbers -listonly bookmarks_11_18_16.html > links1.txt
lynx -dump -nonumbers -listonly bookmarks_11_23_16.html > links2.txt
cat links1.txt links2.txt > links3.txt && rm links1.txt links2.txt

awk '!seen[$0]++' links3.txt > linksclean.txt && rm links3.txt

sort linksclean.txt | uniq -u > linkscleanUnique.txt && rm linksclean.txt

while IFS= read -r line
do
echo "<DT><A HREF=\"$line\">$line</a></DT>"
done < linkscleanUnique.txt > linkslist.txt
cat header.txt linkslist.txt footer.txt > Clean_Bookmarks_$(date +%Y-%m-%d_%H.%M.%S).html

All information on this site is shared with the intention to help. Before any source code or program is ran on a production (non-development) system it is suggested you test it and fully understand what it is doing not just what it appears it is doing. I accept no responsibility for any damage you may do with this code.

Related