DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Snippets has posted 5883 posts at DZone. View Full User Profile

Filter All URLs From Standard Input

  • submit to reddit

Put the following in a file named
#!/usr/bin/env python
'''Prints a list of URLs that are found in standard input.

It will only find URLs between quotes ("" or '') and starting with http://

import re
import sys

# Pattern for fully-qualified URLs:
url_pattern = re.compile('''["']http://[^+]*?['"]''')

# build list of all URLs found in standard input
s =
all = url_pattern.findall(s)

# output all the URLs
for i in all:
    print i.strip('"').strip("'")

Example Usage:
wget -O - | ./ | sort | uniq