Help Python3: Regex for extracting URL from file

0 Kristine Birkeland · April 14, 2015
Hi all !

Does anyone have a good simple regex for extracting a URL, even better the domain part of URL, from a textfile? When i run my code, it only returns "None". I've googled the * out of this, can't find a better regex.:ermm:


urls = re.findall(r'^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$', myfile )
print(', '.join(urls))

I've alo tried re.search, no luck. 


A little desperate :)

Post a Reply

Replies

Oldest  Newest  Rating
0 Kristine Birkeland · April 14, 2015
I found one mistake, fixed it like this:

with open(myfile, "r") as f:
           data = f.read()

            urls = re.search(r'^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$', data)
            IPs = re.search(r'^(?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$', data)

            print(urls)
            print(IPs)


But im still getting None, even though there are a lot of URLs in the file
+1 David Blake · April 16, 2015
Hi,

For Regex's I use this site:

http://regexlib.com/Search.aspx?k=url&c=-1&m=-1&ps=20

Much easier than figuring it out myself. They give patterns they know it matches, and those it doesn't. If your program doesn't match one that they know it does match, then the problem must be elsewhere in your code.

HTH

DB
0 Kristine Birkeland · April 20, 2015
Sweet! Thanks, will look at it right away! :D
  • 1

Python

107,136 followers
About

This section is all about snakes! Just kidding.

Links
Moderators
Bucky Roberts Administrator