Using Gevent to fetch urls website with python using non-blocking method


I have a lot urls links and need to check which one that have 404 (Not Found) or Server Down. With python, everything looks possible and easy. We can use simple urllib2 module to fetch webpages based on given links. But, problem comes when we dealing with IO Bound. When we iterating links and doing fetchings, one process will waiting others process finished before it starting.

GIL, yes, i guarantee you have hear about Global Intrepeter Lock which make threading and multiprocessing became useless in multiple-core system. It caused by CPython memory management which not thread-safe ( http://wiki.python.org/moin/GlobalInterpreterLock ).

But, don’t worries, we have another alternatives here called Gevent (If you have heard about Eventlet, then Gevent is his little brother).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import gevent
from gevent import monkey

# patches stdlib (including socket and ssl modules) to cooperate with other greenlets
monkey.patch_all()

import urllib2
import urls

def print_head(url):
    print("Starting %s" % url)
    data = urllib2.urlopen(url, timeout=5)
    print("%s => Status %s" % (url, data.code))

def main(urls):
    jobs = [gevent.spawn(print_head, url) for url in urls]
    gevent.joinall(jobs)

if __name__ == ‘__main__’:
    urls = ["http://yoodey.com", "http://yodi.me", "http://yodi.biz"]
    main(urls)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.