How to get href value of each a tag from the html using python Beautiful Soup

July 26, 2016

import requests
from bs4 import BeautifulSoup

link = "http://www.flipkart.com/mobiles?otracker=hp_header_nmenu_sub_Electronics_0_Mobiles"
doc = requests.get(link)
soup = BeautifulSoup(doc.text, 'html.parser')
main_div = soup.find(id="list-tagcloud")
div2=main_div.find_all('div')[1]
links = div2.find_all('a')
for link in links:
print link.attrs.get('href')

OUTPUT
======
/mobiles/motorola~brand/pr?sid=tyy,4io
/mobiles/lenovo~brand/pr?sid=tyy,4io
/mobiles/samsung~brand/pr?sid=tyy,4io
/mobiles/leeco~brand/pr?sid=tyy,4io
/yu-yunicorn/p/itmejeuf7egdedar?pid=MOBEJ3MF23Q9MGMH
/mobiles/honor~brand/pr?sid=tyy,4io
/mobiles/mi~brand/pr?sid=tyy,4io
/mobiles/asus~brand/pr?sid=tyy,4io
/mobiles/apple~brand/pr?sid=tyy,4io
/mobiles/intex~brand/pr?sid=tyy,4io
/mobiles/sony~brand/pr?sid=tyy,4io
/mobiles/alcatel~brand/pr?sid=tyy,4io
/mobiles/lava~brand/pr?sid=tyy,4io
/gionee-store
/mobiles/pr?sid=tyy,4io

Search This Blog

Go Helps

How to get href value of each a tag from the html using python Beautiful Soup

Comments

Post a Comment

Popular posts from this blog

AttributeError: Got AttributeError when attempting to get a value for field `abc` on serializer `PfleSerializer`. The serializer field might be named incorrectly and not match any attribute or key on the `QuerySet` instance. Original exception text was: 'QuerySet' object has no attribute 'abc'.

NameError: name 'logging' is not defined

ImportError: No module named regex