web crawler doubt in python

0 manish khilnani · February 1, 2015
hello people

I am trying to learn python from this website.
I was trying to build a web crawler and was trying to crawl through amazon search result given on the below url

http://www.amazon.com/s/ref=sr_pg_2?fst=as%3Aon&rh=n%3A7141123011%2Ck%3Ajordan+shoes&page=2&keywords=jordan+shoes&ie=UTF8&qid=1422739789

I am trying to print the name of all the shoes listed on the page but on running the code all i am getting is "Process finished with exit code 0"

i have the below code:


import requests
from bs4 import BeautifulSoup

def amazon_spider(max_pages):
page = 2
while page <= max_pages:
url = 'http://www.amazon.com/s/ref=sr_pg_2?fst=as%3Aon&rh=n%3A7141123011%2Ck%3Ajordan+shoes&page=' + str(page) +'&keywords=jordan+shoes&ie=UTF8&qid=1422738678'
source_code= requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll('a',{'class':'a-size-base s-inline s-access-title a-text-normal'}):
title = link.string
print(title)
page = page+1

amazon_spider(2)

it would be great if someone can go through the code and point out the error..thanks

Post a Reply

Replies

Oldest  Newest  Rating
0 Tanner Hoke · February 1, 2015
Not sure if you just pasted wrong but your for loop code is not properly indented. Secondly, you are searching for all links that have the class 'a-size-base s-inline s-access-title a-text-normal', but in reality you should be searching all h2 tags. These are the tags that hold the titles on Amazon, not the anchor or link tags.
  • 1

Python

107,282 followers
About

This section is all about snakes! Just kidding.

Links
Moderators
Bucky Roberts Administrator