Bucky's Web Crawler - Error

0 Probuddha N · April 17, 2015

So I followed Bucky's web crawler tutorials and coded a crawler for TrueLocal.com.au

Following is the entire code

import requests
from bs4 import BeautifulSoup

def tru_crawler(max_pages):
page = 1
while page <= max_pages:
url = 'http://www.truelocal.com.au/find/car-rental/' + str(page)
code = requests.get(url)
text = code.text
soup = BeautifulSoup(text)
for link in soup.findAll('a', {'class':'name'}):
href = 'http://www.truelocal.com.au' + link.get('href')
title = link.string
# print (href)
# print (title)
page += 1

def get_single_item_data(item_url):
code = requests.get(item_url)
text = code.text
soup = BeautifulSoup(text)
for item_name in soup.findAll('h1',{'itemprop':'name'}):
print item_name.string
for link in soup.findAll('a'):
href = 'http://www.truelocal.com.au' + link.get('href')
print (href)


However when I run this, it gets the first page Title and the links from that page and then throws up this error

Traceback (most recent call last):
  File "C:/Users/PB/PycharmProjects/crawler/new-crawler.py", line 30, in <module>
  File "C:/Users/PB/PycharmProjects/crawler/new-crawler.py", line 16, in tru_crawler
  File "C:/Users/PB/PycharmProjects/crawler/new-crawler.py", line 26, in get_single_item_data
    href = 'http://www.truelocal.com.au' + link.get('href')
TypeError: cannot concatenate 'str' and 'NoneType' objects

Please advise on what's going wrong here


Post a Reply


Oldest  Newest  Rating
0 Probuddha N · April 28, 2015
Thanks a lot Timmy. Adding the str worked and led to even more confusion for me :angel:

I am just new to Python so could you please tell me why do I have to convert it to a string before concatenating them?

As per my limited understanding so far, I need to use the 'str' operator when I try to concatenate a string and a numerical value. But in this case, href isn't a numeric value.

Also I have coded similar crawlers, where I didn't need to convert the href to string in order to concatenate with the URL

Could you please throw some light on this. 

Many thanks
0 Timmy Cafe · April 25, 2015
it is in this part of the code you are getting the error:

href = 'http://www.truelocal.com.au' + link.get('href')

change the line to this:

href = 'http://www.truelocal.com.au' + str(link.get('href'))

The reason for this is because you were not concatenating the

  • 1



This section is all about snakes! Just kidding.

Bucky Roberts Administrator