Bucky's Web Crawler - Error

0 Probuddha N · April 17, 2015
Hello

So I followed Bucky's web crawler tutorials and coded a crawler for TrueLocal.com.au

Following is the entire code


import requests
from bs4 import BeautifulSoup

def tru_crawler(max_pages):
page = 1
while page <= max_pages:
url = 'http://www.truelocal.com.au/find/car-rental/' + str(page)
code = requests.get(url)
text = code.text
soup = BeautifulSoup(text)
for link in soup.findAll('a', {'class':'name'}):
href = 'http://www.truelocal.com.au' + link.get('href')
title = link.string
# print (href)
# print (title)
get_single_item_data(href)
page += 1

def get_single_item_data(item_url):
code = requests.get(item_url)
text = code.text
soup = BeautifulSoup(text)
for item_name in soup.findAll('h1',{'itemprop':'name'}):
print item_name.string
for link in soup.findAll('a'):
href = 'http://www.truelocal.com.au' + link.get('href')
print (href)


tru_crawler(2)



However when I run this, it gets the first page Title and the links from that page and then throws up this error


Traceback (most recent call last):
  File "C:/Users/PB/PycharmProjects/crawler/new-crawler.py", line 30, in <module>
    tru_crawler(2)
  File "C:/Users/PB/PycharmProjects/crawler/new-crawler.py", line 16, in tru_crawler
    get_single_item_data(href)
  File "C:/Users/PB/PycharmProjects/crawler/new-crawler.py", line 26, in get_single_item_data
    href = 'http://www.truelocal.com.au' + link.get('href')
TypeError: cannot concatenate 'str' and 'NoneType' objects





Please advise on what's going wrong here


Thanks

Post a Reply

Replies

Oldest  Newest  Rating
0 Timmy Cafe · April 25, 2015
it is in this part of the code you are getting the error:


href = 'http://www.truelocal.com.au' + link.get('href')



change the line to this:


href = 'http://www.truelocal.com.au' + str(link.get('href'))



The reason for this is because you were not concatenating the
link.get("href")

.
0 Probuddha N · April 28, 2015
Thanks a lot Timmy. Adding the str worked and led to even more confusion for me :angel:

I am just new to Python so could you please tell me why do I have to convert it to a string before concatenating them?

As per my limited understanding so far, I need to use the 'str' operator when I try to concatenate a string and a numerical value. But in this case, href isn't a numeric value.

Also I have coded similar crawlers, where I didn't need to convert the href to string in order to concatenate with the URL

Could you please throw some light on this. 

Many thanks
  • 1

Python

107,048 followers
About

This section is all about snakes! Just kidding.

Links
Moderators
Bucky Roberts Administrator