Unable to complete tutorial 25 - trading page showing no items.

+6 josh montes · March 30, 2015
I've been going through the Python video tutorials and have been really excited to get onto the web crawlers video's to start making some interesting programs.

However, when I try to go onto the trading page, there are no items listed. I am a member and am signed in so this isn't the issue.

I'm posting this in the hope that Bucky with see it or anyone can suggest another site that can be used without breaching any terms of service.

Thanks in advance.

Post a Reply

Replies

Oldest  Newest  Rating
0 sfolje 0 · September 30, 2015

But why did you have to use .encode('ascii', 'replace') in assigning the plain_text?


As I said, in odriginal post, sometimes when I run first example I end up with error. More precisely with error:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 265-266: character maps to <undefined>

So i googleed it, and .encode('ascii', 'replace') replaces unknowed characters with "?" , so it doesnt turn errors.
0 sfolje 0 · September 30, 2015
try html.parser instead of html_parser in line:
 soup = BeautifulSoup([plain_text], "html_parser")
0 viv vyas · September 29, 2015
Thanks sfolje 0! That worked perfectly! But why did you have to use .encode('ascii', 'replace') in assigning the plain_text?

I tried scraping yellow pages for coffee shops like such:

import requests
from bs4 import BeautifulSoup


def coffee_spider (): 
url = 'http://www.yellowpages.ca/search/si/1/coffee/Vancouver%2C%20BC'
source_code = requests.get(url) 
plain_text = source_code.text 

soup = BeautifulSoup([plain_text], "html_parser") #this is basically all the source code from the website we need to store it into the beautifulsoup object, hence we create soup

for link in soup.findAll('a', {'class': 'listing placement'}): 
         href = link.get('href') 
print(href)


coffee_spider()



But I end up with the following error:
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html_parser. Do you need to install a parser library?

Not sure what I'm doing wrong. Any help in the right direction would be greatly appreciated
+1 sfolje 0 · September 27, 2015
In example below i crawled thenewboston.com (page you are at right now :)) in 'popular' posts section. Instead of entering items own page, i crawled into posts own page.

First example is for educational purposes, because it sometimes returns errors.
import requests
from bs4 import BeautifulSoup

def trade_spider(max_pages):
    page = 1
    while page <= max_pages:
        url = "https://www.thenewboston.com/tops.php?type=text&period=this-month&page=" + str(page) #this is page of popular posts
        source_code = requests.get(url)
        # just get the code, no headers or anything
        plain_text = source_code.text
        # BeautifulSoup objects can be sorted through easy
        soup = BeautifulSoup(plain_text)
        for link in soup.findAll('a', {'class': 'index_singleListingTitles'}): #all links, which contains "" class='index_singleListingTitles' "" in it.
            href = "https://www.thenewboston.com/" + link.get('href')
            title = link.string  # just the text, not the HTML
            print(href)
            print(title)
            # get_single_item_data(href)
        page += 1
trade_spider(1)

If you tried it you can sometimes get errors because of BeautifulSoup or unicode problems.

Full example, returns no errors for me:
import requests
from bs4 import BeautifulSoup

def trade_spider(max_pages):
    page = 1
    while page <= max_pages:
        url = "https://www.thenewboston.com/tops.php?type=text&period=this-month&page=" + str(page)
        source_code = requests.get(url)
        # just get the code, no headers or anything
        plain_text = source_code.text.encode('ascii', 'replace')
        # BeautifulSoup objects can be sorted through easy
        soup = BeautifulSoup(plain_text, "html.parser")
        for link in soup.findAll('a', {'class': 'index_singleListingTitles'}):
            href = "https://www.thenewboston.com/" + link.get('href')
            title = link.string  # just the text, not the HTML
            print(href)
            print(title)
            # get_single_item_data(href)
        page += 1
trade_spider(1)

Tell me your questions, errors, likes, and try to make your own get_single_item_data function, which i have not made.
0 viv vyas · September 27, 2015
Is there a workaround this. I really would like to try the tutorial
0 Kreso Nurnberger · April 3, 2015
The site BuckysRoom doesnt exist anymore, Josh.
0 Kreso Nurnberger · April 2, 2015
I have the same problem Josh Montes
0 Alejandro Karma · April 2, 2015
bruh, got the same problem. Tried also other webite but it didnt work...
0 Abhinav Ankur · March 31, 2015
Anyone, please suggest a site with the 'vulnerable' HTML code still exposed, so that we can try out the thing we learnt.
  • 1

Python

107,136 followers
About

This section is all about snakes! Just kidding.

Links
Moderators
Bucky Roberts Administrator