Python: Web Crawler giving too many errors

+2 Sumit Tyagi · February 6, 2016
import requests
from bs4 import BeautifulSoup


def trade_spider(max_pages):
page = 1
while page <= max_pages:
url = 'https://buckysroom.org/trade/search.php?page=' + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll('a', {'class': 'item-name'}):
href = "https://buckysroom.org" + link.get('href')
title = link.string
# print(href)
# print(title)
get_single_item_data(href)
page += 1


def get_single_item_data(item_url):
source_code = requests.get(item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for item_name in soup.findAll('div', {'class': 'i-name'}):
print(item_name.string)
for link in soup.findAll('a'):
href = "https://buckysroom.org" + link.get('href')
print(href)

trade_spider(3)


this is the same code as given by Bucky 



These are the errors:

C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\python.exe C:/Users/win-7/PycharmProjects/sts/first
Traceback (most recent call last):
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\packages\urllib3\connection.py", line 137, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\packages\urllib3\util\connection.py", line 91, in create_connection
    raise err
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\packages\urllib3\util\connection.py", line 81, in create_connection
    sock.connect(sa)
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\packages\urllib3\connectionpool.py", line 559, in urlopen
    body=body, headers=headers)
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\packages\urllib3\connectionpool.py", line 345, in _make_request
    self._validate_conn(conn)
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\packages\urllib3\connectionpool.py", line 784, in _validate_conn
    conn.connect()
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\packages\urllib3\connection.py", line 217, in connect
    conn = self._new_conn()
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\packages\urllib3\connection.py", line 146, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x0378B1F0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\adapters.py", line 376, in send
    timeout=timeout
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\packages\urllib3\connectionpool.py", line 609, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\packages\urllib3\util\retry.py", line 273, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='buckysroom.org', port=443): Max retries exceeded with url: /trade/search.php?page=1 (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x0378B1F0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/win-7/PycharmProjects/sts/first", line 31, in <module>
    trade_spider(3)
  File "C:/Users/win-7/PycharmProjects/sts/first", line 9, in trade_spider
    source_code = requests.get(url)
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\api.py", line 67, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\win-7\AppData\Local\Programs\Python\Python35-32\lib\requests\adapters.py", line 437, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='buckysroom.org', port=443): Max retries exceeded with url: /trade/search.php?page=1 (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x0378B1F0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond',))

Process finished with exit code 1

Post a Reply

Replies

Oldest  Newest  Rating
+1 sfolje 0 · February 7, 2016
It is giving errors because the website buckysroom.com doesnt exists. Type url in browser google it is not working any more.

Solution, that I posted in forum posts:

https://www.thenewboston.com/forum/topic.php?id=11145

or on page 4 scrool down:

https://www.thenewboston.com/forum/topic.php?id=1610&orderby=oldest&page=4
0 Sumit Tyagi · February 7, 2016
Thanks Brother it worked! ????
  • 1

Apple

106,964 followers
About

The area for Apple news, rumors, and discussions.

Links
Moderators
Bucky Roberts Administrator