"RecursionError: maximum recursion depth exceeded in comparison" when calling str(code)

+2 Arjun Naidu · September 15, 2015
While trying to download all the source files using web crawler I'm hitting the following exception:


Traceback (most recent call last):
  File "C:/Users/arjunn/PycharmProjects/learn/test1.py", line 59, in
    source_spider(474)
  File "C:/Users/arjunn/PycharmProjects/learn/test1.py", line 54, in source_spider
    fw.write(get_code(href))
  File "C:/Users/arjunn/PycharmProjects/learn/test1.py", line 11, in get_code
    code = str(code)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1035, in __unicode__
    return self.decode()
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1122, in decode
    indent_contents, eventual_encoding, formatter)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1191, in decode_contents
    formatter))
.
.
.
.
.

  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1122, in decode
    indent_contents, eventual_encoding, formatter)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1188, in decode_contents
    text = c.output_ready(formatter)

  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 712, in output_ready
    output = self.format_string(self, formatter)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 156, in format_string
    if not isinstance(formatter, collections.Callable):
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\abc.py", line 182, in __instancecheck__
    if subclass in cls._abc_cache:
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\_weakrefset.py", line 75, in __contains__
    return wr in self.data
RecursionError: maximum recursion depth exceeded in comparison



Code is as follows:

import requests
import re
from bs4 import BeautifulSoup


def get_code(url):
    source = requests.get(url)
    text = source.text
    soup = BeautifulSoup(text, "html.parser")
    code = soup.find('code')
    code = str(code)
    line = re.sub('', '\n', code)
    # line = replace_br_newln(code)

    soup2 = BeautifulSoup(line, "html.parser")
    script = soup2.get_text()
    script = script.replace('?', ' ')
    return script

'''
def replace_br_newln(in_str):
    new_string = ''
    while 1:
        index = in_str.find(r'')
        if index == -1:
            new_string += in_str
            break
        else:
            new_string += in_str[:index] + '\n'
            in_str = in_str[index+4:]
    return new_string
'''



def source_spider(max_pages):
    page = 13
    compare_str = r'[source code]'
    while page <= max_pages:
        print('#####  ', page, '  #####')
        url = "https://thenewboston.com/forum/category.php?id=15&page=" + str(page)
        source_code = requests.get(url)
        # just get the code, no headers or anything
        plain_text = source_code.text
        # BeautifulSoup objects can be sorted through easy
        soup = BeautifulSoup(plain_text, "html.parser")
        for link in soup.findAll('a', {'class': 'post-title'}):
            href = "https://thenewboston.com" + link.get('href')
            title = link.string  # just the text, not the HTML
            print(title)
            if compare_str in title:
                title = 'source\\' + title[14:] + '.txt'
                fw = open(title, 'w')
                fw.write(get_code(href))
                fw.close()
        page += 1


source_spider(474)





Why is recursion used for converting to string..?
What is the workaround to the issue?

Post a Reply

Replies

- page 2
Oldest  Newest  Rating
0 sfolje 0 · September 18, 2015
http://stackoverflow.com/questions/3323001/maximum-recursion-depth
says:
"You can change the recursion limit with
sys.setrecursionlimit,

but doing so is dangerous -- the standard limit is a little conservative ".
You can see current recursion limit (1000) with
sys.getrecursionlimit()

Second url needs sys.setrecursionlimit(1194)
0 Arjun Naidu · September 18, 2015
Hi Halcyon

The get_code() is still giving me the recursion error for the url2.
Also, it is not returning individual characters. I've printed it for url1


import requests
import re
from bs4 import BeautifulSoup


def get_code(url):
    source = requests.get(url)
    text = source.text
    soup = BeautifulSoup(text, "html.parser")
    code = soup.find('code')
    code = str(code)
    line = re.sub('', '\n', code)
    soup2 = BeautifulSoup(line, "html.parser")
    script = soup2.get_text()
    script = script.replace('?', ' ')
    return script


print(get_code(r'https://thenewboston.com/forum/topic.php?id=2342'))  #url1
get_code(r'https://thenewboston.com/forum/topic.php?id=2653')   #url1
0 Halcyon Abraham Ramirez · September 15, 2015
tried out your code. no recursion error for me

btw

your get_code function returns individual letters

like these:



b
r
>
f
o
r
 
k
,
 
v
 
i
n
 
w
e
i
g
h
t
s
.
i
t
e
m
s
(
)
:
<
b
r

is that what you wanted? also put some time between each request or you'll overload bucky's server :/

I know because I've overloaded servers before too!
  • 1
  • 2

Python

130,052 followers
About

This section is all about snakes! Just kidding.

Links
Moderators
Bucky Roberts Administrator