"RecursionError: maximum recursion depth exceeded in comparison" when calling str(code)

+2 Arjun Naidu · September 15, 2015
While trying to download all the source files using web crawler I'm hitting the following exception:


Traceback (most recent call last):
  File "C:/Users/arjunn/PycharmProjects/learn/test1.py", line 59, in
    source_spider(474)
  File "C:/Users/arjunn/PycharmProjects/learn/test1.py", line 54, in source_spider
    fw.write(get_code(href))
  File "C:/Users/arjunn/PycharmProjects/learn/test1.py", line 11, in get_code
    code = str(code)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1035, in __unicode__
    return self.decode()
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1122, in decode
    indent_contents, eventual_encoding, formatter)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1191, in decode_contents
    formatter))
.
.
.
.
.

  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1122, in decode
    indent_contents, eventual_encoding, formatter)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1188, in decode_contents
    text = c.output_ready(formatter)

  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 712, in output_ready
    output = self.format_string(self, formatter)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 156, in format_string
    if not isinstance(formatter, collections.Callable):
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\abc.py", line 182, in __instancecheck__
    if subclass in cls._abc_cache:
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\_weakrefset.py", line 75, in __contains__
    return wr in self.data
RecursionError: maximum recursion depth exceeded in comparison



Code is as follows:

import requests
import re
from bs4 import BeautifulSoup


def get_code(url):
    source = requests.get(url)
    text = source.text
    soup = BeautifulSoup(text, "html.parser")
    code = soup.find('code')
    code = str(code)
    line = re.sub('', '\n', code)
    # line = replace_br_newln(code)

    soup2 = BeautifulSoup(line, "html.parser")
    script = soup2.get_text()
    script = script.replace('?', ' ')
    return script

'''
def replace_br_newln(in_str):
    new_string = ''
    while 1:
        index = in_str.find(r'')
        if index == -1:
            new_string += in_str
            break
        else:
            new_string += in_str[:index] + '\n'
            in_str = in_str[index+4:]
    return new_string
'''



def source_spider(max_pages):
    page = 13
    compare_str = r'[source code]'
    while page <= max_pages:
        print('#####  ', page, '  #####')
        url = "https://thenewboston.com/forum/category.php?id=15&page=" + str(page)
        source_code = requests.get(url)
        # just get the code, no headers or anything
        plain_text = source_code.text
        # BeautifulSoup objects can be sorted through easy
        soup = BeautifulSoup(plain_text, "html.parser")
        for link in soup.findAll('a', {'class': 'post-title'}):
            href = "https://thenewboston.com" + link.get('href')
            title = link.string  # just the text, not the HTML
            print(title)
            if compare_str in title:
                title = 'source\\' + title[14:] + '.txt'
                fw = open(title, 'w')
                fw.write(get_code(href))
                fw.close()
        page += 1


source_spider(474)





Why is recursion used for converting to string..?
What is the workaround to the issue?

Post a Reply

Replies

- page 2
Oldest  Newest  Rating
0 sfolje 0 · September 18, 2015
http://stackoverflow.com/questions/3323001/maximum-recursion-depth
says:
"You can change the recursion limit with
sys.setrecursionlimit,

but doing so is dangerous -- the standard limit is a little conservative ".
You can see current recursion limit (1000) with
sys.getrecursionlimit()

Second url needs sys.setrecursionlimit(1194)
0 Halcyon Abraham Ramirez · September 19, 2015
yes it does give you individual letters still

it's because of this line


line = re.sub('', '\n', code)

try this out:

a = "hello"

print(re.sub("","\n",a))

I fixed you def_get code:



def get_code(url):
    soup = BeautifulSoup(requests.get(url).content)
    return re.sub(r"\?"," ",soup.find('code').text)

what is url2? can you show me the exact url?
try runnig this:

def get_code(url):
    soup = BeautifulSoup(requests.get(url).content)
    return re.sub(r"\?"," ",soup.find('code').text)

print(get_code(r'https://thenewboston.com/forum/topic.php?id=2342'))
0 sfolje 0 · September 19, 2015
I think url2 means 'https://thenewboston.com/forum/topic.php?id=2653', no error with your solution though.
  • 1
  • 2

Python

107,282 followers
About

This section is all about snakes! Just kidding.

Links
Moderators
Bucky Roberts Administrator