"RecursionError: maximum recursion depth exceeded in comparison" when calling str(code)

+2 Arjun Naidu · September 15, 2015
While trying to download all the source files using web crawler I'm hitting the following exception:


Traceback (most recent call last):
  File "C:/Users/arjunn/PycharmProjects/learn/test1.py", line 59, in
    source_spider(474)
  File "C:/Users/arjunn/PycharmProjects/learn/test1.py", line 54, in source_spider
    fw.write(get_code(href))
  File "C:/Users/arjunn/PycharmProjects/learn/test1.py", line 11, in get_code
    code = str(code)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1035, in __unicode__
    return self.decode()
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1122, in decode
    indent_contents, eventual_encoding, formatter)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1191, in decode_contents
    formatter))
.
.
.
.
.

  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1122, in decode
    indent_contents, eventual_encoding, formatter)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1188, in decode_contents
    text = c.output_ready(formatter)

  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 712, in output_ready
    output = self.format_string(self, formatter)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 156, in format_string
    if not isinstance(formatter, collections.Callable):
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\abc.py", line 182, in __instancecheck__
    if subclass in cls._abc_cache:
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\_weakrefset.py", line 75, in __contains__
    return wr in self.data
RecursionError: maximum recursion depth exceeded in comparison



Code is as follows:

import requests
import re
from bs4 import BeautifulSoup


def get_code(url):
    source = requests.get(url)
    text = source.text
    soup = BeautifulSoup(text, "html.parser")
    code = soup.find('code')
    code = str(code)
    line = re.sub('', '\n', code)
    # line = replace_br_newln(code)

    soup2 = BeautifulSoup(line, "html.parser")
    script = soup2.get_text()
    script = script.replace('?', ' ')
    return script

'''
def replace_br_newln(in_str):
    new_string = ''
    while 1:
        index = in_str.find(r'')
        if index == -1:
            new_string += in_str
            break
        else:
            new_string += in_str[:index] + '\n'
            in_str = in_str[index+4:]
    return new_string
'''



def source_spider(max_pages):
    page = 13
    compare_str = r'[source code]'
    while page <= max_pages:
        print('#####  ', page, '  #####')
        url = "https://thenewboston.com/forum/category.php?id=15&page=" + str(page)
        source_code = requests.get(url)
        # just get the code, no headers or anything
        plain_text = source_code.text
        # BeautifulSoup objects can be sorted through easy
        soup = BeautifulSoup(plain_text, "html.parser")
        for link in soup.findAll('a', {'class': 'post-title'}):
            href = "https://thenewboston.com" + link.get('href')
            title = link.string  # just the text, not the HTML
            print(title)
            if compare_str in title:
                title = 'source\\' + title[14:] + '.txt'
                fw = open(title, 'w')
                fw.write(get_code(href))
                fw.close()
        page += 1


source_spider(474)





Why is recursion used for converting to string..?
What is the workaround to the issue?

Post a Reply

Replies

- page 1
Oldest  Newest  Rating
+1 Halcyon Abraham Ramirez · September 24, 2015
@sfolje 0

be prepared because most people here when they'll ask when they have a question but as soon as you gave them an answer and solve their questions. boom gone. no thank you whatsoever  
0 Arjun Naidu · September 19, 2015
Hi Halcyon.
I am sorry. The '<br>' inside re.sub is being removed. I am trying to replace html line break tag with newline 
The actual code is     line = re.sub('<br>', '\n', code)  

I am not sure if this is being printed properly even now. (First 5 posts are moderated and I cannot see what I posted until a moderator approves it.)
0 Halcyon Abraham Ramirez · September 20, 2015
have you run this?


def get_code(url):
soup = BeautifulSoup(requests.get(url).content)
return re.sub(r"\?"," ",soup.find('code').text)

print(get_code(r'https://thenewboston.com/forum/topic.php?id=2342'))


was it your desired output?
0 Arjun Naidu · September 21, 2015
I've tried your code, but the entire result is in a single line.
I've increased the recursion depth limit as @sfolje 0  suggested and my code is working fine  without any error
0 sfolje 0 · October 23, 2015
Totally agree with you, man.
0 Halcyon Abraham Ramirez · October 24, 2015
I know that feel man. I used to be pretty active here until you know. We take no incentive from helping people out the least we could get is a thank you. although you'd occasionally get the rare person who says thank you. it's about 1 in a million.

check out reddit look up the learnpython subreddit. they have a better community there. I mean bucky tutorials are great. it's the community that's meh
0 Halcyon Abraham Ramirez · September 15, 2015
tried out your code. no recursion error for me

btw

your get_code function returns individual letters

like these:



b
r
>
f
o
r
 
k
,
 
v
 
i
n
 
w
e
i
g
h
t
s
.
i
t
e
m
s
(
)
:
<
b
r

is that what you wanted? also put some time between each request or you'll overload bucky's server :/

I know because I've overloaded servers before too!
0 Halcyon Abraham Ramirez · September 21, 2015
this is interesting.

this is the first time I've encountered this error. apparently this is a bug in beautifulsoup.

no wonder why we're getting the error.

well I guess the work around is what @sfolje0 said
0 Arjun Naidu · September 18, 2015
Hi Halcyon

The get_code() is still giving me the recursion error for the url2.
Also, it is not returning individual characters. I've printed it for url1


import requests
import re
from bs4 import BeautifulSoup


def get_code(url):
    source = requests.get(url)
    text = source.text
    soup = BeautifulSoup(text, "html.parser")
    code = soup.find('code')
    code = str(code)
    line = re.sub('', '\n', code)
    soup2 = BeautifulSoup(line, "html.parser")
    script = soup2.get_text()
    script = script.replace('?', ' ')
    return script


print(get_code(r'https://thenewboston.com/forum/topic.php?id=2342'))  #url1
get_code(r'https://thenewboston.com/forum/topic.php?id=2653')   #url1
0 sfolje 0 · September 24, 2015
Please like my post if you think it deserves to be liked ;).
  • 1
  • 2

Python

107,335 followers
About

This section is all about snakes! Just kidding.

Links
Moderators
Bucky Roberts Administrator