"RecursionError: maximum recursion depth exceeded in comparison" when calling str(code)

+2 Arjun Naidu · September 15, 2015
While trying to download all the source files using web crawler I'm hitting the following exception:


Traceback (most recent call last):
  File "C:/Users/arjunn/PycharmProjects/learn/test1.py", line 59, in
    source_spider(474)
  File "C:/Users/arjunn/PycharmProjects/learn/test1.py", line 54, in source_spider
    fw.write(get_code(href))
  File "C:/Users/arjunn/PycharmProjects/learn/test1.py", line 11, in get_code
    code = str(code)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1035, in __unicode__
    return self.decode()
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1122, in decode
    indent_contents, eventual_encoding, formatter)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1191, in decode_contents
    formatter))
.
.
.
.
.

  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1122, in decode
    indent_contents, eventual_encoding, formatter)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 1188, in decode_contents
    text = c.output_ready(formatter)

  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 712, in output_ready
    output = self.format_string(self, formatter)
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\element.py", line 156, in format_string
    if not isinstance(formatter, collections.Callable):
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\abc.py", line 182, in __instancecheck__
    if subclass in cls._abc_cache:
  File "C:\Users\arjunn\AppData\Local\Programs\Python\Python35-32\lib\_weakrefset.py", line 75, in __contains__
    return wr in self.data
RecursionError: maximum recursion depth exceeded in comparison



Code is as follows:

import requests
import re
from bs4 import BeautifulSoup


def get_code(url):
    source = requests.get(url)
    text = source.text
    soup = BeautifulSoup(text, "html.parser")
    code = soup.find('code')
    code = str(code)
    line = re.sub('', '\n', code)
    # line = replace_br_newln(code)

    soup2 = BeautifulSoup(line, "html.parser")
    script = soup2.get_text()
    script = script.replace('?', ' ')
    return script

'''
def replace_br_newln(in_str):
    new_string = ''
    while 1:
        index = in_str.find(r'')
        if index == -1:
            new_string += in_str
            break
        else:
            new_string += in_str[:index] + '\n'
            in_str = in_str[index+4:]
    return new_string
'''



def source_spider(max_pages):
    page = 13
    compare_str = r'[source code]'
    while page <= max_pages:
        print('#####  ', page, '  #####')
        url = "https://thenewboston.com/forum/category.php?id=15&page=" + str(page)
        source_code = requests.get(url)
        # just get the code, no headers or anything
        plain_text = source_code.text
        # BeautifulSoup objects can be sorted through easy
        soup = BeautifulSoup(plain_text, "html.parser")
        for link in soup.findAll('a', {'class': 'post-title'}):
            href = "https://thenewboston.com" + link.get('href')
            title = link.string  # just the text, not the HTML
            print(title)
            if compare_str in title:
                title = 'source\\' + title[14:] + '.txt'
                fw = open(title, 'w')
                fw.write(get_code(href))
                fw.close()
        page += 1


source_spider(474)





Why is recursion used for converting to string..?
What is the workaround to the issue?

Post a Reply

Replies

- page 1
Oldest  Newest  Rating
0 Halcyon Abraham Ramirez · October 24, 2015
I know that feel man. I used to be pretty active here until you know. We take no incentive from helping people out the least we could get is a thank you. although you'd occasionally get the rare person who says thank you. it's about 1 in a million.

check out reddit look up the learnpython subreddit. they have a better community there. I mean bucky tutorials are great. it's the community that's meh
0 sfolje 0 · October 23, 2015
Totally agree with you, man.
+1 Halcyon Abraham Ramirez · September 24, 2015
@sfolje 0

be prepared because most people here when they'll ask when they have a question but as soon as you gave them an answer and solve their questions. boom gone. no thank you whatsoever  
0 sfolje 0 · September 24, 2015
Please like my post if you think it deserves to be liked ;).
0 Halcyon Abraham Ramirez · September 21, 2015
this is interesting.

this is the first time I've encountered this error. apparently this is a bug in beautifulsoup.

no wonder why we're getting the error.

well I guess the work around is what @sfolje0 said
0 Arjun Naidu · September 21, 2015
I've tried your code, but the entire result is in a single line.
I've increased the recursion depth limit as @sfolje 0  suggested and my code is working fine  without any error
0 Halcyon Abraham Ramirez · September 20, 2015
have you run this?


def get_code(url):
soup = BeautifulSoup(requests.get(url).content)
return re.sub(r"\?"," ",soup.find('code').text)

print(get_code(r'https://thenewboston.com/forum/topic.php?id=2342'))


was it your desired output?
0 Arjun Naidu · September 19, 2015
Hi Halcyon.
I am sorry. The '<br>' inside re.sub is being removed. I am trying to replace html line break tag with newline 
The actual code is     line = re.sub('<br>', '\n', code)  

I am not sure if this is being printed properly even now. (First 5 posts are moderated and I cannot see what I posted until a moderator approves it.)
0 sfolje 0 · September 19, 2015
I think url2 means 'https://thenewboston.com/forum/topic.php?id=2653', no error with your solution though.
0 Halcyon Abraham Ramirez · September 19, 2015
yes it does give you individual letters still

it's because of this line


line = re.sub('', '\n', code)

try this out:

a = "hello"

print(re.sub("","\n",a))

I fixed you def_get code:



def get_code(url):
    soup = BeautifulSoup(requests.get(url).content)
    return re.sub(r"\?"," ",soup.find('code').text)

what is url2? can you show me the exact url?
try runnig this:

def get_code(url):
    soup = BeautifulSoup(requests.get(url).content)
    return re.sub(r"\?"," ",soup.find('code').text)

print(get_code(r'https://thenewboston.com/forum/topic.php?id=2342'))
  • 1
  • 2

Python

128,187 followers
About

This section is all about snakes! Just kidding.

Links
Moderators
Bucky Roberts Administrator