Simple WEb Crawl But not working

+1 YASH AGRAWAL · January 10, 2016
import requests
from bs4 import BeautifulSoup

def trace(maxpages):
    url='https://www.thenewboston.com'
    source_code=requests.get(url)
    plain_text=source_code.text
    soup=BeautifulSoup(plain_text,"html.parser")
    for link in soup.findAll('a',{'class':'mm-text mmc-dropdown-delay animated fadeIn'}):
        href=link.get('href')
        print(href)


trace(1)





#its printing nothing

Post a Reply

Replies

Oldest  Newest  Rating
+1 Sjoerd van den Belt · January 10, 2016
There are no links with the classes specified by you. There are however <span> elements with class mm-text. The parent element of these span's are indeed <a> elements. So what you would want to do is create a selector that finds all <a> elements that are parent of <span class="mm-text">. This can be done:



import requests
from bs4 import BeautifulSoup

def trace(maxpages):
url='https://www.thenewboston.com'
source_code=requests.get(url)
plain_text=source_code.text
soup=BeautifulSoup(plain_text,"html.parser")
links = [link.parent for link in soup.findAll('span',{'class':'mm-text'})]
for link in links:
href=link.get('href')
print(href)


trace(1)


Tell me it worked for you, or if you have any questions :)
  • 1

Python

107,103 followers
About

This section is all about snakes! Just kidding.

Links
Moderators
Bucky Roberts Administrator