Facebook – https://www.facebook.com/TheNewBoston-464114846956315/
GitHub – https://github.com/buckyroberts
Google+ – https://plus.google.com/+BuckyRoberts
LinkedIn – https://www.linkedin.com/in/buckyroberts
reddit – https://www.reddit.com/r/thenewboston/
Support – https://www.patreon.com/thenewboston
thenewboston – https://thenewboston.com/
Twitter – https://twitter.com/bucky_roberts
Amazon Auto Links: No products found.
i installed beautifulsoup4 but the codes wont run.i get this red underline on my import requests
hi my code is executing without any error but i.m not getting any output can somebody help ??????
yo guys import request u can find only from urllib so try “from urllib import request”
Hey bucky I think ur page doesnt exist anymore
I made a web crawler with beautiful soup(20 lines). The logic is: harvest links, append links, iterate through list and append forever. I plan on screening the websites for some things in the am. I’ll post what i do with them on my blog. I just wanted to share this function.
print ‘n’
print ‘-‘*25+’Web Crawler’+’-‘*25
print ‘n’
places = []
def crawl(url):
# harvest and store links
html_page = urllib2.urlopen(url)
soup = BeautifulSoup(html_page,”lxml”)
for link in soup.findAll(‘a’):
places.append(link.get(‘href’))
# harvest and iterate forever
while 1:
for a in places:
try:
html2 = urllib2.urlopen(a)
soup2 = BeautifulSoup(html2, “lxml”)
for b in soup2.findAll(‘a’):
places.append(b)
print b.get(‘href’)
except KeyboardInterrupt:
print ‘Crawler Stopped…’
time.sleep(2)
print ‘Shutting Down’
return
except:
continue
Hi Bucky,
Your website doesn’t exist any more, any website I tried didn’t work, I thought every code I could but the output was: error. I need some advice here and either its you or any other who reads this comment please don’t write try another website because every other website I tried, the inspect element was different from yours.
Thank You, Any advice will be appreciated.
what is the difference between
1)urllib.request.urlretrieve()
2)request.urlopen()
3)requests.get()
Because in three tutorials you have used three different ways to access the url or webpage.Why can’t we use the same request?
Thank you.
if you need to build your own crawler refer to this paper
https://doi.org/10.1016/j.softx.2017.04.004
i downloaded beautifulsoup4 but it just wont show when i put import. is there anyone who has the same problem?
Question: at 3:26 Bucky right-clicks mouse, gets a menu and selects view page code. My computer does not do that. BeautifulSoup4 is installed at my computer, and when I copy-paste Dark Seid’s code (below in one of the other reactions) everything works.
My question basically is: how can I view a page code? (Right-click mouse does not work)
Hopefully somebody can help me! thanks in advance!
Hey ,you can try the best web Crawler–ContentBomb
https://plus.google.com/115670926276317481719/posts/5348DZH6X8R
how did you upload and create your website ?
what is beatifulsoup again?
when i go to settings /interpreter add new it gives me blank panel with no options to choose from, what’s wrong?
I’ve been lied to!!! THE CAKE IS A LIE!!! There is no buckysroom…
error loading package list:pypi.python.org – please help
when I have try to import BeautifulSoup, this error message comes up, here is my code:
import requests
from bs4 import BeatifulSoup
The error message:
Traceback (most recent call last):
File “C:UsersShaanDesktopcps.py”, line 2, in
from bs4 import BeautifulSoup
File “C:UsersShaanAppDataLocalProgramsPythonPython35-32libsite-packagesbs4__init__.py”, line 30, in
from .builder import builder_registry, ParserRejectedMarkup
File “C:UsersShaanAppDataLocalProgramsPythonPython35-32libsite-packagesbs4builder__init__.py”, line 308, in
from . import _htmlparser
File “C:UsersShaanAppDataLocalProgramsPythonPython35-32libsite-packagesbs4builder_htmlparser.py”, line 7, in
from html.parser import (
ImportError: cannot import name ‘HTMLParseError’
DO I have to reinstall bs4?
could you please tell me how i can get that interface to program in python?
shouldn’t the next line after the while loop be ” page+=1 ” ? that would actually add the page value by 1 upto the max value page and then we can add that url line after this page+=1 line ??
Haven’t finished watching your series yet, but you are using pycharm. So I like you already.
How can we know what we need to import to do certain thing ? And also the way to use it ?
question, do i need pycharm to use the beautiful soup module?
1. Does anyone know of an acceptable website to crawl? The classes on say, ebay, are not as self-evident as “class = item” and/or they are blocking me from crawling them
2. Is say “max_pages” a built in parameter? Like does python know what you mean without defining it further? Like does it think page 20 is “max_pages? or 50? I had this same question a few tutorials back with “csv_url” when writing the reader because – how does the program know which csv-file containing url you want to open? we passed “csv_url” into the function without ever saying okay csv_url = goog.fgf.csv etc. Did it just automatically assume you wanted to open the link above the user-created function because it was there?
someone please help i just started learning python
how can I get beautifulsoup 4 package in atom IDE ?
can some one help me how to download modules from internet ,my pycharm is not downloading modules
plzz any help is appreciated
+thenewboston you should, maybe, give us alternative websites that are pretty much the same as your trade page since it doesn’t exist anymore, just to make it easier for others and make them focus on the important things rather than looking for a website.
if you cant install beautifulsoupe4, try running your PyCharm as an administrator in Windows
Hello
,,,,,….etc)thank you for the course.please how to calculate the number of tags in an HTML,like (
thanks
can multiple threads make multiple http requests at the same time or do these connections interfere with each other?
9 pages…
beautiful soup cannot be installed?
help
i cant seem to add packages on pycharm. ive got 3.0.3, older ios.
and dont have that + button to add packages off the web
any help guys?
I cant find you webpage buckysroom/trade
Hey Bucky! What do I do when at the moment of selecting another page from the website, the url is not altered? Do you understand the question?
what about game bots is it decently hard to create them or no?
*BotChief* can create any online web bots without programming or coding. https://plus.google.com/+LewRowland/posts/KCguwM62qsC
etsy for anyone else looking for a good url
this word is always little , can you enlarge it ?
do you have any idea how fucking hard it is to find a website with page= at the end of the url bucky fix this video for f sake
I can’t find the website what should I do
you can also use Rcrawler package, it’s much easier and can crawl & scrape all web pages of a website automatically https://CRAN.R-project.org/package=Rcrawler
http://oceanofgames.com/page/2/
for all the people who are asking nicely for a site
and all those fuckers who are abusing
Need a url to crawl
youtube needs people like you, seriously there are very few people who are sharing their knowledge with the world in such a beautiful manner
is web crawler will work on community version of pycharm ide?
do you have django tutorials. if you have why don”t you post?
install BeautifulSoup and Requests at windows command:
>python -m pip install BeautifulSoup4
>python -m pip install Requests
Tried a few changes still wont work 28/04/17
import requests
from bs4 import BeautifulSoup
def motor_spider(max_pages):
page = 1
while page <= max_pages: url = "https://www.donedeal.ie/motorbikes" + str(page) source_code = requests.get(url) plain_text = source_code.text soup = BeautifulSoup(plain_text,"html.parser") for link in soup.findAll('a',{'class': 'body_title'}): href = link.get('href') print(href) page += 1 motor_spider(2)