Creating project
create a project in pycharm called scrapy_learn
scrapy startproject quote #project name inside scrapy_learn
create a spider must inside the spiders folder
1 | import scrapy |
Running the main code in spider
cd quote# name of the small project
scrapy crawl quotes # name of the spider
Using shell(css selector) inside scrapy
scrapy shell “http://quotes.toscrape.com/"
response.css(“title::text”).extract()
response.css(“title::text”)[0].extract() #want a item inside a list
response.css(“span.text::text”).extract() #. represents class # represents ID
Using xpath selector
response.xpath(“//title”).extract()
response.xpath(“//title/text()”).extract()
response.xpath(“//span[@class=’text’]/text()”).extract() # all text
response.xpath(“//span[@class=’text’]/text()”)[1].extract() # second quote, if its an ID instead of a class, @ID=
response.css(“li.next a”).extract() #li tag contains a ‘a’ tag inside it
Extract data — temporary containers(items) — storing in database
In items.py
1 | import scrapyclass QuoteItem(scrapy.Item): # define the fields for your item here like: |
In spider(quotes_spider.py)
1 | import scrapy |
Storing all the data in json,xml and csv
scrapy crawl quotes -o items.json # -o: output
scraped data — item containers — json/csv
scraped data — item containers — pipeline —mysql database
go to settings.py uncomment configure item pipelines
1 | ITEM_PIPELINES = { |
install mysql-connector-python