Scrapy is an open-source Python framework used for web scraping. In this tutorial, we’re going to show you how to install, set up, and start using Scrapy on Ubuntu.
Requirements
For this tutorial, you’ll need:
- An Ubuntu system. This can either be your desktop or an Ubuntu VPS. We’ll be using Ubuntu 22.04 for this tutorial, but these instructions will work on any other version.
- Sudo access to the Terminal (or ssh)
Step 1: Update Ubuntu
The first step is to always update your system. You can do that with the following commands:
apt-get update apt-get upgrade
Step 2: Install Python, Pip, and dependencies
The second step is to install Python, Pip, and some needed dependencies. Python may already be installed on your system, so if it’s installed along with the other packages in this step, you can skip it. Scrapy requires Python 3.8+
We have a detailed tutorial on how to install Python on Ubuntu and how to install Pip on Ubuntu here, but you’ll just need to run this command:
apt-get install python3 python3-dev python3-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev virtualenv
Step 3: Create and set up a Python virtual environment
It’s recommended to use Scrapy in a virtual environment. You don’t need to, but it is recommended to do it so it’s separate.
To create a virtual environment, first, create a directory for virtual environments:
mkdir ~/python-environments && cd ~/python-environments
Once you’re in the new directory, create a new environment for Scrapy:
virtualenv --python=python3 scrapyenv
Next, activate the virtual environment
source scrapyenv/bin/activate
And that’s it. Now you can start using the virtual environment.
Step 4: Install Scrapy with Pip
Now that you’re in the virtual environment, you can start installing Scrapy with pip:
pip install scrapy
Step 5: Create and run a new Scrapy project
Now that Scrapy is installed, you can start creating a new project:
scrapy startproject example
And you can start using Scrapy. For a more detailed beginner tutorial, check this from their official documentation.
There’s also an example project that you can use to try things out.
To try the example project, clone the GitHub repo:
git clone https://github.com/scrapy/quotesbot.git
And run the crawler:
scrapy crawl toscrape-css -o quotes.json
This will save all the quotes to a quotes.json file that you can later check with nano:
nano quotes.json
Once you’re finished testing and working with Scrapy, you can get out of the virtual environment by running:
deactivate
And that’s it! You’ve now installed, set up, and used Scrapy on Ubuntu. You can read their official documentation to learn more about Scrapy and how to use it.