How to Install, Set Up and Start Using Scrapy on Ubuntu

Scrapy is an open-source Python framework used for web scraping. In this tutorial, we’re going to show you how to install, set up, and start using Scrapy on Ubuntu.

Requirements

For this tutorial, you’ll need:

  • An Ubuntu system. This can either be your desktop or an Ubuntu VPS. We’ll be using Ubuntu 22.04 for this tutorial, but these instructions will work on any other version.
  • Sudo access to the Terminal (or ssh)

Step 1: Update Ubuntu

The first step is to always update your system. You can do that with the following commands:

apt-get update
apt-get upgrade

Step 2: Install Python, Pip, and dependencies

The second step is to install Python, Pip, and some needed dependencies. Python may already be installed on your system, so if it’s installed along with the other packages in this step, you can skip it. Scrapy requires Python 3.8+

We have a detailed tutorial on how to install Python on Ubuntu and how to install Pip on Ubuntu here, but you’ll just need to run this command:

apt-get install python3 python3-dev python3-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev virtualenv

Step 3: Create and set up a Python virtual environment

It’s recommended to use Scrapy in a virtual environment. You don’t need to, but it is recommended to do it so it’s separate.

To create a virtual environment, first, create a directory for virtual environments:

mkdir ~/python-environments && cd ~/python-environments

Once you’re in the new directory, create a new environment for Scrapy:

virtualenv --python=python3 scrapyenv

Next, activate the virtual environment

source scrapyenv/bin/activate

And that’s it. Now you can start using the virtual environment.

Step 4: Install Scrapy with Pip

Now that you’re in the virtual environment, you can start installing Scrapy with pip:

pip install scrapy

Step 5: Create and run a new Scrapy project

Now that Scrapy is installed, you can start creating a new project:

scrapy startproject example

And you can start using Scrapy. For a more detailed beginner tutorial, check this from their official documentation.

There’s also an example project that you can use to try things out.

To try the example project, clone the GitHub repo:

git clone https://github.com/scrapy/quotesbot.git

And run the crawler:

scrapy crawl toscrape-css -o quotes.json

This will save all the quotes to a quotes.json file that you can later check with nano:

nano quotes.json

Once you’re finished testing and working with Scrapy, you can get out of the virtual environment by running:

deactivate

And that’s it! You’ve now installed, set up, and used Scrapy on Ubuntu. You can read their official documentation to learn more about Scrapy and how to use it.

Leave a comment

Your email address will not be published. Required fields are marked *