Overview

This guide covers two Python scripts designed to scrape job listings from LinkedIn and Dice based on specific search criteria. Both scripts use Selenium for web scraping and store the data in a MongoDB database. The LinkedIn scraper targets Oracle-related remote and location-based jobs, while the Dice scraper focuses on Oracle functional roles, distinguishing between remote and non-remote positions.

Prerequisites

  • Python 3.6+ installed on your system
  • MongoDB Atlas account and a database set up (or a local MongoDB instance)
  • Chrome Browser installed (for LinkedIn scraper)
  • Required Python Libraries:
    • selenium
    • pymongo
    • webdriver_manager (for Dice scraper)
  • Chromedriver (for LinkedIn scraper, manually downloaded) or automatically managed (for Dice scraper)
  • LinkedIn Account (for LinkedIn scraper, as login may be required)

1. LinkedIn Job Scraper

Script Contents

Below is the complete LinkedIn scraper script (linkedin.py):

Setup

  1. Install required libraries:
    pip install selenium pymongo
  2. Download Chromedriver compatible with your Chrome version and update the DRIVER_PATH:
    DRIVER_PATH = r"C:\path\to\chromedriver.exe"
  3. Update the Chrome user data directory (optional, for auto-login):
    chrome_options.add_argument(r"--user-data-dir=C:\Users\YourUsername\AppData\Local\Google\Chrome\User Data")
  4. Ensure your MongoDB URI is correct:
    mongo_uri = "mongodb+srv://yourusername:yourpassword@cluster0.bpujz.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0"

Usage

  1. Save the script as linkedin.py.
  2. Run the script:
    python linkedin.py
  3. If a login screen appears, log in manually and press Enter in the console.
  4. The script will scrape remote and location-based Oracle jobs and store them in MongoDB.

Customization

  • Modify remote_search_url and location_search_url for different search criteria:
    keywords=("New Keywords" OR "Other Terms")
  • Adjust scroll_step or time.sleep() for scrolling speed.
  • Add fields to the jobs dictionary in scrape_jobs().

Note: LinkedIn may require manual login. Ensure URLs are valid job search links.

2. Dice Job Scraper

Script Contents

Below is the complete Dice scraper script (dice.py):

Setup

  1. Install required libraries:
    pip install selenium pymongo webdriver_manager
  2. No manual Chromedriver download needed; webdriver_manager handles it.
  3. Verify your MongoDB URI:
    mongo_uri = "mongodb+srv://yourusername:yourpassword@cluster0.bpujz.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0"

Usage

  1. Save the script as dice.py.
  2. Run the script:
    python dice.py
  3. The script will scrape remote and non-remote Oracle functional jobs and store them in MongoDB.

Customization

  • Update remote_url and non_remote_url for different search criteria:
    q=((NEW AND TERMS) AND (NOT (EXCLUDED)))
  • Adjust time.sleep(5) for page load wait time.
  • Modify job_data in scrape_dice_jobs() for additional fields.
  • Remove --headless to see the browser during scraping.

Note: The non_remote_url is a placeholder. Replace it with a valid Dice search URL for non-remote jobs.

Output

Both scrapers store job data in the jobs collection within the jobs_db database in MongoDB. Each entry includes fields like _id, title, company, location, url, and more.

Tips

  • Test with a smaller pageSize (e.g., 10) in Dice URLs initially.
  • Monitor MongoDB for duplicates (LinkedIn skips, Dice updates).
  • Respect website terms of service to avoid bans.

Created on March 25, 2025

Let's set up a call?

Send over your name and email and we can coordinate to do call over coffee!

We'll get in touch

Let's set up a call

Send over your name and email and we can coordinate to do call over coffee!

We'll get in touch

Let's get on a call!

Send over your name and email and we can coordinate to do call over coffee!

We'll get in touch

Subscribe To Keep Up To Date

Subscribe To Keep Up To Date

Join our mailing list to receive the latest news and updates.

You have Successfully Subscribed!