Unlocking Insights from Indeed Job Listings with Web Scraping

The online job market is massive and growing rapidly. According to a report from Burning Glass Technologies, there were over 22 million online job postings in the U.S. alone in 2020. That number is projected to keep climbing as more companies embrace digital hiring.

Navi.

For job seekers, this abundance of online listings can be both a blessing and a curse. On one hand, you have access to exponentially more opportunities than you would through offline channels. But sifting through this overwhelming volume of postings to find the best fits is incredibly time-consuming.

A 2017 survey by DHI Group found that job seekers spend an average of 11 hours per week searching online job boards. For recruiters and HR teams tasked with filling multiple positions, the time commitment is even greater.

But what if there was a way to automatically gather and structure all this valuable job market data? To almost instantly pull key information like job titles, company names, locations, salaries, skill requirements, and more from thousands of listings?

Enter web scraping.

What is Web Scraping?

Web scraping is the process of programmatically extracting data from websites. Rather than manually copying and pasting, web scraping tools can automatically load webpages, parse the HTML, and extract specific data points into a structured format like JSON or CSV.

Here‘s a simplified overview of how it works:

The scraper sends an HTTP request to the target webpage (e.g. an Indeed job search results page)
The server responds with the page HTML
The scraper parses the HTML to find and extract the desired data based on unique identifiers like element IDs and classes
The extracted data is saved into a structured format and outputted

Web scraping is commonly used for price monitoring, lead generation, competitive research, sentiment analysis, and more. It allows you to gather large amounts of data from across the web in an automated and efficient way.

Benefits of Scraping Job Listings

So what kind of data can you extract from online job listings? And what are the benefits of doing so at scale?

Let‘s use Indeed, the world‘s largest job site, as an example. For any given job listing, key data points you could scrape include:

Job title
Company name
Location
Salary
Job description
Qualifications
Required skills
Experience level
Industry
Employee benefits
Remote or on-site
Full-time or part-time
Date posted

By extracting and structuring these details for thousands or even millions of listings, you unlock a trove of valuable data and insights:

For job seekers: Set up alerts for new openings matching your specific criteria. Analyze trends in required skills and qualifications. Compare salaries and benefits across companies and locations.
For recruiters: Track hiring demand in different industries and regions. Identify the most in-demand roles and skills. Monitor competitors‘ hiring activities. Optimize job titles and descriptions.
For employers: Benchmark salaries and benefits against industry averages. Evaluate the competitive landscape for talent. Measure your hiring velocity and bottlenecks.
For analysts: Gauge the health of the job market. Map skills gaps and training needs. Predict future hiring trends. Quantify the economic impact of events like recessions or pandemics.

The applications are endless. With access to large-scale, real-time job market data, you can make smarter, more informed decisions – whether you‘re looking for a new role, trying to fill one, or zooming out to understand the bigger picture.

Scraping Indeed with ScrapingBee and Make

So how can you start extracting data from Indeed at scale? Building your own web scraper from scratch requires significant coding skills and effort.

Fortunately, there are powerful no-code tools that make it easy to scrape job listings without writing a line of code:

ScrapingBee – A web scraping API that handles the fetching, rendering, and parsing of webpages, outputting structured JSON data
Make – An integration platform for connecting APIs and automating workflows

By connecting ScrapingBee and Make, you can set up an automated pipeline to scrape job listings, save the extracted data, and route it to other apps. Here‘s how:

Step 1: Create a ScrapingBee API Token

First, sign up for a free ScrapingBee account and grab your API token from the dashboard. This will allow you to use ScrapingBee‘s scraping functionality within your Make workflows.

Step 2: Set Up a Make Scenario

Next, create a new scenario in Make. A scenario is a workflow composed of modules that perform actions across different apps.

Add the ScrapingBee module and configure it with:

The URL of the Indeed search results you want to scrape (e.g. https://www.indeed.com/jobs?q=data+scientist)
Your ScrapingBee API token
A JSON object specifying the data you want to extract:

{
  "jobs": {
    "selector": "ul.jobsearch-ResultsList > li",
    "type": "list",
    "output": {
      "title": "h2.jobTitle",
      "company": "span.companyName",   
      "location": ".companyLocation",
      "salary": ".salary-snippet",
      "description": ".job-snippet"
    }
  }  
}

This tells ScrapingBee to select all the <li> elements containing job listings, and extract the title, company, location, salary, and description from each one.

Step 3: Add an Iterator Module

The ScrapingBee module will output an array of job listings. To process each one individually, add an Iterator module to your scenario. This will split the array into separate bundles that can be mapped to other modules.

Step 4: Save the Extracted Data

Now that you have the individual job listings, you‘ll want to save the extracted data somewhere. Make has a built-in data store that can act as a simple database.

Add a "Create Record" module for Make‘s data store. Create a new data store with fields matching the scraped data (title, company, location, etc). Map the outputs from the Iterator module to their corresponding fields.

This will create a new record in your data store for each scraped job listing, allowing you to build up a structured database of job postings over time.

Step 5: Connect Other Apps

With your job data flowing into Make‘s data store, you can now route it to any other app in the Make ecosystem. Some examples:

Send an automated email digest or Slack message with new listings
Save the data to Google Sheets or Airtable for further analysis and visualization
Trigger an SMS alert when a high-priority listing is found
Post listings to your company‘s internal job board or ATS

The possibilities are endless. Make supports integrations with hundreds of popular tools across marketing, sales, HR, project management, and more. Simply add the relevant module to your scenario and map the data from the previous steps.

Step 6: Schedule the Scenario

Finally, configure your Make scenario to run automatically on a set schedule (e.g. every morning at 8am). This will kick off the entire workflow – scraping fresh listings from Indeed, extracting the key details, saving them to your data store, and syncing the data to any connected apps.

Just like that, you‘ve created a fully automated data pipeline to extract valuable data from Indeed listings. No coding required!

Best Practices for Ethical Scraping

Before you start scraping job listings from Indeed or any other site, it‘s important to be aware of the legal and ethical implications.

While web scraping itself is legal in most jurisdictions, some websites prohibit or restrict the practice in their terms of service. For example, here is Indeed‘s policy on scraping:

You are welcome to access and use our Sites, APIs and the data and content you find on or through our Sites and APIs…
However, you may not:
Aggregate, scrape, index, copy, transmit, retransmit, reverse engineer or reproduce any of the data or content you find on or through our Site or APIs without our prior written permission.

In other words, Indeed does not allow unauthorized scraping of their platform. Violating these terms could result in your IP address being blocked or even legal action.

So what‘s the right way to access Indeed‘s job data? The best approach is to go through Indeed‘s official API, which provides structured access to listings with clear terms and pricing. However, getting API access requires an Indeed publisher account and approval.

If you do decide to scrape job listings from Indeed or other sites without approval, here are some best practices to follow:

Respect robots.txt – This file specifies which parts of a site can be scraped. Always check it before scraping and follow its instructions.
Limit your request rate – Sending too many requests too quickly can overload servers and get your IP address blocked. Throttle your scraper to mimic human browsing behavior.
Don‘t scrape personal data – Avoid collecting personal information like names, email addresses, or phone numbers in your scraping. Stick to public, non-sensitive data points.
Use the data responsibly – Don‘t use scraped data for spam, fraud, or other malicious purposes. Respect intellectual property rights and give credit to the original source.
Comply with GDPR and CCPA – If you‘re scraping personal data of EU or California residents, make sure you‘re meeting these regulations‘ strict data privacy and processing requirements.

At the end of the day, web scraping is a powerful tool – but it‘s up to you to wield it ethically and responsibly.

Conclusion

The job market is undergoing a rapid digital transformation. According to research from Gartner, 86% of organizations were conducting virtual interviews to hire candidates in the midst of the COVID-19 pandemic. A staggering 80% of recruitment is expected to be virtual going forward.

With this shift to online hiring, the importance of job listing data will only continue to grow. Those who can efficiently collect, analyze, and act on this data will have a major advantage – whether they‘re job seekers looking for their next role, recruiters trying to find the perfect candidate, or leaders striving to make smarter talent decisions.

Web scraping makes it possible to extract massive amounts of job listing data from sites like Indeed quickly and cost-effectively. Using no-code tools like ScrapingBee and Make, even non-technical users can automate the data collection process from start to finish.

With a consistent flow of structured job data, you can:

Track hiring trends and patterns over time
Benchmark salaries and benefits by role, industry, and location
Identify the most in-demand skills and qualifications
Optimize job titles and descriptions for better visibility and fit
Measure and forecast key talent metrics
And much more

Of course, with great data comes great responsibility. When scraping job listings or any other web data, it‘s crucial to do so ethically, legally, and in compliance with relevant regulations.

By following scraping best practices and using data for good, you can unlock powerful insights from job listings to help people find meaningful work, help companies find the right talent, and ultimately move the job market forward in a positive direction.

The treasure trove of insights is out there – now you know how to tap into it.