Python爬虫入门教程30：爬取拉勾网招聘数据信息

import csv
import requests

f = open('data.csv', mode='a', encoding='utf-8', newline='')
csv_writer = csv.DictWriter(f, fieldnames=[
    '标题',
    '城市',
    '公司名字',
    '学历',
    '经验',
    '薪资',
    '公司福利',
    '详情页',
])
csv_writer.writeheader()
url = 'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false'
data = {
    'first': 'true',
    'pn': '1',
    'kd': 'python'
}
headers = {
    'cookie': 'cookie',
    'referer': 'https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36'
}
response = requests.post(url=url, data=data, headers=headers)
result = response.json()['content']['positionResult']['result']
for index in result:
    # pprint.pprint(index)
    title = index['positionName']  # 标题
    city = index['city']  # 城市
    area = index['district']  # 区域
    city_area = city + '-' + area
    company_name = index['companyFullName']  # 公司名字
    edu = index['education']  # 学历
    money = index['salary']  # 薪资
    exp = index['workYear']  # 经验
    boon = index['positionAdvantage']  # 公司福利
    href = f'https://www.lagou.com/jobs/{index["positionId"]}.html'
    job_info = index['positionDetail'].replace('<br>\n', '').replace('<br>', '')
    dit = {
        '标题': title,
        '城市': city_area,
        '公司名字': company_name,
        '学历': edu,
        '经验': exp,
        '薪资': money,
        '公司福利': boon,
        '详情页': href,
    }
    csv_writer.writerow(dit)
    txt_name = company_name + '-' + title + '.txt'
    with open(txt_name, mode='w', encoding='utf-8') as f:
        f.write(job_info)
    print(dit)

💥爬取数据展示

在这里插入图片描述

Biegral Blog

Python爬虫入门教程30：爬取拉勾网招聘数据信息

前言💨

前文内容💨

基本开发环境💨

相关模块的使用💨

💥需求数据来源分析

💥代码实现

💥爬取数据展示

阅读排行

分类

归档