python爬虫出租屋
python爬虫租房信息在地图上显示的方法本人初学python是菜鸟级,写的不好勿喷。
python爬虫用了比较简单的urllib.parse和requests,把爬来的数据显示在地图上。接下里我们话不多说直接上代码:
1.安装python环境和编辑器(自行度娘)
2.本人以58品牌公寓为例,爬取在杭州地区价格在2000-4000的公寓。
|
#-*- coding:utf-8 -*- from bs4 import beautifulsoup from urllib.parse import urljoin import requests import csv import time |
以上是需要引入的模块
|
url = "http://hz.58.com/pinpaigongyu/pn/{page}/?minprice=2000_4000" #已完成的页数序号,初时为0 page = 0 |
以上的全局变量
|
csv_file = open (r "c:\users\****\desktop\houosenew.csv" , "a+" ,newline = '') csv_writer = csv.writer(csv_file, delimiter = ',' ) |
自定义某个位置来保存爬取得数据,本人把爬取得数据保存为csv格式便于编辑(其中”a+”表示可以多次累加编辑在后面插入数据,建议不要使用“wb”哦!newline=”表示没有隔行)
|
while true: #为了防止网站屏蔽ip,设置了时间定时器每隔5秒爬一下。打完一局农药差不多都爬取过来了。 time.sleep( 5 ) page + = 1 #替换url中page变量 print (url. format (page = page) + "ok" ) response = requests.get(url. format (page = page)) html = beautifulsoup(response.text) #寻找html中dom节点li house_list = html.select( ".list > li" ) # 循环在读不到新的房源时结束 if not house_list: break for house in house_list: #根据hml的dom节点获取自己需要的数据 house_title = house.select( "h2" )[ 0 ].string house_url = urljoin(url, house.select( "a" )[ 0 ][ "href" ]) house_pic = urljoin(url, house.select( "img" )[ 0 ][ "lazy_src" ]) house_info_list = house_title.split() # 如果第一列是公寓名 则取第二列作为地址 if "公寓" in house_info_list[ 0 ] or "青年社区" in house_info_list[ 0 ]: house_location = house_info_list[ 0 ] else : house_location = house_info_list[ 1 ] house_money = house.select( ".money" )[ 0 ].select( "b" )[ 0 ].string csv_writer.writerow([house_title, house_location, house_money,house_pic ,house_url]) #最后不要忘记关闭节流 csv_file.close() |
如果网站屏蔽了你的ip,你可以做一个ip地址数组放在http的头部具体度娘一下吧。
接下来我们写html
只是简单的写了一下写的不好见谅。用的是高德地图,具体的js api可以到高德开发者上去看。
|
<body> <li id = "container" >< / li> <li class = "control-panel" > <li class = "control-entry" > <label>选择工作地点:< / label> <li class = "control-input" > < input id = "work-location" type = "text" > < / li> < / li> <li class = "control-entry" > <label>选择通勤方式:< / label> <li class = "control-input" > < input type = "radio" name = "vehicle" value = "subway,bus" onclick = "takebus(this)" checked / > 公交 + 地铁 < input type = "radio" name = "vehicle" value = "subway" onclick = "takesubway(this)" / > 地铁 < input type = "radio" name = "vehicle" value = "walk" onclick = "takewalk(this)" / > 走路 < input type = "radio" name = "vehicle" value = "bike" onclick = "takebike(this)" / > 骑车 < / li> < / li> <li class = "control-entry" > <label>导入房源文件:< / label> <li class = "control-input" > < input type = "file" name = "file" id = "filecsv" / > <button style = "margin-top: 10px;width: 50%;" onclick = "changecsv()" >开始< / button> < / li> < / li> < / li> <li id = "transfer-panel" >< / li> <script> var map = new amap. map ( "container" , { resizeenable: true, zoomenable: true, center: [ 120.1256856402492 , 30.27289264553506 ], zoom: 12 }); / / 添加标尺 var scale = new amap.scale(); map .addcontrol(scale); / / 公交到达圈对象 var arrivalrange = new amap.arrivalrange(); / / 经度,纬度,时间(用不到),通勤方式(默认是地铁+公交 + 走路 + 骑车) var x, y, t, vehicle = "subway,bus" ; / / 工作地点,工作标记 var workaddress, workmarker; / / 房源标记队列 var rentmarkerarray = []; / / 多边形队列,存储公交到达的计算结果 var polygonarray = []; / / 路径规划 var amaptransfer; / / 信息窗体对象 var infowindow = new amap.infowindow({ offset: new amap.pixel( 0 , - 30 ) }); / / 地址补完的使用 var auto = new amap.autocomplete({ / / 通过 id 指定输入元素 input : "work-location" }); / / 添加事件监听,在选择补完的地址后调用worklocationselected amap.event.addlistener(auto, "select" , worklocationselected); function takebus(radio) { vehicle = radio.value; loadworklocation() } function takesubway(radio) { vehicle = radio.value; loadworklocation() } function takewalk(radio){ vehicle = radio.value; loadworklocation() } function takebike(radio) { vehicle = radio.value; loadworklocation() } / / 获取加载的文件 function changecsv() { $( "#filecsv" ).csv2arr(function (res) { $.each(res, function (k, p) { if (res[k][ 1 ]) { / / addmarkerbyaddress(地址,价格,展示的图片) addmarkerbyaddress(res[k][ 1 ], res[k][ 2 ],res[k][ 3 ]) } }) }); } function worklocationselected(e) { workaddress = e.poi.name; loadworklocation(); } function loadworkmarker(x, y, locationname) { workmarker = new amap.marker({ map : map , title: locationname, icon: 'http://webapi.amap.com/theme/v1.3/markers/n/mark_r.png' , position: [x, y] }); } function loadworkrange(x, y, t, color, v) { arrivalrange.search([x, y], t, function (status, result) { if (result.bounds) { for (var i = 0 ; i < result.bounds.length; i + + ) { / / 新建多边形对象 var polygon = new amap.polygon({ map : map , fillcolor: color, fillopacity: "0.4" , strokecolor: color, strokeopacity: "0.8" , strokeweight: 1 }); / / 得到到达圈的多边形路径 polygon.setpath(result.bounds[i]); polygonarray.push(polygon); } } }, { policy: v }); } function addmarkerbyaddress(address, money,imgurl) { var geocoder = new amap.geocoder({ city: "杭州" , radius: 标签:
|