在transbigdata中,栅格参数有如下几个
params=(lonStart,latStart,deltaLon,deltaLat,theta)
如何选择合适的栅格参数是很重要的事情,这会对最终的分析结果产生很大的影响。
怎么选择参数,和数据以及分析的目的息息相关,transbigdata提供了三种方法来优化
1 准备代码
1.1 导入库
import pandas as pd
import geopandas as gpd
import transbigdata as tbd
1.2 读取数据
1.2.1 轨迹数据
data=pd.read_csv('Downloads/TaxiData-Sample.csv',names= ['VehicleNum', 'Time', 'Lng', 'Lat', 'OpenStatus', 'Speed'])
data
1.2.2 area数据
area = gpd.read_file('Downloads/szarea1.json')
area
area.plot()
1.3 筛选在区域内的记录
data=tbd.clean_outofshape(data,area)
data
1.4 创建初始栅格
grid,initialparams=tbd.area_to_grid(area)
initialparams
'''
{'slon': 113.87256817484639,
'slat': 22.55155183165019,
'deltalon': 0.004869410314514816,
'deltalat': 0.004496605206422906,
'theta': 0,
'method': 'rect',
'gridsize': 500}
'''
grid.plot()
2 优化方法1——centerdist 最小化栅格中心和GPS数据之间的距离
- 当一批距离很近的数据分布在栅格边缘时,GPS数据的偏差会导致这些数据被匹配到不同的栅格中。
- 因此,解决方案之一是最小化栅格中心和GPS数据之间的距离。
params_op=tbd.grid_params_optimize(data,
initialparams,
col=['VehicleNum','Lng','Lat'],
optmethod='centerdist',
sample=0, #not sampling
printlog=True)
'''
Optimized index centerdist: 167.56608905526596
Optimized gridding params: {'slon': 113.87374968010685, 'slat': 22.553664777307173, 'deltalon': 0.004869410314514816, 'deltalat': 0.004496605206422906, 'theta': 44.131419745260644, 'method': 'rect'}
'''
3 优化方法2——gini:最大化基尼指数
基尼指数的较高值表示数据在给定栅格中的分布更集中
params_op=tbd.grid_params_optimize(data,
initialparams,
col=['VehicleNum','Lng','Lat'],
optmethod='gini',
sample=0, #not sampling
printlog=True)
'''
Optimized index gini: -0.07232170907948476
Optimized gridding params: {'slon': 113.87460338641485, 'slat': 22.554558793623986, 'deltalon': 0.004869410314514816, 'deltalat': 0.004496605206422906, 'theta': 45.108548092477754, 'method': 'rect'}
'''
可以看到最左上的斜条,这边是4个,而之前centerdist的是3个
4 优化方法4——gridscount: 最小化个体的栅格平均数
每个人都应出现在尽可能少的栅格中
params_op=tbd.grid_params_optimize(data,
initialparams,
col=['VehicleNum','Lng','Lat'],
optmethod='gridscount',
sample=0, #not sampling
printlog=True)
'''
Optimized index gridscount: 9.0
Optimized gridding params: {'slon': 113.87506228430335, 'slat': 22.55319001399235, 'deltalon': 0.004869410314514816, 'deltalat': 0.004496605206422906, 'theta': 40.90195581089501, 'method': 'rect'}
'''
5 同样适用于tri和hexa
Optimized index gridscount: 13.0
Optimized gridding params: {'slon': 113.87524021241177, 'slat': 22.55350036557914, 'deltalon': 0.004869410314514816, 'deltalat': 0.004496605206422906, 'theta': 23.722083498424578, 'method': 'tri'}
initialparams['method']='hexa'
params_op=tbd.grid_params_optimize(data,
initialparams,
col=['VehicleNum','Lng','Lat'],
optmethod='gridscount',
sample=0, #not sampling
printlog=True)