题目
有两个数据框。一个包含有关地址的信息,另一个包含了不同城市和州之间的关系。
示例:
df_addresses
address |
---|
4860 Sunset Boulevard, San Francisco, 94105 |
3055 Paradise Lane, Salt Lake City, 84103 |
682 Main Street, Detroit, 48204 |
9001 Cascade Road, Kansas City, 64102 |
5853 Leon Street, Tampa, 33605 |
df_cities
city | state |
---|---|
Salt Lake City | Utah |
Kansas City | Missouri |
Detroit | Michigan |
Tampa | Florida |
San Francisco | California |
编写一个函数 complete_address
来创建一个单一的数据框,其中包含完整的地址,格式为街道、城市、州、邮政编码。
输入:
import pandas as pd
addresses = {"address": ["4860 Sunset Boulevard, San Francisco, 94105", "3055 Paradise Lane, Salt Lake City, 84103", "682 Main Street, Detroit, 48204", "9001 Cascade Road, Kansas City, 64102", "5853 Leon Street, Tampa, 33605"]}
cities = {"city": ["Salt Lake City", "Kansas City", "Detroit", "Tampa", "San Francisco"], "state": ["Utah", "Missouri", "Michigan", "Florida", "California"]}
df_addresses = pd.DataFrame(addresses)
df_cities = pd.DataFrame(cities)
输出:
def complete_address(df_addresses,df_cities) ->
address |
---|
4860 Sunset Boulevard, San Francisco, California, 94105 |
3055 Paradise Lane, Salt Lake City, Utah, 84103 |
682 Main Street, Detroit, Michigan, 48204 |
9001 Cascade Road, Kansas City, Missouri, 64102 |
5853 Leon Street, Tampa, Florida, 33605 |
答案
答案代码
def complete_address(df_addresses, df_cities):
# Split address column into street, city, and zip code
df_addresses[['street', 'city', 'zip_code']] = df_addresses['address'].str.split(', ', expand=True)
# Merge with df_cities to get state information
df_complete = pd.merge(df_addresses, df_cities, on='city', how='left')
# Rearrange columns and drop unnecessary columns
df_complete = df_complete[['street', 'city', 'state', 'zip_code']]
# Concatenate state with the city in the city column
df_complete['city'] = df_complete['city'] + ', ' + df_complete['state']
# Drop the state column as it's redundant
df_complete.drop('state', axis=1, inplace=True)
# Concatenate street, city, and zip code to get complete address
df_complete['address'] = df_complete['street'] + ', ' + df_complete['city'] + ', ' + df_complete['zip_code']
# Drop intermediate columns
df_complete.drop(['street', 'city', 'zip_code'], axis=1, inplace=True)
return df_complete
代码说明
这段代码使用了Pandas库中的一些常见功能和技术,主要包括:
- 字符串操作:使用
str.split()
方法将地址列拆分为街道、城市和邮政编码,以及str.join()
方法将这些部分连接起来。 - 数据合并:使用
pd.merge()
函数将两个DataFrame进行合并,以获取城市对应的州信息。 - 列操作:使用
df[column_name]
来选择DataFrame中的某一列,以及使用df[[column_list]]
来选择多列。使用df.drop()
方法删除不需要的列。 - 数据重排:通过重新排列列的顺序,使DataFrame的列按特定顺序排列。
- 数据清洗:删除不需要的中间列,以及删除冗余的状态列。
这些操作结合起来,使得函数能够根据输入的地址数据和城市-州信息数据,生成完整的地址信息,并返回一个包含完整地址的DataFrame。