由于借助 AI 工具学习编程已经变得非常容易了,因此之后的课程就不再默认进行视频讲解了,如果特别需要视频讲解也可以联系李老师预约讲解~讲义材料学习过程中遇到的问题也可以及时与李老师联系。
在之前「使用 ggplot2 绘制地图——以中国地图为例」课程的基础上,我们今天再来学习下如何使用 Python 的 matplotlib 把邻国的区域也添加到中国地图上:
使用 reticulate 创建与管理 Python 虚拟环境
在 R 中通过 reticulate 包来调用 Python,最好的实践是为项目创建一个专属的 Python 虚拟环境,将所需依赖隔离到独立空间,避免与系统 Python(如 Anaconda)发生版本冲突。
重要说明(避免"已初始化"报错):reticulate 在 R 会话中只能绑定一次 Python——一旦某个 {python} 代码块运行,Python 解释器就被锁定,之后再调用 use_virtualenv() 会报错:
ERROR: The requested version of Python cannot be used, as another version has already been initialized.因此,虚拟环境的激活必须在所有 {python} 代码块之前完成。本文档的解决方案是在 setup chunk 中通过 Sys.setenv(RETICULATE_PYTHON = ...) 提前锁定 Python 路径,这是 reticulate 选取 Python 的最高优先级入口。
安装 reticulate(仅首次)
options(repos = c(CRAN = "https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))if (!requireNamespace("reticulate", quietly = TRUE)) { install.packages("reticulate") message("reticulate 安装完成!") message("reticulate 已安装,版本:", packageVersion("reticulate"))虚拟环境初始化原理(已在 setup chunk 中完成)
本文档的 setup chunk(隐藏运行)包含如下逻辑:
.venv_python <- virtualenv_python(.venv_name)if(!file.exists(.venv_python)){ virtualenv_create(.venv_name) .venv_python <- virtualenv_python(.venv_name)Sys.setenv(RETICULATE_PYTHON = .venv_python)use_virtualenv(.venv_name, required =TRUE)这样做的关键在于:knitr 在处理第一个 {python} chunk 时,reticulate 已经通过 RETICULATE_PYTHON 环境变量知道要使用 .venv,不会再去碰 Anaconda。
在虚拟环境中安装 Python 包(仅首次)
"numpy", "pandas", "geopandas", "matplotlib",installed <- py_list_packages(".venv")$packageneed_install <- setdiff(py_pkgs, installed)if (length(need_install) > 0) { virtualenv_install(".venv", packages = need_install) message("已安装缺失的包:", paste(need_install, collapse = ", ")) message("所有 Python 包已就绪,无需安装")验证激活状态
查看已安装的包
pkgs <- py_list_packages(".venv")key_pkgs <- c("numpy", "pandas", "geopandas", "matplotlib", "shapely", "openpyxl")pkgs[pkgs$package %in% key_pkgs, c("package", "version")]虚拟环境管理常用命令
# virtualenv_remove(".venv") # 删除虚拟环境# virtualenv_install(".venv", packages = "geopandas", ignore_installed = TRUE) # 升级某个包
小地图版本
1. 在地图上添加散点
首先加载所需 Python 包,设置字体和坐标系:
import matplotlib.pyplot as pltimport matplotlib.colors as mcolorsimport matplotlib.font_manager as fmfrom matplotlib.patches import Rectanglefrom shapely.geometry import box, Pointwarnings.filterwarnings("ignore")font_path = "LXGWWenKai-Regular.ttf"font_prop = fm.FontProperties(fname=font_path)fm.fontManager.addfont(font_path)plt.rcParams["font.family"] = font_prop.get_name()plt.rcParams["axes.unicode_minus"] = False"+proj=aea +lat_0=0 +lon_0=105 +lat_1=25 +lat_2=47 ""+x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"然后读取地图数据和散点数据:
plot_bbox = box(-2725586, 1800655, 2982768, 6000000)plot_bbox_gdf = gpd.GeoDataFrame(geometry=[plot_bbox], crs=MYCRS)citymap = gpd.read_file("chinacity2019mini/chinacity2019mini.shp")citymap = citymap[citymap["省代码"].notna()].copy()citymap = citymap.to_crs(MYCRS)citylinemap_all = gpd.read_file("chinacity2019mini/chinacity2019mini_line.shp")citylinemap_all = citylinemap_all.to_crs(MYCRS)citylinemap = citylinemap_all[ citylinemap_all["class"].isin(["九段线", "海岸线", "小地图框格"])在之前课程的基础上,我们还需要读取一份邻国地图数据:
china_neighboring = gpd.read_file("china_neighboring/china_neighboring.shp")china_neighboring = china_neighboring.to_crs(MYCRS)读取企业散点数据,并将经纬度坐标转换为投影坐标:
pointdf = pd.read_excel("瞪羚、独角兽、创新型企业经纬度数据.xlsx")pointdf = pointdf[pointdf["经度"].notna()].copy()print(f"数据量:{len(pointdf)} 条")# 转换成 GeoDataFrame(WGS84 → 投影坐标)pointdfsf = gpd.GeoDataFrame( geometry=gpd.points_from_xy(pointdf["经度"], pointdf["纬度"]),pointdfsf = pointdfsf.to_crs(MYCRS)接下来处理小地图框格内的散点——先提取小地图范围内的点,然后将其缩放并平移到小地图框格的位置:
small_bbox = box(120000, 320000, 1766004.1, 2557786.0)pointdfsf_small = pointdfsf[pointdfsf.geometry.within(small_bbox)].copy()print(f"小地图内散点数:{len(pointdfsf_small)}")# 缩放 0.5 倍 + 平移(等价于 R 中 geometry * 0.5 + c(2100000, 1665139))for geom in pointdfsf_small.geometry: new_geoms.append(Point(geom.x * 0.5 + 2100000, geom.y * 0.5 + 1665139))pointdfsf_small = pointdfsf_small.copy()pointdfsf_small = pointdfsf_small.set_geometry(new_geoms)pointdfsf_small = pointdfsf_small.set_crs(MYCRS)pointdfsfall = pd.concat([pointdfsf, pointdfsf_small], ignore_index=True)pointdfsfall = gpd.GeoDataFrame(pointdfsfall, geometry="geometry", crs=MYCRS)设置省份颜色表(对应 R 中 ggsci::default_igv)和线条样式:
# 省份颜色表(32 色,对应 ggsci::default_igv)province_list = sorted(pointdfsfall["省"].dropna().unique())color_map = {prov: IGV_COLORS[i % len(IGV_COLORS)] for i, prov inenumerate(province_list)}pointdfsfall["color"] = pointdfsfall["省"].map(color_map)LINE_WIDTH = {"九段线": 0.6, "海岸线": 0.3, "小地图框格": 0.3}然后就可以绘制散点地图了:
defadd_scale_bar(ax, ref_gdf, location="bl"): bar_x = xlim[0] + ax_w * 0.03 bar_y = ylim[0] + ax_h * 0.05 fc = "black"if i % 2 == 0else"white" (bar_x + i * seg_w, bar_y), seg_w, bar_h, facecolor=fc, edgecolor="black", linewidth=0.6, ax.text(bar_x, bar_y - bar_h * 0.5, "0", fontsize=9, ha="center", va="top", fontproperties=font_prop, zorder=11) ax.text(bar_x + bar_w, bar_y - bar_h * 0.5, f"{bar_km} km", fontsize=9, ha="center", va="top", fontproperties=font_prop, zorder=11) x_center = xlim[1] - ax_w * 0.05 y_base = ylim[1] - ax_h * 0.08"", xy=(x_center, y_base + arrow_len), xytext=(x_center, y_base), arrowprops=dict(arrowstyle="-|>", color="black", lw=2), zorder=11 ax.text(x_center, y_base + arrow_len + ax_h * 0.015, "N", fontsize=12, ha="center", va="bottom", fontweight="bold", fontproperties=font_prop, zorder=11)2. 填充地图
下面我们再演示下填充地图的绘制。
首先统计每个城市的公司数量,并与地图数据合并:
citydf = pointdf.groupby(["市", "市代码"]).size().reset_index(name="n")citymap2 = citymap.merge(citydf, on=["市", "市代码"], how="left")citymap2["n"] = citymap2["n"].fillna(0).astype(int)print(citymap2["n"].describe())然后绘制连续填充地图:
n = n + 1 是为了对数化之后 0 变成 -Inf,在图上会反映为缺失值。
也可以使用分段填色:
cutlist_raw = np.quantile(citymap2["n"].values, np.arange(1, 11) / 10)cutlist = sorted(set(cutlist_raw.astype(int).tolist()))labels_c = ["<= 1", "1~3", "3~9", "9~18", "19~39", "39~92", ">= 92"]citymap3 = citymap2.copy()citymap3["group"] = pd.cut( citymap3["n"], bins=cutlist, labels=labels_c[: len(cutlist) - 1],print(citymap3["group"].value_counts().sort_index())# 离散色阶(使用 inferno_r 的 0~0.95 范围)acton_colors = [CMAP_ACTON(i / (n_cats - 1) * 0.95) for i inrange(n_cats)]cat_color_map = dict(zip(labels_c, acton_colors))fig, ax = plt.subplots(figsize=(11.5, 8.5))plot_bbox_gdf.plot(ax=ax, color="#BBD1EB", zorder=1) subset = citymap3[citymap3["group"] == label] ax=ax, facecolor=cat_color_map[label], edgecolor="gray", linewidth=0.01, zorder=2citymap3[citymap3["group"].isna()].plot( ax=ax, facecolor="white", edgecolor="gray", linewidth=0.01, zorder=2china_neighboring.plot(ax=ax, facecolor="#EDEDED", edgecolor="none", zorder=3)for _, row in china_neighboring.iterrows(): centroid = row.geometry.centroid centroid.x, centroid.y, row["country_cn"], fontsize=8, color="gray", ha="center", va="center", fontproperties=font_prop, zorder=6for cls in ["九段线", "海岸线", "小地图框格"]: subset = citylinemap[citylinemap["class"] == cls] subset.plot(ax=ax, color=LINE_COLOR[cls], linewidth=LINE_WIDTH[cls], zorder=4) pointdfsfall.geometry.x, pointdfsfall.geometry.y, c=pointdfsfall["color"], s=0.3, linewidths=0, zorder=5ax.set_xlim(plot_bbox_gdf.total_bounds[0], plot_bbox_gdf.total_bounds[2])ax.set_ylim(plot_bbox_gdf.total_bounds[1], plot_bbox_gdf.total_bounds[3])# 手动绘制分类图例(左下角,两列排列,从上到下、从左到右)legend_x = xlim[0] + ax_w * 0.03legend_y = ylim[0] + ax_h * 0.10n_rows_per_col = (len(labels_c) + n_cols - 1) // n_cols
长版地图
长版地图的绘制更简单一些,这里演示省级分段填充地图的绘制。
首先读取长版省级地图、线条和邻国数据:
long_bbox = box(-2925762.0, 377031.1, 2507277.8, 6221888.6)long_bbox_gdf = gpd.GeoDataFrame(geometry=[long_bbox], crs=MYCRS)provmap = gpd.read_file("chinaprov2021long/chinaprov2021long.shp")provmap = provmap[provmap["省代码"].notna()].copy()provmap = provmap.to_crs(MYCRS)provlinemap_all = gpd.read_file("chinaprov2021long/chinaprov2021long_line.shp")provlinemap_all = provlinemap_all.to_crs(MYCRS)provlinemap = provlinemap_all[ provlinemap_all["class"].isin(["九段线", "海岸线"])china_neighboring_long = gpd.read_file("china_neighboring_long/china_neighboring_long.shp")china_neighboring_long = china_neighboring_long.to_crs(MYCRS)读取散点数据并汇总统计每个省份的企业数量:
pointdf = pd.read_excel("瞪羚、独角兽、创新型企业经纬度数据.xlsx")pointdf = pointdf[pointdf["经度"].notna()].copy()pointdfsf = gpd.GeoDataFrame( geometry=gpd.points_from_xy(pointdf["经度"], pointdf["纬度"]),pointdfsf = pointdfsf.to_crs(MYCRS)province_list_long = sorted(pointdfsf["省"].dropna().unique())color_map_long = {prov: IGV_COLORS[i % len(IGV_COLORS)] for i, prov inenumerate(province_list_long)}pointdfsf["color"] = pointdfsf["省"].map(color_map_long)provdf = pointdfsf.drop(columns="geometry").groupby(["省", "省代码"]).size().reset_index(name="n")provdf2 = provmap.merge(provdf, on=["省", "省代码"], how="left")provdf2["n"] = provdf2["n"].fillna(0).astype(int)按分位数分组:
cutlist_raw = np.quantile(provdf2["n"].values, np.arange(1, 9) / 8)cutlist = [0] + sorted(set(cutlist_raw.astype(int).tolist()))labels_p = ["<= 8", "8~28", "28~58", "58~159","159~513", "513~870", "870~2459", "> 2450"]provdf3["group"] = pd.cut( provdf3["n"], bins=cutlist, labels=labels_p[: len(cutlist) - 1],print(provdf3["group"].value_counts().sort_index())然后绘制长版省级填充地图:
LINE_WIDTH_LONG = {"九段线": 0.6, "海岸线": 0.3}acton_colors_long = [CMAP_ACTON(i / (n_cats - 1) * 0.95) for i inrange(n_cats)]cat_color_map_long = dict(zip(labels_p, acton_colors_long))fig, ax = plt.subplots(figsize=(8, 9))long_bbox_gdf.plot(ax=ax, color="#BBD1EB", zorder=1) subset = provdf3[provdf3["group"] == label] ax=ax, facecolor=cat_color_map_long[label], edgecolor="gray", linewidth=0.01, zorder=2provdf3[provdf3["group"].isna()].plot( ax=ax, facecolor="white", edgecolor="gray", linewidth=0.01, zorder=2china_neighboring_long.plot(ax=ax, facecolor="#EDEDED", edgecolor="none", zorder=3)for _, row in china_neighboring_long.iterrows(): centroid = row.geometry.centroid centroid.x, centroid.y, row["country_cn"], fontsize=7, color="gray", ha="center", va="center", fontproperties=font_prop, zorder=6for cls in ["九段线", "海岸线"]: subset = provlinemap[provlinemap["class"] == cls] subset.plot(ax=ax, color=LINE_COLOR_LONG[cls], linewidth=LINE_WIDTH_LONG[cls], zorder=4) pointdfsf.geometry.x, pointdfsf.geometry.y, c=pointdfsf["color"], s=0.3, linewidths=0, zorder=5ax.set_xlim(long_bbox_gdf.total_bounds[0], long_bbox_gdf.total_bounds[2])ax.set_ylim(long_bbox_gdf.total_bounds[1], long_bbox_gdf.total_bounds[3])# 分类图例(左侧,下移到 0.12 位置,从上到下排列)
关键函数对照表
| Python(matplotlib + geopandas) | |
|---|
st_bbox() %>% st_as_sfc() | shapely.geometry.box() | |
read_sf() | gpd.read_file() | |
st_as_sf(coords, crs) | gpd.points_from_xy() + GeoDataFrame(crs) | |
st_transform(crs) | gdf.to_crs(crs) | |
st_intersection() | gdf.geometry.within() | |
bind_rows() | pd.concat() | |
geom_sf() | gdf.plot() | |
annotation_scale() | | |
annotation_north_arrow() | ax.annotate(arrowprops) | |
scale_fill_scico() | plt.cm.get_cmap() | |
scale_fill_scico_d() | pd.cut() | |
ggsave() | plt.savefig() | |
如何参加课程?
是不是感觉很硬核!欢迎报名 RStata 培训班获取全部课程和以会员价获取数据资料(10元/份)详情可阅读这篇推文:数据处理、图表绘制、效率分析与计量经济学如何学习~
详情可点击阅读原文进入 RStata 学院了解(从首页的会员卡专区即可查看和购买会员卡)。
更多关于 RStata 培训班的信息可添加微信号 r_stata2 咨询:

附件下载(点击文末的阅读原文即可跳转):
https://rstata.duanshu.com/#/brief/course/1295d878ae3442e7b38c67ce1c0be493






