小提琴图可同时展示数据分布的概率密度形状与关键分位数,适用于探测多模态、偏态等复杂分布特征,多用于探索性数据分析。箱线图则专注于呈现中位数、四分位距及异常值,侧重于统计摘要的稳健比较,适用于样本量较小或需简洁展示多组差异的场景。二者核心差异在于信息维度:箱线图仅反映分布位置与尺度,而小提琴图额外揭示分布形态。实践中建议先以小提琴图识别数据潜在结构,再辅以箱线图进行组间统计量对比。原图源于Nature Communications杂志上的一篇文献:Genome-scale community modelling reveals conserved metabolic cross-feedings in epipelagic bacterioplankton communities。下面展示了Python绘制这类图的全部过程,供大家参考。
原文中的结果图:Fig. 3


Python代码
import pandas as pdimport ete3import scipy as sciimport numpy as npimport osimport seaborn as snsimport matplotlib as mplimport matplotlib.pyplot as pltimport mpl_toolkits.axes_grid1.inset_locatorfrom statannotations.Annotator import Annotatorimport itertoolsimport randomrandom.seed(0)
funvsphyl_dir = "./Fig/"%matplotlib inlinempl.rcParams['figure.dpi'] = 300mpl.rcParams['savefig.dpi'] = 300plt.rcParams['svg.fonttype'] = "none"CELL_COLORS = {"negative assoc.": "#377EB8", # 蓝色"unlinked": "#E41A1C", # 红色"positive assoc.": "#4DAF4A", # 绿色}
palette_colors = [CELL_COLORS[cat] for cat in order]df = pd.read_csv('./dataset.csv')new_categories = ("unlinked","positive assoc.","negative assoc.",)order = (new_categories[2], new_categories[0], new_categories[1])g_r = sns.catplot(data=df.set_index(["genome_a", "genome_b", "category"]).drop(columns="edge").melt(ignore_index=False).reset_index(),kind="violin",row="variable",x="value",y="category",palette=palette_colors, # 使用CELL经典颜色order=order,height=4.5 / 2,aspect=2,sharex=False,cut=0,)# 定义 graphtype 变量graphtype = "dRep95_MHQ"# Add annotations for significance for Mann-Whitney U testpairs = list(itertools.combinations(order, 2))for i, ax in enumerate(g_r.axes.flatten()):ax.set_title(ax.get_title().split("= ", 1)[1], fontsize=10)ax.set_ylabel(None)print(ax.get_title())boxplot_args = dict(data=df,x=ax.get_title(),y="category",palette=palette_colors, # 使用CELL经典颜色order=order,)annotator = Annotator(ax=ax,pairs=pairs,**boxplot_args,orient="h",)annotator.configure(test="Mann-Whitney",text_format="star",loc="inside",verbose=1,comparisons_correction="Bonferroni",)_, _ = annotator.apply_and_annotate()if i == 0:ax.set_xlim(0, 6)# 保存图片for ext in ("svg", "pdf", "png"):os.makedirs(os.path.join(funvsphyl_dir, ext), exist_ok=True)g_r.figure.savefig(os.path.join(funvsphyl_dir,ext,f"dRep95_MHQ_fundist-vs-phyldist-tests_{graphtype}.{ext}",),bbox_inches="tight",dpi=300,)plt.show()
复现图如下

往期回顾
参考文献:Giordano, N., Gaudin, M., Trottier, C. et al. Genome-scale community modelling reveals conserved metabolic cross-feedings in epipelagic bacterioplankton communities. Nat Commun 15, 2721 (2024). https://doi.org/10.1038/s41467-024-46374-w
以上内容为原创,转载需声明出处。

🔥亲测有效,一键运行,助你快速上手!
🔥整理不易,欢迎点赞分享给更多小伙伴~