[R] How to communicate with your data? - ggplot2

We have gone through the basic part of how to clean and process before analyzing your data.

How to communicate with your data?

R语言具有生成各种图形的多种可能性。

并非所有图形功能对初学者来说都是必要的。 复杂的图形需要长代码。

我们将从简单的图形元素开始,然后逐步定制复杂图形。

Which package do we need: ggplot 2

>library (ggplot2)

What can we do?

For continuous variables:

Creating, editing coloring histogram

For categorical variables

Creating, editing coloring bar plot
我们需要哪个包:

ggplot2 >库(ggplot2)

我们能做什么

对于连续变量: 创建,编辑着色直方图

对于分类变量: 创建,编辑着色条形图

# 导入 ggplot2 包
library(ggplot2)

# 创建一个数据框
data <- data.frame(
  x = c(1, 2, 3, 4, 5),
  y = c(2, 3, 4, 5, 6)
)

# 使用 ggplot 函数创建一个散点图
ggplot(data, aes(x = x, y = y)) +
  geom_point()


Separate parts or layers

In ggplot2, a plot can be subdivided into separate parts or layers, each of which contributes to the final appearance of the plot. This layering system allows you to add different elements to the plot, such as data points, lines, text, and annotations, in a flexible and customizable way.

Here's a brief explanation of the key components of a ggplot2 plot:

  1. Data: The data you want to visualize, typically in the form of a data frame.

  2. Aesthetic Mapping (aes) adj. 审美的,美学的;美的,艺术的: Aesthetic mappings define how variables in the data are mapped to visual properties of the plot, such as x and y positions, colors, shapes, and sizes. 

  3. Geoms (Geometric Objects): Geoms are the visual elements that represent the data in the plot, such as points, lines, bars, and polygons. Each geom function adds a new layer to the plot.

  4. Facets: Facets allow you to create multiple plots, each showing a different subset of the data. You can facet by one or more variables to create small multiples.

  5. Stats (Statistical Transformations): Stats are used to calculate summary statistics or perform transformations on the data before plotting. Each stat function can be thought of as a new dataset that is plotted using a geom.

  6. Scales: Scales control how the data values are mapped to the visual properties of the plot, such as axes, colors, and sizes. You can customize scales to change the appearance of the plot.

  7. Coordinate Systems: Coordinate systems determine how the plot is spatially arranged. The default is Cartesian coordinates, but ggplot2 also supports polar coordinates and other specialized coordinate systems.

By combining these components and adding them in layers, you can create complex and informative visualizations that effectively communicate insights from your data.

Using mtcars dataset to explore:

The mtcars dataset in R contains information about various features of 32 different automobiles from the early 1970s. Here are the meanings of the variables in the mtcars dataset:

  1. mpg: Miles per gallon (fuel efficiency).
  2. cyl: Number of cylinders.
  3. disp: Displacement (engine size) in cubic inches.
  4. hp: Gross horsepower.
  5. drat: Rear axle ratio.
  6. wt: Weight (in 1000 lbs).
  7. qsec: 1/4 mile time (in seconds).
  8. vs: Engine type, where 0 = V-shaped and 1 = straight.
  9. am: Transmission type, where 0 = automatic and 1 = manual.
  10. gear: Number of forward gears.
  11. carb: Number of carburetors.
#Load mtcars and ggplot2
data("mtcars")
str(mtcars)

library(ggplot2)
'data.frame':	32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

It tell the performances of cars in the US.

ggplot(mtcars,aes(x=mpg))+geom_histogram()

ggplot(mtcars,aes(x=cyl))+geom_histogram()

It look poor.

ggplot(mtcars,aes(x=mpg))+geom_dotplot()

The resulting image is a dot plot where each dot represents a car from the mtcars dataset, and the position of the dot on the x-axis represents its miles per gallon value. The dot plot can give you an idea of the distribution of miles per gallon values in the dataset and can help identify any patterns or outliers.

ggplot(mtcars,aes(x=qsec))+geom_area(stat="bin")

The code attempts to create an area plot using the qsec variable from the mtcars dataset.

ggplot(mtcars,aes(x=disp))+geom_density()

#or

ggplot(mtcars,aes(x=disp))+geom_density(kernel ="gaussian")

The code creates a density plot using the disp (displacement) variable from the mtcars dataset. Here's a breakdown of the code:

  • ggplot(mtcars, aes(x = disp)): This sets up the basic plot using the mtcars dataset and specifies that the x-axis of the plot should represent the disp variable.

  • geom_density(): This adds a layer to the plot, specifying that the data should be displayed as a density plot.

Density plots are useful for visualizing the distribution of a continuous variable and can help identify patterns such as peaks, valleys, and skewness偏度 in the data.

In a density plot created using geom_density(), the y-axis represents the density of the data at each point along the x-axis. Density is a way of representing the distribution of data values. It is calculated using kernel density estimation, which estimates the probability density function of the underlying variable.

Graphing

poor for publication

1.binwidth

2. color

3. title and labels

4. Gaussian curve: from a normal distribution or not

Change four parameters in my bar design= change to be made on Geom

Binwidth=nbr Change the bar width

Fill ="name of the colour" Change the colour with which the bar is filled

Colour="name of the colour” Change the outline of the bar

Alpha=nbr  Change the transparency of the colour

ggplot(mtcars,aes(x=mpg))+geom_histogram(binwidth = 5)
ggplot(mtcars,aes(x=mpg))++geom_histogram(fill="blue",binwidth=5)
ggplot(mtcars,aes(x=mpg))+geom_histogram(fill="skyblue",alpha=0.7,binwidth=5,colour="grey")

#Let's practice, hisogram of BMI in purple
#after importing the excel file with File->Import dataset->From excel
ggplot(SEE_students_data_2,aes(x=BMI))+geom_histogram(binwidth = 1, fill="purple",colour="black",alpha=0.5)

  • ggplot(SEE_students_data_2, aes(x = BMI)):使用SEE_students_data_2数据集,将BMI变量映射到x轴。

  • geom_histogram(binwidth = 1, fill = "purple", colour = "black", alpha = 0.5):添加直方图层,其中binwidth = 1指定每个直方柱的宽度为1(即每个单位)。fill = "purple"设置直方图的填充颜色为紫色,colour = "black"设置边框颜色为黑色,alpha = 0.5设置透明度为0.5,使得直方图具有一定的透明度。

  • ggplot(SEE_students_data_2, aes(x = BMI)): This sets up the basic plot using the SEE_students_data_2 dataset and maps the BMI variable to the x-axis.

  • geom_histogram(binwidth = 1, fill = "purple", colour = "black", alpha = 0.5): This adds a histogram layer to the plot. binwidth = 1 specifies the width of each histogram bin as 1 (i.e., each unit). fill = "purple" sets the fill color of the histogram bars to purple, colour = "black" sets the border color to black, and alpha = 0.5 sets the transparency to 0.5, giving the histogram bars some transparency.

Tips:

1. Since male and female depends on the variable Gender, the fill option should be specified in the aesthetics part

2. Geom_area require the option stat=bin when there is no variable plot on the Y axis

ggplot(SEE_students_data_2,aes(x=BMI, fill=Gender))+geom_density(colour="black",alpha=0.5)

  • ggplot(SEE_students_data_2, aes(x = BMI, fill = Gender)): Sets up the basic plot using the SEE_students_data_2 dataset. The aes() function maps the BMI variable to the x-axis and uses the Gender variable to fill the density curves by gender.

  • geom_density(colour = "black", alpha = 0.5): Adds a density plot layer to the plot. The colour = "black" argument sets the color of the density curve outlines to black, and the alpha = 0.5 argument sets the transparency of the density curves to 0.5, making them partially transparent.

 

ggplot(SEE_students_data_2,aes(x=BMI, fill=Gender)) + geom_area(stat="bin", colour="black",alpha=0.5,binwidth=1)

Geom_area require the option stat=bin when there is no variable to plot on the Y axis

ggplot(SEE_students_data_2,aes(x=BMI, fill=Gender))+geom_density(colour="black",alpha=0.5)+labs(title="Body Mass index per Gender\nSEE Students", y="Frequency",x="Body Mass Index")

#add a title and axis title to the BMI  geom_density graph

Unvariate categorical data

#Graphing a factor variable using geom_bar()

ggplot(SEE_students_data_2,aes(x=Gender))+geom_bar()

 

#adding color to the bar using a set, a given color, manually defined colors
ggplot(SEE_students_data_2,aes(x=Gender, fill=Gender))+geom_bar(alpha=0.5)+scale_fill_brewer(palette="Set1")
ggplot(SEE_students_data_2,aes(x=Gender, fill=Gender))+geom_bar()+scale_fill_brewer(palette = "Blues")
ggplot(SEE_students_data_2,aes(x=Gender,fill=Gender))+geom_bar(alpha=0.75)+scale_fill_manual(values=c("pink","blue"))
  1. ggplot(SEE_students_data_2, aes(x = Gender, fill = Gender)) + geom_bar(alpha = 0.5) + scale_fill_brewer(palette = "Set1"): This code creates a bar plot where each bar is filled with a color from the "Set1" color palette调色板, which is part of the RColorBrewer酿造师 package. The alpha = 0.5 argument sets the transparency of the bars to 0.5, making them partially transparent.

  2. ggplot(SEE_students_data_2, aes(x = Gender, fill = Gender)) + geom_bar() + scale_fill_brewer(palette = "Blues"): This code creates a bar plot with bars filled with shades of blue from the "Blues" color palette. The bars are fully opaque by default.

  3. Manually defined color: ggplot(SEE_students_data_2, aes(x = Gender, fill = Gender)) + geom_bar(alpha = 0.75) + scale_fill_manual(values = c("pink", "blue")): This code creates a bar plot with bars filled with the colors "pink" and "blue", using the scale_fill_manual() function to manually specify the colors. The alpha = 0.75 argument sets the transparency of the bars to 0.75, making them partially transparent.

Order the bar in the right order:

# Install and load the forcats package
install.packages("forcats")
library(forcats)

# Create the plot with the reordered factor levels
ggplot(CUHKSZ_employment_survey_1, aes(fct_infreq(Occupation, palette="Blues")) +
  geom_bar(fill = Occupation, alpha = 0.75) +
  scale_fill_brewer(palette = "Blues")
  • ggplot(CUHKSZ_employment_survey_1, aes(x = fct_infreq(Occupation), fill = Occupation)): This sets up the basic plot using the CUHKSZ_employment_survey_1 dataset. The x aesthetic uses the fct_infreq() function from the forcats package to reorder the Occupation variable based on frequency. The fill aesthetic fills the bars based on the Occupation variable.

  • geom_bar(alpha = 0.75): This adds a bar plot layer to the plot. The alpha parameter sets the transparency of the bars to 0.75, making them partially transparent.

  • scale_fill_brewer(palette = "Blues"): This sets the fill color of the bars using the "Blues" color palette from the RColorBrewer package.

  • the fill = Occupation aesthetic is used to fill the bars of the bar plot based on the levels of the Occupation variable. Each unique level of the Occupation variable will be represented by a different color in the plot, which can be helpful for distinguishing between different categories or groups in the data.

  • additional resources: STHDA - Homeicon-default.png?t=N7T8http://www.sthda.com/english/

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/433980.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

基于SSM的医院挂号系统

1 引言 1.1 课题背景及意义 社会发展迅速&#xff0c;以往的管理方式已经满足不了人们对获得信息的方式、方便快捷的需求。医院门诊挂号系统慢慢的被人们关注。网上获取信息十分的实时、便捷&#xff0c;只要系统在线状态&#xff0c;无论在哪里都能第一时间查找到理想的信息…

深度解析速卖通商品详情API:Python实战与高级技术探讨

速卖通商品详情API接口实战&#xff1a;Python代码示例 一、准备工作 在开始之前&#xff0c;请确保你已经完成了以下步骤&#xff1a; 在速卖通开放平台注册账号并创建应用&#xff0c;获取API密钥。阅读速卖通商品详情API接口的文档&#xff0c;了解接口的使用方法和参数要…

静态住宅代理IP如何选?这5点最重要

静态住宅代理IP&#xff0c;是一种在网络通信过程中提供固定IP地址的代理服务。与动态代理IP相比&#xff0c;静态代理IP提供的是持久且不变的IP地址。这种稳定性使得静态代理IP在需要长期稳定网络身份的场景中&#xff0c;如跨境电商/社媒养号、网络监控、品牌保护、长期数据爬…

人力资源管理软件大比拼:这篇文章帮你做出明智选择!

本期为您盘点的助力现代企业强力提效的人力资源管理软件有&#xff1a;Zoho People&#xff0c;Workday&#xff0c;BambooHR和Namely。 Zoho People人力资源管理软件 Zoho People是一款全面的云端人力资源管理&#xff08;HRM&#xff09;软件&#xff0c;由Zoho Corporation…

Android开发工程师面试题,2024年Android开发陷入饱和

前言 马上快到金三银四都春招阶段了&#xff0c;在这本就是跳槽、找工作的年后黄金时间&#xff0c;大多数求职者都早早做好年后求职的准备&#xff0c;其中不乏有年前早早辞了工作准备年后跳槽的有经验的职场老人们&#xff0c;也有一批即将毕业的应届毕业生的职场新人们。 …

软件测试需求分析如何编写?为什么要进行测试需求分析?

在软件开发的过程中&#xff0c;软件测试需求分析是至关重要的一个环节。测试需求分析是指对待测软件的需求进行全面细致的分析&#xff0c;明确软件测试的目标和范围&#xff0c;为测试活动的进行提供指导。通过对软件需求的详细分析&#xff0c;可以确保测试人员清楚了解软件…

diffusion model (扩散模型)原理

扩散模型分为正向过程和反向过程。 正向过程为一点点在图片上添加噪声的过程&#xff0c;反向过程为去噪声的过程。 图片的生成就是反向过程&#xff0c;给一张高斯噪声图片&#xff0c;逐步去噪生成图片。 扩散模型和VAE的区别&#xff0c; VAE是一步到位的&#xff08;通过…

7大必备应用推荐,为你的 Nextcloud 实例增添更多效率功能

适用于 Linux 的开源云存储软件有很多&#xff0c;ownCloud、Seafile 和 Pydio 只是其中的几个。 不过&#xff0c;如果您非常重视安全问题&#xff0c;并希望完全掌管您的数据&#xff0c;可以选择​Nextcloud并将其安装到您的服务器上。​ Nextcloud 是一个基于 PHP 的开源安…

NetOps-Python实现网络设备SFTP配置

一、网络设备文件管理 1、基本概念 ①配置文件 网络设备配置文件是命令的集合。 ②作用 用户将当前配置保存到配置文件中&#xff0c;以便设备重启后&#xff0c;这些配置能够继续生效。通过配置文件&#xff0c;用户可以非常方便地查阅配置信息将配置文件下载到本地设备&…

【DevSecOps】你的应用真的安全吗?

【DevSecOps】你的应用真的安全吗? 由于当今一切都运行在云计算上,并依靠互连系统来提供尖端的业务服务,以满足客户永无止境的需求,因此企业需要采用最先进的技术来保持活力也就不足为奇了,以此来领先于他们的竞争对手。 这种需求不仅需要创新服务,还需要快速开发和更快…

腾讯云服务器CVM_云主机_云计算服务器_弹性云服务器

腾讯云服务器CVM提供安全可靠的弹性计算服务&#xff0c;腾讯云明星级云服务器&#xff0c;弹性计算实时扩展或缩减计算资源&#xff0c;支持包年包月、按量计费和竞价实例计费模式&#xff0c;CVM提供多种CPU、内存、硬盘和带宽可以灵活调整的实例规格&#xff0c;提供9个9的数…

OPENWRT本地局域网模拟域名多IP

本地配置MINIO服务时&#xff0c;会遇到域名多IP的需求。当某一个节点失效时&#xff0c;可以通过域名访问平滑过渡到其它的节点继续服务。 【MINIO搭建过程略】 搭建完毕后&#xff0c;有4个节点&#xff0c;对应的docker搭建命令&#xff1a; docker run --nethost --rest…

通过vue ui创建项目

确认前端环境都安装好之后 打开黑窗口 输入 vue ui 会打开一个vue的网页 在此创建项目 可以选择在那个路径创建 这是我的项目配置 这里是选择vue版本 我要用的是vue2 选好点击创建项目就好了 创建好后的重点的目录结构以及结构的作用 启动前端工程 将创建好的项目导入编译器 我…

141.乐理基础-男声女声音域、模唱、记谱与实际音高等若干问题说明

上一个内容&#xff1a;140.乐理基础-音程的转位-CSDN博客 上一个内容练习的答案&#xff1a;红色箭头指向的是转为&#xff0c;比如第一个只要写成c低g高都是正确的&#xff0c;不一定非要和图中一样 首先在 12.音域、1C到底是那一组的C 里面写了人声的音域&#xff0c;大致默…

Sui推出项目提案申请RFP计划资助开发者

近日&#xff0c;Sui推出了对其资助分配流程的重大改进&#xff1a;引入了项目提案申请&#xff08;Request for Proposals&#xff0c;RFP&#xff09;计划。这个新计划扩展了支持增长Sui生态创新项目的途径&#xff0c;是Sui资助计划的重大发展。 为什么要采用RFP流程&#…

uniapp封装统一请求(get和post)

uniapp封装请求 request.js文件 import Vue from vue // 全局配置 import settings from ./settings.js function computedBaseUrl(url) {// console.log(url);return (url.indexOf(http) -1 ? settings.baseUrl : ) url }// 发送请求 export default (options) > {const…

SpringBoot中集成LiteFlow(轻量、快速、稳定可编排的组件式规则引擎)实现复杂业务解耦、动态编排、高可扩展

场景 在业务开发中&#xff0c;经常遇到一些串行或者并行的业务流程问题&#xff0c;而业务之间不必存在相关性。 使用策略和模板模式的结合可以解决这个问题&#xff0c;但是使用编码的方式会使得文件太多, 在业务的部分环节可以这样操作&#xff0c;在项目角度就无法一眼洞…

【教程】无法验证app需要互联网连接以验证是否信任开发者

摘要 本文将探讨在使用苹果App时遇到无法验证开发者的情况&#xff0c;以及用户可以采取的解决方案。通过检查网络连接、重新操作、验证描述文件等方式来解决无法验证开发者的问题。同时&#xff0c;还介绍了开发者信任设置的步骤&#xff0c;以及使用appuploader工具进行安装…

VMware虚拟机使用Windows共享的文件夹

虚拟机版本为 VMware Workstation 16 Pro:16.2.4&#xff1b;主机位Windows11&#xff1b;记录于2024-03-05   在个人使用时&#xff0c;经常会有一些数据集等大文件重复在不同实验中使用&#xff0c;但是不同系统中来回使用会导致占用虚拟机空间&#xff0c;该博文通过将主机…

蓝桥杯倒计时 38 天

整数二分模板&#xff1a;数的范围 二分的本质不是单调性&#xff0c;而是二分出能满足某种性质使得将整数分成两半。 思考&#xff1a;模板题&#xff0c;模板记熟就能做 #include<iostream> using namespace std; int n,q; const int N 1e510; int a[N]; int main…