#### r boxplot outliers identify

One of the easiest ways to identify outliers in R is by visualizing them in boxplots. These outliers are observations that are at least 1.5 times the interquartile range (Q3 – Q1) from the edge of the box. There are two categories of outlier: (1) outliers and (2) extreme points. Values above Q3 + 1.5xIQR or below Q1 - 1.5xIQR are considered as outliers. Values above Q3 + 3xIQR or below Q1 - 3xIQR are considered as extreme points (or extreme outliers). Q1 and Q3 are the first and third quartile, respectively. The function uses the same criteria to identify outliers as the one used for box plots. The outliers package provides a number of useful functions to systematically extract outliers. Some of these are convenient and come handy, especially the outlier() and scores() functions. Identifying outliers in R with ggplot2. One of the first steps when working with a fresh data set is to plot its values to identify patterns and outliers. To better understand the implications of outliers better, I am going to compare the fit of a simple linear regression model on cars dataset with and without outliers. Therefore, one of the most important task in data analysis is to identify and (if is necessary) to remove the outliers. On boxplots, Minitab uses an asterisk (*) symbol to identify outliers. The function geom_boxplot() is used. Hiding the outliers can be achieved by setting outlier.shape = NA. IQR is the interquartile range (IQR = Q3 - Q1). Boxplot() (Uppercase B !) is built on the base boxplot() function but has more options, specifically the possibility to label outliers. 