JEREMYYYYYYY

From Erica, 2 Months ago, written in R.

Embed

Download Paste or View Raw
Hits: 193

---

title: "HW2"

author: "Erica Slogar"

date: "02-27-24"

output: pdf_document

---

# Set up

Move this script and the `cnty_COVID_cases.csv` dataset into a folder and then create a methods/html/new.html">new RProject in that folder. This step sets the working directory to the correct location to make loading data easier.

Load the tidyverse package here. Remember to hide the package loading messages.

```{r,message=FALSE}

library(tidyverse)

```

Now load the `cnty_COVID_cases.csv` dataset. Remember to hide the data loading messages. This methods/html/is.html">is a dataset from August 2020 about the number of methods/html/new.html">new COVID cases in the past 90days by county. It methods/html/is.html">is an excerpt from my colleague's paper about the [effect of altitude on COVID prevalence](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0245055). Reading the paper is not required for this assignment.

```{r,message=FALSE}

cov_cnty <- read_csv("cnty_COVID_cases.csv")

```

# 1.

## 1A. Which State has the fewest rows (counties)?

> Hint: use `count()` and `arrange()`

```{r}

#summary(cov_cnty)

cov_cnty %>%

count(State) %>%

arrange(n) %>%

head(n = 1)

```

## 1B. Which State has the most rows (counties)?

```{r}

cov_cnty %>%

count(State) %>%

arrange(-n) %>%

head(n = 1)

```

# 2. Show the 12 Counties that have Population over 2 million

> Hint: use `filter()`

```{r}

cov_cnty %>%

filter(Population > 2000000) %>%

select(County_name, Population) %>%

head(n = 12)

```

# 3. Which States have Counties with Population over 2 million

```{r}

cov_cnty %>%

filter(Population > 2000000) %>%

select(State, County_name, Population) %>%

distinct(State)

```

# 4.

## 4A. Create a plot showing Population on X and Confirmed_90d on Y.

```{r}

cov_cnty %>%

ggplot(aes(x = Population, y = Confirmed_90d)) +

geom_point(aes(color = State))

```

## 4B. Describe what you see.

```{r}

# I see a scatter plot where each point represents a confirmed COVID case based

# on population size. Most of the points are concentrated near the origin.

```

# 5.

## 5A. Create the same plot as #4, but using log(Population) as X and log(Confirmed_90d + 1) as Y.

```{r}

cov_cnty %>%

ggplot(aes(x = log(Population), y = log(Confirmed_90d + 1))) +

geom_point(aes(color = State))

```

## 5B. Describe what you see. What was the effect of taking the log?

```{r}

# I see a scatter plot similar to #4, but this graph differs in that the log

# makes the scatter plot linear.

```

## 5C. Why did we +1 in log(Confirmed_90d + 1)?

```{r}

# A log cannot be taken of a zero value, so we add +1 to make the values nonzero.

```

# 6. Create two new variables that show:

## 6A. New confirmed covid cases in the past 90 days per 1000 people (cases_1k) and

```{r}

cov_cnty %>%

mutate(cases_1k = (Confirmed_90d / Population) * 1000,

log_cases_1k = log(cases_1k + 0.1))

covid <- cov_cnty %>%

mutate(cases_1k = (Confirmed_90d / Population) * 1000,

log_cases_1k = log(cases_1k + 0.1))

```

## 6B. Log-transformed cases_1k

> In case we did not get to this in class, the code is here for you. Run it to see what it does.

```{r}

covid %>%

mutate(cases_1k = (Confirmed_90d / Population) * 1000,

log_cases_1k = log(cases_1k + 0.1))

# save the new variables onto the dataset

covid <- covid %>%

mutate(cases_1k = (Confirmed_90d / Population) * 1000,

log_cases_1k = log(cases_1k + 0.1))

```

# 7.

## 7A. The RUCC_2013 variable is us the Rural-Urban Continuum Code, in which more urban counties have lower numbers and more rural counties have higher numbers. Plot a boxplot for each RUCC_2013 category on x and the log-transformed new cases of covid per 1000 people on y (`log_cases_1k`).

> Hint: use `aes(group = RUCC_2013)` inside `geom_boxplot()` to plot RUCC_2013 as groups.

```{r}

covid %>%

ggplot(aes(x = (RUCC_2013), y = (log_cases_1k))) +

geom_boxplot(aes(group = RUCC_2013))

```

## 7B. Describe what you see

```{r}

# I see 9 box plots where the median remains under 2.5k.

```

# 8. When we divided the new cases by the population, that was a form of normalization. What are the benefits and the detriments of this step?

```{r}

# asdf

```

Author

Title

Language

Your paste - Paste your paste here

---
title: &quot;HW2&quot;
author: &quot;Erica Slogar&quot;
date: &quot;02-27-24&quot;
output: pdf_document
---

# Set up

Move this script and the `cnty_COVID_cases.csv` dataset into a folder and then create a new RProject in that folder. This step sets the working directory to the correct location to make loading data easier.

Load the tidyverse package here. Remember to hide the package loading messages.

```{r,message=FALSE}
library(tidyverse)
```

Now load the `cnty_COVID_cases.csv` dataset. Remember to hide the data loading messages. This is a dataset from August 2020 about the number of new COVID cases in the past 90days by county. It is an excerpt from my colleague's paper about the [effect of altitude on COVID prevalence](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0245055). Reading the paper is not required for this assignment.

```{r,message=FALSE}
cov_cnty &lt;- read_csv(&quot;cnty_COVID_cases.csv&quot;)
```

# 1.

## 1A. Which State has the fewest rows (counties)?

&gt; Hint: use `count()` and `arrange()`

```{r}
#summary(cov_cnty)
cov_cnty %&gt;% 
  count(State) %&gt;% 
  arrange(n) %&gt;% 
  head(n = 1)
```

## 1B. Which State has the most rows (counties)?

```{r}
cov_cnty %&gt;% 
  count(State) %&gt;% 
  arrange(-n) %&gt;% 
  head(n = 1)
```

# 2. Show the 12 Counties that have Population over 2 million

&gt; Hint: use `filter()`

```{r}
cov_cnty %&gt;% 
  filter(Population &gt; 2000000) %&gt;% 
  select(County_name, Population) %&gt;% 
  head(n = 12)
```

# 3. Which States have Counties with Population over 2 million

```{r}
cov_cnty %&gt;% 
  filter(Population &gt; 2000000) %&gt;% 
  select(State, County_name, Population) %&gt;% 
  distinct(State)
```

# 4.

## 4A. Create a plot showing Population on X and Confirmed_90d on Y.

```{r}
cov_cnty %&gt;% 
  ggplot(aes(x = Population, y = Confirmed_90d)) +
  geom_point(aes(color = State))
```

## 4B. Describe what you see.

```{r}
# I see a scatter plot where each point represents a confirmed COVID case based 
# on population size. Most of the points are concentrated near the origin.
```

# 5.

## 5A. Create the same plot as #4, but using log(Population) as X and log(Confirmed_90d + 1) as Y.

```{r}
cov_cnty %&gt;% 
  ggplot(aes(x = log(Population), y = log(Confirmed_90d + 1))) +
  geom_point(aes(color = State))
```

## 5B. Describe what you see. What was the effect of taking the log?

```{r}
# I see a scatter plot similar to #4, but this graph differs in that the log 
# makes the scatter plot linear.
```

## 5C. Why did we +1 in log(Confirmed_90d + 1)?

```{r}
# A log cannot be taken of a zero value, so we add +1 to make the values nonzero.
```

# 6. Create two new variables that show:

## 6A. New confirmed covid cases in the past 90 days per 1000 people (cases_1k) and

```{r}
cov_cnty %&gt;%
  mutate(cases_1k = (Confirmed_90d / Population) * 1000,
         log_cases_1k = log(cases_1k + 0.1))

covid &lt;- cov_cnty %&gt;%
  mutate(cases_1k = (Confirmed_90d / Population) * 1000,
         log_cases_1k = log(cases_1k + 0.1))
```

## 6B. Log-transformed cases_1k

&gt; In case we did not get to this in class, the code is here for you. Run it to see what it does.

```{r}
covid %&gt;%
  mutate(cases_1k = (Confirmed_90d / Population) * 1000,
         log_cases_1k = log(cases_1k + 0.1))

# save the new variables onto the dataset
covid &lt;- covid %&gt;%
  mutate(cases_1k = (Confirmed_90d / Population) * 1000,
         log_cases_1k = log(cases_1k + 0.1))
```

# 7.

## 7A. The RUCC_2013 variable is us the Rural-Urban Continuum Code, in which more urban counties have lower numbers and more rural counties have higher numbers. Plot a boxplot for each RUCC_2013 category on x and the log-transformed new cases of covid per 1000 people on y (`log_cases_1k`).

&gt; Hint: use `aes(group = RUCC_2013)` inside `geom_boxplot()` to plot RUCC_2013 as groups.

```{r}
covid %&gt;% 
  ggplot(aes(x = (RUCC_2013), y = (log_cases_1k))) +
  geom_boxplot(aes(group = RUCC_2013))
```

## 7B. Describe what you see

```{r}
# I see 9 box plots where the median remains under 2.5k. 
```

# 8. When we divided the new cases by the population, that was a form of normalization. What are the benefits and the detriments of this step?

```{r}
# asdf
```

Private - Private paste aren't shown in recent listings.

Delete After - When should we delete your paste?

Spam protection -

Reply to "JEREMYYYYYYY"