2018-01-10
## -- Attaching packages --------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1 v purrr 0.2.4
## v tibble 1.3.4 v dplyr 0.7.4
## v tidyr 0.7.2 v stringr 1.2.0
## v readr 1.1.1 v forcats 0.2.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
plot of chunk data
plot of chunk plot1
plot of chunk plot1_reduced
popdensity
by state
.county
by state
.
ggplot(data = midwest)
from above.
plot of chunk aesthetic
plot of chunk global_aes
alpha
instead.
colors()
.
plot of chunk smooth
plot of chunk smooth_states
plot of chunk combine_geoms
plot of chunk two_geoms
plot of chunk breaks_x2
## [1] 0
## [1] 35
## [1] 2
## [1] 2
## [1] 4
## [1] 4
## [1] 12
## Error in eval(expr, envir, enclos): object 'Case_sensitive' not found
## [1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078
## [1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078
## [1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078
?function_name
to explore the details of the function. The examples at the bottom of every R help page can be especially helpful.
dplyr
for data manipulationThe dplyr
package uses verbs for common data manipulation tasks. These include:
filter()
count()
arrange()
select()
mutate()
summarise()
https://fivethirtyeight.com/features/both-republicans-and-democrats-have-an-age-problem/
plot of chunk congress
filter
## # A tibble: 555 x 13
## congress chamber bioguide firstname middlename lastname suffix
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 80 house M000112 Joseph Jefferson Mansfield <NA>
## 2 80 house D000448 Robert Lee Doughton <NA>
## 3 80 house S000001 Adolph Joachim Sabath <NA>
## 4 80 house E000023 Charles Aubrey Eaton <NA>
## 5 80 house L000296 William <NA> Lewis <NA>
## 6 80 house G000017 James A. Gallagher <NA>
## 7 80 house W000265 Richard Joseph Welch <NA>
## 8 80 house B000565 Sol <NA> Bloom <NA>
## 9 80 house H000943 Merlin <NA> Hull <NA>
## 10 80 house G000169 Charles Laceille Gifford <NA>
## # ... with 545 more rows, and 6 more variables: birthday <date>,
## # state <chr>, party <chr>, incumbent <lgl>, termstart <date>, age <dbl>
>
<
>=
<=
is.na
function to identify congress members that have missing middlenames.## # A tibble: 102 x 13
## congress chamber bioguide firstname middlename lastname suffix
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 80 senate C000133 Arthur <NA> Capper <NA>
## 2 80 senate G000418 Theodore Francis Green <NA>
## 3 80 senate M000499 Kenneth Douglas McKellar <NA>
## 4 80 senate R000112 Clyde Martin Reed <NA>
## 5 80 senate M000895 Edward Hall Moore <NA>
## 6 80 senate O000146 John Holmes Overton <NA>
## 7 80 senate M001108 James Edward Murray <NA>
## 8 80 senate M000308 Patrick Anthony McCarran <NA>
## 9 80 senate T000165 Elmer <NA> Thomas <NA>
## 10 80 senate W000021 Robert Ferdinand Wagner <NA>
## # ... with 92 more rows, and 6 more variables: birthday <date>,
## # state <chr>, party <chr>, incumbent <lgl>, termstart <date>, age <dbl>
## # A tibble: 102 x 13
## congress chamber bioguide firstname middlename lastname suffix
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 80 senate C000133 Arthur <NA> Capper <NA>
## 2 80 senate G000418 Theodore Francis Green <NA>
## 3 80 senate M000499 Kenneth Douglas McKellar <NA>
## 4 80 senate R000112 Clyde Martin Reed <NA>
## 5 80 senate M000895 Edward Hall Moore <NA>
## 6 80 senate O000146 John Holmes Overton <NA>
## 7 80 senate M001108 James Edward Murray <NA>
## 8 80 senate M000308 Patrick Anthony McCarran <NA>
## 9 80 senate T000165 Elmer <NA> Thomas <NA>
## 10 80 senate W000021 Robert Ferdinand Wagner <NA>
## # ... with 92 more rows, and 6 more variables: birthday <date>,
## # state <chr>, party <chr>, incumbent <lgl>, termstart <date>, age <dbl>
## # A tibble: 1,112 x 13
## congress chamber bioguide firstname middlename lastname suffix
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 80 house M000112 Joseph Jefferson Mansfield <NA>
## 2 80 house D000448 Robert Lee Doughton <NA>
## 3 80 house S000001 Adolph Joachim Sabath <NA>
## 4 80 house E000023 Charles Aubrey Eaton <NA>
## 5 80 house L000296 William <NA> Lewis <NA>
## 6 80 house G000017 James A. Gallagher <NA>
## 7 80 house W000265 Richard Joseph Welch <NA>
## 8 80 house B000565 Sol <NA> Bloom <NA>
## 9 80 house H000943 Merlin <NA> Hull <NA>
## 10 80 house G000169 Charles Laceille Gifford <NA>
## # ... with 1,102 more rows, and 6 more variables: birthday <date>,
## # state <chr>, party <chr>, incumbent <lgl>, termstart <date>, age <dbl>
%in%
## # A tibble: 1,112 x 13
## congress chamber bioguide firstname middlename lastname suffix
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 80 house M000112 Joseph Jefferson Mansfield <NA>
## 2 80 house D000448 Robert Lee Doughton <NA>
## 3 80 house S000001 Adolph Joachim Sabath <NA>
## 4 80 house E000023 Charles Aubrey Eaton <NA>
## 5 80 house L000296 William <NA> Lewis <NA>
## 6 80 house G000017 James A. Gallagher <NA>
## 7 80 house W000265 Richard Joseph Welch <NA>
## 8 80 house B000565 Sol <NA> Bloom <NA>
## 9 80 house H000943 Merlin <NA> Hull <NA>
## 10 80 house G000169 Charles Laceille Gifford <NA>
## # ... with 1,102 more rows, and 6 more variables: birthday <date>,
## # state <chr>, party <chr>, incumbent <lgl>, termstart <date>, age <dbl>
## # A tibble: 18,080 x 13
## congress chamber bioguide firstname middlename lastname suffix
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 81 house D000448 Robert Lee Doughton <NA>
## 2 81 house S000001 Adolph Joachim Sabath <NA>
## 3 81 house E000023 Charles Aubrey Eaton <NA>
## 4 81 house W000265 Richard Joseph Welch <NA>
## 5 81 house B000565 Sol <NA> Bloom <NA>
## 6 81 house H000943 Merlin <NA> Hull <NA>
## 7 81 house B000545 Schuyler Otis Bland <NA>
## 8 81 house K000138 John Hosea Kerr <NA>
## 9 81 house C000932 Robert <NA> Crosser <NA>
## 10 81 house K000039 John <NA> Kee <NA>
## # ... with 18,070 more rows, and 6 more variables: birthday <date>,
## # state <chr>, party <chr>, incumbent <lgl>, termstart <date>, age <dbl>
## # A tibble: 453 x 13
## congress chamber bioguide firstname middlename lastname suffix
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 80 house M000112 Joseph Jefferson Mansfield <NA>
## 2 80 house D000448 Robert Lee Doughton <NA>
## 3 80 house S000001 Adolph Joachim Sabath <NA>
## 4 80 house E000023 Charles Aubrey Eaton <NA>
## 5 80 house L000296 William <NA> Lewis <NA>
## 6 80 house G000017 James A. Gallagher <NA>
## 7 80 house W000265 Richard Joseph Welch <NA>
## 8 80 house B000565 Sol <NA> Bloom <NA>
## 9 80 house H000943 Merlin <NA> Hull <NA>
## 10 80 house G000169 Charles Laceille Gifford <NA>
## # ... with 443 more rows, and 6 more variables: birthday <date>,
## # state <chr>, party <chr>, incumbent <lgl>, termstart <date>, age <dbl>
## # A tibble: 15,698 x 13
## congress chamber bioguide firstname middlename lastname suffix
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 80 house M000112 Joseph Jefferson Mansfield <NA>
## 2 80 house D000448 Robert Lee Doughton <NA>
## 3 80 house S000001 Adolph Joachim Sabath <NA>
## 4 80 house E000023 Charles Aubrey Eaton <NA>
## 5 80 house W000265 Richard Joseph Welch <NA>
## 6 80 house B000565 Sol <NA> Bloom <NA>
## 7 80 house H000943 Merlin <NA> Hull <NA>
## 8 80 house G000169 Charles Laceille Gifford <NA>
## 9 80 house B000545 Schuyler Otis Bland <NA>
## 10 80 house R000358 John Marshall Robsion <NA>
## # ... with 15,688 more rows, and 6 more variables: birthday <date>,
## # state <chr>, party <chr>, incumbent <lgl>, termstart <date>, age <dbl>
count
## # A tibble: 6 x 2
## party n
## <chr> <int>
## 1 AL 3
## 2 D 10290
## 3 I 63
## 4 ID 4
## 5 L 1
## 6 R 8274
## # A tibble: 2 x 2
## incumbent n
## <lgl> <int>
## 1 FALSE 2937
## 2 TRUE 15698
arrange
## # A tibble: 18,635 x 13
## congress chamber bioguide firstname middlename lastname suffix
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 80 house B000201 Edward Lewis Bartlett <NA>
## 2 81 house B000201 Edward Lewis Bartlett <NA>
## 3 82 house B000201 Edward Lewis Bartlett <NA>
## 4 83 house B000201 Edward Lewis Bartlett <NA>
## 5 84 house B000201 Edward Lewis Bartlett <NA>
## 6 85 house B000201 Edward Lewis Bartlett <NA>
## 7 86 house R000282 Ralph Julian Rivers <NA>
## 8 86 senate G000508 Ernest <NA> Gruening <NA>
## 9 86 senate B000201 Edward Lewis Bartlett <NA>
## 10 87 house R000282 Ralph Julian Rivers <NA>
## # ... with 18,625 more rows, and 6 more variables: birthday <date>,
## # state <chr>, party <chr>, incumbent <lgl>, termstart <date>, age <dbl>
## # A tibble: 18,635 x 13
## congress chamber bioguide firstname middlename lastname suffix
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 80 house B000201 Edward Lewis Bartlett <NA>
## 2 81 house B000201 Edward Lewis Bartlett <NA>
## 3 82 house B000201 Edward Lewis Bartlett <NA>
## 4 83 house B000201 Edward Lewis Bartlett <NA>
## 5 84 house B000201 Edward Lewis Bartlett <NA>
## 6 85 house B000201 Edward Lewis Bartlett <NA>
## 7 86 house R000282 Ralph Julian Rivers <NA>
## 8 86 senate G000508 Ernest <NA> Gruening <NA>
## 9 86 senate B000201 Edward Lewis Bartlett <NA>
## 10 87 house R000282 Ralph Julian Rivers <NA>
## # ... with 18,625 more rows, and 6 more variables: birthday <date>,
## # state <chr>, party <chr>, incumbent <lgl>, termstart <date>, age <dbl>
## # A tibble: 18,635 x 13
## congress chamber bioguide firstname middlename lastname suffix
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 113 house H000067 Ralph M. Hall <NA>
## 2 113 house D000355 John D. Dingell <NA>
## 3 113 house C000714 John <NA> Conyers Jr.
## 4 113 house S000480 Louise McIntosh Slaughter <NA>
## 5 113 house R000053 Charles B. Rangel <NA>
## 6 113 house J000174 Sam Robert Johnson <NA>
## 7 113 house Y000031 C. W. Bill Young <NA>
## 8 113 house C000556 Howard <NA> Coble <NA>
## 9 113 house L000263 Sander M. Levin <NA>
## 10 113 house Y000033 Don E. Young <NA>
## # ... with 18,625 more rows, and 6 more variables: birthday <date>,
## # state <chr>, party <chr>, incumbent <lgl>, termstart <date>, age <dbl>
select
## # A tibble: 18,635 x 4
## congress chamber party age
## <int> <chr> <chr> <dbl>
## 1 80 house D 85.9
## 2 80 house D 83.2
## 3 80 house D 80.7
## 4 80 house R 78.8
## 5 80 house R 78.3
## 6 80 house R 78.0
## 7 80 house R 77.9
## 8 80 house D 76.8
## 9 80 house R 76.0
## 10 80 house R 75.8
## # ... with 18,625 more rows
starts_with()
ends_with()
contains()
matches()
num_range()
:
everything()
starts_with
helper## # A tibble: 18,635 x 2
## suffix state
## <chr> <chr>
## 1 <NA> TX
## 2 <NA> NC
## 3 <NA> IL
## 4 <NA> NJ
## 5 <NA> KY
## 6 <NA> PA
## 7 <NA> CA
## 8 <NA> NY
## 9 <NA> WI
## 10 <NA> MA
## # ... with 18,625 more rows
## # A tibble: 18,635 x 3
## firstname middlename lastname
## <chr> <chr> <chr>
## 1 Joseph Jefferson Mansfield
## 2 Robert Lee Doughton
## 3 Adolph Joachim Sabath
## 4 Charles Aubrey Eaton
## 5 William <NA> Lewis
## 6 James A. Gallagher
## 7 Richard Joseph Welch
## 8 Sol <NA> Bloom
## 9 Merlin <NA> Hull
## 10 Charles Laceille Gifford
## # ... with 18,625 more rows
## # A tibble: 18,635 x 8
## congress chamber bioguide firstname middlename lastname suffix
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 80 house M000112 Joseph Jefferson Mansfield <NA>
## 2 80 house D000448 Robert Lee Doughton <NA>
## 3 80 house S000001 Adolph Joachim Sabath <NA>
## 4 80 house E000023 Charles Aubrey Eaton <NA>
## 5 80 house L000296 William <NA> Lewis <NA>
## 6 80 house G000017 James A. Gallagher <NA>
## 7 80 house W000265 Richard Joseph Welch <NA>
## 8 80 house B000565 Sol <NA> Bloom <NA>
## 9 80 house H000943 Merlin <NA> Hull <NA>
## 10 80 house G000169 Charles Laceille Gifford <NA>
## # ... with 18,625 more rows, and 1 more variables: birthday <date>
## # A tibble: 18,635 x 8
## congress bioguide middlename lastname suffix birthday termstart
## <int> <chr> <chr> <chr> <chr> <date> <date>
## 1 80 M000112 Jefferson Mansfield <NA> 1861-02-09 1947-01-03
## 2 80 D000448 Lee Doughton <NA> 1863-11-07 1947-01-03
## 3 80 S000001 Joachim Sabath <NA> 1866-04-04 1947-01-03
## 4 80 E000023 Aubrey Eaton <NA> 1868-03-29 1947-01-03
## 5 80 L000296 <NA> Lewis <NA> 1868-09-22 1947-01-03
## 6 80 G000017 A. Gallagher <NA> 1869-01-16 1947-01-03
## 7 80 W000265 Joseph Welch <NA> 1869-02-13 1947-01-03
## 8 80 B000565 <NA> Bloom <NA> 1870-03-09 1947-01-03
## 9 80 H000943 <NA> Hull <NA> 1870-12-18 1947-01-03
## 10 80 G000169 Laceille Gifford <NA> 1871-03-15 1947-01-03
## # ... with 18,625 more rows, and 1 more variables: age <dbl>
everything
## # A tibble: 18,635 x 13
## congress chamber incumbent age bioguide firstname middlename
## <int> <chr> <lgl> <dbl> <chr> <chr> <chr>
## 1 80 house TRUE 85.9 M000112 Joseph Jefferson
## 2 80 house TRUE 83.2 D000448 Robert Lee
## 3 80 house TRUE 80.7 S000001 Adolph Joachim
## 4 80 house TRUE 78.8 E000023 Charles Aubrey
## 5 80 house FALSE 78.3 L000296 William <NA>
## 6 80 house FALSE 78.0 G000017 James A.
## 7 80 house TRUE 77.9 W000265 Richard Joseph
## 8 80 house TRUE 76.8 B000565 Sol <NA>
## 9 80 house TRUE 76.0 H000943 Merlin <NA>
## 10 80 house TRUE 75.8 G000169 Charles Laceille
## # ... with 18,625 more rows, and 6 more variables: lastname <chr>,
## # suffix <chr>, birthday <date>, state <chr>, party <chr>,
## # termstart <date>
rename
function## # A tibble: 18,635 x 13
## congress chamber bioguide first_name middlename last_name suffix
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 80 house M000112 Joseph Jefferson Mansfield <NA>
## 2 80 house D000448 Robert Lee Doughton <NA>
## 3 80 house S000001 Adolph Joachim Sabath <NA>
## 4 80 house E000023 Charles Aubrey Eaton <NA>
## 5 80 house L000296 William <NA> Lewis <NA>
## 6 80 house G000017 James A. Gallagher <NA>
## 7 80 house W000265 Richard Joseph Welch <NA>
## 8 80 house B000565 Sol <NA> Bloom <NA>
## 9 80 house H000943 Merlin <NA> Hull <NA>
## 10 80 house G000169 Charles Laceille Gifford <NA>
## # ... with 18,625 more rows, and 6 more variables: birthday <date>,
## # state <chr>, party <chr>, incumbent <lgl>, termstart <date>, age <dbl>
dplyr
helper functions, select all the variables that start with the letter ‘c’.num_range
function.mutate
## # A tibble: 18,635 x 6
## congress chamber state party democrat num_democrat
## <int> <chr> <chr> <chr> <dbl> <dbl>
## 1 80 house TX D 1 10290
## 2 80 house NC D 1 10290
## 3 80 house IL D 1 10290
## 4 80 house NJ R 0 10290
## 5 80 house KY R 0 10290
## 6 80 house PA R 0 10290
## 7 80 house CA R 0 10290
## 8 80 house NY D 1 10290
## 9 80 house WI R 0 10290
## 10 80 house MA R 0 10290
## # ... with 18,625 more rows
diamonds
data, use ?diamonds
for more information on the data, use the mutate
function to calculate the price per carat. Hint, this operation would involve standardizing the price variable so that all are comparable at 1 carat.mutate
, calculate the rank of the original price variable and the new price variable calculated above using the min_rank
function. Are there differences in the ranking of the prices?summarise
## # A tibble: 1 x 1
## num_democrat
## <dbl>
## 1 10290
group_by
## # A tibble: 34 x 4
## congress num_democrat total prop_democrat
## <int> <dbl> <int> <dbl>
## 1 80 247 555 0.4450450
## 2 81 330 557 0.5924596
## 3 82 292 555 0.5261261
## 4 83 274 557 0.4919210
## 5 84 288 544 0.5294118
## 6 85 295 547 0.5393053
## 7 86 356 554 0.6425993
## 8 87 339 559 0.6064401
## 9 88 332 552 0.6014493
## 10 89 371 548 0.6770073
## # ... with 24 more rows
plot of chunk trend2
summarise
command above to calculate these values.sum(democrat)
above, we used mean(democrat)
, what does this value return? Why does it return this value?group_by
with mutate
group_by
with mutate
output## # A tibble: 18,635 x 8
## # Groups: congress [34]
## congress chamber state party democrat num_democrat total prop_democrat
## <int> <chr> <chr> <chr> <dbl> <dbl> <int> <dbl>
## 1 80 house TX D 1 247 555 0.445045
## 2 80 house NC D 1 247 555 0.445045
## 3 80 house IL D 1 247 555 0.445045
## 4 80 house NJ R 0 247 555 0.445045
## 5 80 house KY R 0 247 555 0.445045
## 6 80 house PA R 0 247 555 0.445045
## 7 80 house CA R 0 247 555 0.445045
## 8 80 house NY D 1 247 555 0.445045
## 9 80 house WI R 0 247 555 0.445045
## 10 80 house MA R 0 247 555 0.445045
## # ... with 18,625 more rows
%>%
is the answerpipe_congress <- congress_age %>%
filter(congress >= 100) %>%
mutate(democrat = ifelse(party == 'D', 1, 0)) %>%
group_by(congress, chamber) %>%
summarise(
num_democrat = sum(democrat),
total = n(),
prop_democrat = num_democrat / total
)
nested_congress <- summarise(
group_by(
mutate(
filter(
congress_age, congress >= 100
),
democrat = ifelse(party == 'D', 1, 0)
),
congress, chamber
),
num_democrat = sum(democrat),
total = n(),
prop_democrat = num_democrat / total
)
identical(pipe_congress, nested_congress)
## [1] TRUE
summarise(
group_by(
mutate(
filter(
diamonds,
color %in% c('D', 'E', 'F') & cut %in% c('Fair', 'Good', 'Very Good')
),
f_color = ifelse(color == 'F', 1, 0),
vg_cut = ifelse(cut == 'Very Good', 1, 0)
),
clarity
),
avg = mean(carat),
sd = sd(carat),
avg_p = mean(price),
num = n(),
summary_f_color = mean(f_color),
summary_vg_cut = mean(vg_cut)
)
There are a set of functions that can greatly simplify data operations. These functions end with:
*_if
*_each
*_all
*_at
## # A tibble: 18,635 x 13
## congress CHAMBER BIOGUIDE FIRSTNAME MIDDLENAME LASTNAME SUFFIX
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 80 house M000112 Joseph Jefferson Mansfield <NA>
## 2 80 house D000448 Robert Lee Doughton <NA>
## 3 80 house S000001 Adolph Joachim Sabath <NA>
## 4 80 house E000023 Charles Aubrey Eaton <NA>
## 5 80 house L000296 William <NA> Lewis <NA>
## 6 80 house G000017 James A. Gallagher <NA>
## 7 80 house W000265 Richard Joseph Welch <NA>
## 8 80 house B000565 Sol <NA> Bloom <NA>
## 9 80 house H000943 Merlin <NA> Hull <NA>
## 10 80 house G000169 Charles Laceille Gifford <NA>
## # ... with 18,625 more rows, and 6 more variables: birthday <date>,
## # STATE <chr>, PARTY <chr>, incumbent <lgl>, termstart <date>, age <dbl>
## # A tibble: 43 x 5
## # Groups: species [?]
## species gender height mass birth_year
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Aleena male 79 15.00000 NaN
## 2 Besalisk male 198 102.00000 NaN
## 3 Cerean male 198 82.00000 92.0
## 4 Chagrian male 196 NaN NaN
## 5 Clawdite female 168 55.00000 NaN
## 6 Droid none 200 140.00000 15.0
## 7 Droid <NA> 120 46.33333 72.5
## 8 Dug male 112 40.00000 NaN
## 9 Ewok male 88 20.00000 8.0
## 10 Geonosian male 183 80.00000 NaN
## # ... with 33 more rows
## # A tibble: 43 x 9
## # Groups: species [38]
## species gender height_mean mass_mean birth_year_mean height_median
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Aleena male 79 15.00000 NaN 79
## 2 Besalisk male 198 102.00000 NaN 198
## 3 Cerean male 198 82.00000 92.0 198
## 4 Chagrian male 196 NaN NaN 196
## 5 Clawdite female 168 55.00000 NaN 168
## 6 Droid none 200 140.00000 15.0 200
## 7 Droid <NA> 120 46.33333 72.5 97
## 8 Dug male 112 40.00000 NaN 112
## 9 Ewok male 88 20.00000 8.0 88
## 10 Geonosian male 183 80.00000 NaN 183
## # ... with 33 more rows, and 3 more variables: mass_median <dbl>,
## # birth_year_median <dbl>, n <int>
## Parsed with column specification:
## cols(
## `Date / Time` = col_character(),
## City = col_character(),
## State = col_character(),
## Shape = col_character(),
## Duration = col_character(),
## Summary = col_character(),
## Posted = col_character()
## )
## # A tibble: 8,031 x 7
## `Date / Time` City State Shape Duration
## <chr> <chr> <chr> <chr> <chr>
## 1 12/12/14 17:30 North Wales PA Triangle 5 minutes
## 2 12/12/14 12:40 Cartersville GA Unknown 3.6 minutes
## 3 12/12/14 06:30 Isle of Man (UK/England) <NA> Light 2 seconds
## 4 12/12/14 01:00 Miamisburg OH Changing <NA>
## 5 12/12/14 00:00 Spotsylvania VA Unknown 1 minute
## 6 12/11/14 23:25 Kenner LA Chevron ~1 minute
## 7 12/11/14 23:15 Eugene OR Disk 2 minutes
## 8 12/11/14 20:04 Phoenix AZ Chevron 3 minutes
## 9 12/11/14 20:00 Franklin NC Disk 5 minutes
## 10 12/11/14 18:30 Longview WA Cylinder 10 seconds
## # ... with 8,021 more rows, and 2 more variables: Summary <chr>,
## # Posted <chr>
## Parsed with column specification:
## cols(
## `Date/Time` = col_character(),
## City = col_character(),
## State = col_character(),
## Shape = col_character(),
## Duration = col_character(),
## Summary = col_character(),
## Posted = col_character()
## )
## # A tibble: 8,031 x 7
## `Date/Time` City State Shape Duration
## <chr> <chr> <chr> <chr> <chr>
## 1 12/12/14 17:30 North Wales PA Triangle 5 minutes
## 2 12/12/14 12:40 Cartersville GA Unknown 3.6 minutes
## 3 12/12/14 06:30 Isle of Man (UK/England) <NA> Light 2 seconds
## 4 12/12/14 01:00 Miamisburg OH Changing <NA>
## 5 12/12/14 00:00 Spotsylvania VA Unknown 1 minute
## 6 12/11/14 23:25 Kenner LA Chevron ~1 minute
## 7 12/11/14 23:15 Eugene OR Disk 2 minutes
## 8 12/11/14 20:04 Phoenix AZ Chevron 3 minutes
## 9 12/11/14 20:00 Franklin NC Disk 5 minutes
## 10 12/11/14 18:30 Longview WA Cylinder 10 seconds
## # ... with 8,021 more rows, and 2 more variables: Summary <chr>,
## # Posted <chr>
## Warning in rbind(names(probs), probs_f): number of columns of result is not
## a multiple of vector length (arg 1)
## Warning: 56 parsing failures.
## row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 119 Date / Time date like %m/%d/%y %H:%M 12/1/14 'Data/ufo.csv' file 2 194 Date / Time date like %m/%d/%y %H:%M 11/27/14 'Data/ufo.csv' row 3 236 Date / Time date like %m/%d/%y %H:%M 11/24/14 'Data/ufo.csv' col 4 407 Date / Time date like %m/%d/%y %H:%M 11/15/14 'Data/ufo.csv' expected 5 665 Date / Time date like %m/%d/%y %H:%M 10/31/14 'Data/ufo.csv'
## ... ................. ... .................................................................... ........ .................................................................... ...... .................................................................... .... .................................................................... ... .................................................................... ... .................................................................... ........ ....................................................................
## See problems(...) for more details.
## # A tibble: 8,031 x 7
## `Date / Time` City State Shape Duration
## <dttm> <chr> <chr> <chr> <chr>
## 1 2014-12-12 17:30:00 North Wales PA Triangle 5 minutes
## 2 2014-12-12 12:40:00 Cartersville GA Unknown 3.6 minutes
## 3 2014-12-12 06:30:00 Isle of Man (UK/England) <NA> Light 2 seconds
## 4 2014-12-12 01:00:00 Miamisburg OH Changing <NA>
## 5 2014-12-12 00:00:00 Spotsylvania VA Unknown 1 minute
## 6 2014-12-11 23:25:00 Kenner LA Chevron ~1 minute
## 7 2014-12-11 23:15:00 Eugene OR Disk 2 minutes
## 8 2014-12-11 20:04:00 Phoenix AZ Chevron 3 minutes
## 9 2014-12-11 20:00:00 Franklin NC Disk 5 minutes
## 10 2014-12-11 18:30:00 Longview WA Cylinder 10 seconds
## # ... with 8,021 more rows, and 2 more variables: Summary <chr>,
## # Posted <chr>
## # A tibble: 56 x 5
## row col expected actual file
## <int> <chr> <chr> <chr> <chr>
## 1 119 Date / Time date like %m/%d/%y %H:%M 12/1/14 'Data/ufo.csv'
## 2 194 Date / Time date like %m/%d/%y %H:%M 11/27/14 'Data/ufo.csv'
## 3 236 Date / Time date like %m/%d/%y %H:%M 11/24/14 'Data/ufo.csv'
## 4 407 Date / Time date like %m/%d/%y %H:%M 11/15/14 'Data/ufo.csv'
## 5 665 Date / Time date like %m/%d/%y %H:%M 10/31/14 'Data/ufo.csv'
## 6 797 Date / Time date like %m/%d/%y %H:%M 10/25/14 'Data/ufo.csv'
## 7 946 Date / Time date like %m/%d/%y %H:%M 10/19/14 'Data/ufo.csv'
## 8 1081 Date / Time date like %m/%d/%y %H:%M 10/14/14 'Data/ufo.csv'
## 9 1122 Date / Time date like %m/%d/%y %H:%M 10/12/14 'Data/ufo.csv'
## 10 1123 Date / Time date like %m/%d/%y %H:%M 10/12/14 'Data/ufo.csv'
## # ... with 46 more rows
read_tsv
read_fwf
read_table
read_delim
file.choose()
. For example, read_tsv(file.choose())
.
read_excel
## # A tibble: 891 x 12
## PassengerId Survived Pclass
## <dbl> <dbl> <dbl>
## 1 1 0 3
## 2 2 1 1
## 3 3 1 3
## 4 4 1 1
## 5 5 0 3
## 6 6 0 3
## 7 7 0 1
## 8 8 0 3
## 9 9 1 3
## 10 10 1 2
## # ... with 881 more rows, and 9 more variables: Name <chr>, Sex <chr>,
## # Age <dbl>, SibSp <dbl>, Parch <dbl>, Ticket <chr>, Fare <dbl>,
## # Cabin <chr>, Embarked <chr>