This function takes a tibble and a specific column. This column is evaluated one observation after the other, and finally gives the best matching date format for the whole column. The best matching format is tested across seven different formats provided by the lubridate library. Along with the format, the percentage of matching is given in the output tibble. The information of the best matching format can be used to mutate a column using as_any_date(). The default format is yyyy-mm-dd.

guess_date_format(tbl, col = NULL)

Arguments

tbl

R object(dataframe or tibble) of the input tbl

col

A character string specifying a column of interest

Value

A tibble with information concerning the best matching date format, given an object to be evaluated.

Details

Contrary to lubridate library or as.Date(), the function evaluates the column as a whole, and does not cast the column if there is ambiguity between values. For example, ('19-07-1983', '02-03-1982') implies that 02 refers to the day and 03 refers to the month, since that order works for the first element, and doesn't otherwise.

Examples

{

library(tidyr)

##### Example 1 -------------------------------------------------------------
# Non-ambiguous dates ----------------------------------------------------
time <-
  tibble(time = c(
  "1983-07-19",
  "2003-01-14",
  "2010-09-29",
  "2023-12-12",
  "2009-09-03",
  "1509-11-30",
  "1809-01-01"))
guess_date_format(time)

##### Example 2 -------------------------------------------------------------
# Ambiguous dates ----------------------------------------------------
time <-
 tibble(time = c(
 "1983-19-07",
 "1983-10-13",
 "2009-09-03",
 "1509-11-30"))
guess_date_format(time)


##### Example 3 -------------------------------------------------------------
# Non date format dates --------------------------------------------------
time <-
  tibble(time = c(
  "198-07-19",
  "200-01-14",
  "201-09-29",
  "202-12-12",
  "2000-09-03",
  "150-11-3d0",
  "180-01-01"))
guess_date_format(time)

}
#> # A tibble: 1 × 4
#>   name_var `Date format` `% values formated` `Date match`   
#>   <chr>    <chr>                       <dbl> <chr>          
#> 1 time     ymd, ydm                     14.3 Ambiguous match