class: center <br> # {Sknifedatar} 📦 **Un paquete para el modelado y visualización de múltiples series de tiempo** <br> <img src="data:image/png;base64,#images/logo_latinr.png" width="15%" height="15%" style="display: block; margin: auto;" /> <br> Rafael Zambrano & Karina Bartolomé Octubre 2022 --- <style type="text/css"> /* Table width = 100% max-width */ .remark-slide table{width: 100%;} /* Change the background color to white for shaded rows (even rows) */ .remark-slide thead, .remark-slide tr:nth-child(2n) { background-color: white; .tfoot .td {background-color: white} } .bold-last-item > ul > li:last-of-type, .bold-last-item > ol > li:last-of-type {font-weight: bold;} </style> # ¿Quiénes somos? .pull-left[ ### Rafael Zambrano - **Actuary / Data Scientist** ] .pull-right[ <br> <br> <img src="data:image/png;base64,#images/imagen_b.jpeg" width="35%" style="display: block; margin: auto;" /> ] .pull-left[ ### Karina Bartolomé - **Economist / Data Scientist** ] .pull-right[ <br> <br> <img src="data:image/png;base64,#images/imagen_a.jpeg" width="35%" style="display: block; margin: auto;" /> ] --- ## Ecosistema {tidymodels-modeltime} {modeltime} fue desarrollado por **Matt Dancho** para realizar análisis de series de tiempo mediante un enfoque ordenado (o Tidy) con {tidymodels} 📦. <img src="data:image/png;base64,#images/modeltime.png" width="90%" height="70%" style="display: block; margin: auto;" /> --- # {sknifedatar} 📦 Cómo surge sknifedatar? -- #### Una extensión de **{modeltime}** ```r install.packages('sknifedatar') ``` <img src="data:image/png;base64,#images/sknifedatar.png" width="25%" height="35%" style="display: block; margin: auto;" /> --- # {sknifedatar} 📦 ### Funcionalidades - **Ajustes múltiples**: {Modeltime} para múltiples series de tiempo (sin datos de panel) - **Adaptación de workflowsets al modelado de series de tiempo**: Múltiples modelos y recetas de preprocesamiento con {tidymodels} - **Automagic tabs**: Generación automática de tabs para presentaciòn de resultados (rmarkdown) --- # Objetivo de la Charla Presentar el paquete **{skinifedatar}** 📦 para el modelado y visualización de series de tiempo, tomando como caso de uso datos de viajes en transporte público, Argentina (2021-2022). Se muestra la compatibilidad con diversos componentes del ecosistema **{tidymodels}** como **{modeltime}** y **{workflosets}**. --- # Agenda #### ✅ Caso de uso #### ✅ Múltiples modelos en una serie #### ✅ Múltiples modelos en múltiples series #### ✅ Bonus track --- class: chapter-slide # Caso de Uso --- # Caso de Uso .pull-left[ <br> <br> <br> <br> <br> <br> <img src="data:image/png;base64,#https://upload.wikimedia.org/wikipedia/commons/thumb/1/14/SUBE_frente.svg/413px-SUBE_frente.svg.png" width="35%" style="display: block; margin: auto;" /> ] .pull-right[ <br> - El **Sistema Único de Boleto Electrónico (SUBE)** es una tarjeta para abonar medios de transporte público en Argentina. - Los viajes realizados quedan registrados en un **portal de datos abiertos**. - Se busca **proyectar** cómo será el uso de diversas líneas en los próximos días o meses. ] --- # Caso de Uso ### Datos 📊 Se utilizan datos de [SUBE - Cantidad de transacciones (usos) por fecha ](https://datos.gob.ar/dataset/transporte-sube---cantidad-transacciones-usos-por-fecha). -- ```r data <- read_csv('data/df_sube.csv') %>% rename(date=fecha, value=n) ``` -- <table> <thead> <tr> <th style="text-align:left;"> date </th> <th style="text-align:left;"> linea </th> <th style="text-align:right;"> value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2021-01-01 </td> <td style="text-align:left;"> BSAS_LINEA_203 </td> <td style="text-align:right;"> 20189 </td> </tr> <tr> <td style="text-align:left;"> 2021-01-01 </td> <td style="text-align:left;"> BSAS_LINEA_501G </td> <td style="text-align:right;"> 25760 </td> </tr> <tr> <td style="text-align:left;"> 2021-01-01 </td> <td style="text-align:left;"> FFCC ROCA </td> <td style="text-align:right;"> 25576 </td> </tr> <tr> <td style="text-align:left;"> 2021-01-01 </td> <td style="text-align:left;"> FFCC SAR </td> <td style="text-align:right;"> 4049 </td> </tr> </tbody> </table> --- # Caso de Uso * **Evolución de las series 📈** ```r data %>% group_by(linea) %>% plot_time_series(date, value) ``` ![](data:image/png;base64,#slides_files/figure-html/unnamed-chunk-16-1.png)<!-- --> --- class: chapter-slide # Múltiples modelos sobre una serie <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <div:<span style="color:white"> Introducción a modeltime</span> --- # Múltiples modelos sobre una serie * **Seleccion de datos**: Se selecciona el departamento **Ferrocarril Roca** ```r data_ffcc_roca <- data %>% filter(linea=='FFCC ROCA') %>% select(-linea) %>% ungroup() ``` ```r data_ffcc_roca %>% head(5) %>% kableExtra::kable(format = "html") ``` <table> <thead> <tr> <th style="text-align:left;"> date </th> <th style="text-align:right;"> value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2021-01-01 </td> <td style="text-align:right;"> 25576 </td> </tr> <tr> <td style="text-align:left;"> 2021-01-02 </td> <td style="text-align:right;"> 112401 </td> </tr> <tr> <td style="text-align:left;"> 2021-01-03 </td> <td style="text-align:right;"> 58265 </td> </tr> <tr> <td style="text-align:left;"> 2021-01-04 </td> <td style="text-align:right;"> 231781 </td> </tr> <tr> <td style="text-align:left;"> 2021-01-05 </td> <td style="text-align:right;"> 233965 </td> </tr> </tbody> </table> --- ## Múltiples modelos sobre una serie * **Partición de datos**: Se particiona el dataset en train y test ✂️ ```r splits <- data_ffcc_roca %>% initial_time_split(prop = 0.9) ``` ```r splits %>% tk_time_series_cv_plan() %>% plot_time_series_cv_plan(date, value, .title='Partición temporal') ```
--- ## Múltiples modelos sobre una serie * **Preprocesamiento / Recetas 🧁**: Se crea una receta de preprocesamiento, incluye la fórmula a estimar y un paso adicional que añade variables en función de la fecha. ```r receta <- recipe(value ~ date, data = training(splits)) %>% step_date(date, features = c('week', 'month','year','quarter','semester')) ```
date
value
date_week
date_month
date_year
date_quarter
date_semester
2021-01-01
25576
1
Jan
2021
1
1
--- ## Múltiples modelos sobre una serie * **Modelos**: Definición y ajuste de modelos sobre train ```r # Modelo: Auto-ARIMA m_autoarima <- arima_reg() %>% set_engine('auto_arima') %>% fit(value~date, data=training(splits)) ``` ```r # Modelo: regresión lineal m_reg_lineal <- linear_reg() %>% set_engine("lm") %>% fit(value ~ as.numeric(date) + factor(month(date, label = TRUE), ordered = FALSE)+ factor(wday(date, label=TRUE), ordered=FALSE), data = training(splits)) ``` ```r # Workflow: prophet boosted m_prophet_boost <- workflow() %>% add_recipe(receta) %>% add_model(prophet_boost(mode='regression') %>% set_engine("prophet_xgboost")) %>% fit(data = training(splits)) ``` --- ## Múltiples modelos sobre una serie * **Modeltime Table** El objeto central del ecosistema **{modeltime}** 📦 es el **modeltime_table**, el cual incluye todos los modelos entrenados para realizar comparaciones. ```r modelos <- modeltime_table(m_autoarima, m_reg_lineal, m_prophet_boost ) ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":[".model_id"],"name":[1],"type":["int"],"align":["right"]},{"label":[".model"],"name":[2],"type":["list"],"align":["right"]},{"label":[".model_desc"],"name":[3],"type":["chr"],"align":["left"]}],"data":[{"1":"1","2":"<S3: _auto_arima_fit_impl>","3":"ARIMA(2,0,1)(0,1,2)[7] WITH DRIFT"},{"1":"2","2":"<S3: _lm>","3":"LM"},{"1":"3","2":"<S3: workflow>","3":"PROPHET W/ XGBOOST ERRORS"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[6],"max":[6]},"pages":{}}} </script> </div> --- ## Múltiples modelos sobre una serie * **Métricas en Test** ```r calibration_table <- modelos %>% modeltime_calibrate(new_data = testing(splits)) ``` -- ```r calibration_table %>% modeltime_accuracy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":[".model_id"],"name":[1],"type":["int"],"align":["right"]},{"label":[".model_desc"],"name":[2],"type":["chr"],"align":["left"]},{"label":["mae"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["mape"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["mase"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["smape"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["rmse"],"name":[7],"type":["dbl"],"align":["right"]},{"label":["rsq"],"name":[8],"type":["dbl"],"align":["right"]}],"data":[{"1":"1","2":"ARIMA(2,0,1)(0,1,2)[7] WITH DRIFT","3":"43069","4":"19847","5":"0.34","6":"16","7":"83130","8":"0.74"},{"1":"2","2":"LM","3":"49918","4":"19166","5":"0.39","6":"20","7":"88685","8":"0.73"},{"1":"3","2":"PROPHET W/ XGBOOST ERRORS","3":"56434","4":"19147","5":"0.45","6":"22","7":"93702","8":"0.75"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[6],"max":[6]},"pages":{}}} </script> </div> --- ## Múltiples modelos sobre una serie * **Métricas en Test** ```r forecast_series <- calibration_table %>% modeltime_forecast( new_data = testing(splits), actual_data = data_ffcc_roca) ``` --- ## Múltiples modelos sobre una serie * **Métricas en Test** ```r forecast_series %>% plot_modeltime_forecast() ```
--- ## Múltiples modelos sobre una serie * **Refit y pronostico (próximos 2 meses)** ```r refit_tbl <- calibration_table %>% filter(.model_id %in% c(1)) %>% modeltime_refit(data = data_ffcc_roca) ``` ```r forecast_final <- refit_tbl %>% modeltime_forecast( actual_data = data_ffcc_roca, h='2 months' ) ``` --- ## Múltiples modelos sobre una serie * **Visualización de la proyección a 2 meses** ```r forecast_final %>% plot_modeltime_forecast() ``` ![](data:image/png;base64,#slides_files/figure-html/unnamed-chunk-38-1.png)<!-- --> --- class: chapter-slide # Múltiples modelos sobre múltiples series <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <div:<span style="color:white">Introducción a sknifedatar</span> --- ## Múltiples modelos - múltiples series * **Selección de datos** ```r nest_data <- data %>% ungroup() %>% nest(nested_column = -linea) ```
--- ## Múltiples modelos - múltiples series * **Modeltime Table Multifit** ```r receta <- recipe(value ~ date, data = data %>% select(-linea)) %>% step_date( date, features = c('week', 'month', 'year', 'quarter', 'semester')) ``` --- ## Múltiples modelos - múltiples series * **Definición de modelos** ```r # Modelo: tbats m_tbats <-seasonal_reg() %>% set_engine("tbats") ``` ```r # Modelo stlm_arima m_seasonal <- seasonal_reg() %>% set_engine("stlm_arima") ``` ```r # Workflow: prophet boosted m_prophet_boost <- workflow() %>% add_recipe(receta) %>% add_model( prophet_boost(mode='regression') %>% set_engine("prophet_xgboost") ) ``` --- ## Múltiples modelos - múltiples series * **Modeltime Table Multifit** ```r model_table <- modeltime_multifit(serie = nest_data, .prop = 0.9, m_tbats, m_seasonal, m_prophet_boost ) ``` --- ## Múltiples modelos - múltiples series * **Métricas en la partición de evaluación** ```r model_table$models_accuracy ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["name_serie"],"name":[1],"type":["chr"],"align":["left"]},{"label":[".model_id"],"name":[2],"type":["int"],"align":["right"]},{"label":[".model_desc"],"name":[3],"type":["chr"],"align":["left"]},{"label":[".type"],"name":[4],"type":["chr"],"align":["left"]},{"label":["mae"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["mape"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["mase"],"name":[7],"type":["dbl"],"align":["right"]},{"label":["smape"],"name":[8],"type":["dbl"],"align":["right"]},{"label":["rmse"],"name":[9],"type":["dbl"],"align":["right"]},{"label":["rsq"],"name":[10],"type":["dbl"],"align":["right"]}],"data":[{"1":"BSAS_LINEA_203","2":"1","3":"TBATS(1, {0,0}, -, {<7,3>})","4":"Test","5":"8877","6":"10.0","7":"0.53","8":"9.0","9":"12399","10":"0.73"},{"1":"BSAS_LINEA_203","2":"2","3":"SEASONAL DECOMP: ARIMA(2,1,1)","4":"Test","5":"8897","6":"9.4","7":"0.54","8":"8.5","9":"12450","10":"0.73"},{"1":"BSAS_LINEA_203","2":"3","3":"PROPHET W/ XGBOOST ERRORS","4":"Test","5":"16928","6":"18.2","7":"1.02","8":"15.9","9":"19774","10":"0.65"},{"1":"BSAS_LINEA_501G","2":"1","3":"TBATS(1, {0,0}, -, {<7,2>})","4":"Test","5":"14809","6":"15.7","7":"0.52","8":"12.9","9":"21342","10":"0.68"},{"1":"BSAS_LINEA_501G","2":"2","3":"SEASONAL DECOMP: ARIMA(1,1,2)","4":"Test","5":"12353","6":"11.8","7":"0.43","8":"9.7","9":"19708","10":"0.73"},{"1":"BSAS_LINEA_501G","2":"3","3":"PROPHET W/ XGBOOST ERRORS","4":"Test","5":"22197","6":"21.7","7":"0.78","8":"17.5","9":"28177","10":"0.69"},{"1":"FFCC ROCA","2":"1","3":"TBATS(1, {2,0}, -, {<7,3>})","4":"Test","5":"49290","6":"18685.4","7":"0.39","8":"19.7","9":"83561","10":"0.74"},{"1":"FFCC ROCA","2":"2","3":"SEASONAL DECOMP: ARIMA(1,1,1)","4":"Test","5":"42667","6":"18311.3","7":"0.34","8":"13.8","9":"77579","10":"0.73"},{"1":"FFCC ROCA","2":"3","3":"PROPHET W/ XGBOOST ERRORS","4":"Test","5":"56434","6":"19147.2","7":"0.45","8":"22.1","9":"93702","10":"0.75"},{"1":"FFCC SAR","2":"1","3":"TBATS(0.26, {0,0}, -, {<7,3>})","4":"Test","5":"23446","6":"22340.4","7":"0.28","8":"16.4","9":"48855","10":"0.71"},{"1":"FFCC SAR","2":"2","3":"SEASONAL DECOMP: ARIMA(3,1,3)","4":"Test","5":"24818","6":"22542.7","7":"0.30","8":"17.2","9":"49441","10":"0.71"},{"1":"FFCC SAR","2":"3","3":"PROPHET W/ XGBOOST ERRORS","4":"Test","5":"37468","6":"23751.8","7":"0.45","8":"30.9","9":"61321","10":"0.69"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[6],"max":[6]},"pages":{}}} </script> </div> --- ## Múltiples modelos - múltiples series * **Forecast sobre la partición de evaluación** ```r forecast_series <- modeltime_multiforecast( models_table=model_table$table_time, .prop = 0.9 ) ``` --- ```r forecast_series %>% select(linea, nested_forecast) %>% unnest(nested_forecast) %>% group_by(linea) %>% plot_modeltime_forecast() ``` ![](data:image/png;base64,#slides_files/figure-html/unnamed-chunk-51-1.png)<!-- --> --- ## Múltiples modelos - múltiples series * **Selección del mejor modelo para cada serie** ```r best_models <- modeltime_multibestmodel( .table = forecast_series, .metric = "mae" ) ``` --- ```r best_models %>% select(linea, nested_forecast) %>% unnest(nested_forecast) %>% group_by(linea) %>% plot_modeltime_forecast() ``` ![](data:image/png;base64,#slides_files/figure-html/unnamed-chunk-54-1.png)<!-- --> --- ## Múltiples modelos - múltiples series * **Refit y pronostico (próximos 2 meses)** ```r models_refit <- modeltime_multirefit(best_models) ``` ```r forecast_final <- models_refit %>% modeltime_multiforecast(.h = "2 months") ``` ```r forecast_final %>% select(linea, nested_forecast) %>% unnest(nested_forecast) %>% group_by(linea) %>% plot_modeltime_forecast(.interactive=FALSE) ``` --- ## Múltiples modelos - múltiples series * **Refit y pronostico (próximos 2 meses)** ![](data:image/png;base64,#slides_files/figure-html/unnamed-chunk-58-1.png)<!-- --> --- class: chapter-slide # Bonus track: workflowsets --- ## {Worflowsets} sobre series de tiempo <br> <br> <img src="data:image/png;base64,#images/diagrama_wfs.png" width="6233" style="display: block; margin: auto;" /> --- class: chapter-slide # Bonus track: automagic tabs --- # ¿Por qué utilizar tabs? 🤔 Mostrar muchos gráficos 📈 o resultados de modelos 🤖 juntos puede generar confusión. Organizar los resultados en solapas permite centrar la atención en ciertos aspectos y no sobrecargar de información. .panelset[ .panel[.panel-name[👋 Hey!] Esta es la primera tab 🌟 Hacer click en las tabs para consejos no solicitados 🌟 👆 ] .panel[.panel-name[Consejo 1] <img src="data:image/png;base64,#https://media.tenor.com/images/be8a87467b75e9deaa6cfe8ad0b739a0/tenor.gif" width="50%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Consejo 2] <img src="data:image/png;base64,#https://media.tenor.com/images/6a2cca305dfacae61c5668dd1687ad55/tenor.gif" width="50%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Consejo 3] <img src="data:image/png;base64,#https://media.tenor.com/images/bfde5ad652b71fc9ded82c6ed760355b/tenor.gif" width="50%" style="display: block; margin: auto;" /> ] ] --- ## ¿Cómo se crean tabs manualmente? <img src="data:image/png;base64,#https://karbartolome-blog.netlify.app/posts/automagictabs/data/tabs.png" width="70%" style="display: block; margin: auto;" /> --- ## Generación automática de tabs 🙌 👉 **Código inline** , utilizando un dataframe anidado, que incluye una variable del resultado a presentar por tab ('.plot'), y una variable agrupadora ('Species') ![](data:image/png;base64,#images/automagic_tabs_gif.gif)<!-- --> --- ## Contactos ✉ Karina Bartolome [![Twitter Badge](data:image/png;base64,#https://img.shields.io/badge/-@karbartolome-1ca0f1?style=flat&labelColor=1ca0f1&logo=twitter&logoColor=white&link=https://twitter.com/karbartolome)](https://twitter.com/karbartolome) [![Linkedin Badge](data:image/png;base64,#https://img.shields.io/badge/-karina bartolome-blue?style=flat&logo=Linkedin&logoColor=white&link=https://www.linkedin.com/in/karinabartolome/)](https://www.linkedin.com/in/karinabartolome/) [![Github Badge](data:image/png;base64,#https://img.shields.io/badge/-karbartolome-black?style=flat&logo=Github&logoColor=white&link=https://github.com/karbartolome)](https://github.com/karbartolome) [![Website Badge](data:image/png;base64,#https://img.shields.io/badge/-Personal%20blog-47CCCC?style=flat&logo=Google-Chrome&logoColor=white&link=https://karbartolome-blog.netlify.app/)](https://karbartolome-blog.netlify.app/) Rafael Zambrano [![Twitter Badge](data:image/png;base64,#https://img.shields.io/badge/-@rafa_zamr-1ca0f1?style=flat&labelColor=1ca0f1&logo=twitter&logoColor=white&link=https://twitter.com/rafa_zamr)](https://twitter.com/rafa_zamr) [![Linkedin Badge](data:image/png;base64,#https://img.shields.io/badge/-rafael zambrano-blue?style=flat&logo=Linkedin&logoColor=white&link=https://www.linkedin.com/in/rafael-zambrano/)](https://www.linkedin.com/in/rafael-zambrano/) [![Github Badge](data:image/png;base64,#https://img.shields.io/badge/-rafzamb-black?style=flat&logo=Github&logoColor=white&link=https://github.com/rafzamb)](https://github.com/rafzamb) [![Website Badge](data:image/png;base64,#https://img.shields.io/badge/-Personal%20blog-47CCCC?style=flat&logo=Google-Chrome&logoColor=white&link=https://rafael-zambrano-blog-ds.netlify.app/)](https://rafael-zambrano-blog-ds.netlify.app/) ## {Sknifedatar} 📦 [Gitpage](https://rafzamb.github.io/sknifedatar/) [Workflowsets sobre múltiples series](https://rafzamb.github.io/sknifedatar/articles/workflowsets_multi_times.html) [Automagic tabs](https://rafzamb.github.io/sknifedatar/articles/automatic_tabs.html) --- class: chapter-slide # Muchas Gracias!!!