Title: | Implementation of Flag Aggregation |
---|---|
Description: | Three methods are implemented in R to facilitate the aggregations of flags in official statistics. From the underlying flags the highest in the hierarchy, the most frequent, or with the highest total weight is propagated to the flag(s) for EU or other aggregates. Below there are some reference documents for the topic: <https://sdmx.org/wp-content/uploads/CL_OBS_STATUS_v2_1.docx>, <https://sdmx.org/wp-content/uploads/CL_CONF_STATUS_1_2_2018.docx>, <http://ec.europa.eu/eurostat/data/database/information>, <http://www.oecd.org/sdd/33869551.pdf>, <https://sdmx.org/wp-content/uploads/CL_OBS_STATUS_implementation_20-10-2014.pdf>. |
Authors: | Mátyás Mészáros [aut, cre], Matteo Salvati [aut] |
Maintainer: | Mátyás Mészáros <[email protected]> |
License: | EUPL-1.1 |
Version: | 0.3.2 |
Built: | 2024-10-27 05:50:30 UTC |
Source: | https://github.com/cran/flagr |
This function is used when a single value has multiple flags. The same weight is repeated for each single character.
flag_divide(x)
flag_divide(x)
x |
A vector with two items. The first item is a string of flags with several characters, the second is a single numerical value of the weight. |
flag_divide
returns a character matrix with the flags as single characters as the first column and the weight is
repeated as the second column. The length of the list is equal to the length of the string of flags.
flags <- tidyr::spread(test_data[, c(1:3)], key = time, value = flags) weights <- tidyr::spread(test_data[, c(1, 3:4)], key = time, value = values) input <- as.data.frame(cbind(flags[,5],weights[,5]),stringsAsFactors = FALSE)[!is.na(flags[,5]),] do.call(rbind, apply(input,1,flag_divide))
flags <- tidyr::spread(test_data[, c(1:3)], key = time, value = flags) weights <- tidyr::spread(test_data[, c(1, 3:4)], key = time, value = values) input <- as.data.frame(cbind(flags[,5],weights[,5]),stringsAsFactors = FALSE)[!is.na(flags[,5]),] do.call(rbind, apply(input,1,flag_divide))
Flag aggregation by the frequency count method
flag_frequency(f)
flag_frequency(f)
f |
A vector of flags containing the flags of a series for a given period. |
flag_frequency
returns a character with a single character flag in case the highest frequency count
is unique, or multiple character in case there are several flags with the highest frequency count.
flag_frequency(c("pe","b","p","p","u","e","d")) flag_frequency(c("pe","b","p","p","eu","e","d")) flags <- tidyr::spread(test_data[, c(1:3)], key = time, value = flags) flag_frequency(flags[,5]) apply(flags[, c(2:ncol(flags))],2, flag_frequency)
flag_frequency(c("pe","b","p","p","u","e","d")) flag_frequency(c("pe","b","p","p","eu","e","d")) flags <- tidyr::spread(test_data[, c(1:3)], key = time, value = flags) flag_frequency(flags[,5]) apply(flags[, c(2:ncol(flags))],2, flag_frequency)
Flag aggregation by the hierarchical inheritance method
flag_hierarchy(f, flag_list)
flag_hierarchy(f, flag_list)
f |
A vector of flags containing the flags of a series for a given set of flags. |
flag_list |
The predefined hierarchy of allowed flags as a vector of single characters. |
flag_hierarchy
returns the flag as single character that is the highest place in the
predifined hierarchy order for the given set of flags.
flag_hierarchy(c("p","b","s","b","u","e","b"), flag_list = c("e","s","t")) flag_hierarchy(c("p","b","s","c","u","d"), flag_list = c("e","s","t")) flags <- tidyr::spread(test_data[, c(1:3)], key = time, value = flags) flag_hierarchy(flags[,4],flag_list = c("p","b","s","c","u","e","d")) apply(flags[, c(2:ncol(flags))],2, flag_hierarchy, flag_list = c("p","b","s","c","u","e","d"))
flag_hierarchy(c("p","b","s","b","u","e","b"), flag_list = c("e","s","t")) flag_hierarchy(c("p","b","s","c","u","d"), flag_list = c("e","s","t")) flags <- tidyr::spread(test_data[, c(1:3)], key = time, value = flags) flag_hierarchy(flags[,4],flag_list = c("p","b","s","c","u","e","d")) apply(flags[, c(2:ncol(flags))],2, flag_hierarchy, flag_list = c("p","b","s","c","u","e","d"))
This method can be used when you want to derive the flag of an aggregate that is a weighted average, index, quantile, etc.
flag_weighted(i, f, w)
flag_weighted(i, f, w)
i |
An integer column identifier of data.frame or a matrix containing the flags and weights used to derived the flag for the aggregates. |
f |
A data.frame or a matrix containing the flags of the series (one column per period) |
w |
A data.frame or a matrix with same size and dimesion as |
flag_weighted
Returns a character vector with the flag that has the highest weighted frequency or multiple flags in alphabetical
order (in case there are more than one flag with the same highest weight) as the first value, and the sum of weights for the given flag(s) as
the second value for the given columns of f,w
defined by the parameter i
.
flag_weighted(1, data.frame(f=c("pe","b","p","p","u","e","d"), stringsAsFactors = FALSE), data.frame(w=c(10,3,7,12,31,9,54))) flag_weighted(1, data.frame(f=c("pe","b","p","p","up","e","d"), stringsAsFactors = FALSE), data.frame(w=c(10,3,7,12,31,9,54))) flag_weighted(1, data.frame(f=c("pe",NA,"pe",NA,NA,"d"), stringsAsFactors = FALSE), data.frame(w=c(10,3,7,12,31,9))) flags <- tidyr::spread(test_data[, c(1:3)], key = time, value = flags) weights <- tidyr::spread(test_data[, c(1, 3:4)], key = time, value = values) flag_weighted(7,flags[, c(2:ncol(flags))],weights[, c(2:ncol(weights))]) weights<-apply(weights[, c(2:ncol(weights))],2,function(x) x/sum(x,na.rm=TRUE)) weights[is.na(weights)] <- 0 flags<-flags[, c(2:ncol(flags))] sapply(1:ncol(flags),flag_weighted,f=flags,w=weights)
flag_weighted(1, data.frame(f=c("pe","b","p","p","u","e","d"), stringsAsFactors = FALSE), data.frame(w=c(10,3,7,12,31,9,54))) flag_weighted(1, data.frame(f=c("pe","b","p","p","up","e","d"), stringsAsFactors = FALSE), data.frame(w=c(10,3,7,12,31,9,54))) flag_weighted(1, data.frame(f=c("pe",NA,"pe",NA,NA,"d"), stringsAsFactors = FALSE), data.frame(w=c(10,3,7,12,31,9))) flags <- tidyr::spread(test_data[, c(1:3)], key = time, value = flags) weights <- tidyr::spread(test_data[, c(1, 3:4)], key = time, value = values) flag_weighted(7,flags[, c(2:ncol(flags))],weights[, c(2:ncol(weights))]) weights<-apply(weights[, c(2:ncol(weights))],2,function(x) x/sum(x,na.rm=TRUE)) weights[is.na(weights)] <- 0 flags<-flags[, c(2:ncol(flags))] sapply(1:ncol(flags),flag_weighted,f=flags,w=weights)
The wrapper function to use the different method and provide a structured return value independently from the method used.
propagate_flag(flags, method = "", codelist = NULL, flag_weights = 0, threshold = 0.5)
propagate_flag(flags, method = "", codelist = NULL, flag_weights = 0, threshold = 0.5)
flags |
A data.frame or a matrix containing the flags of the series (one column per period) without row identifiers (e.g. country code). |
method |
A string contains the method to to derive the flag for the aggregate. It can take the value, "hierarchy", "frequency" or "weighted". |
codelist |
A string or character vector defining the list of acceptable flags in case the method "hierarchy" is chosen. In case of the string equals to "estat" or "sdmx" then the predefined standard Eurostat and SDMX codelist is used, otherwise the characters in the sring will define the hierarchical order. |
flag_weights |
A data.frame or a matrix containing the corresponding weights of the series (one column per
period) without row identifiers (e.g. country code). It has the same size and dimesion as the |
threshold |
The threshold which above the should be the waights in order the aggregate to receive a flag. Defalut value is 0.5, but can be changed to any value. |
propagate_flag
returns a list with the same size as the number of periods (columns) in the flags
parameter. In case of the methods is "hierarchy" or "frequency", then only the derived flag(s) is returned. In case
of weighted it returns the flag(s) and the sum of weights if it is above the threshold, otherwise the list contains
NA
where the sum of weights are below the threshold.
flag_hierarchy
, flag_frequency
, flag_weighted
flags <- tidyr::spread(test_data[, c(1:3)], key = time, value = flags) weights <- tidyr::spread(test_data[, c(1, 3:4)], key = time, value = values) propagate_flag(flags[, c(2:ncol(flags))],"hierarchy","puebscd") propagate_flag(flags[, c(2:ncol(flags))],"hierarchy","estat") propagate_flag(flags[, c(2:ncol(flags))],"frequency") flags<-flags[, c(2:ncol(flags))] weights<-weights[, c(2:ncol(weights))] propagate_flag(flags,"weighted",flag_weights=weights) propagate_flag(flags,"weighted",flag_weights=weights,threshold=0.1)
flags <- tidyr::spread(test_data[, c(1:3)], key = time, value = flags) weights <- tidyr::spread(test_data[, c(1, 3:4)], key = time, value = values) propagate_flag(flags[, c(2:ncol(flags))],"hierarchy","puebscd") propagate_flag(flags[, c(2:ncol(flags))],"hierarchy","estat") propagate_flag(flags[, c(2:ncol(flags))],"frequency") flags<-flags[, c(2:ncol(flags))] weights<-weights[, c(2:ncol(weights))] propagate_flag(flags,"weighted",flag_weights=weights) propagate_flag(flags,"weighted",flag_weights=weights,threshold=0.1)
This data set is a fictive data set with fictive values and flags for testing purposes.
test_data
test_data
A data frame with 195 rows and 4 variables:
2 digit country code
flag of the value
date of observation
value of the element
The source is in *.csv* format also available in the package.