The R community tinyverse
movement is a tryout of creating new package development standards. The
clue is the TINY part here. The last years of R package development were
full of many dependencies temptations. tinyverse
means as
few dependencies as R package dependencies matter. Every dependency you
add to your project is an invitation to break your project.
More information is available on tinyverse
website.
tinyverse
badges
The tinyverse
advocates wants to help and motivate R
developers. Thus they created a rest-API which generates a dependencies
badge for each CRAN package. The badge contains 2 numbers; the first
number is a direct dependency and the second one recursive ones. The R
base packages are not counted. The tinyverse
badge could
have one of 4 colors: bright green, green, orange, or red. To get a
green badge package have to have less than 5 packages (<5) in the
Depends
/Imports
/LinkingTo
fields
(check the Dependencies subsection for more description). To have a
bright green a, zero dependencies are needed. The orange badge is from 5
to 9 dependencies (>=5 and <=9). And the last one red when there
are more than 9 dependencies (>= 10). Of course, the base packages
are not counted as a dependency, pacs::pacs_base()
.
Summing up, each badge constraint:
https://tinyverse.netlify.app/
https://tinyverse.netlify.app/badge/<package>
Examples:
The tidyverse
is an opinionated collection of R packages
designed for data science. All packages share an underlying design
philosophy, grammar, and data structures. On the other hand, the
tinyvere
is only a R community movement that is trying to
make a new programming standard. There is no tinyverse
package collection; any package which has less than 5 direct
dependencies (in the
Depends
/Imports
/LinkingTo
fields)
are treated as a decent one. The best is to have zero dependencies. Even
tidyverse
looks to go toward tinyverse
if we
check their lower-level packages like purrr
,
forcats
, renv
or rlang
.
The core tidyverse
packages:
ggplot2
: tibble
: tidyr
: readr
: purrr
: dplyr
: stringr
: forcats
:
Examples of random tinyverse
packages, bright green or
green badges:
TL;DR
install.packages
requires
Depends/Imports/LinkingTo
DESCRIPTION fields dependencies, recursively.
pacs::pac_deps_user
could be used to get them.R CMD check
requires
Depends/Imports/LinkingTo/Suggests
DESCRIPTION fields dependencies, and for them
Depends/Imports/LinkingTo
fields recursively. pacs::pac_deps_dev
could be used to get
them.Now you might think of what preciously these R package dependencies mean. The R DESCRIPTION file is the place where we could explore the number and nature of dependencies; the 5 fields represent different types of dependencies: Depends/Imports/LinkingTo/Suggests/Enhances.
DESCRIPTION file dependencies:
Package: NAME
...
Depends:
R (>= 3.6)
Imports:
dplyr
data.table
LinkingTo:
Rcpp
Suggests:
testthat
ca2cat
Enhances:
Hmisc
...
We could get any installed package description file with the
packageDescription
function. More than that, the
pacs::pac_description
could get any, even not installed
package description file and for any version you want.
packageDescription("pacs")
When we run install.packages
(and other install
functions like remotes::install_github
) only 3 fields are
installed
Depends/Imports/LinkingTo.
We could easily confirm that by checking its help page and the
dependencies
argument definition:
?install.packages
...
Dependencies:
...
The default, 'NA',
means 'c("Depends", "Imports", "LinkingTo")'.
Depends are packages library
(attached), before the main package is library
(attached).
So when we library()
the main package
Depends dependencies functions are available to the end
user in the R console. This could be more convenient for the end user if
the main package offers additional functionality over the dependency
one.
The Imports field lists packages whose namespaces
are imported from (as specified in the NAMESPACE file or when sb is
using ::
/:::
inside the package) but which do
not need to be attached (library
). When we use the
library()
call, Imports dependencies
functions are unavailable to the user in the R console. Namespaces
accessed by the ::
and :::
operators
(e.g. ggplot2::ggplot
) must be listed in the
Imports field, or in Suggests (when
used only for tests or examples).
A package that wishes to use header files in other packages to compile its C/C++ code needs to declare them as a comma-separated list in the field LinkingTo. Specifying a package in LinkingTo suffices if these are C/C++ headers containing source code or static linking is done at installation: the packages do not need to be (and usually should not be) listed in the Depends or Imports fields.
So what about the rest? Suggests are installed when
we need to run R CMD CHECK
(or higher level like
devtools::check()
), they are used for tests (e.g. testthat)
or for examples (roxygen2
@examples). Enhances is used
rarely as these are packages which could extent the usage and are NOT
needed for running examples and tests. If your tests/examples use e.g. a
dataset from another package, it should be in Suggests
and not Enhances.
So now we see that a Imports dependency is not equal
to a Suggests dependency. From the end user’s
perspective, we focus on
Depends/Imports/LinkingTo
dependencies which they will downlaod with
install.packages
.
It’s common for packages to be listed in Imports in DESCRIPTION, but
not in NAMESPACE. The DESCRIPTION file Imports field has nothing to do
with functions imported into the namespace. The DESCRIPTION file Imports
is mainly used by install.packages
. On the other hand,
NAMESPACE is a place where we defining what we need to build our package
and what we want to expose to the end users (export). Nowadays the
NAMESPACE file is even more mysterious as it is built automatically
e.g. by roxygen2
package. A package has to be listed in the
Imports in DESCRIPTION file, but not in NAMESPACE if we
will call the dependencies to function with ::
in the main
package. These explicit calls to dependencies are preferred.
If you are interested “How-R-Searches-And-Finds-Stuff” I recommend a great blog post which has more than 10 years and still is one of the most valuable R sources.
tinytest
vs testthat
This subsection will be a subjective view on the difference between
tinytest
and testthat
packages. A package
could have many dependencies, nevertheless not exposed to the end user
(these dependencies are not installed with install.packages
call), as is in Suggests
field of the DESCRIPTION file.
tinytest
was created to offer similar functionality to
testthat
package nevertheless, tinytest
has
zero dependencies. For me, tinytest
is an interesting
alternative compared to testthat
nevertheless not so
obvious replacement. I do not care how many dependencies have the
testthat
package as it is located in Suggests
field of DESCRIPTION file. testthat
will not be delayed
loaded with requireNampese
too. This means that the higher
number of dependencies from the testthat
package is only my
problem (developer one, not the end user) when e.g. I am checking a
package (e.g. with R CMD check
). How many additional
packages must be downloaded by a developer (e.g. for
R CMD check
) when comparing tinytest
and
testthat
? In the case of tinytest
it is zero
packages and for testthat
80 packages now. Please use
pacs::pac_deps_dev("tinytest")
and
pacs::pac_deps_dev("testthat")
to confirm that. When
tinytest
and testthat
are in the Suggests
field of another package (e.g. pacs
), then the end user
needs additional 0 packages for tinytest
and 30 packages
for testthat
(pacs::pac_deps_user("testthat")
). Remember that these
dependencies might overlap with other packages and their
dependencies.
Dependencies from the end user perspective:
yagni
(XP) - do not include unnecessary featuresmodularization
- divide your package into a few smaller
and more specialized onesOne of the methods of reducing the number of dependencies (exposed to end users) is to transfer the package from Imports to Suggests and load it in a delayed manner or not include it at all. So we have to identify package functions that will be used optionally or rarely (are not a core of the package). Then we have to apply conditional execution if the package is installed (available), if not, then ask the user to install it. If a function with the delayed loaded package is used in examples or tests, then the package must be in the Suggests field.
func <- function() {
if (requireNamespace("PACKAGE", quietly = TRUE)) {
# regular code
} else {
stop("Please install the PACKAGE to use the func function")
}
}
parsnip
and caret
packages are examples
that apply this strategy. It could be quickly confirmed by looking for
requireNamespace
phrase with github search, from each
repo.
pacs
package
One functionality of the pacs
package is to check a
package complexity. We could check the number of dependencies
(recursively or not) and even check how many MB are allocated for a
package and all its dependencies.
Weight Case Study: devtools
Consider that package sizes are appropriate for your local system
(Sys.info()
). Installation with
install.packages
and some devtools
functions
might result in different packages sizes.
If you do not want to install anything in your current library
(.libPaths()
) and still inspect a package size, then using
the withr
package is recommended.
withr::with_temp_libpaths
is recommended to isolate the
download process.
# restart of R session could be needed
withr::with_temp_libpaths({install.packages("devtools"); cat(pacs::pac_true_size("devtools") / 10**6, "MB", "\n")})
Installation in your main library.
# if not have
install.packages("devtools")
Size of the devtools
package:
The actual size of the devtools
package is
113MB
for devtools
with all dependencies and
without base packages (Mac OS arm64
).
cat(pacs::pac_true_size("devtools") / 10**6, "MB", "\n")
A reasonable assumption might be to count only dependencies not used
by any other package. Then we could use exclude_joint
argument to limit them. However hard to assume if your local
installation is a reasonable proxy for an average user.
# exclude packages if at least one other package uses it too
cat(pacs::pac_true_size("devtools", exclude_joint = 1L) / 10**6, "MB", "\n")
It is crucial to check the number of dependencies too:
# 70 recursive dependencies
pacs::pac_deps("devtools", local = TRUE)$Package
# 20 direct dependencies
pacs::pac_deps("devtools", local = TRUE, recursive = FALSE)$Package
We could check out which of the direct dependencies are the heaviest ones:
pac_deps_heavy("devtools")