ineptr2 includes a file-based caching system that avoids redundant API calls, supports resuming interrupted downloads, and lets you download data without holding it all in memory. This article explains how it works.
Enabling the cache
Caching is off by default. Enable it at construction or at any point during a session:
# At construction
ine <- INEClient$new(use_cache = TRUE)
# Or later
ine$use_cache <- TRUEThe default cache directory is the system’s user cache folder for R
packages (tools::R_user_dir("ineptr2", "cache")). You can
override it:
ine <- INEClient$new(use_cache = TRUE, cache_dir = "my_cache")What gets cached
The cache has multiple layers, each serving a different purpose:
| Layer | File pattern | Format | Purpose |
|---|---|---|---|
| Chunks | ine_{indicator}_{lang}_chunks/chunk_0001.json |
JSON | Raw API responses, one file per request |
| Processed data | ine_{indicator}_{lang}_data.rds |
RDS | Tidy data frame ready to use |
| Metadata | ine_{indicator}_{lang}_meta.json |
JSON | Indicator properties (name, dimensions, dates) |
| Catalog | ine_catalog_{lang}.xml |
XML | Full INE indicator catalog |
Every cached file is tagged with the indicator code and language, so
switching ine$lang between "PT" and
"EN" maintains separate caches.
Chunk cache and manifests
When downloading data, the API response is split into chunks (one per HTTP request). Each chunk is saved as an individual JSON file inside a directory:
my_cache/
ine_0008273_EN_chunks/
chunk_0001.json
chunk_0002.json
chunk_0003.json
ine_0008273_EN_manifest.json
The manifest is a JSON file that tracks the download state:
{
"indicator": "0008273",
"lang": "EN",
"total_chunks": 3,
"urls": ["https://...chunk1", "https://...chunk2", "https://...chunk3"],
"complete": false
}The complete flag is set to true only after
every chunk has been downloaded and validated. This is the mechanism
that enables resume support.
Resuming interrupted downloads
If a download is interrupted — network timeout, session crash, or you
simply close R — the manifest and any completed chunks remain on disk.
When you call download_data() again, ineptr2:
- Reads the existing manifest
- Checks which chunks are already cached and valid (non-empty, valid JSON)
- Skips those and downloads only the remaining chunks
# Session 1: starts downloading, gets interrupted at chunk 40 of 120
ine$download_data("0008206")
# Session 2 (later): resumes from chunk 41
ine$download_data("0008206")
#> Resuming download: 40/120 chunks cachedEach chunk is first written to a temporary .part file
and only renamed to its final .json name after the JSON is
validated. This prevents half-written files from corrupting the
cache.
Processed data cache
After chunks are downloaded and assembled, get_data()
processes them into a tidy data frame and caches the result as an
.rds file. This cache also stores which dimension filters
were used.
On subsequent calls, the cache is reused only if the new request is equal to or a subset of what was previously cached. For example:
# First call: fetches and caches data for three regions
ine$get_data("0008273", dim2 = c("11", "15", "17"))
# Second call: served from cache (equal to the cached filters)
ine$get_data("0008273", dim2 = c("11", "15", "17"))
# Third call: served from cache (subset of the cached filters)
ine$get_data("0008273", dim2 = c("11", "17"))
# Fourth call: cache miss — "20" was not in the original request
ine$get_data("0008273", dim2 = c("11", "20"))This avoids the problem of silently returning incomplete data when filters change.
Cache invalidation
The chunk cache is automatically invalidated when dimension filters change between downloads. ineptr2 detects this by comparing the API URLs in the manifest against the URLs generated by the new request — different filters produce different URLs.
# Downloads with one set of filters
ine$download_data("0008273", dim2 = c("11", "15"))
# Different filters: chunk cache is cleared and download starts from scratch
ine$download_data("0008273", dim2 = c("11", "15", "17"))
#> Dimension filters changed. Clearing chunk cache.As a general rule, if 1) you are unsure of what dimensions you may need and 2) disk space is not an issue, it’s better to download the full indicator, and then work with the cache.
You can also manually clear the cache:
# Clear cache for one indicator
ine$clear_cache("0008273")
# Clear everything
ine$clear_cache()Inspecting the cache
Use list_cached() to see what’s currently stored:
ine$list_cached()
#> indicator has_metadata has_data chunks_downloaded chunks_total download_complete
#> 1 0008273 TRUE TRUE 3 3 TRUE
#> 2 0008206 TRUE FALSE 40 120 FALSEThis is useful to check the state of partial downloads or to decide what to clear.
download_data() vs get_data()
download_data() always writes to the file cache, even
when use_cache = FALSE. This is by design — its purpose is
to populate the cache for later use via
load_raw_data():
ine <- INEClient$new() # use_cache defaults to FALSE
# Downloads to cache without loading into memory
ine$download_data("0008206")
# Later: load the raw cached data
raw <- ine$load_raw_data("0008206")get_data() respects the use_cache setting:
when enabled, it checks the processed data cache before fetching; when
disabled, it always hits the API.
