globaldatatoolkit.dataloaders.bycountry
Country-specific data loading functionality.
This module provides tools for loading and processing country-specific datasets from CSV files.
DataLoader
Interface for generating per-country csv files and loading them into runtime memory.
DataLoader wraps the Generator class' processing of the csv containing data from multiple countries into individual csv files, and returns it in runtime memory as a dictionary of pandas.DataFrame.
Attributes:
| Name | Type | Description |
|---|---|---|
storage_dir |
str
|
path of directory in which csv files of individual countries are stored |
Examples:
>>> from globaldatatoolkit.dataloaders import bycountry
>>> per_country_data_loader = bycountry.DataLoader("./climate_data.csv")
>>> country_data = per_country_data_loader.load()
>>> type(country_data)
Dict[str, pandas.DataFrame]
Source code in src/globaldatatoolkit/dataloaders/bycountry.py
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | |
__init__(csv_path, data_generation_dir='./data')
Generate per-country csv files for loading.
Source code in src/globaldatatoolkit/dataloaders/bycountry.py
238 239 240 241 242 243 | |
load()
Load per-country data into runtime memory as pandas.DataFrame objects.
Returns:
| Type | Description |
|---|---|
dict[str, DataFrame]
|
country to country data dictionary |
Source code in src/globaldatatoolkit/dataloaders/bycountry.py
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | |
Generator
Generates per-country csv files from combined data csv.
Generates csv files for each country's records present in the csv data file, and stores intermittent variables such as the path of the csv file and a pandas' DataFrame containing the csv data.
Attributes:
| Name | Type | Description |
|---|---|---|
data |
DataFrame
|
a DataFrame containing the unfiltered, raw csv data |
bycountry_data_dir |
str
|
the directory where generated per-country csv files should be stored, defaults to "./data/bycountry/" when a path is not provided upon class init |
Methods:
| Name | Description |
|---|---|
_files_already_exist |
returns True if csv files for countries in the main data csv seem to exist, and False otherwise. |
Source code in src/globaldatatoolkit/dataloaders/bycountry.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 | |
__init__(csv_file_path, data_dir='./data')
Validate params and initialize Generator with a pandas.DataFrame and directory string.
Initialize the Generator class once a few checks are performed: 1. param csv_file_path represents a csv file 2. csv_file_path exists 3. data_dir exists
Upon validation, self.data is populated with a pandas.DataFrame version of the csv, and an absolute path for self.data_dir for storage of per-country csv files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
csv_file_path
|
str
|
path of the main csv file with climate data of different countries. |
required |
data_dir
|
str | PathLike[str]
|
directory where the bycountry/ folder of csvs should be created, defaults to "./data". |
'./data'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
if csv_file_path doesn't specify a .csv file. |
FileNotFoundError
|
if the file indicated by csv_file_path doesn't exist, or the directory indicated by data_dir doesn't exist. |
Source code in src/globaldatatoolkit/dataloaders/bycountry.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | |
_clean_storage()
Clean the per-country csv storage directory.
Remove any files inside the directory.
Source code in src/globaldatatoolkit/dataloaders/bycountry.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
_files_already_exist()
Check if per-country csvs already exist in the data_dir.
Performs a shallow check to determine if number and names of csv files in the bycountry directory match number and names of countries in the main csv file.
Note: this doesn't check the contents or lengths of the csv files.
Returns:
| Type | Description |
|---|---|
True if {data_dir}/bycountry/ exists and contains exactly the same countries as in the main csv.
|
|
False otherwise.
|
|
Source code in src/globaldatatoolkit/dataloaders/bycountry.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
generate()
Generate per country CSV files, if they don't already exist. Store them in {data_dir}/bycountry/.
Returns:
| Type | Description |
|---|---|
str
|
directory where countries' individual csv files are stored |
list[str]
|
list of countries for which csv files are present |
Source code in src/globaldatatoolkit/dataloaders/bycountry.py
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 | |