Allow the script to generate more realistic, country-specific data.

Overview

The primary purpose of the script is to generate realistic-looking fake/test data. So when it comes to human-centric geographical information, it needs the actual raw data - city and region names - in order to do its job. That's where the Country plugins come in: they let you provide the following information about any country:

  1. High-level geographical political groupings: regions / provinces / states / territories, etc.
  2. City / town names for those regions
  3. Extended Data - as of 3.0.6, Country plugins may now contain an optional extended data section that contains whatever additional information may be needed by any Data Types. This currently includes zip codes and phone numbers. You can choose to make that data generic for the country as a whole, or make it overridable per region. For example, some Country plugins define custom zip code formats for each region; others define custom area codes for phone numbers. But the idea is that it's a generic data structure that can be appended to over time without changing the structure of the Country class.

Limitations

The Country plugins are currently pretty basic. Right now, all they're used for is to try to keep the data across a single generated row looking as consistent as possible. So if the generated row contains "Canada" for the Country field, it will pick a Canadian province for any Region fields, and any cities within that region for any City fields. A few more interesting caveats:

  • If the user didn't select the "Limit countries to those selected above" for a Country row, it will randomly pick any country from the list (200 or so). If the country being outputted for a row doesn't have a corresponding Country-plugin, it will arbitrarily pick any region, city and postal/zip code (since it won't know any better!)
  • If the data set being generated doesn't contain a Country or Region field, the cities will just be arbitrarily chosen.

Add your own

Adding your own country-plugin is very simple. Knowing a little PHP would help a lot, but with common sense and a bit of patience, you can probably get by just fine. But before we get into the details, remember this:

Important: the purpose of a Country plugin isn't to provide a 100% accurate, 100% complete list of regions and cities for a country: it's to provide enough information so that the generated data looks valid.

If you were to add in every region and every city/town within a country, the data set could get extremely large, which could slow down the data generation.

Now that's over with, here's how to create a

  1. In the /plugins/countries folder, create a new folder for your country. The folder name should be the country name with no spaces, and camel-case - i.e. an upper case letter for each word in the country name, like PapuaNewGuinea.
  2. Create a single file in that folder called PapuaNewGuinea.class.php (where PapuaNewGuinea is the name of the folder you just created) and add in the following PHP.
<?php

/**
 * @package Countries
 */
class Country_PapuaNewGuinea extends CountryPlugin {
	protected $countryName = "Papua New Guinea";
	protected $countrySlug = "papuanewguinea";
	protected $regionNames = "Papua New Guinean Provinces";
	protected $continent = "oceania";
	protected $countryData = array(
		array(
			"regionName" => "Province Name 1",
			"regionShort" => "PN1",
			"regionSlug" => "province_name_1",
			"weight" => 1,
			"cities" => array(
				"City Name 1", "City Name 2"
			)
		),
		array(
			"regionName" => "Province Name 2",
			"regionShort" => "PN2",
			"regionSlug" => "province_name_2",
			"weight" => 1,
			"cities" => array(
				"City Name 3", "City Name 4"
			)
		)
	);


	public function install() {
		return CountryPluginHelper::populateDB(
			$this->countryName,
			$this->countrySlug,
			$this->countryData
		);
	}
}
  1. Now edit that file for your own country data. Here's the important stuff.
    • First line. On this line, all you need to do is change the class name to end with _YourCountry. e.g.
      class Country_YourCountry extends CountryPlugin {
    • $countryName. This is your country name.
    • $countrySlug. This is the country name without any spaces or non a-Z characters.
    • $regionNames. Different countries subdivide their political geographic regions in different ways. For example, Canada has provinces, the US has states, the UK has counties and so on. Just enter a string like "UK Counties"; this is used in the interface of the Region Data Type to let users know what data they want to generate.
    • $continent. This is the name of the continent. The following options are available (note: these must be entered exactly as written, otherwise your plugin won't show up: africa, asia, europe, central_america, north_america, oceania, south_america.
    • $countryData. The regions and cities/towns are all stored in a single data structure, grouped by region. Hopefully it's pretty self-explanatory from looking at the example above, but there are a couple of things to note:
      • regionShort. This is whatever form of abbreviation is use for the region. e.g. US States have a single two-letter code for states, as do Canadian provinces. If your country doesn't use abbreviations for the region, just enter the full region name again.
      • weight. This field lets you optionally weight the region to increase / decrease the likelihood of random data being pulled from this region. If, say, one of your regions contained 90% of the population, you could enter "90" for this value, then have the rest of the regions add up to 10. Note: the weights don't need to add up to any particular value. They simply reflect the relative weights.
  2. Lastly, to get your Country plugin to show up, go to the Settings tab in the generator and click the "Reset Plugins" button.

And that's it!


Extended Data

You may have noticed that in the PapuaNewGuinea example above, there was no Zip / Postal code data or Phone Number formats added for the country. Country plugins are designed to be flexible enough to add any country- or region-specific format.

The basic pattern to adding extended data is to create two things:

  1. an $extendedData` protected member variable in the class that contains the default values for the extended data.
  2. Inside each region inside $countryData, define whatever region-specific data is needed.
  3. Inside the Data Type, parse and interpret that data.

Here's a couple of existing Data Types that use this feature.

Postal/Zip & Phone-Regional Data Types

At the time of writing, the only two Data Types that make use of country extended data is the Postal/Zip and Phone-Regional data types. They both generate as appropriate a value as they can, based on the selected countries and the value for the Country and Region field in the data set.

Here's the first few lines of the Costa Rica Country Data Type. Take a look at the $extendedData variable.

<?php

/**
 * @package Countries
 */
class Country_CostaRica extends CountryPlugin {
	protected $continent   = "central_america";
	protected $countryName = "Costa Rica";
	protected $countrySlug = "CR";
	protected $regionNames = "Costa Rican Provinces";

	protected $extendedData = array(
		"zipFormat" => array(
			"format" => "ZYxYx",
			"replacements" => array(
				"Z" => "1234567",
				"Y" => "01",
				"x" => "0123456789"
			)
		),
		"phoneFormat" => array(
			"displayFormats" => array(
				"xxxxxxxx",
				"xxxx-xxxx"
			)
		)
	);

	// ...

The $extendedData variable can store whatever information is needed. For the zip format it stores a general zip format for the whole country and a list of replacement values that are used to generate the zip. The phone number needs a list (one is fine) of possible display formats for the phone number. These are selectable via the UI.

Note: the Data Types are what handles all the actual data generation. The developers of those plugins decide the structure of the extended data (for that section) and what info needs to be supplied. As a Country plugin developer you just need to follow the pattern set out in other Country plugins.

To provide region-specific data, you'll need to include an extendedData key in the region's data section, like as followed:

protected $countryData = array(
	array(
		"regionName" => "Alajuela",
		"regionShort" => "A",
		"regionSlug" => "alajuela",
		"weight" => 20,
		"cities" => array(
			"Alajuela", "Quesada", "San José de Alajuela", "San Rafael"
		),
		"extendedData" => array(
			"zipFormat" => array(
				"format" => "2zxYx",
				"replacements" => array(
					"z" => "01",
					"Y" => "01",
					"x" => "0123456789"
				)
			),
			"phoneFormat" => array(
				"format" => "24xxxxxx"
			)
		)
	),
	array(
		// ...

That will be used by the various data types to override the default values and provide more realistic data for the country.

To keep your Country plugin up to date with whatever extended data is generally used, I'd suggest looking through the various existing Country plugins and seeing what's defined. Extended data should be optional, but naturally you'll want to make your plugin as compatible with as many Data Types as possible.


Contribute your plugin

Sharing is much appreciated! To contribute your plugin, please just fork the project on github and submit your changes via a pull request. This is certainly the preferred method to contribute code, but if you don't think you're up for it you can always email me and I'll manually add it in. Please note, all contributions will be expected to be available under the GPL license and released along with the rest of the code. I'll be sure to add in your name as a contributor.