Provide new types of data for generation.


This page explains how to add your own data types so you can use the Data Generator to generate pretty much whatever crazy stuff you want.

Data Types are self-contained plugins that generate a single random data item, like a name, email address, country name, country code, image, picture, URL, barcode image, binary string - really anything you want. Data Types can offer basic functionality, like the Email Address Data Type which has no options, examples or help doc, or they can be more advanced, like the Date Data Type, which contains examples of date formats for easy generation, and contains a date picker dialog (jQuery UI). Data Types can be standalone and generate data that has no bearing on other fields - like the Alpha Numeric Data Type - or make decisions about its content based on other fields in the data set, like Region, which intelligently generates a region within whatever country has been randomly generated for that row. Finally, if you want to get really fancy, you can even create Data Types that generate content based on previously generated row data, like the Tree Data Type that creates a tree-like data structure by mapping the ID of each row to a single parent row ID.

Data Types have both a PHP and (optional) JS component. The PHP is used to do the actual generation; the JS is used for creating the UI and saving/loading the Data Type data.

When creating your new Data Type, you can add anything you need from client-side validation to custom dynamic JS/DOM manipulation. You can also generate different content based on the selected Export Type (SQL, XML etc). It's a pretty flexible system, so hopefully you won't run into any brick walls. And if you do, you can just drop me a line and explain the shortcomings.

Lastly, I tried to make the process of adding Data Types as simple and as sandboxed as possible. The Core script does an awful lot for you: all you really need to do is follow the instructions below and maybe look at the existing Data Types for inspiration. Once you wrap your head about how it all fits together, developing new Data Types should be pretty straightforward.

Alrighty! Let's start with looking at the actual files and folders that go into a Data Type.

Anatomy of a Data Type

Now let's do a high-level view of what goes into a module: the files and folders, the JS + PHP components and how the translations / internationalization works. We'll get into the details about the code in the following sections.

Files and Folders

All Data Types are found in the /resources/plugins/dataTypes/ folder. Each Data Type has its own folder, which acts as the namespace for the JS and PHP code. What I mean is that the exact string you choose for the folder (like AlphaNumeric or StreetAddress) has to be used in your JS module creation and PHP class definition. I'll explain all that below.

A Data Type has the following required files. Let's assume the folder name is MyNewDataType.

  • /resources/plugins/dataType/MyNewDataType.js: this file can actually be called whatever you want, but for consistency and for keeping reading the Web Inspector / Firebug net panel, I'd name them like this. You can have as many JS files as you want, but one is almost certainly enough.
  • /resources/plugins/dataType/MyNewDataType.class.php: this contains your DataType_MyNewDataType class, which handles all necessary server-side code: the data generation and any markup you want available in the generator webpage. More info about all that below.
  • /resources/plugins/dataType/lang/en.php: A PHP file containing a single array (hash) that lists all strings used in your module.

You can also include any custom CSS files you want. See the PHP class definition below for more information


The JS module for your Data Type does the following:

  • Registers itself with the Manager JS component, to allow it to publish and subscribe to messages; i.e. to interact with the Core script and detect when certain user interface events happen.
  • Save and load data for each row that has your Data Type selected.
  • Perform whatever validation is required to ensure the user fills in the Data Type row properly.
  • Perform any additional UI frills, like hiding/showing/disabling/enabling content based on information entered by the user in the page.


The PHP class for your Data Type handles the following functionality:

  • Initial installation of the module, if it needs to do anything special.
  • Specifies in which section and what order in the Data Types dropdown your Data Type should appear.
  • Specifies what JS and CSS files should be included for the Data Type when the generator is loaded.
  • Creates whatever HTML is needed for the Example and Options columns in the generator table.
  • Creates whatever HTML should be included in the Help section of the dialog window.
  • Actually generate the random data for that Data Type.
  • Specifies the process order of the Data Type. When the random data is generated, it's generated row by row. Within each row, each Data Type is generated in waves. The first wave are fields that have no dependencies with other row types; the second and later waves may all depend on previous waves. That way, a Data Type that needs to know if another field has a particular value can be sure that that value is actually loaded, and use that information in generating the random snippet for that column and row. For example, a Region field can check to see if a Country field has been included, and if so, generate a random region within the country for that row.

Language Files

All text strings that appear in your module should be pulled from a language file. It's very simple. Just create a file called en.php in your /resources/plugins/dataTypes/[data type folder]/lang/ folder. That file should contain a single $L hash, like so:


$L = array();
$L["DATA_TYPE_NAME"] = "Alphanumeric";
$L["example_CanPostalCode"] = "(Can. Postal code)";
$L["example_Password"] = "(Password)";

// ...

Once you do that, the Data Generator automatically makes that information accessible to your PHP and JS code. I'll explain how that works in the following sections.

The PHP Class

All plugins - Data Types, Export Types and Country plugins have to extend a base, abstract class defined by the core code. Hopefully you know what this means, but if not - time for some Googling! Simply put, abstract classes are a mechanism to help ensure that the class being defined has a proper footprint and contains all the functionality that's expected and required.

For Data Types, take a look at this file: /resources/classes/DataTypePlugin.class.php. That's the class you'll need to extend.

Example: GUID Data Type

Now rather than blather on about your Data Type PHP class in the abstract, let's look at an actual implementation first. If you want to see the complete list of available variables and methods, check out the source code of the Data Type abstract class (/resources/classes/DataTypePlugin.abstract.class.php). It's well documented.

This is the PHP class for the GUID class. It's a simple Data Type that generates a random GUID string. Maybe first try it out in the script to see what it does.


 * @package DataTypes

class DataType_GUID extends DataTypePlugin {
	protected $isEnabled = true;
	protected $dataTypeName = "GUID";
	protected $dataTypeFieldGroup = "numeric";
	protected $dataTypeFieldGroupOrder = 50;
	private $generatedGUIDs = array();

	public function generate($generator, $generationContextData) {
		$guid = Utils::generateRandomAlphanumericStr($placeholderStr);

		// pretty sodding unlikely, but just in case!
		while (in_array($guid, $this->generatedGUIDs)) {
			$guid = Utils::generateRandomAlphanumericStr($placeholderStr);
		$this->generatedGUIDs[] = $guid;
		return array(
			"display" => $guid

	public function getHelpHTML() {
		return "<p>{$this->L["help"]}</p>";

	public function getDataTypeMetadata() {
		return array(
			"SQLField" => "varchar(36) NOT NULL",
			"SQLField_Oracle" => "varchar2(36) NOT NULL"

Let's look at each line in turn.

  • class DataType_GUID extends DataTypePlugin: our class definition. All Data Type class names must for of the following format: DataType_[folder] - where folder is the name of the Data Type folder. Pretty straightforward. Also, note that it extends the DataTypePlugin base class. That's required.
  • $isEnabled: this var explicitly enables/disables the module. In case you're tinkering around with a new Data Type, sometimes you may not want it to show up in the UI - so you'd just set this to false.
  • $dataTypeName: this is the human-readable name of your module. It can be in whatever language you want, but we prefer English as the default language string. The value you enter in this variable is automatically overridden if the current selected language has the following value in the language file: $L["DATA_TYPE_NAME"] = "New Name"; This provides a simple mechanism to provide alternative translations of your Data Type names.
  • $dataTypeFieldGroup: in the Data Type dropdowns in the generator, you'll notice that the Data Types are all grouped. This variable determines which group your Data Type should appear in. You can choose any of the following strings: human_data, geo, text, numeric, math, other. If you feel that you need a new group for your Data Type, drop me a line.
  • $dataTypeFieldGroupOrder: this determines where in the list your Data Type should appear. Look at the the values for other Data Types to figure out what value to enter. I spaced them all out with 10 in between to allow you to insert your Data Type at any point in the list.

So far so good. The next line, $generatedGUIDs is a custom private var for use by this Data Type only. Don't worry about it.

Now lets look at the methods:

  • public function generate($generator, $generationContextData): this is the main generation function for the Data Type. It's passed two parameters:
    1. The current Generator instance. Behinds the scenes, the data generation is all managed by the Generator class, found here: /resources/classes/Generator.class.php. This is a very helpful class - it contains various utility methods for finding out about the current data set being generated. However, the GUID class doesn't need it.
    2. The generation context data. The Generator generates the data sets row by row. Each row contains one or more Data Types. This variable contains all the Data Types generated so far for the current row. Any Data Type can choose to return additional meta data for a particular generated atomic data - e.g. a Region could choose to return the Country to which is belongs. This second function param contains all that information. Lastly, if a Data Type has dependencies on previous Data Types in the row, it needs to set the protected $processOrder = X; class variable. See the Data Type plugin abstract class for more information about that advanced feature - or look at the Region plugin for an example of how it's used.
  • public function getHelpHTML(): this optional function is used to return whatever help text you want for your Data Type. Note that the returned string references a $L class variable: $this->L["help"]. The $L variable is populated with the current language file automatically when the Data Type is instantiated. This mechanism is taken care of for you - you can safely refer to $this->L throughout your own class.
  • public function getDataTypeMetadata(): this optional function returns additional meta information about your Data Type. Right now it's really only used for the SQL Export Type. When the user selects SQL, the code needs to know how large a database field should be created for the data. As such, this function returns that information - for both generic SQL and Oracle SQL, so the Export Type can do it's job. As mentioned, this is not a required function. If it wasn't supplied, the SQL Export Type would just provide its best guess.

And that's it for our example. The following sections go into greater depth regarding the class member vars and methods. There's a lot more you can do.

Class Variable List

Alright! Here's the full list of class vars that have special meaning.

Var Req/Opt Type Explanation
$dataTypeName required string The human-readable name of the Data Type used in the UI. Note: the $L["DATA_TYPE_NAME"] defined in a language file will override this value.
$dataTypeFieldGroup required string Data Types are grouped together in the Data Type dropdowns in the UI. This variable lets the system know to which group your Data Type should belong. Possible values are: human_data, geo, text, numeric, math, other. If you feel that you need a new group for your Data Type, drop me a line.
$dataTypeFieldGroupOrder required integer The order in which the Data Type should appear within the group specified by the previous field.
$isEnabled optional boolean Hides / shows the module from the interface. Note, you'll need to refresh the list of plugins after changing this value.
$jsModules optional array An array of JS filenames, all found in the Data Type folder.
$cssFiles optional array An array of CSS filenames, all found in the Data Type folder.
$L auto-generated array Do NOT define this variable. When the Data Type is instantiated, this variable is auto-generated and populated with the appropriate language file.

Class Method List


Req/Opt required
  1. $generator: the Generator object, through which a Data Type can call the various available public methods. See /resources/classes/Generator.class.php.
  2. $generationOptions: A hash of information relating to the generation context. Namely:
    rowNum: the row number in the generated content (indexed from 1)
    generationOptions: whatever options were passed for this particular row and data type; i.e. whatever information was returned by getRowGenerationOptions(). This data can be empty or contain anything needed - in whatever format. By default, this is set to null.
    existingRowData: data already generated for the row.
Explanation This does the work of actually generating a random data snippet. Data Types have to return a hash with at least one key: "display". They can also load up the hash with whatever else they want, if they want to provide additional meta data to other Data Types that are being generated on that row (e.g. Country, passing its country_slug info to Region)


Req/Opt optional
Params $runtimeContext: Data Types classes are instantiated at different times in the code. This parameter is a string that describes the context in which it's being instantiated: ui / generation
Explanation An optional constructor. Note: this should always call parent::__construct($runtimeContext);.


Req/Opt optional
Params None
Explanation This is called once during the initial installation of the script, or when the installation is reset (which is effectively a fresh install). It is called AFTER the Core tables are installed, and you can rely on Core::$db having been initialized and the database connection having been set up.


Req/Opt optional
Params None
Explanation If the Data Type wants to include something in the Example column, it should return the raw HTML via this function. If this function isn't defined (or it returns an empty string), the string "No examples available." will be outputted in the cell. This is used for inserting static content into the appropriate spot in the table; if the Data Type needs something more dynamic, it should subscribe to the appropriate event.


Req/Opt optional
Params None
Explanation If the Data Type wants to include something in the Options column, it must return the HTML via this function. If this function isn't defined (or it returns an empty string), the string "No options available." will be outputted in the cell. This is used for inserting static content into the appropriate spot in the table; if the Data Type needs something more dynamic, it should subscribe to the appropriate event.


Req/Opt optional
Params None
Explanation Returns the help content for this Data Type (HTML / string).


Req/Opt optional
  1. $generator (object): the instance of the Generatorobject, containing assorted public methods
  2. $post (array): the entire contents of $_POST
  3. $colNum (integer): the column number (row in the UI...!) of the item
  4. $numCols (integer): the number of columns in the data set
  • false, if the Data Type doesn't have sufficient information to generate the row (i.e. things weren't filled in in the UI and the Data Type didn't add proper validation)
  • anything else. This can be any data structure needed by the Data Type. It'll be passed as-is into the generateItem function as the second parameter.
Explanation Called during data generation. This determines what options the user selected in the user interface; it's used to figure out what settings to pass to each Data Type to provide that function the information needed to generate that particular data item. Note: if this function determines that the values entered by the user in the options column are invalid (most likely just incomplete) the function can explicitly return false to tell the core script to ignore this row.


Req/Opt optional
Returns array
Explanation Used for providing additional metadata about the Data Type for use during generation. Right now this is only used to pass additional data to the SQL Export Type so it can intelligently create a CREATE TABLE statement with database column types and sizes that are appropriate to each field type.

Non-overridable Methods

The following methods are defined on the Data Plugin abstract class, which you can use when developing your Data Type.

Function Explanation
getName() returns the Data Type name.
getIncludedFiles() returns list (array) of included files.
getDataTypeFieldGroup() returns the field type group to which this Data Type belongs.
getDataTypeFieldGroupOrder() returns the order of the field type group.
getProcessOrder() returns the Data Type process order.
getPath() returns the path to the Data Type file.
getJSModules() returns the array of JS modules.
getCSSFiles() returns the array of CSS files for the Data Type.
isEnabled() returns whether or not the Data Type is enabled or not.

The JS Module

Each Data Type may choose to have an optional JS component: a javascript module that performs certain functionality like saving/loading the data type data, running client-side validation on the user inputs (if required) and triggering whatever additional JS code is necessary.

Optional or required?

The JS module is optional. The Core script handles saving and loading the Column Title and Data Type for all Data Types, so if you don't need anything in the Example or Options columns, you don't need to include a JS module.

Explaining how the JS module works can be a little abstract, so let's start with an example.

Example: Alphanumeric Data Type

The following is the JS module for the Alphanumeric Data Type. Give it a look over, then we'll pull it apart and explain each bit below.

/*global $:false*/
], function(manager, C, L, generator) {

	"use strict";

	 * @name AlphaNumeric
	 * @description JS code for the AlphaNumeric Data Type.
	 * @see DataType
	 * @namespace

	var MODULE_ID = "data-type-AlphaNumeric";
	var LANG = L.dataTypePlugins.AlphaNumeric;
	var subscriptions = {};

	var _init = function() {
		subscriptions[C.EVENT.DATA_TABLE.ROW.EXAMPLE_CHANGE + "__" + MODULE_ID] = _exampleChange;
		manager.subscribe(MODULE_ID, subscriptions);

	var _saveRow = function(rowNum) {
		return {
			"example": $("#dtExample_" + rowNum).val(),
			"option":  $("#dtOption_" + rowNum).val()

	var _loadRow = function(rowNum, data) {
		return {
			execute: function() {
				$("#dtExample_" + rowNum).val(data.example);
				$("#dtOption_" + rowNum).val(data.option);
			isComplete: function() { return $("#dtOption_" + rowNum).length > 0; }

	var _exampleChange = function(msg) {
		$("#dtOption_" + msg.rowID).val(msg.value);

	var _validate = function(rows) {
		var visibleProblemRows = [];
		var problemFields      = [];
		for (var i=0; i<rows.length; i++) {
			var currEl = $("#dtOption_" + rows[i]);
			if ($.trim(currEl.val()) === "") {
				var visibleRowNum = generator.getVisibleRowOrderByRowNum(rows[i]);
		var errors = [];
		if (visibleProblemRows.length) {
			errors.push({ els: problemFields, error: LANG.incomplete_fields + " <b>" + visibleProblemRows.join(", ") + "</b>"});
		return errors;

	manager.registerDataType(MODULE_ID, {
		init: _init,
		validate: _validate,
		saveRow: _saveRow,
		loadRow: _loadRow

Now let's go line by line.

  • /*global $:false*/ this first line is for jshint/jslint. In my local environment, I use jshint with strict mode to catch problems. This line just tells the interpreter to ignore the dollar sign. It's a global, used by jQuery.
  • define([
    ], function(manager, C, L, generator) {

    The outer code that wraps the entire JS module is called within requireJS's define function. This ensures the code is defined as an AMD (Asynchronous Module Definition) for consumption by other code. The important thing to understand here is the parameters. The first array params define string labels to other modules: they all map to specific JS files - you can find the mapping in /resources/scripts/requireConfig.js. Each of those discrete modules is in turn passed to the Data Type module via functions in the anonymous section param to define(). Whatever public API those modules reveal are now accessible via the four params: manager, constants, lang, generator.

    When defining your own Data Type module JS file, you'll want to include all four of those params. They all contain useful functionality and data that you'll need.

  • "use strict"; - do it! JS strict mode is never a bad idea. :D
  • Here we're going to skip ahead to the very end of the code, to these lines:
    	manager.registerDataType(MODULE_ID, {
    		init: _init,
    		validate: _validate,
    		saveRow: _saveRow,
    		loadRow: _loadRow

    This chunk of code is required for your Data Type. What it does is register your Data Type with the core. That allows it to listen to published events, publish its own events for other code to listen to, tie into the validation functionality and so on. It's pretty straightforward. The manager.registerDataType() function takes two parameters: the unique MODULE_ID constant, defined above (see below) and an object containing certain required and optional functions, whose property names have special values. Again, more on that below. Now let's go back to the top of the code again.

  • 	/**
    	 * @name AlphaNumeric
    	 * @description JS code for the AlphaNumeric Data Type.
    	 * @see DataType
    	 * @namespace
    	var MODULE_ID = "data-type-AlphaNumeric";
    	var LANG = L.dataTypePlugins.AlphaNumeric;
    • The comment is of a particular format for being understood by JSDoc. For more information on that, see the JS Doc project.
    • The MODULE_ID variable is special. It must always be of the form data-type-[FOLDER NAME]. That acts a unique identifier within the client-side code so the Manager can keep track of who's who.
    • As with the PHP code, the language strings for your Data Type are automatically accessible: you don't have to do any extra work to get access to them. The L function param fed to your Data Type contains all language strings in the system - in whatever language is currently selected. To locate the strings for your own module, just reference it by your Data Type folder name, again: L.dataTypePlugins.[FOLDER NAME]
  • The following lines all define special functions. Rather than explain the implementation details of each of these for the Alphanumeric type, we'll discuss these in a more abstract sense in the next section.

Registration Functions

As explained above, the second parameter of the manager.registerDataType() function is an object containing various predefined functions. This explains what are the properties for that object and what they're used for. Note: all properties are optional, but you'll almost certainly need one or more.

Property Params Returns Explanation
init If this is defined for your Data Type, it gets called on page load prior to any events being published. By "event" I mean a custom published event, which I'll explain more thoroughly in the Pub/Sub section below.
run The run() function gets called for all Data Types and Export Types after their init()'s are called. As such, run() can rely on all subscriptions being in place so events published at this juncture will have an audience.
saveRow rowNum int object When the user saves a Data Set, the Data Generator examines the table and calls the appropriate Data Type's saveRow() method. This method is responsible for determining what information it wants to save for the row. Generally all it does is examine the DOM and extract whatever values the user entered in custom fields that the Data Type field uses. It then returns an object of simple property-value pairs. The row number being passed to this function is the unique row number for the row - it may not be the visual row number seen in the UI. After a row is created, it can be re-ordered. The row number passed to this function can be used for DOM element identification.
loadRow rowNum int
data object
When the user loads a saved data set, the script calls each Data Type's loadRow() function, passing the appropriate row number and whatever data was originally returned by its saveRow() function. The row number should be sufficient information to identify the appropriate elements in the DOM and re-enter the saved information.
validate rows array array

When the user clicks on the Generate button, the core first validates the information they've entered. If a Data Type defines this function, it means they want to confirm the user input for one or more of their custom fields - mostly likely appearing in the Options column. The rows parameter is an array of row numbers that have this Data Set selected. As mentioned above, the row numbers may not be the visual row numbers, because rows may have been added / removed / resorted. However, it can be used to identify the appropriate DOM elements.

This function needs to return an array of errors to display - or an empty array if there are no errors. Each array index is an object of the following form: { els: [], error: "error message here" }. els is an array of DOM elements that have problems with them; error is the error message that will be displayed.

Check out the Alphanumeric Data Type's validate() function above for an example of how this function can work.

Pub/Sub & Event List

As mentioned elsewhere, the client-side code revolves around the idea of publish/subscribe - or pub/sub. Different parts of the script can publish arbitrary events with arbitrary information associated with them, and any module can choose to listen out for particular events and run code when they occur. This is a very elegant pattern: it allow us to keep our modules loosely coupled and reduce the likelihood of introducing dependencies that can break things.

The core script publishes the following script for certain events that occur in the lifetime of the page. They're all found in /resources/scripts/constants.php (returned as JS). You can refer to them in your code via the C parameter, mapping to the constants module. The names are pretty descriptive so I won't bother explaining them any further.


How to subscribe to an event

Generally you'll want to set up your subscriptions in your module's init() function. Here's how it works:


var _init = function() {
	var subscriptions = {};
	subscriptions[C.EVENT.COUNTRIES.CHANGE] = _onChangeCountries;

var _onChangeCountries = function(msg) {


manager.registerDataType(MODULE_ID, {
	init: _init


That would subscribe to the C.EVENT.COUNTRIES.CHANGE event (which is where the user adds/removes a country from the Country List section in the UI) and attaches a callback function - _onChangeCountries(). The manager.subscribe() function can be called at any time in any of your functions, so you can subscribe to events on the fly.

Practical Tips

I thought maybe I'd include this section on how to achieve a few practical things. Let me know if you're stuck on something and maybe I'll expand this section to explain how to do it.

Populating "Example" and "Options" columns

If your Data Type is non-trivial, you'll probably want to include some custom HTML to appear in the Example and Options columns in the generator table. Here's how that works.

First, your PHP class above needs to define the getExampleColumnHTML() and getOptionsColumnHTML() methods. They should return a block of generic markup that the client-side Core code will automatically insert into any row where the user selects your Data Type. Since that same block will be inserted for every row of your Data Type, for anything you need to be unique - e.g. input field names and IDs, include the %ROW% placeholder. When the HTML is inserted into the appropriate locations in the DOM, those placeholders will be replaced by the appropriate row number, thus allowing you to uniquely pinpoint those fields.

Available Resources

There are several client-side code libraries already available in the page that can be used in your Data Type:

  • jQuery ($)
  • jQuery UI
  • MomentJS- date/time formatting script
  • Chosen - dropdown enhancement

You can always include additional libraries should you wish, but do try to namespace them.

Adding your Data Type

When you add a new Data Type, just creating the new files and folders won't get it to show up in the UI. First, you'll need to follow the steps below to make sure your PHP class and JS Module has been created properly, and afterwards you'll need to refresh the UI.

To update the list of available Data Types in the UI, go to the second Settings tab. There, click the Reset Plugins button. A dialog will appears which resets all the available plugins (don't worry, this won't cause any problems with saved content or anything like that). After refreshing the page, you should see your Data Type appear in the Data Type dropdowns in the generator.

How to Contribute

If you feel that your Data Type could be of use to other people, send it our way! I'd love to take a look at it, and maybe even include it in the core script for others to download. Read the How to Contribute page.