Glossary

A

API, Application Programming Interface: A set of public types, methods, properties and other elements exposed by a library intended to be used by external software.
Arbitrary data: One of the data kinds natively supported by Data Conveyer. A kind of text data where elements (fields) are "cherry-picked" using regular expressions. This may be useful in situations where row structure is complex requiring custom rules to identify fields. Arbitrary data belongs to a textual data category.
See Also: Delimited data, Fixed width data, Keyword data, Raw data, Textual data

C

Canonical data

Canonical data (or Uniform data) is the internal data format used by Data Conveyer during transformations. During intake and output processing Data Conveyer translates data rows of a given kind (such as Delimited or Keyword data) to/from canonical data.

Canonical data is structured in a 3 tier hierarchy: cluster, record and item. One cluster contains a collection of (zero to many) records and one record contains a collection of (zero to many) items.

See Also: Cluster, Record, Item, Uniform data

Cluster

A collection of related records. For example, records related to a family, such as individual members, address records, employment data, etc. can be grouped into a cluster.

A cluster is the top tier in the canonical data hierarchy.

See Also: Canonical data, Record, Item, Clusterbound transformation

Clusterbound transformation

A scenario where transformation is performed one cluster at a time. So, a clustrerbound transformer function accepts a cluster as an input parameter and returns a cluster upon return.

D

Data row: A data row (or simply row) refers to a single line of data on intake or output. This is in contrast to a record, which is a part of canonical data.
See Also: Record, Canonical data
Delimited data, Separated values: One of the data kinds natively supported by Data Conveyer. A kind of text data where element values (fields) are separated by a given character, such as a comma. For example, dimensions of a 120 by 45 rectangle can be expressed as "120,45" (without the quotes). Delimited data belongs to a textual data category.
See Also: Arbitrary data, Fixed width data, Keyword data, Raw data, Textual data
DLL, Dynamic link library: An industry standard file type that contains class libraries, for example DataConveyer.dll.

E

External line: A a unit of data sent to or received from Data Conveyer. Due to a variety of supported data kinds, Data Conveyer employs different mechanisms to read and write data. External line allows these mechanisms to be treated using a common pattern. In functional programming lingo, such common pattern is called a discriminated union.
See Also: Record-centric data, Textual data

F

Filter transformation

A scenario where transformation does not change the contents of a transformed entity (cluster or record), but instead evaluates the entity and possibly removes it. So, a filter predicate function accepts an entity (cluster or record) as an input parameter and returns true (to retain the entity) or false (to remove the entity from the output).

Note that there are two separate filter predicate functions: ClusterFilterPredicate and RecordFilterPredicate.

Fixed width data, Flat file

One of the data kinds natively supported by Data Conveyer. A kind of text data where each element (field) is assigned a fixed set of character positions in the record. For example, if both length and width fields occupy 3 character positions, then dimensions of a 120 by 45 rectangle can be expressed as "120 45" (without the quotes). Fixed width data belongs to a textual data category.

See Also: Arbitrary data, Delimited data, Keyword data, Raw data, Textual data

Foot cluster

A special empty cluster that can optionally be added during intake after the last cluster created from intake records. Data Conveyer guarantees that in transformation phase the foot cluster gets processed after all other clusters (regardless of ConcurrencyLevel setting).

The foot cluster can be used during the transformation phase to add records containing data aggregated during processing of prior clusters.

G

Global cache

Global cache is a central, thread-safe repository of arbitrary key value pairs. Unlike trace bin or property bin objects, which are attached to individual records (or clusters), there is only a single global cache object. Elements of global cache are available throughout all phases of Data Conveyer processing.

In addition, global cache allows signaling to synchronize threads of Data Conveyer processing.

For example, global cache may be used to aggregate data during transformation processing, such as counting records that meet certain conditions. Due to a multi-threaded nature of Data Conveyer, special care needs to be taken to assure thread-safety when manipulating global cache elements.

H

Head cluster: A special empty cluster that can optionally be added during intake before the first cluster created from intake records.
See Also: Foot cluster

I

Item

An item is the bottom tier of the canonical data hierarchy. It represents an element (field) within a record. For example, a member record can contain items such as FirstName, LastName or DateOfBirth.

Items in Data Conveyer are typed, i.e. they have Type property; for example, DateOfBirth property might be of DateTime type. The default item type is string.

See Also: Void item, Canonical data, Cluster, Record

J

JSON data: One of the data kinds natively supported by Data Conveyer. JavaScript Object Notation (JSON) is a common data interchange format defined in ECMA-404 and related standards. JSON data kind, like XML, requires data to be in a "tabular" format with explict record and cluster definitions. Alternatively, unbound JSON data kind can be used to translate JSON data into canonical format. JSON data belongs to a record-centric data category.
See Also: Unbound JSON data, XML data, Record-centric data, Canonical data

K

Key value pairs, Keyword data: One of the data kinds natively supported by Data Conveyer. A kind of text data where each element (field) consists of 2 parts separated by an equal sign: a key (field name) and a value; elements in turn are separated by commas. For example, dimensions of a 120 by 45 rectangle can be expressed as "length=120,width=45" (without the quotes). Keyword data belongs to a textual data category.
See Also: Arbitrary data, Delimited data, Fixed width data, Raw data, Textual data

O

Orchestrator: A key concept and the main component in processing data by Data Conveyer. Upon creation, the orchestrator collects settings defining details of the entire process. The settings also involve functions, i.e. executable code. During execution, the orchestrator manages component invocations, effectively combining the native and custom code into a seamless and unique transformation process.
See Also: Global cache, Trace bin

P

Property bin

Property bin is a container of key value pairs that can be attached to records and clusters throughout Data Conveyer processing. It allows passing of arbitrary data from one event (function) to another.

For example, there may be a common calculation formula required as a part of both transformer as well as router processing. In the event that no result of such calculation is kept in the record or cluster, a property bin may be used to carry such data across the processing phases and thus avoiding repetitive calculation.

R

Raw data: One of the data kinds natively supported by Data Conveyer. A kind of text data which is not parsed by Data Conveyer and every record is considered to have a single field with the entire contents of a data row. Raw data belongs to a textual data category.
See Also: Arbitrary data, Delimited data, Fixed width data, Keyword data, Textual data
Record: A record is a defined as a middle tier in the canonical data hierarchy. For example, there may be multiple records in a family cluster, where each member represents an individual family member.
See Also: Canonical data, Cluster, Item, Recordbound transformation
Recordbound transformation: A scenario where transformation is performed one record at a time. So, a recordbound transformer function accepts a record as an input parameter and returns a record upon return.
See Also: Clusterbound transformation, Filter transformation
Record-centric data: A category of data kinds, where a unit of data sent to or received from Data Conveyer is a record expressed by a sequence of key-value pairs. Examples of this category are XML or JSON data.
See Also: Textual data, External line, XML data

T

Textual data

A category of data kinds, where a unit of data sent to or received from Data Conveyer is a line of text terminated by a line terminator, such as CR/LF on Windows. Most, but not all, data kinds supported by Data Conveyer belong to this category.

Trace bin

Trace bin is a container that can be attached to each record on intake, thus allowing to pass arbitrary data elements from one record to another. In other words, it is a Data Conveyer's mechanism to manage state on intake, making it possible for subsequent records to refer to data from prior records.

For example, when processing X12 data, some elements of the interchange envelope (such as ISA06 - Submitter ID) may be required in handling subsequent segments. Trace bin allows such elements to be available when needed.

Trace bin elements can only be set during intake processing (using RecordInitiator function), but they are accessible in read-only mode throughout the Data Conveyer processing.

U

Unbound JSON data: One of the data kinds natively supported by Data Conveyer. Like JSON data kind, it is used to read and write data in the standard JavaScript Object Notation (JSON) format. Unlike JSON data kind, it does not require data to be tabular. Instead, unbound JSON converts hierarchical JSON data to and from canonical data by using a special convention where field names define nesting of corresponding JSON values via dot notation. Specifically, field names reflect the Path property JsonReader and JsonWriter as defined by Newtonsoft's Json.NET. JSON data belongs to a record-centric data category.
See Also: JSON data, Record-centric data, Canonical data
Uniform data: Same as canonical data.
See Also: Canonical data
Untyped record: A record that is unaware of types of the items it contains. Properties of untyped records that return item values (e.g. indexers), always return string values. This is unlike (regular) records, where such properties return strongly typed values. Untyped records can be casted to and from (regular) records.
See Also: Record, Item

V

Void item: A void item is an abstraction used by Data Conveyer to express "non-existing" items. In general, attempts to obtain an item for an absent key in the record result in void items. Void items hold default values for their respective types (such as null in case of a string). Therefore, their behavior is typically intuitive requiring no special handling.
See Also: Item

X

X12 data: One of the data kinds natively supported by Data Conveyer. A kind of text data used for electronic data interchange according to the ANSI ASC X12 standards. Data Conveyer parses X12 data so that each X12 segment becomes a record and each X12 element becomes an item, while X12 transaction typically becomes a cluster. At present, X12 data belongs to a textual data category.
See Also: Textual data, Record, Cluster
XML data: One of the data kinds natively supported by Data Conveyer. Extensible Markup Language (XML) is defined in W3C's XML 1.0 specification. To facilitate translation into canonical data, XML data needs to be in a "tabular" format where elements, such as records and clusters, are unambiguously defined. XML data belongs to a record-centric data category.
See Also: JSON data, Record-centric data, Canonical data

Glossary

Terms related to Data Conveyer

A

C

D

E

F

G

H

I

J

K

O

P

R

T

U

V

X