Click or drag to resize

OrchestratorConfig Properties

The OrchestratorConfig type exposes the following members.

Properties
  NameDescription
Public propertyActionOnDuplicateKey
Action to take in case the key is repeated on a single record: IgnoreItem, ExcludeRecord or AssignDefaultKey. Default action is IgnoreItem.
Public propertyAllowOnTheFlyInputFields
False (default) means no new fields beyond those defined in InputFields (or 1st row if applicable) are allowed. Any additional fields encountered during intake processing will be excluded. True means that new fields can be added on the fly in subsequent data records; their names and types are determined according to standard rules. In case of Keyword data, default setting (false) requires all fields (keys) to be listed in the InputFields setting (keys not listed there will be excluded, so in absence of InputFields no fields will be present). True setting allows inclusion of all fields for Keyword data. This setting is ignored in case of the following data kinds on intake:
  • Flat - Flat data fields are determined up-front from either InputFields setting or the first row.
  • Arbitrary - Arbitrary data can only be determined from the ArbitraryInputDefs setting.
  • X12 - In case of X12 data, fields are added dynamically from X12 segments, so that this setting is assumed to be true.
Public propertyAllowTransformToAlterFields
If true, fields (items) can be added to and removed from records during transformations. If false (default), no fields can be added/removed during transformations (although field values can still be update and records/clusters can be cloned).
Public propertyAppendFootCluster
True causes Data Conveyer to add an extra empty cluster (so called foot cluster) after the last cluster formed from intake records. False (default) means no foot cluster will be added at the end of intake. The foot cluster always has a StartRecNo of -1 (FootClusterRecNo) and StartSourceNo of 1. If trailer contents is present, it will follow the foot cluster on output.
Public propertyArbitraryInputDefs
Array of elements that define fields to be extracted from input lines. Each element is a string consisting of 2 parts separated by a space: a field name followed by a regular expression containing formula to extract a value from input line. Data Conveyer will extract the fields in the order they are specified in this setting. This setting is mandatory for InputDataKind of Arbitrary; if specified for other data kinds, it is ignored.
Public propertyArbitraryOutputDefs
Array of strings that define data to be placed on output lines. Each string contains a fragment of the output line and may contain a single token in a form of {field}, where such token will be substituted by the actual value of the field. In order to include brace characters in the output, they need to be escaped (preceded by a backslash), like so: \{ and \}. Note that backslash characters inside C# string literals may need to be escaped themselves. For example: "\\{ "Format":"JSON" \\}" (as well as @"\{ ""Format"":""JSON"" \}") will output { "Format": "JSON" }. Data fragments defined by array elements are spliced together to form output lines. This setting is mandatory for OutputDataKind of Arbitrary; if specified for other data kinds, it is ignored.
Public propertyAsyncIntake
True will cause Data Conveyer to perform intake operations asynchronously (e.g. it will call AsyncTextIntakeSupplier or AsyncIntakeSupplier function if provided, not the TextIntakeSupplier or IntakeSupplier function). False (default) will cause Data Conveyer to perform intake operations synchronously, in which case the TextIntakeSupplier or IntakeSupplier function is called (if provided). Note that Data Conveyer performs such synchronous intake on a dedicated thread to prevent a possible deadlock condition.
Public propertyAsyncIntakeSupplier
An asynchronous function that supplies a task with (a promise of) a tuple containing an ExternalLine object (an input line to be fed into Data Conveyer) and the source number (1-based) (Func<IGlobalCache, Task<Tuple<string,int>>>). Data Conveyer calls this function when AsyncIntake is true (providing a single parameter - IGlobalCache) in succession until null is received. When end of data is reached (i.e. intake sources are depleted), the function must return a task with null result (end of data mark), which will initiate processing shutdown with a status of IntakeDepleted. Note the difference between null tuples (end of data marks) and tuples containing null strings (ignored by Data Conveyer). Any returned tuples that contain null ExternalLine objects are ignored by Data Conveyer; to send an empty line, a tuple containing an empty object (e.g. an empty string) needs to be returned. If not defined, Data Conveyer assumes default AsyncIntakeSupplier function that returns no data, but a task with an end of data result, i.e.:
gc => Task.FromResult<Tuple<string, int>>(null)
Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
Public propertyAsyncOutput
True will cause Data Conveyer to perform output operations asynchronously (i.e. it will call AsyncTextOutputConsumer or AsyncOutputConsumer function if provided, not the TextOutputConsumer or OutputConsumer function). False (default) to perform intake operations synchronously, in which case the TextOutputConsumer or OutputConsumer function is called (if provided).
Public propertyAsyncOutputConsumer
An asynchronous action (Task returning function) that consumes a tuple containing a ExternalLine object (a single output line received from Data Conveyer) and the target number (Func<Tuple<string, int>, Task>). Reference to global cache (IGlobalCache) is passed as the second parameter. The function is intended for use in case of a long-running operation to consume the output line, e.g. await LongRunningOperation(..). Data Conveyer calls this action (when AsyncOutput is true) in succession passing (tuples with) the output lines one at a time. The last tuple sent by Data Conveyer is always null (end of data mark). If not defined, Data Conveyer assumes default AsyncOutputConsumer action that ignores data passed to it, i.e.:
(tpl, gc) => Task.CompletedTask
Any exception thrown by this action will cause the process shutdown with a CompletionStatus of Failed.
Public propertyAsyncTextIntakeSupplier
An asynchronous function that supplies a task promising contents of a single text line to be fed into Data Conveyer. (Func<Task<string>>). When end of data is reached, the function must return null (end of data mark). The AsyncTextIntakeSupplier function is a simplified version of the AsyncIntakeSupplier function where the following restrictions apply:
  • Input data kind is textual such as Delimited, Flat, etc. (and not for example XML).
  • All lines are associated with a single source.
  • No access to global cache.
Public propertyAsyncTextOutputConsumer
An asynchronous action (Task returning function) that consumes a single line of text received from Data Conveyer. (Func<string, Task>). The last line sent by Data Conveyer is always null (end of data mark). The AsyncTextOutputConsumer action is a simplified version of the AsyncOutputConsumer action where the following restrictions apply:
  • Output data kind is textual such as Delimited, Flat, etc. (and not for example XML).
  • All lines are sent to a single target (regardless of the target determined by the ClusterRouter function).
  • No access to global cache.
Public propertyBufferSize
Capacity of a transformer buffer defined as a maximum number of clusters to be held unprocessed by Data Conveyer. This setting can be used to control memory consumption of Data Conveyer. Default value is -1, which means no limit.
Public propertyCloseLoggerOnDispose

True (default) causes the logger to close (stop logging) when the orchestrator object is disposed. False keeps the logger open beyond the lifespan of the orchestrator object, which may be helpful in certain troubleshooting scenarios (e.g. to avoid System.ObjectDisposedException: Cannot write to a closed TextWriter when using LogFile logger). It is recommended that the CloseLoggerOnDispose remains at its default value of true, except for troubleshooting scenarios.

The CloseLoggerOnDispose setting also affects the behavior of the SaveConfig(String) method in case an error occurs: if true, then the error gets logged and the logger is closed (stops logging); if false, then the the error gets logged and the logger remains open.

Public propertyClusterboundTransformer
A function that takes a single cluster and returns a single cluster; specific to Clusterbound transformer type. In case the function returns null (Nothing in Visual Basic), the cluster will be filtered out. This makes the and ClusterFilterPredicate a special case of and ClusterboundTransformer where unfiltered clusters are passed through. If not supplied, a default pass-through function is used that passes input cluster to output, i.e.:
clstr => clstr
Any exception thrown by this function will cause the process shutdown (CompletionStatus of Failed).
Public propertyClusterFilterPredicate
A predicate (boolean function) that takes a single cluster and returns true to accept the cluster or false to reject it; specific to ClusterFilter transformer type. If not supplied, a default pass-through predicate is used that always returns true, i.e.:
clstr => true
Any exception thrown by this function will cause the process shutdown (CompletionStatus of Failed).
Public propertyClusterMarker
A predicate (boolean function) to identify records that cause cluster splits (either starting a new cluster or ending a cluster depending on the MarkerStartsCluster setting). It accepts 3 parameters: the current record, the previous record and the counter of records accumulated so far in the current cluster. Note that the previous record is null for the first record. If the predicate returns true, it causes cluster split (before or after the current record depending on MarkerStartsCluster value); returning false causes continuation of record accumulation into the current cluster. In case this function marks the first record of a cluster (i.e. MarkerStartsCluster is true), then the current record starts a new set of records accumulated for a cluster; otherwise (MarkerStartsCluster is false, i.e. the function marks the last record of a cluster), the current record ends the set of records accumulated for a cluster, and a new empty set of accumulated records is created. If not specified, the default function will split clusters on every record, with one notable exception as follows: In case of XML/JSON intake (XML, JSON or UnboundJSON) and the XmlJsonIntakeSettings setting directing Data Conveyer to automatically detect clusters (ClusterNode or DetectClusters respectively), then Data Conveyer will respect clusters detected from XML/JSON intake. Here are is the default ClusterMarker function:
(r, pr, i) => r.ClstrNo == 0 || pr == null || r.ClstrNo != pr.ClstrNo;
Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
Note Note

An astute reader may be asking what will happen when the automatic cluster detection during XML/JSON intake coexists with explicit (non-default) ClusterMarker and MarkerStartsCluster settings. In this advanced scenario, the ClusterMarker function can override the cluster assignment made by the XML/JSON intake. The ClstrNo property of the IRecord (either current or previous) can be used in a logic to decide whether to return true (new cluster) or false (continue with old cluster).

Public propertyClusterRouter
A function to determine the output target for every record in a given cluster. It receives an output cluster and returns TargetNo to be assigned to every record of the cluster (Func<ICluster, int>). This function is specific to PerCluster router type; it is ignored for other router types. If not supplied, a default function that returns 1 for every cluster is assumed, i.e.:
clstr => 1
Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
Public propertyClusterSyncInterval
This advanced setting specifies number of milliseconds DataConveyer awaits when synchronizing processing of head and foot clusters. Note that regardless of ConcurrencyLevel setting, DataConveyer guarantees that the head cluster (if present) will be processed before all other clusters and that the foot cluster (if present) will be processed after all other clusters. The order of processing remaining ("regular") clusters is not guaranteed. The default setting of 40 ms is suitable for most scenarios.
Public propertyConcurrencyLevel
Degree of parallelism during transformation phase; defined as a maximum number of engines performing transformation. If not supplied, a default value of 1 is used.
Public propertyConfigName
Name of this configuration in case it was created from a .cfg (and optional .dll) file(s) via RestoreConfig method; otherwise null.
Public propertyDataConveyerInfo
Information on current version of Data Conveyer.
Public propertyDefaultInputFieldWidth
Default width of fields in flat input data; applicable only if InputDataKind is Flat, ignored otherwise. If not specified, a default width of 10 is assumed.
Public propertyDefaultOutputFieldWidth
Default width of fields in the flat output data; applicable only if OutputDataKind is Flat, ignored otherwise. If not specified, a default width of 10 is assumed.
Public propertyDefaultX12FieldDelimiter
Default value of the field delimiter - applicable only to X12 data. On X12 intake, field delimiter is always determined from ISA segment, so this setting only applies in case ISA segment is absent (incomplete X12 envelope). On X12 output, in absence of this setting, field delimiter is determined from intake if also X12 (specifically, from the first ISA segment encountered), or a default value of '*' is assumed.
Public propertyDefaultX12SegmentDelimiter
Default value of the segment delimiter - applicable only to X12 data. On X12 intake, segment delimiter is always determined from ISA segment, so this setting only applies in case ISA segment is absent (incomplete X12 envelope). On X12 output, in absence of this setting, segment delimiter is determined from intake if also X12 (specifically, from the first ISA segment encountered), or a default value of '~' is assumed. May contain multiple characters to make output easier to read, such as "~\r\n".
Public propertyDeferOutput
Defines when Data Conveyer is allowed to start the Transformation phase. This setting should be left at its default value of Auto. The value of Indefinitely is restricted for use in specialized tests only!
Public propertyDeferTransformation
Defines when Data Conveyer is allowed to start the Transformation phase. One of: Note that after all records are read, the Intake phase still continues execution by executing record clustering. However, during the clustering process Data Conveyer has no ability to trigger the start of the Transformation phase. Similarly, the ClusterMarker function (unlike the RecordInitiator function ) has no ability to set the trace bin contents.
Public propertyEagerInitialization
If true, all initializations are executed regardless if prior initializations failed or not. So, for example OutputInitializer gets executed even after a failure of IntakeInitializer. This may result in unwanted reset of output (such as erasure of prior data) even if the process could not start due to a problem initializing intake. If false (default), any initialization failure (e.g. in IntakeInitializer) will prevent execution of subsequent initializations (such as OutputInitializer ). In this case, troubleshooting of output initialization is only possible after successful intake initialization.
Public propertyErrorOccurredHandler
Handler of the ErrorOccurred event. Data Conveyer calls this function (regardless of the ReportProgress setting) when an exception thrown during processing (for example in the caller supplied code) is unhandled. This handler is intended for troubleshooting purposes. It occurs immediately before the process completes with the status of Failed and provides the last chance to identify the reason of (but not recovery from) the failure. Also note that any exceptions thrown by event handlers are discarded by Data Conveyer.
Public propertyExcludeExtraneousFields

In case of Delimited or Flat output data, this setting, when true, causes Data Conveyer to remove trailing, insignificant fields, i.e. fields with empty values (in addition, trailing spaces on the last field are removed); if false (default), then all fields are always included on output (in their entirety).

For Keyword output data, the setting is only applicable when OutputFields setting is specified. If true, only those fields that are both: specified in OutputFields and also present in the actual records will be included on output. If false (default), all fields will always be included on output (empty values assumed for fields absent from the actual records).

Public propertyExcludeItemsMissingPrefix
True will cause exclusion of items (fields) with keys not matching the InputKeyPrefix; false (default) will include such fields (with keys in their entirety). Use caution when assigning true to this setting, as it may result in records with no contents (when none of the fields match prefix specified in InputKeyPrefix setting).
Public propertyExplicitTypeDefinitions

This setting defines data types used in internal representations of record fields. It contains a set of keys (field names) along with their corresponding data types and optional formats.

The setting is in the form of a comma delimited list of items, each item being a pipe delimited triplet: fldName|type|format, where type is one of: S=String, M=Decimal, D=DateTime, I=Integer, B=Boolean; and format is the format string used when formatting output. Format can be omitted with the preceding pipe, in which case no formatting will take place. Format has no relevance on intake. In addition, with one exception, format is ignored in JSON output, because JSON supports writing elements in their native data types. The exception is for fields of DateTime type, which are not supported in JSON. DateTime fields are converted to strings (respecting format) before being submitted to JSON output.

Example:"AMOUNT|M,BIRTH_DATE|D|M/d/yyyy,AGE|I"

Sequence of the elements in this setting is irrelevant.

Types of the fields not specified in ExplicitTypeDefinitions are determined by the TypeDefiner function, which by default assumes string type and no format for all fields.

Note Note

In case of JSON or UnboundJSON intake, the elements are parsed according to JSON specifications, which involves automatic data type determination. However, in order to preserve this type, the corresponding field must be of a matching type. Otherwise, the type conversion will be performed.

For example, a JSON element "ID": 5 will be parsed as number 5. So, in order to avoid type conversion, a setting of ID|I is needed (otherwise, number 5 will be converted to string "5"). Similarly, an element "ID": "5" will be parsed as string "5". Therefore, no type conversion will take place in absense of type definition for field "ID" (fields are strings by default); if however setting ID|I is present, then string "5" will be converted into number 5.

Public propertyGlobalCacheElements

A list of definitions of the elements to be held in the global cache (IGlobalCache). Global cache is a central repository of arbitrary key value pairs maintained throughout the Data Conveyer process. Each array element defines a single global cache element. The definition consists of the key optionally followed by a pipe symbol (|) and the initial value to be placed in global cache. Element keys are of string type (only letters and numbers are recommended). The type of each element depends on the element value: if specified after the pipe symbol, it can be one of: int, DataTime, decimal or string. In case no pipe symbol is present in element definition, the element will be of type object with a null value. Note that element values (and types as well) can be changed by the ReplaceValueTIn, TOut(String, FuncTIn, TOut) method.

Rules for determining the element type depend on the value placed after the pipe symbol (|):

  • Any unquoted number without decimal point or commas will be of an int type. Int examples: "Int|0", "Val2|-15".
  • An unquoted number with decimal point will be of a decimal type. Decimal examples: "Dec|0.", "AnotherDec|-3.5".
  • A string in a valid date format will be of a DateTime type. DateTime examples: "Date|12/31/2012", Val2|12-DEC-12".
  • Any quoted string or string not fitting other types will be of a string type. Strings can contain any characters including pipe symbols and quotes if quoted, i.e. staring with a quote (ending quote is optional). String examples (note escaped inner quotes): "Str|abc", "Val2|\"0\"", "Val3|\"0", "Val4|\"\"" (empty string), "Val5|" (empty string)).
  • In absence of the pipe symbol, a null object is assumed, e.g. "Val1" (null).

Note that ExplicitTypeDefinitions or TypeDefiner settings have no meaning in determining the type of the global cache elements.

Also note that this settings only defines the elements of the global cache, and not the signals, which are simply referred to in RaiseSignal(String), AwaitSignal(String) and AwaitSignalAsync(String) methods.

Public propertyHeadersInFirstInputRow
True means 1st input line contains field names and data starts from the 2nd line; false (default) means data starts from the 1st line (default field names are assigned in this case). This setting applies only to data kinds that support header rows, such as Delimited or Flat; otherwise, it is ignored, in which case data always starts from the 1st line.
Public propertyHeadersInFirstOutputRow
True means 1st line sent to output will contain field names and data will start on the 2nd line; false (default) means no field names are sent to output and data starts on the 1st line. This setting applies only to data kinds that support header rows, such as Delimited or Flat; otherwise, it is ignored, in which case data always starts on the 1st line.
Public propertyInputDataKind
Type (format) of input data: Raw, Keyword, Delimited, Flat, Arbitrary, XML, JSON, UnboundJSON or X12. Default is Raw. These values are intended for future use: HL7 and Ultimate.
Public propertyInputFields

Comma delimited list of fields as they appear on intake lines. Each field is defined by a name and width, separated by a pipe (|) symbol. Field widths are only applicable to Flat (fixed width) data; they are ignored for other kinds of input data. Where omitted or invalid, a default width (DefaultInputFieldWidth) is assumed. Field names specified in this setting take precedence over those in the first row headers (Delimited and Flat data), so in case of HeadersInFirstInputRow=true, the 1st row data may get discarded.

If a field name is omitted, a default name will be used (either from the header row or from a formula, which yields Fldnnn, where nnn is the field sequence number; exception is X12 data, where fields are named Segment, Elem001, Elem002, ...). This setting, when accompanied by AllowOnTheFlyInputFields of false, can be used to only accept those fields specified and exclude all other fields from intake (e.g. in case of Keyword data).

Examples:

  • |10,|4,|12 - (flat data, records 26 character long, field names in header row or default names)
  • Seq#|5, First Name|12,Mid Init|1, Last Name or Company|20, Zip Code|5 - (flat data, records 43 character long)
  • Seq#,Name,Description - (keyword data, exclude all fields from intake except for the 3 specified)
Public propertyInputFieldSeparator
Character that separates fields on intake lines. Only applicable to Delimited and Keyword data, ignored in case of other data kinds. If not specified, a comma is assumed.
Public propertyInputFileName

A synonym for the InputFileNames setting (to make it more intuitive in case of a single input file).

InputFileName and InputFileNames settings should not be used both at the same time.

Public propertyInputFileNames

Name(s) of (i.e. path(s) to) the input file(s). This setting is the same as (synonym for) the InputFileName setting. If multiple files are specified, the names are separated by pipe symbols(|). Each name can be surrounded by double quotes. The first file is assigned SourceNo=1, the second SourceNo=2, etc. The files will be read one after another (in SourceNo order) either synchronously or asynchronously, depending on AsyncIntake setting. Ignored if TextIntakeSupplier or IntakeReaders setting (or one of their equivalent settings) is also submitted.

InputFileName and InputFileNames settings should not be used both at the same time.

Public propertyInputHeadersRepeated
Relevant only if multiple intake sources are present and HeadersInFirstInputRow is true. True (default) means that all intake sources contain the header row; in this case Data Conveyer will read the headers from the first source that supplied intake data and will ignore the header rows from the remaining sources. False means that only the source that supplies the intake data first contains the header row (typically the source with SourceNo=1, but it can actually be any source); the remaining sources are assumed to only contain data rows (regardless of the HeadersInFirstInputRow setting). In either case, the only header row actually considered is the one from the source that supplies intake first. Note that all intake sources are subjected to the same set of processing rules, which implies that the same header row is applicable to all sources (hence the header row should be identical for all sources).
Public propertyInputKeyPrefix
Prefix to be trimmed (removed) from every key on input, if any (e.g. @p). Default value is null (no prefix to be removed).
Public propertyIntakeBufferFactor
Ratio between sizes of the intake buffer and the transformer's input buffer (the latter is defined by the BufferSize setting). This advanced setting allows fine-tuning of memory consumption by Data Conveyer. For example, if an average cluster on intake is expected to be created from 4 intake records, then it may be sensible to set intake buffer size to 400 records and transformer's input buffer size to 100 clusters. In this case, BufferSize = 100 and IntakeBufferFactor = 4.0. Default value for IntakeBufferFactor is 1.5. This setting is respected only if BufferSize is set to a positive value (i.e. ignored in case of BufferSize's default value of -1 (Unlimited)).
Public propertyIntakeDisposer
An action intended to dispose any intake sources and related resources that were opened by the IntakeInitializer function. Data Conveyer calls this action a single time after completing orchestrator processing (when the orchestrator is disposed). However, if TextIntakeSupplier or AsyncTextIntakeSupplier function is not defined; then this action is not called. The IntakeDisposer action accepts a single parameter (IGlobalCache) and returns void (Action<IGlobalCache>). If not defined, Data Conveyer assumes empty action, i.e.:
gc => { }
Any exception thrown by this action will be logged, but otherwise ignored.
Public propertyIntakeInitializer
A function intended to initialize any intake sources (such as open files or database connections) and related resources that may be needed by the TextIntakeSupplier or AsyncTextIntakeSupplier function. Data Conveyer calls this function a single time before starting the actual orchestrator processing, such as invocations of TextIntakeSupplier function, if one is defined. However, if TextIntakeSupplier or AsyncTextIntakeSupplier function is not defined; then this function is not called. The IntakeInitializer function accepts a single parameter (IGlobalCache) and returns a string (Func<IGlobalCache, string>). Null returned value means successful initialization; otherwise, an error message indicating reason for the failure is expected to be returned (Data Conveyer will log this message). In case of failure, the processing will end in the InitializationError status. If not defined, default IntakeInitializer simply returns null, i.e.:
gc => null
Public propertyIntakeReader
A function returning a text reader object to supply data into the Data Conveyer intake process. This setting is equivalent to IntakeReaders setting containing a single reader. IntakeReader and IntakeReaders settings should not be used both at the same time.
Public propertyIntakeReaders
A function returning a collection of text reader objects to supply data into the Data Conveyer intake process. Each reader corresponds to a single intake source; the first reader is assigned SourceNo=1, the second SourceNo=2, etc. Data will be read from one reader after another, in SourceNo order, either synchronously or asynchronously, depending on AsyncIntake setting. Ignored if the TextIntakeSupplier function (or applicable equivalent function) is also submitted.
Public propertyIntakeRecordLimit
Maximum number of intake records allowed for the process execution. Default value is -1, i.e.:
Unlimited
Public propertyIntakeSupplier
A function that supplies a tuple containing a ExternalLine object (an input line to be fed into Data Conveyer) and the source number (1-based) (Func<IGlobalCache, Tuple<string,int>>). Data Conveyer calls this function when AsyncIntake is false (providing a single parameter - IGlobalCache) in succession until null is received. When end of data is reached (i.e. all intake sources are depleted), the function must return null (end of data mark), which will initiate processing shutdown with a status of IntakeDepleted. Note the difference between null tuples (end of data marks) and tuples containing null ExternalLine objects (ignored by Data Conveyer). Any returned tuples that contain null ExternalLine objects are ignored by Data Conveyer; to send an empty line, a tuple containing an empty object (e.g. an empty string) needs to be returned. If not defined, Data Conveyer assumes default IntakeSupplier function that returns no data, but only the end of data mark, i.e.:
gc => null
This function is called on a dedicated thread to prevent possible deadlocks in situations, such as UI updates made during progress changes. Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
Public propertyLeaderContents
Data, if any, to be sent to output before the first output line; may be multi-line. In case the header row is also present, then it will follow the leader contents.
Public propertyMarkerStartsCluster
True (default) means that ClusterMarker predicate matches the first record of a cluster (i.e. starts a cluster); false means that predicate matches the last record of a cluster (i.e. ends a cluster). As a general rule, true value is recommended (marker starts a cluster), especially in cases where ClusterMarker predicate relies on the previous record contents or the number of records accumulated so far; false value (marker ends a cluster) should only be used in cases where ClusterMarker predicate relies solely on the current record contents.
Public propertyOutputBufferFactor
Ratio between sizes of the output buffer and the transformer's input buffer (transformer's input buffer size is defined by the BufferSize setting). This advanced setting allows fine-tuning of memory consumption by Data Conveyer. For example, if an average cluster on output is expected to produce 5 output records (and also on average every other cluster to be removed by the transformer), it may be sensible to set the transformer's input buffer size to 100 clusters (transformer's output buffer size to 50 clusters) and the output buffer size to 250 clusters. In this case, BufferSize = 100 and OutputBufferFactor = 2.5. Default value for IntakeBufferFactor is 1.5. This setting is respected only if BufferSize is set to a positive value (i.e. ignored in case of BufferSize's default value of -1 (Unlimited)).
Public propertyOutputConsumer
An action (void function) that consumes a tuple containing a ExternalLine object (a single output line received from Data Conveyer) and the target number (1-based). (Action<Tuple<string, int>>). Reference to global cache (IGlobalCache is passed as the second parameter. Data Conveyer calls this action (when AsyncOutput is false) in succession passing (tuples with) the output lines one at a time. The last tuple sent by Data Conveyer is always null (end of data mark). If not defined, Data Conveyer assumes default OutputConsumer action that ignores data passed to it, i.e.:
(tpl, gc) => { }
Any exception thrown by this action will cause the process shutdown with a CompletionStatus of Failed.
Public propertyOutputDataKind
Type (format) of output data: Raw, Keyword, Delimited, Flat, Arbitrary, XML, JSON, UnboundJSON or X12. Default is Raw. HL7 and Ultimate types are designated for future use.
Public propertyOutputDisposer
An action intended to dispose any output targets and related resources that were opened by the OutputInitializer function. Data Conveyer calls this action a single time after completing orchestrator processing (when the orchestrator is disposed). However, if TextOutputConsumer or AsyncTextOutputConsumer action is not defined; then this function is not called. The OutputDisposer action accepts a single parameter (IGlobalCache) and returns void (Action<IGlobalCache>). If not supplied, Data Conveyer assumes empty action, i.e.:
gc => { }
Any exception thrown by this function will be logged, but otherwise ignored.
Public propertyOutputFields

Comma delimited list of fields to appear on output lines. Each field is defined by a name and width, separated by a pipe (|) symbol. Once this setting is specified, only those fields specified can be included in output.

This setting is optional. However, it can only be omitted in its entirety, in which case all fields produced by transformation will be included in output. If the setting is present, then each field name must be specified (no default names can be assumed [unlike InputFields]). If a name of a non-existing field is specified, then empty contents is sent to output for such field.

Field widths are only applicable to Flat (fixed width) data output; they are ignored for other kinds of output data. Where omitted or invalid, a default width (DefaultOutputFieldWidth) is assumed.

Public propertyOutputFieldSeparator
Character that separates fields on output lines. Only applicable to Delimited and Keyword data, ignored in case of other data kinds. If not specified, a comma is assumed.
Public propertyOutputFileName

A synonym for the OutputFileNames setting (to make it more intuitive in case of a single output file).

OutputFileName and OutputFileNames settings should not be used both at the same time.

///
Public propertyOutputFileNames

Name(s) of (path(s) to) the output file(s). This setting is the same as (synonym for) the OutputFileName setting.

If multiple files are specified, the names are separated by pipe symbols(|). Each name can be surrounded by double quotes. The first file is assigned TargetNo=1, the second TargetNo=2, etc. Number of files specified here must be equal the highest TargetNo returned by the ClusterRouter function. Ignored if TextOutputConsumer or OutputWriters setting (or one of their equivalent settings) is also submitted.

OutputFileName and OutputFileNames settings should not be used both at the same time.

Public propertyOutputInitializer
A function intended to initialize any output targets (such as open files or database connections) and related resources that may be needed by the TextOutputConsumer or AsyncTextOutputConsumer action. Data Conveyer calls this function a single time before starting the actual orchestrator processing, such as invocations of TextOutputConsumer function if one is defined. However, if TextOutputConsumer or AsyncTextOutputConsumer action is not defined; then this function is not called. The OutputInitializer function accepts a single parameter (IGlobalCache) and returns a string (Func<string>). Null returned value means successful initialization; otherwise, an error message indicating reason for the failure is expected to be returned (Data Conveyer will log this message). This function may remain not called in case of failure of prior initializer (IntakeInitializer), in which case the processing will result in the InitializationError status. If not defined, default OutputInitializer function simply returns null, i.e.:
gc => null
Public propertyOutputKeyPrefix
Prefix to be prepended to every key on Keyword output (e.g. @p).
Public propertyOutputWriter
A function returning a text writer object to consume data produced by the Data Conveyer output. This setting is equivalent to OutputWriters setting containing a single writer. OutputWriter and OutputWriters settings should not be used both at the same time.
Public propertyOutputWriters
A function returning a collection of text writer objects to consume data produced by the Data Conveyer output. Each writer corresponds to a single output target; the first writer is assigned TargetNo=1, the second TargetNo=2, etc. Number of files specified here must be equal the highest TargetNo returned by the ClusterRouter function. Ignored if the TextOutputConsumer function (or applicable equivalent function) is also submitted.
Public propertyPhaseFinishedHandler
Handler of the PhaseFinished event. Data Conveyer calls this function at the end of a processing phase. Note that any exceptions thrown by event handlers are discarded by Data Conveyer.
Public propertyPhaseStartingHandler
Handler of the PhaseStarting event. Data Conveyer calls this function at start of a processing phase. Note that any exceptions thrown by event handlers are discarded by Data Conveyer.
Public propertyPrependHeadCluster
True causes Data Conveyer to add an extra empty cluster (so called head cluster) before the first cluster formed from intake records. False (default) means no foot cluster will be added at the end of intake. The head cluster always has a StartRecNo of 0 (HeadClusterRecNo) and StartSourceNo of 1. Header and/or leader contents, if present, will precede the head cluster on output.
Public propertyProgressChangedHandler
Handler of the ProgressChanged event. Data Conveyer calls this function at specified intervals during processing. Note that any exceptions thrown by event handlers are discarded by Data Conveyer.
Public propertyProgressInterval
Frequency of raising the ProgressChanges event:
  • 0 - Never (default)
  • 1 - Every cluster
  • 2 - Every other cluster
  • ... - Etc.
This setting is ignored if ReportProgree is false.
Public propertyPropertyBinEntities
Entities (such as records and/or clusters) that will have property bin objects attached to during Data Conveyer processing. Default value is Nothing, i.e. no property bins attached. This feature should be used judiciously due to its impact on performance.
Public propertyQuotationMode
Specifies which values are to be surrounded with quotes on output. One of: OnlyIfNeeded - Output values are not quoted, except for those that contain commas and/or quotes (default). StringsAndDates - String and date values are quoted on output, while decimal or integer values are not (except if formatted to contain commas). Always - All values are surrounded with quotes on output.
Public propertyRecordboundTransformer
A function that takes a single record and returns a single record; specific to Recordbound transformer type. In case the function returns null (Nothing in Visual Basic), the record will be filtered out. This makes the and RecordFilterPredicate a special case of and RecordboundTransformer where unfiltered records are passed through. If not supplied, a default pass-through function is used that passes input record to output, i.e.:
rec => rec
Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
Public propertyRecordFilterPredicate
A predicate (boolean function) that takes a single record and returns true to accept the record or false to reject it; specific to RecordFilter transformer type. Note that in case all records for a cluster are rejected, then the cluster will be rejected. If not supplied, a default pass-through predicate is used that always returns true, i.e.:
rec => true
Any exception thrown by this function will cause the process shutdown (CompletionStatus of Failed).
Public propertyRecordInitiator

Introduces records into the processing pipeline of Data Conveyer.

This function is called once per record immediately after the record is read from intake and parsed, before being forwarded to the clustering process. There are 2 roles of this function: trace bin setup and trigger the start of the Transformation phase. Any data collected from the current record can be stored in a trace bin for access during processing of subsequent records (as well as during subsequent phases of processing). The start of the Transformation phase can be triggered by this function only in case of DeferTransformation set to UntilRecordInitiation.

The function accepts 2 parameters: current record and a trace bin object (dictionary). The function returns boolean value to trigger the start of the Transformation phase in case of DeferTransformation set to UntilRecordInitiation. Note that records are processed sequentially on Intake; after returning the first true (or if DeferTransformation setting is other than UntilRecordInitiation), the values returned by this function are inconsequential. In case RecordInitiator returns false for all records and the DeferTransformation is UntilRecordInitiation, the transformation starts after all input records have been read.

Deferral of Transformation may be useful in cases where transformation of initial clusters (e.g. head cluster) requires data that is read from some records during Intake and saved in global cache. Note though that such deferral affects performance as well as memory footprint; and in case of limited buffer sizes may lead to deadlocks.

If not supplied, Data Conveyer assumes default function that does not make any updates and returns true, i.e.:
(rec, tb) => true
Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
Public propertyRecordRouter
A function to determine the output target for a given record. It receives an output record and a cluster that contains the record and returns TargetNo to be assigned to the record (Func<IRecord, ICluster, int>). This function is specific to PerRecord router type; it is ignored for other router types. If not supplied, a default function that returns 1 for every record is assumed, i.e.:
(rec, clstr) => 1
Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
Public propertyRepeatLeaders
Relevant only if multiple output targets are present and LeaderContents is not null. If true (default), then the trailer rows will be send to all output targets after sending data rows. If false, then the trailer rows will only be sent the last output target (typically the one with the highest TargetNo, but it can actually be any target).
Public propertyRepeatOutputHeaders
Relevant only if multiple output targets are present and HeadersInFirstOutputRow is true. If true (default), then the header row will be sent to all output targets before sending data rows. If false, then the header row will only be sent the first output target (typically the one with TargetNo=1, but it can actually be any target). Note that all output targets are subjected to the same set of processing rules, which implies that the same header row applies to all targets regardless if it's repeated for all sources or not.
Public propertyRepeatTrailers
Relevant only if multiple output targets are present and TrailerContents is not null. If true (default), then the trailer rows will be send to all output targets after sending data rows. If false, then the trailer rows will only be sent the last output target (typically the one with the highest TargetNo, but it can actually be any target).
Public propertyReportProgress
True causes Data Conveyer to raise progress events, i.e. PhaseStarted, PhaseFinished and (if ProgressInterval other than 0) ProgressChanged. If false (default), then no progress events occur.
Public propertyRetainQuotes
True will keep double quotes surrounding values if any; false (default) will strip surrounding quotes. Note that quotes are stripped before possible trimming; so unless quote is the very 1st character, quotes will be retained regardless of this setting.
Public propertyRouterType
Type of router that determines the output target. One of: SingleTarget (default), SourceToTarget, PerCluster or PerRecord.
Public propertyTextIntakeSupplier
A function that supplies contents of a single text line to be fed into Data Conveyer. (Func<string>). When end of data is reached, the function must return null (end of data mark). The TextIntakeSupplier function is a simplified version of the IntakeSupplier function where the following restrictions apply:
  • Input data kind is textual such as Delimited, Flat, etc. (and not for example XML).
  • All lines are associated with a single source.
  • No access to global cache.
Public propertyTextOutputConsumer
An action (void function) that consumes a single line of text received from Data Conveyer. (Action<string>). The last line sent by Data Conveyer is always null (end of data mark). The TextOutputConsumer action is a simplified version of the OutputConsumer action where the following restrictions apply:
  • Output data kind is textual such as Delimited, Flat, etc. (and not for example XML).
  • All lines are sent to a single target (regardless of the target determined by the ClusterRouter function).
  • No access to global cache.
Public propertyTimeLimit
Maximum amount of time allowed for the process execution. Default value is -1 ms, i.e.:
Timeout.InfiniteTimeSpan
Public propertyTrailerContents
Data, if any, to be sent to output after the last output line; may be multi-line.
Public propertyTransformBufferFactor
Ratio between sizes of the transformer's output and input buffers (transformer's input buffer size is defined by the BufferSize setting). This advanced setting allows fine-tuning of memory consumption by Data Conveyer. For example, if transformation process is expected to remove (filter out) on average every other cluster, then it may be sensible to set transformer's input buffer size to 100 clusters and the transformer's output buffer size to 50 clusters. In this case, BufferSize = 100 and TransformBufferFactor = 0.5. Default value for IntakeBufferFactor is 1.0. This setting is respected only if BufferSize is set to a positive value (i.e. ignored in case of BufferSize's default value of -1 (Unlimited)).
Public propertyTransformerType
Type of the transformer: Clusterbound, Recordbound, ClusterFilter, RecordFilter or Universal. Default transformer type is Recordbound. The Aggregator type is designated for future use.
Public propertyTrimInputValues
True means leading and trailing spaces will be trimmed from values encountered during intake; default is false meaning all spaces are preserved. Note that RetainQuotes = true will prevent trimming of quoted values.
Public propertyTrimOutputValues
True means that leading and trailing spaces will be trimmed from output values. False (default) means that all spaces are preserved. Values get trimmed before surrounding in quotes (per QuotationMode setting).
Public propertyTypeDefiner
A function to determine data types for those fields that are not listed in ExplicitTypeDefinitions. The function takes a field name and returns an ItemDef consisting of a type and format for the field. Default function assumes every field is of string type and has no format, i.e.:
fn => new ItemDef(ItemType.String, null)
Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
Public propertyUniversalTransformer
A function that takes a single cluster and returns a sequence of clusters; specific to Universal transformer type. If not supplied, a default pass-through function is used that passes input cluster to output as a single element enumerable, i.e.:
clstr => Enumerable.Repeat(clstr, 1)
Any exception thrown by this function will cause the process shutdown (CompletionStatus of Failed).
Public propertyXmlJsonIntakeSettings
A string containing comma-separated parameters for parsing XML or JSON or UnboundJSON data on intake. Each such parameter is a key-value pair with a pipe symbol (|) separating the key and the value. The parameters reflect the shape of data to parse and define the elements to be extracted. There are some differences (explained below) in their interpretation in case of XML vs JSON vs UnboundJSON. The following keys can be used:
  • CollectionNode - "xpath" to the collection of clusters (or records if the ClusterNode parameter is absent). If this parameter is absent for XML, then intake is expected to contain XML fragment where each root constitutes a record or a cluster. In case of JSON, absent or empty value generally means an array instead of an object. Ignored in case of UnboundJSON.
  • ClusterNode - "xpath" to the cluster node within the collection node. Records within a cluster node will be assigned the same cluster number (sequential). If this parameter is absent, then all records will be assigned a cluster number of 0 (undetermined) and the clusters will be determined solely by the ClusterMarker function. Ignored in case of UnboundJSON.
  • RecordNode - "xpath" to the record node within the cluster node (or the collection node if the cluster node is empty). The RecordNode parameter cannot be absent, although in case of JSON, it can be empty. Ignored in case of UnboundJSON.
  • IncludeExplicitText - XML only, ignored in case of JSON or UnboundJSON. true to include explicit text inside record nodes in XML data; false (default) to ignore explicit text. Explicit text is the text contained directly inside an XML element that represents the record node; this is not typically expected. Data Conveyer assigns a special key of "__explicitText__" to explicit text in the record node.
  • IncludeAttributes - XML only, ignored in case of JSON or UnboundJSON. true to include attributes (the key of the corresponding item will be prefixed by @); truePlain to include attributes (without prefix in item key); false (default) to ignore explicit text. It is generally recommended that the keys (names) of items that originate from XML attributes be prepended with @; therefore setting of true is preferred over truePlain. In the latter case, name conflicts are possible between items originating from attributes and inner nodes.
  • AddClusterDataToTraceBin - XML only, ignored in case of JSON or UnboundJSON. Also ignored if ClusterNode value is absent. true to include attributes of the cluster node(s) as key value pairs in the trace bins attached to the records of the cluster. false to exclude cluster data from intake processing. The keys of the elements placed in the trace bin reflect nodes traversed to reach the cluster node (including the attribute node itself) separated by periods. For example, if ClusterNode value is Region/State and the actual nodes contain <Region name="East"><State abbrev="NJ">..., then there will be 2 elements placed in the trace bin with keys of Region.name and Region.State.abbrev and the corresponding values of East and NJ.
  • DetectClusters - UnboundJSON only, ignored in case of XML or JSON. If present, Data Conveyer will recognize clusters on intake by tracking nesting levels of JSON objects (intake records). In case the level is not the same as on previous record, a new cluster is detected. If absent, array nesting of JSON objects is ignored and records remain unclustered (although clusters can still be assigned via the ClusterMarker function).
Note Note

"xpath" in any of the node parameters is a simplified version of the xpath syntax. It is always relative to the containing node (no need for ./).

In case of JSON, it contains names of nodes (nested in the node hierarchy) separated by /. Empty nodes are allowed, in which case the intake is expected to contain an array at a given hierarchy level.

In case of XML, each node can contain attributes. For example, "xpath" of "Department[@name=\"QA\"]/Employees" can be used to identify employees of the QA department. Empty nodes are not allowed and the // syntax is not supported.

Example 1:"CollectionNode|Members,RecordNode|Member"(XML or JSON)

Example 2:"CollectionNode|Root/Members,ClusterNode|Group/FamilyRecordNode|Member"(XML or JSON)

Example 3:"RecordNode|row"(XML fragment or JSON object with an array)

Example 4:"CollectionNode|,RecordNode|"(JSON only - an array of objects=records)

Example 5:"ClusterNode|,RecordNode|"(JSON only - an array of arrays=clusters of objects=records)

Example 6:"RecordNode|"(JSON only - multiple objects containing records)

Example 7:"CollectionNode|Root/Members[@region=\"North\"],ClusterNode|Group[@id=2][@zone=\"\"]/Family,RecordNode|Data/Member[@class],IncludeExplicitText|true"(XML only)

Example 8:"DetectClusters"(UnboundJSON only)

This configuration setting is only applicable when InputDataKind value is XML or JSON or UnboundJSON.

Public propertyXmlJsonOutputSettings
A string containing comma separated parameters for writing XML or JSON or UnboundJSONdata on output. Each such parameter is a key-value pair with a pipe symbol (|) separating the key and the value. The parameters define the shape of data to write. There are differences (explained below) in their interpretation in case of XML vs JSON vs UnboundJSON. The following keys can be used:
  • CollectionNode - "xpath" defining the collection of clusters (or records if the ClusterNode parameter is absent). If this parameter is absent for XML, then output will contain XML fragment where each root constitutes a record or a cluster. In case of JSON, absent or empty value generally means an array instead of an object. Ignored in case of UnboundJSON.
  • ClusterNode - "xpath" to the cluster node within the collection node. Records with the same cluster number will be placed inside the same cluster node. If this parameter is absent, then all records will be placed directly in the collection node regardless of their cluster number. Ignored in case of UnboundJSON.
  • RecordNode - "xpath" to the record node within the cluster node (or the collection node if the cluster node is empty). In case of XML this parameter is required; if absent or empty, a default value of "__record__" will be assumed. In case of JSON this parameter can be empty (or even absent if CollectionNode and ClusterNode are also empty - special case to output multiple objects containing records). Ignored in case of UnboundJSON.
  • AttributeFields - XML only, ignored in case of JSON or UnboundJSON. A semi-colon separated list of field names (item keys) to be projected as attributes of the record node (and not inner nodes). In addition, all fields with names starting with @ will be projected as attributes of the record node (the @ will not be removed from the attribute name).
  • IndentChars - A string to use when indenting, e.g. "\t" or " ". This parameter allows to "pretty-print" the output. When absent, no indenting takes place. Note that in case of JSON or UnboundJSON, the only sensible value for this parameter contains a string containg a single character (possibly repeated, e.g. " " or "\t"; if different characters are used (e.g. " \t" i.e. a space and a tab), then the output may not be as expected.
  • NewLineChars - XML only, ignored in case of JSON or UnboundJSON. Characters to use for line breaks (to "pretty-print" the XML output).
  • ProduceStandaloneObjects - Only applicable to UnboundJSON output: if present, causes Data Conveyer to produce successive, standalone objects (records) or , if ProduceClusters is also present, arrays (clusters) - technically not valid JSON, but commonly used; if absent, a JSON array is produced where each record (or cluster ProduceClusters is also present) becomes an array element. Ignored in case of XML or JSON.
  • ProduceClusters - Only applicable to UnboundJSON output: if present, causes all records of a clusted to be enclosed in arrays; if absent, clusters are ignored on output. Ignored in case of XML or JSON.
  • SkipColumnPresorting - Only applicable to UnboundJSON output: if present, the fields are processed in order they appear in the records (better performance, but JSON hierarchy may be off); if absent, fields are grouped by segments (dot notation) to force proper JSON nesting hierarchy. Ignored in case of XML or JSON.
Note Note

"xpath" in any of the node parameters is a simplified version of the xpath syntax. It is always relative to the containing node (no need for ./).

In case of JSON, it contains names of the nodes (nested in the node hierarchy) separated by /. Empty nodes are allowed, in which case the output at a given hierarchy level consist of an array instead of an object containing an array. Special case: when CollectionNode, ClusterNode and RecordNode are all absent, then output will consist of multiple objects containing records.

In case of XML, empty nodes are not allowed. In addition, node names must be valid XML names (notes that certain characters, such as spaces, are not allowed in XML even though they are allowed in JSON). Attributes can be specified, but must include both name and value. In such cases, an attribute will be added to the node on output. Care needs to be taken in case of attributes of the record node to avoid name duplicates with the records fields (when AttributeFields parameter is used).

Example 1:"CollectionNode|Members,RecordNode|Member,IndentChars| "(XML or JSON)

Example 2:"RecordNode|Member,IndentChars|\t"(XML or JSON)

Example 3:"CollectionNode|Root/Members,ClusterNode|Group/Subgroup/Family,RecordNode|Data/Member"(XML or JSON)

Example 4:"RecordNode|,IndentChars| "(JSON only - an array of object containing records)

Example 5:"ClusterNode|,RecordNode|"(JSON only - an array of arrays=clusters of objects=records)

Example 6:""(JSON only - all node parameters absent is a special case that results in multiple objects containing records)

Example 7:"CollectionNode|Root/Members[@region=North],ClusterNode|Group[@id=2][@zone=\"\"]/Family,RecordNode|Data/Member[@class=\"main\"],AttributeFields|ID;zone"(XML only)

Example 8:"ProduceStandaloneObjects,SkipColumnPresorting,IndentChars| "(UnboundJSON only)

This configuration setting is only applicable when OutputDataKind value is XML, JSON or UnboundJSON.

Top
See Also