OrchestratorConfig Properties |
The OrchestratorConfig type exposes the following members.
Name | Description | |||
---|---|---|---|---|
ActionOnDuplicateKey |
Action to take in case the key is repeated on a single record: IgnoreItem, ExcludeRecord or AssignDefaultKey. Default action is IgnoreItem.
| |||
AllowOnTheFlyInputFields |
False (default) means no new fields beyond those defined in InputFields (or 1st row if applicable) are allowed. Any additional fields encountered during intake processing will be excluded.
True means that new fields can be added on the fly in subsequent data records; their names and types are determined according to standard rules.
In case of Keyword data, default setting (false) requires all fields (keys) to be listed in the InputFields setting (keys not listed there will be excluded, so in absence of InputFields no fields will be present).
True setting allows inclusion of all fields for Keyword data.
This setting is ignored in case of the following data kinds on intake:
| |||
AllowTransformToAlterFields |
If true, fields (items) can be added to and removed from records during transformations.
If false (default), no fields can be added/removed during transformations (although field values can still be update and records/clusters can be cloned).
| |||
AppendFootCluster |
True causes Data Conveyer to add an extra empty cluster (so called foot cluster) after the last cluster formed from intake records.
False (default) means no foot cluster will be added at the end of intake.
The foot cluster always has a StartRecNo of -1 (FootClusterRecNo) and StartSourceNo of 1. If trailer contents is present, it will follow the foot cluster on output.
| |||
ArbitraryInputDefs |
Array of elements that define fields to be extracted from input lines.
Each element is a string consisting of 2 parts separated by a space: a field name followed by a regular expression containing formula to extract a value from input line.
Data Conveyer will extract the fields in the order they are specified in this setting.
This setting is mandatory for InputDataKind of Arbitrary; if specified for other data kinds, it is ignored.
| |||
ArbitraryOutputDefs |
Array of strings that define data to be placed on output lines.
Each string contains a fragment of the output line and may contain a single token in a form of {field}, where
such token will be substituted by the actual value of the field.
In order to include brace characters in the output, they need to be escaped (preceded by a backslash), like so: \{ and \}.
Note that backslash characters inside C# string literals may need to be escaped themselves.
For example: "\\{ "Format":"JSON" \\}" (as well as @"\{ ""Format"":""JSON"" \}") will output { "Format": "JSON" }.
Data fragments defined by array elements are spliced together to form output lines.
This setting is mandatory for OutputDataKind of Arbitrary; if specified for other data kinds, it is ignored.
| |||
AsyncIntake |
True will cause Data Conveyer to perform intake operations asynchronously (e.g. it will call AsyncTextIntakeSupplier or AsyncIntakeSupplier function if provided, not the TextIntakeSupplier or IntakeSupplier function).
False (default) will cause Data Conveyer to perform intake operations synchronously, in which case the TextIntakeSupplier or IntakeSupplier function is called (if provided).
Note that Data Conveyer performs such synchronous intake on a dedicated thread to prevent a possible deadlock condition.
| |||
AsyncIntakeSupplier |
An asynchronous function that supplies a task with (a promise of) a tuple containing an ExternalLine object (an input line to be fed into Data Conveyer) and the source number (1-based) (Func<IGlobalCache, Task<Tuple<string,int>>>).
Data Conveyer calls this function when AsyncIntake is true (providing a single parameter - IGlobalCache) in succession until null is received.
When end of data is reached (i.e. intake sources are depleted), the function must return a task with null result (end of data mark), which will initiate processing shutdown with a status of IntakeDepleted.
Note the difference between null tuples (end of data marks) and tuples containing null strings (ignored by Data Conveyer).
Any returned tuples that contain null ExternalLine objects are ignored by Data Conveyer; to send an empty line, a tuple containing an empty object (e.g. an empty string) needs to be returned.
If not defined, Data Conveyer assumes default AsyncIntakeSupplier function that returns no data, but a task with an end of data result, i.e.:
Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
| |||
AsyncOutput |
True will cause Data Conveyer to perform output operations asynchronously (i.e. it will call AsyncTextOutputConsumer or AsyncOutputConsumer function if provided, not the TextOutputConsumer or OutputConsumer function).
False (default) to perform intake operations synchronously, in which case the TextOutputConsumer or OutputConsumer function is called (if provided).
| |||
AsyncOutputConsumer |
An asynchronous action (Task returning function) that consumes a tuple containing a ExternalLine object (a single output line received from Data Conveyer) and the target number (Func<Tuple<string, int>, Task>).
Reference to global cache (IGlobalCache) is passed as the second parameter.
The function is intended for use in case of a long-running operation to consume the output line, e.g. await LongRunningOperation(..).
Data Conveyer calls this action (when AsyncOutput is true) in succession passing (tuples with) the output lines one at a time.
The last tuple sent by Data Conveyer is always null (end of data mark).
If not defined, Data Conveyer assumes default AsyncOutputConsumer action that ignores data passed to it, i.e.:
Any exception thrown by this action will cause the process shutdown with a CompletionStatus of Failed.
| |||
AsyncTextIntakeSupplier |
An asynchronous function that supplies a task promising contents of a single text line to be fed into Data Conveyer. (Func<Task<string>>).
When end of data is reached, the function must return null (end of data mark).
The AsyncTextIntakeSupplier function is a simplified version of the AsyncIntakeSupplier function where the following restrictions apply:
| |||
AsyncTextOutputConsumer |
An asynchronous action (Task returning function) that consumes a single line of text received from Data Conveyer. (Func<string, Task>).
The last line sent by Data Conveyer is always null (end of data mark).
The AsyncTextOutputConsumer action is a simplified version of the AsyncOutputConsumer action where the following restrictions apply:
| |||
BufferSize |
Capacity of a transformer buffer defined as a maximum number of clusters to be held unprocessed by Data Conveyer.
This setting can be used to control memory consumption of Data Conveyer.
Default value is -1, which means no limit.
| |||
CloseLoggerOnDispose | True (default) causes the logger to close (stop logging) when the orchestrator object is disposed. False keeps the logger open beyond the lifespan of the orchestrator object, which may be helpful in certain troubleshooting scenarios (e.g. to avoid System.ObjectDisposedException: Cannot write to a closed TextWriter when using LogFile logger). It is recommended that the CloseLoggerOnDispose remains at its default value of true, except for troubleshooting scenarios. The CloseLoggerOnDispose setting also affects the behavior of the SaveConfig(String) method in case an error occurs: if true, then the error gets logged and the logger is closed (stops logging); if false, then the the error gets logged and the logger remains open. | |||
ClusterboundTransformer |
A function that takes a single cluster and returns a single cluster; specific to Clusterbound transformer type.
In case the function returns null (Nothing in Visual Basic), the cluster will be filtered out.
This makes the and ClusterFilterPredicate a special case of
and ClusterboundTransformer where unfiltered clusters are passed through.
If not supplied, a default pass-through function is used that passes input cluster to output, i.e.:
Any exception thrown by this function will cause the process shutdown (CompletionStatus of Failed).
| |||
ClusterFilterPredicate |
A predicate (boolean function) that takes a single cluster and returns true to accept the cluster or false to reject it; specific to ClusterFilter transformer type.
If not supplied, a default pass-through predicate is used that always returns true, i.e.:
Any exception thrown by this function will cause the process shutdown (CompletionStatus of Failed).
| |||
ClusterMarker |
A predicate (boolean function) to identify records that cause cluster splits (either starting a new cluster or ending a cluster depending on the MarkerStartsCluster setting).
It accepts 3 parameters: the current record, the previous record and the counter of records accumulated so far in the current cluster. Note that the previous record is null for the first record.
If the predicate returns true, it causes cluster split (before or after the current record depending on MarkerStartsCluster value);
returning false causes continuation of record accumulation into the current cluster.
In case this function marks the first record of a cluster (i.e. MarkerStartsCluster is true), then the current record starts a new set of records accumulated for a cluster;
otherwise (MarkerStartsCluster is false, i.e. the function marks the last record of a cluster), the current record ends the set of records accumulated for a cluster, and a new empty
set of accumulated records is created.
If not specified, the default function will split clusters on every record, with one notable exception as follows:
In case of XML/JSON intake (XML, JSON or UnboundJSON) and the XmlJsonIntakeSettings setting
directing Data Conveyer to automatically detect clusters (ClusterNode or DetectClusters respectively), then Data Conveyer will respect clusters detected
from XML/JSON intake.
Here are is the default ClusterMarker function:
Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
| |||
ClusterRouter |
A function to determine the output target for every record in a given cluster. It receives an output cluster and returns TargetNo to be assigned to every record of the cluster (Func<ICluster, int>).
This function is specific to PerCluster router type; it is ignored for other router types. If not supplied, a default function that returns 1 for every cluster is assumed, i.e.:
Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
| |||
ClusterSyncInterval |
This advanced setting specifies number of milliseconds DataConveyer awaits when synchronizing processing of head and foot clusters.
Note that regardless of ConcurrencyLevel setting, DataConveyer guarantees that the head cluster (if present) will be processed before
all other clusters and that the foot cluster (if present) will be processed after all other clusters. The order of processing remaining ("regular")
clusters is not guaranteed. The default setting of 40 ms is suitable for most scenarios.
| |||
ConcurrencyLevel |
Degree of parallelism during transformation phase; defined as a maximum number of engines performing transformation.
If not supplied, a default value of 1 is used.
| |||
ConfigName |
Name of this configuration in case it was created from a .cfg (and optional .dll) file(s) via RestoreConfig method; otherwise null.
| |||
DataConveyerInfo |
Information on current version of Data Conveyer.
| |||
DefaultInputFieldWidth |
Default width of fields in flat input data; applicable only if InputDataKind is Flat, ignored otherwise.
If not specified, a default width of 10 is assumed.
| |||
DefaultOutputFieldWidth |
Default width of fields in the flat output data; applicable only if OutputDataKind is Flat, ignored otherwise.
If not specified, a default width of 10 is assumed.
| |||
DefaultX12FieldDelimiter |
Default value of the field delimiter - applicable only to X12 data.
On X12 intake, field delimiter is always determined from ISA segment, so this setting only applies in case ISA segment is absent (incomplete X12 envelope).
On X12 output, in absence of this setting, field delimiter is determined from intake if also X12 (specifically, from the first ISA segment encountered), or a default value of '*' is assumed.
| |||
DefaultX12SegmentDelimiter |
Default value of the segment delimiter - applicable only to X12 data.
On X12 intake, segment delimiter is always determined from ISA segment, so this setting only applies in case ISA segment is absent (incomplete X12 envelope).
On X12 output, in absence of this setting, segment delimiter is determined from intake if also X12 (specifically, from the first ISA segment encountered), or a default value of '~' is assumed.
May contain multiple characters to make output easier to read, such as "~\r\n".
| |||
DeferOutput |
Defines when Data Conveyer is allowed to start the Transformation phase.
This setting should be left at its default value of Auto.
The value of Indefinitely is restricted for use in specialized tests only!
| |||
DeferTransformation |
Defines when Data Conveyer is allowed to start the Transformation phase. One of:
| |||
EagerInitialization |
If true, all initializations are executed regardless if prior initializations failed or not. So, for example OutputInitializer gets executed even after a failure of IntakeInitializer.
This may result in unwanted reset of output (such as erasure of prior data) even if the process could not start due to a problem initializing intake.
If false (default), any initialization failure (e.g. in IntakeInitializer) will prevent execution of subsequent initializations (such as OutputInitializer ).
In this case, troubleshooting of output initialization is only possible after successful intake initialization.
| |||
ErrorOccurredHandler |
Handler of the ErrorOccurred event. Data Conveyer calls this function (regardless of the ReportProgress setting)
when an exception thrown during processing (for example in the caller supplied code) is unhandled.
This handler is intended for troubleshooting purposes. It occurs immediately before the process completes with the
status of Failed and provides the last chance to identify the reason of (but not recovery from) the failure.
Also note that any exceptions thrown by event handlers are discarded by Data Conveyer.
| |||
ExcludeExtraneousFields | In case of Delimited or Flat output data, this setting, when true, causes Data Conveyer to remove trailing, insignificant fields, i.e. fields with empty values (in addition, trailing spaces on the last field are removed); if false (default), then all fields are always included on output (in their entirety). For Keyword output data, the setting is only applicable when OutputFields setting is specified. If true, only those fields that are both: specified in OutputFields and also present in the actual records will be included on output. If false (default), all fields will always be included on output (empty values assumed for fields absent from the actual records). | |||
ExcludeItemsMissingPrefix |
True will cause exclusion of items (fields) with keys not matching the InputKeyPrefix; false (default) will include such fields (with keys in their entirety).
Use caution when assigning true to this setting, as it may result in records with no contents (when none of the fields match prefix specified in InputKeyPrefix setting).
| |||
ExplicitTypeDefinitions | This setting defines data types used in internal representations of record fields. It contains a set of keys (field names) along with their corresponding data types and optional formats. The setting is in the form of a comma delimited list of items, each item being a pipe delimited triplet: fldName|type|format, where type is one of: S=String, M=Decimal, D=DateTime, I=Integer, B=Boolean; and format is the format string used when formatting output. Format can be omitted with the preceding pipe, in which case no formatting will take place. Format has no relevance on intake. In addition, with one exception, format is ignored in JSON output, because JSON supports writing elements in their native data types. The exception is for fields of DateTime type, which are not supported in JSON. DateTime fields are converted to strings (respecting format) before being submitted to JSON output. Example:"AMOUNT|M,BIRTH_DATE|D|M/d/yyyy,AGE|I" Sequence of the elements in this setting is irrelevant. Types of the fields not specified in ExplicitTypeDefinitions are determined by the TypeDefiner function, which by default assumes string type and no format for all fields.
| |||
GlobalCacheElements | A list of definitions of the elements to be held in the global cache (IGlobalCache). Global cache is a central repository of arbitrary key value pairs maintained throughout the Data Conveyer process. Each array element defines a single global cache element. The definition consists of the key optionally followed by a pipe symbol (|) and the initial value to be placed in global cache. Element keys are of string type (only letters and numbers are recommended). The type of each element depends on the element value: if specified after the pipe symbol, it can be one of: int, DataTime, decimal or string. In case no pipe symbol is present in element definition, the element will be of type object with a null value. Note that element values (and types as well) can be changed by the ReplaceValueTIn, TOut(String, FuncTIn, TOut) method. Rules for determining the element type depend on the value placed after the pipe symbol (|):
Note that ExplicitTypeDefinitions or TypeDefiner settings have no meaning in determining the type of the global cache elements. Also note that this settings only defines the elements of the global cache, and not the signals, which are simply referred to in RaiseSignal(String), AwaitSignal(String) and AwaitSignalAsync(String) methods. | |||
HeadersInFirstInputRow |
True means 1st input line contains field names and data starts from the 2nd line; false (default) means data starts from the 1st line (default field names are assigned in this case).
This setting applies only to data kinds that support header rows, such as Delimited or Flat; otherwise, it is ignored, in which case data always starts from the 1st line.
| |||
HeadersInFirstOutputRow |
True means 1st line sent to output will contain field names and data will start on the 2nd line; false (default) means no field names are sent to output and data starts on the 1st line.
This setting applies only to data kinds that support header rows, such as Delimited or Flat; otherwise, it is ignored, in which case data always starts on the 1st line.
| |||
InputDataKind |
Type (format) of input data: Raw, Keyword, Delimited, Flat, Arbitrary, XML, JSON, UnboundJSON or X12. Default is Raw.
These values are intended for future use: HL7 and Ultimate.
| |||
InputFields | Comma delimited list of fields as they appear on intake lines. Each field is defined by a name and width, separated by a pipe (|) symbol. Field widths are only applicable to Flat (fixed width) data; they are ignored for other kinds of input data. Where omitted or invalid, a default width (DefaultInputFieldWidth) is assumed. Field names specified in this setting take precedence over those in the first row headers (Delimited and Flat data), so in case of HeadersInFirstInputRow=true, the 1st row data may get discarded. If a field name is omitted, a default name will be used (either from the header row or from a formula, which yields Fldnnn, where nnn is the field sequence number; exception is X12 data, where fields are named Segment, Elem001, Elem002, ...). This setting, when accompanied by AllowOnTheFlyInputFields of false, can be used to only accept those fields specified and exclude all other fields from intake (e.g. in case of Keyword data). Examples:
| |||
InputFieldSeparator |
Character that separates fields on intake lines. Only applicable to Delimited and Keyword data, ignored in case of other data kinds.
If not specified, a comma is assumed.
| |||
InputFileName | A synonym for the InputFileNames setting (to make it more intuitive in case of a single input file). InputFileName and InputFileNames settings should not be used both at the same time. | |||
InputFileNames | Name(s) of (i.e. path(s) to) the input file(s). This setting is the same as (synonym for) the InputFileName setting. If multiple files are specified, the names are separated by pipe symbols(|). Each name can be surrounded by double quotes. The first file is assigned SourceNo=1, the second SourceNo=2, etc. The files will be read one after another (in SourceNo order) either synchronously or asynchronously, depending on AsyncIntake setting. Ignored if TextIntakeSupplier or IntakeReaders setting (or one of their equivalent settings) is also submitted. InputFileName and InputFileNames settings should not be used both at the same time. | |||
InputHeadersRepeated |
Relevant only if multiple intake sources are present and HeadersInFirstInputRow is true.
True (default) means that all intake sources contain the header row; in this case Data Conveyer will read the headers from the first source that supplied intake data and will ignore the header rows from the remaining sources.
False means that only the source that supplies the intake data first contains the header row (typically the source with SourceNo=1, but it can actually be any source); the remaining sources are assumed to only contain data rows (regardless of the HeadersInFirstInputRow setting).
In either case, the only header row actually considered is the one from the source that supplies intake first. Note that all intake sources are subjected to the same set of processing rules, which implies that the same header row is applicable to all sources (hence the header row should be identical for all sources).
| |||
InputKeyPrefix |
Prefix to be trimmed (removed) from every key on input, if any (e.g. @p).
Default value is null (no prefix to be removed).
| |||
IntakeBufferFactor |
Ratio between sizes of the intake buffer and the transformer's input buffer (the latter is defined by the BufferSize setting).
This advanced setting allows fine-tuning of memory consumption by Data Conveyer.
For example, if an average cluster on intake is expected to be created from 4 intake records, then it may be sensible to set intake buffer size to 400 records
and transformer's input buffer size to 100 clusters. In this case, BufferSize = 100 and IntakeBufferFactor = 4.0.
Default value for IntakeBufferFactor is 1.5.
This setting is respected only if BufferSize is set to a positive value (i.e. ignored in case of BufferSize's default value of -1 (Unlimited)).
| |||
IntakeDisposer |
An action intended to dispose any intake sources and related resources that were opened by the IntakeInitializer function.
Data Conveyer calls this action a single time after completing orchestrator processing (when the orchestrator is disposed).
However, if TextIntakeSupplier or AsyncTextIntakeSupplier function is not defined; then this action is not called.
The IntakeDisposer action accepts a single parameter (IGlobalCache) and returns void (Action<IGlobalCache>).
If not defined, Data Conveyer assumes empty action, i.e.:
Any exception thrown by this action will be logged, but otherwise ignored.
| |||
IntakeInitializer |
A function intended to initialize any intake sources (such as open files or database connections) and related resources that may be needed by the TextIntakeSupplier or AsyncTextIntakeSupplier function.
Data Conveyer calls this function a single time before starting the actual orchestrator processing, such as invocations of TextIntakeSupplier function, if one is defined.
However, if TextIntakeSupplier or AsyncTextIntakeSupplier function is not defined; then this function is not called.
The IntakeInitializer function accepts a single parameter (IGlobalCache) and returns a string (Func<IGlobalCache, string>).
Null returned value means successful initialization; otherwise, an error message indicating reason for the failure is expected to be returned (Data Conveyer will log this message).
In case of failure, the processing will end in the InitializationError status.
If not defined, default IntakeInitializer simply returns null, i.e.:
| |||
IntakeReader |
A function returning a text reader object to supply data into the Data Conveyer intake process.
This setting is equivalent to IntakeReaders setting containing a single reader.
IntakeReader and IntakeReaders settings should not be used both at the same time.
| |||
IntakeReaders |
A function returning a collection of text reader objects to supply data into the Data Conveyer intake process. Each reader corresponds to a single intake source;
the first reader is assigned SourceNo=1, the second SourceNo=2, etc.
Data will be read from one reader after another, in SourceNo order, either synchronously or asynchronously, depending on AsyncIntake setting.
Ignored if the TextIntakeSupplier function (or applicable equivalent function) is also submitted.
| |||
IntakeRecordLimit | ||||
IntakeSupplier |
A function that supplies a tuple containing a ExternalLine object (an input line to be fed into Data Conveyer) and the source number (1-based) (Func<IGlobalCache, Tuple<string,int>>).
Data Conveyer calls this function when AsyncIntake is false (providing a single parameter - IGlobalCache) in succession until null is received.
When end of data is reached (i.e. all intake sources are depleted), the function must return null (end of data mark), which will initiate processing shutdown with a status of IntakeDepleted.
Note the difference between null tuples (end of data marks) and tuples containing null ExternalLine objects (ignored by Data Conveyer).
Any returned tuples that contain null ExternalLine objects are ignored by Data Conveyer; to send an empty line, a tuple containing an empty object (e.g. an empty string) needs to be returned.
If not defined, Data Conveyer assumes default IntakeSupplier function that returns no data, but only the end of data mark, i.e.:
This function is called on a dedicated thread to prevent possible deadlocks in situations, such as UI updates made during progress changes.
Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
| |||
LeaderContents |
Data, if any, to be sent to output before the first output line; may be multi-line.
In case the header row is also present, then it will follow the leader contents.
| |||
MarkerStartsCluster |
True (default) means that ClusterMarker predicate matches the first record of a cluster (i.e. starts a cluster); false means that predicate matches the last record of a cluster (i.e. ends a cluster).
As a general rule, true value is recommended (marker starts a cluster), especially in cases where ClusterMarker predicate relies on the previous record contents or the number of records accumulated so far;
false value (marker ends a cluster) should only be used in cases where ClusterMarker predicate relies solely on the current record contents.
| |||
OutputBufferFactor |
Ratio between sizes of the output buffer and the transformer's input buffer (transformer's input buffer size is defined by the BufferSize setting).
This advanced setting allows fine-tuning of memory consumption by Data Conveyer.
For example, if an average cluster on output is expected to produce 5 output records (and also on average every other cluster to be removed by the transformer),
it may be sensible to set the transformer's input buffer size to 100 clusters (transformer's output buffer size to 50 clusters)
and the output buffer size to 250 clusters. In this case, BufferSize = 100 and OutputBufferFactor = 2.5.
Default value for IntakeBufferFactor is 1.5.
This setting is respected only if BufferSize is set to a positive value (i.e. ignored in case of BufferSize's default value of -1 (Unlimited)).
| |||
OutputConsumer |
An action (void function) that consumes a tuple containing a ExternalLine object (a single output line received from Data Conveyer) and the target number (1-based). (Action<Tuple<string, int>>).
Reference to global cache (IGlobalCache is passed as the second parameter.
Data Conveyer calls this action (when AsyncOutput is false) in succession passing (tuples with) the output lines one at a time.
The last tuple sent by Data Conveyer is always null (end of data mark).
If not defined, Data Conveyer assumes default OutputConsumer action that ignores data passed to it, i.e.:
Any exception thrown by this action will cause the process shutdown with a CompletionStatus of Failed.
| |||
OutputDataKind |
Type (format) of output data: Raw, Keyword, Delimited, Flat, Arbitrary, XML, JSON, UnboundJSON or X12. Default is Raw.
HL7 and Ultimate types are designated for future use.
| |||
OutputDisposer |
An action intended to dispose any output targets and related resources that were opened by the OutputInitializer function.
Data Conveyer calls this action a single time after completing orchestrator processing (when the orchestrator is disposed).
However, if TextOutputConsumer or AsyncTextOutputConsumer action is not defined; then this function is not called.
The OutputDisposer action accepts a single parameter (IGlobalCache) and returns void (Action<IGlobalCache>).
If not supplied, Data Conveyer assumes empty action, i.e.:
Any exception thrown by this function will be logged, but otherwise ignored.
| |||
OutputFields | Comma delimited list of fields to appear on output lines. Each field is defined by a name and width, separated by a pipe (|) symbol. Once this setting is specified, only those fields specified can be included in output. This setting is optional. However, it can only be omitted in its entirety, in which case all fields produced by transformation will be included in output. If the setting is present, then each field name must be specified (no default names can be assumed [unlike InputFields]). If a name of a non-existing field is specified, then empty contents is sent to output for such field. Field widths are only applicable to Flat (fixed width) data output; they are ignored for other kinds of output data. Where omitted or invalid, a default width (DefaultOutputFieldWidth) is assumed. | |||
OutputFieldSeparator |
Character that separates fields on output lines. Only applicable to Delimited and Keyword data, ignored in case of other data kinds.
If not specified, a comma is assumed.
| |||
OutputFileName | A synonym for the OutputFileNames setting (to make it more intuitive in case of a single output file). OutputFileName and OutputFileNames settings should not be used both at the same time. /// | |||
OutputFileNames | Name(s) of (path(s) to) the output file(s). This setting is the same as (synonym for) the OutputFileName setting. If multiple files are specified, the names are separated by pipe symbols(|). Each name can be surrounded by double quotes. The first file is assigned TargetNo=1, the second TargetNo=2, etc. Number of files specified here must be equal the highest TargetNo returned by the ClusterRouter function. Ignored if TextOutputConsumer or OutputWriters setting (or one of their equivalent settings) is also submitted. OutputFileName and OutputFileNames settings should not be used both at the same time. | |||
OutputInitializer |
A function intended to initialize any output targets (such as open files or database connections) and related resources that may be needed by the TextOutputConsumer or AsyncTextOutputConsumer action.
Data Conveyer calls this function a single time before starting the actual orchestrator processing, such as invocations of TextOutputConsumer function if one is defined.
However, if TextOutputConsumer or AsyncTextOutputConsumer action is not defined; then this function is not called.
The OutputInitializer function accepts a single parameter (IGlobalCache) and returns a string (Func<string>).
Null returned value means successful initialization; otherwise, an error message indicating reason for the failure is expected to be returned (Data Conveyer will log this message).
This function may remain not called in case of failure of prior initializer (IntakeInitializer), in which case the processing will result in the InitializationError status.
If not defined, default OutputInitializer function simply returns null, i.e.:
| |||
OutputKeyPrefix |
Prefix to be prepended to every key on Keyword output (e.g. @p).
| |||
OutputWriter |
A function returning a text writer object to consume data produced by the Data Conveyer output.
This setting is equivalent to OutputWriters setting containing a single writer.
OutputWriter and OutputWriters settings should not be used both at the same time.
| |||
OutputWriters |
A function returning a collection of text writer objects to consume data produced by the Data Conveyer output. Each writer corresponds to a single output target;
the first writer is assigned TargetNo=1, the second TargetNo=2, etc.
Number of files specified here must be equal the highest TargetNo returned by the ClusterRouter function.
Ignored if the TextOutputConsumer function (or applicable equivalent function) is also submitted.
| |||
PhaseFinishedHandler |
Handler of the PhaseFinished event. Data Conveyer calls this function at the end of a processing phase.
Note that any exceptions thrown by event handlers are discarded by Data Conveyer.
| |||
PhaseStartingHandler |
Handler of the PhaseStarting event. Data Conveyer calls this function at start of a processing phase.
Note that any exceptions thrown by event handlers are discarded by Data Conveyer.
| |||
PrependHeadCluster |
True causes Data Conveyer to add an extra empty cluster (so called head cluster) before the first cluster formed from intake records.
False (default) means no foot cluster will be added at the end of intake.
The head cluster always has a StartRecNo of 0 (HeadClusterRecNo) and StartSourceNo of 1. Header and/or leader contents, if present, will precede the head cluster on output.
| |||
ProgressChangedHandler |
Handler of the ProgressChanged event. Data Conveyer calls this function at specified intervals during processing.
Note that any exceptions thrown by event handlers are discarded by Data Conveyer.
| |||
ProgressInterval |
Frequency of raising the ProgressChanges event:
| |||
PropertyBinEntities |
Entities (such as records and/or clusters) that will have property bin objects attached to during Data Conveyer processing.
Default value is Nothing, i.e. no property bins attached.
This feature should be used judiciously due to its impact on performance.
| |||
QuotationMode |
Specifies which values are to be surrounded with quotes on output. One of:
OnlyIfNeeded - Output values are not quoted, except for those that contain commas and/or quotes (default).
StringsAndDates - String and date values are quoted on output, while decimal or integer values are not (except if formatted to contain commas).
Always - All values are surrounded with quotes on output.
| |||
RecordboundTransformer |
A function that takes a single record and returns a single record; specific to Recordbound transformer type.
In case the function returns null (Nothing in Visual Basic), the record will be filtered out.
This makes the and RecordFilterPredicate a special case of
and RecordboundTransformer where unfiltered records are passed through.
If not supplied, a default pass-through function is used that passes input record to output, i.e.:
Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
| |||
RecordFilterPredicate |
A predicate (boolean function) that takes a single record and returns true to accept the record or false to reject it; specific to RecordFilter transformer type.
Note that in case all records for a cluster are rejected, then the cluster will be rejected.
If not supplied, a default pass-through predicate is used that always returns true, i.e.:
Any exception thrown by this function will cause the process shutdown (CompletionStatus of Failed).
| |||
RecordInitiator | Introduces records into the processing pipeline of Data Conveyer. This function is called once per record immediately after the record is read from intake and parsed, before being forwarded to the clustering process. There are 2 roles of this function: trace bin setup and trigger the start of the Transformation phase. Any data collected from the current record can be stored in a trace bin for access during processing of subsequent records (as well as during subsequent phases of processing). The start of the Transformation phase can be triggered by this function only in case of DeferTransformation set to UntilRecordInitiation. The function accepts 2 parameters: current record and a trace bin object (dictionary). The function returns boolean value to trigger the start of the Transformation phase in case of DeferTransformation set to UntilRecordInitiation. Note that records are processed sequentially on Intake; after returning the first true (or if DeferTransformation setting is other than UntilRecordInitiation), the values returned by this function are inconsequential. In case RecordInitiator returns false for all records and the DeferTransformation is UntilRecordInitiation, the transformation starts after all input records have been read. Deferral of Transformation may be useful in cases where transformation of initial clusters (e.g. head cluster) requires data that is read from some records during Intake and saved in global cache. Note though that such deferral affects performance as well as memory footprint; and in case of limited buffer sizes may lead to deadlocks. If not supplied, Data Conveyer assumes default function that does not make any updates and returns true, i.e.: Any exception thrown by this function will cause the process shutdown with a completion status of Failed. | |||
RecordRouter |
A function to determine the output target for a given record. It receives an output record and a cluster that contains the record and returns TargetNo to be assigned to the record (Func<IRecord, ICluster, int>).
This function is specific to PerRecord router type; it is ignored for other router types. If not supplied, a default function that returns 1 for every record is assumed, i.e.:
Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
| |||
RepeatLeaders |
Relevant only if multiple output targets are present and LeaderContents is not null.
If true (default), then the trailer rows will be send to all output targets after sending data rows.
If false, then the trailer rows will only be sent the last output target (typically the one with the highest TargetNo, but it can actually be any target).
| |||
RepeatOutputHeaders |
Relevant only if multiple output targets are present and HeadersInFirstOutputRow is true.
If true (default), then the header row will be sent to all output targets before sending data rows.
If false, then the header row will only be sent the first output target (typically the one with TargetNo=1, but it can actually be any target).
Note that all output targets are subjected to the same set of processing rules, which implies that the same header row applies to all targets regardless if it's repeated for all sources or not. | |||
RepeatTrailers |
Relevant only if multiple output targets are present and TrailerContents is not null.
If true (default), then the trailer rows will be send to all output targets after sending data rows.
If false, then the trailer rows will only be sent the last output target (typically the one with the highest TargetNo, but it can actually be any target).
| |||
ReportProgress |
True causes Data Conveyer to raise progress events, i.e. PhaseStarted, PhaseFinished and (if ProgressInterval other than 0) ProgressChanged.
If false (default), then no progress events occur.
| |||
RetainQuotes |
True will keep double quotes surrounding values if any; false (default) will strip surrounding quotes.
Note that quotes are stripped before possible trimming; so unless quote is the very 1st character, quotes will be retained regardless of this setting.
| |||
RouterType |
Type of router that determines the output target. One of: SingleTarget (default), SourceToTarget, PerCluster or PerRecord.
| |||
TextIntakeSupplier |
A function that supplies contents of a single text line to be fed into Data Conveyer. (Func<string>).
When end of data is reached, the function must return null (end of data mark).
The TextIntakeSupplier function is a simplified version of the IntakeSupplier function where the following restrictions apply:
| |||
TextOutputConsumer |
An action (void function) that consumes a single line of text received from Data Conveyer. (Action<string>).
The last line sent by Data Conveyer is always null (end of data mark).
The TextOutputConsumer action is a simplified version of the OutputConsumer action where the following restrictions apply:
| |||
TimeLimit | ||||
TrailerContents |
Data, if any, to be sent to output after the last output line; may be multi-line.
| |||
TransformBufferFactor |
Ratio between sizes of the transformer's output and input buffers (transformer's input buffer size is defined by the BufferSize setting).
This advanced setting allows fine-tuning of memory consumption by Data Conveyer.
For example, if transformation process is expected to remove (filter out) on average every other cluster, then it may be sensible to set transformer's input buffer size to 100 clusters
and the transformer's output buffer size to 50 clusters. In this case, BufferSize = 100 and TransformBufferFactor = 0.5.
Default value for IntakeBufferFactor is 1.0.
This setting is respected only if BufferSize is set to a positive value (i.e. ignored in case of BufferSize's default value of -1 (Unlimited)).
| |||
TransformerType |
Type of the transformer: Clusterbound, Recordbound, ClusterFilter, RecordFilter or Universal. Default transformer type is Recordbound.
The Aggregator type is designated for future use.
| |||
TrimInputValues |
True means leading and trailing spaces will be trimmed from values encountered during intake; default is false meaning all spaces are preserved.
Note that RetainQuotes = true will prevent trimming of quoted values.
| |||
TrimOutputValues |
True means that leading and trailing spaces will be trimmed from output values. False (default) means that all spaces are preserved.
Values get trimmed before surrounding in quotes (per QuotationMode setting).
| |||
TypeDefiner |
A function to determine data types for those fields that are not listed in ExplicitTypeDefinitions.
The function takes a field name and returns an ItemDef consisting of a type and format for the field.
Default function assumes every field is of string type and has no format, i.e.:
Any exception thrown by this function will cause the process shutdown with a completion status of Failed.
| |||
UniversalTransformer |
A function that takes a single cluster and returns a sequence of clusters; specific to Universal transformer type.
If not supplied, a default pass-through function is used that passes input cluster to output as a single element enumerable, i.e.:
Any exception thrown by this function will cause the process shutdown (CompletionStatus of Failed).
| |||
XmlJsonIntakeSettings |
A string containing comma-separated parameters for parsing XML or JSON or UnboundJSON data on intake.
Each such parameter is a key-value pair with a pipe symbol (|) separating the key and the value.
The parameters reflect the shape of data to parse and define the elements to be extracted.
There are some differences (explained below) in their interpretation in case of XML vs JSON vs UnboundJSON.
The following keys can be used:
Example 1:"CollectionNode|Members,RecordNode|Member"(XML or JSON) Example 2:"CollectionNode|Root/Members,ClusterNode|Group/FamilyRecordNode|Member"(XML or JSON) Example 3:"RecordNode|row"(XML fragment or JSON object with an array) Example 4:"CollectionNode|,RecordNode|"(JSON only - an array of objects=records) Example 5:"ClusterNode|,RecordNode|"(JSON only - an array of arrays=clusters of objects=records) Example 6:"RecordNode|"(JSON only - multiple objects containing records) Example 7:"CollectionNode|Root/Members[@region=\"North\"],ClusterNode|Group[@id=2][@zone=\"\"]/Family,RecordNode|Data/Member[@class],IncludeExplicitText|true"(XML only) Example 8:"DetectClusters"(UnboundJSON only) This configuration setting is only applicable when InputDataKind value is XML or JSON or UnboundJSON. | |||
XmlJsonOutputSettings |
A string containing comma separated parameters for writing XML or JSON or UnboundJSONdata on output.
Each such parameter is a key-value pair with a pipe symbol (|) separating the key and the value.
The parameters define the shape of data to write.
There are differences (explained below) in their interpretation in case of XML vs JSON vs UnboundJSON.
The following keys can be used:
Example 1:"CollectionNode|Members,RecordNode|Member,IndentChars| "(XML or JSON) Example 2:"RecordNode|Member,IndentChars|\t"(XML or JSON) Example 3:"CollectionNode|Root/Members,ClusterNode|Group/Subgroup/Family,RecordNode|Data/Member"(XML or JSON) Example 4:"RecordNode|,IndentChars| "(JSON only - an array of object containing records) Example 5:"ClusterNode|,RecordNode|"(JSON only - an array of arrays=clusters of objects=records) Example 6:""(JSON only - all node parameters absent is a special case that results in multiple objects containing records) Example 7:"CollectionNode|Root/Members[@region=North],ClusterNode|Group[@id=2][@zone=\"\"]/Family,RecordNode|Data/Member[@class=\"main\"],AttributeFields|ID;zone"(XML only) Example 8:"ProduceStandaloneObjects,SkipColumnPresorting,IndentChars| "(UnboundJSON only) This configuration setting is only applicable when OutputDataKind value is XML, JSON or UnboundJSON. |