|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Overview Mergemill merges static contents in a template with data feeds to generate your desired output. You embed special tags in the template to direct the insertion of contents from feeds you specify. Content-insertion fields are embedded in the template in the form of the basic tag <?[FieldName]?>. All the fields placed between a <?Loop?> and an <?EndLoop?> tags are in-loop fields, and all those outside the loop tag pair are out-loop fields. Feed-insertion fields directly insert data values from feeds. Other fields insert values generated on-the-fly. When you set up a merge job in Mergemill, you specify an associated template and push a button to parse it. For each new field found in the template, Mergemill automatically adds a task to the job definition. You then specify the data column to be fetched for each field in its task settings, or specify how the values are to be generated dynamically. When you run a job, the template is executed repeatedly till all primary feeds are exhausted. In each run of the template, Mergemill copies all static contents in your template to the output, and replaces each field tag with the appropriate data value from its specified source to generate the full output. Depending on what you set the job to do, when Mergemill reaches the end of the template each time, one of several things will happen: the merged text is spoken, the merged email is sent, the merged file is saved, the data are used to update an SQL data store, or the data are exported to TSV, CSV or XML files.
A data source provides data feeds for feed-insertion fields in the template. The number of fields specified to fetch data from the same feed determines the number of data columns it contains: one column of data in the feed for each field. A data feed may consist of many consecutive data streams, which may come from the files in a folder, the emails in an inbox, or the web pages on a URL list. For example, a folder of ten files provides a feed comprising ten streams. If you set a task to obtain data from a folder, an email account, or a URL list as a single stream, then the entire feed will be treated as one stream. If the data source is an SQL data store, there is always only one stream in the data feed. All data streams in a feed contain the same number of data columns, each of which provides a series of data values for one field. Shorter columns in a data stream are always padded with empty values to match the data count of the longest column, so that a data stream always provides the same number of data values in all its columns. Streams are important in two ways. First, the scope of data sorting is limited to the data stream. You may set multi-level sorting for any data feed. All fields obtaining data from the same feed are sorted together by rows, one stream at a time. Second, loops always break on stream breaks. This is because Mergemill fetches data from feeds one stream at a time, to produce a set of outputs that together exhausts the current data streams for all the in-loop fields in the template. The section on loops explains this process in detail. When Mergemill parses a template, it creates only one task for each unique field name, and so fields identically named in different loops all use the same column of data values from the same stream. Their uses of the same data column, however, need not be in step, because Mergemill manages each loop separately when running the template. Different fields requesting data from the same column of the same source do not necessarily obtain the same series of data values, because in setting a task you may specify filters to fetch specific contents in the data values. You may also specify the conditions each data value must satisfy to be included.
Mergemill accepts data feeds from diverse sources. Some fields in a job may obtain their data from an ODBC-compliant SQL database, and other fields in the same job may get their data from a local REAL SQL database, emails, a webpage, webpages on a URL list in a text file, a local text file, a remote text file via FTP, a local folder of text files, or a remote folder of text files via FTP. A data source may provide structured or non-structured data. Sources providing structured data are delimited text files (commonly tab-separated or comma-separated), XML files, and SQL data stores. Non-structured data come from emails, plain text files, and HTML files. A plain text file or an email message may provide its entire text content as a single data value, or it may provide non-structured data each bracketed with Mergemill data markers, like [FieldName]...[/]. You may easily extract these wrapped data values using Mergemill's fetch filters. In this case, text outside the markers are ignored by Mergemill.
The default is no filter actions: all data values are included as they are. If filters are specified, data values are fed through the filters as they are read to decide if they should be included. If a value meets the include conditions, the pieces of text within the data value satisfying the fetch filters will be captured. If only one field obtains data from a source, or a source provides non-structured data, the include filters specified for a field only affect the data values provided for it. If a source provides structured data for several fields, then a data value rejected for any one of those fields will exclude the entire row of data. If the fetch filters grow the data count of some columns of structured data, then for the other columns of the same stream you may opt to copy values down to the inserted cells or leave the inserted cells empty. Columns in other feeds are not affected. If the fetch filters grow the data count of some columns of non-structured data, all other columns are not affected regardless of whether they are in the same feed. All Mergemill does is to append empty values to shorter columns so that the data count is uniform across all columns of the same feed.
All search-replace data processing are done AFTER the filter actions and BEFORE the Datafeed End-Processing operations. Below are the twelve Datafeed End-Processings you may apply to the data read. They are set in the task associated with each field.
The last step of feed processing is data sorting. As mentioned earlier, Mergemill allows multi-level sorting, restricts each sort group to fields getting data from the same feed, and limits the scope of data sorting to the current data stream.
AFTER all the data are read, filtered, processed and sorted, Mergemill runs the template to generate the output. When a field tag is encountered, the appropriate data value is fetched. Multiple occurrences of an in-loop field in the current iteration of a loop get the same data value. Likewise for all occurrences of each out-loop field for the current output. The actual text inserted at each place depends on the attributes inside the field tag. When Mergemill encounters a field tag for which you specified dynamically generated data, i.e. autotext or RBScript, the data value to be inserted is generated on-the-fly for EACH occurrence of the field. You specify in the associated task of the field the RBScript to run or how the autotext is generated. The RBScript is compiled only once for all occurrences of the field for optimum speed.
Mergemill lets you specify how the output path and output file name are built for a merge job, and you may include feed-insertion fields as components. The starting data value for the field for the page to be generated is used.
You may specify a source filename or source filename extension as a component in building output path, output file name, or autotext. Mergemill always uses the first out-loop feed-insertion field found in the template to determine the source filename and source extension for the current page. If you have multiple sources for the fields, and you want a certain field to set the source filename and extension while other out-loop fields appear earlier in your template, you may put the chosen field FIRST, but between <?HD?> and <?/HD?> to hide it from the output.
A sequence number component may be included in generating autotext values for a field, in which case you need to provide parameters in the form of Start;Increment in the task settings of the field, where Start is the beginning number and Increment is the step size between successive numbers in the series. For example, the parameters 012;2 give you the sequence 012, 014, 016, and so on till the end of the job. Mergemill maintains the sequence numbers independently for each loop. If, for instance, <?[SeqNum]?> is an autotext field that is used outside and inside loops, Mergemill will begin them all at the Start value at the beginning of the job. As the loops are run, the sets of SeqNum will be out of step with each other. The sequence number components in the output file name and output path are similar. The number begins at the Start value when the job begins, and the Increment is added for each subsequent new page, till the job ends. There is an additional option here: You may choose to skip the Start number in the file name or path. This is done by including the Suppress First Number switch (1 or 0) in the parameters: Start;Increment;Suppress. For example, 000;1;1 in the output file name may give you output.htm, output001.htm, output002.htm, and so on.
The RBScript dynamic data source allows you to execute RealBasic code within Mergemill and assign the result to a Mergemill field tag.
Please note that the parameter to be passed could either be a static value or a feed-insertion field to fetch data from. If you use a field as a parameter, its current first-column data value (i.e., <?[Field]?> or <?[Field]{1}?>) will be passed to the script.
The RealBasic language is very similar to that of Microsoft Visual Basic. So you may use your favourite BASIC development tool to develop and test your scripts. Mergemill does not provide any code development facilities like the IDE or debugger, other than the edit-field to simply insert and save your script, and the compiler to translate it into fast machine code. If you use Real Studio to develop your RBScript, we have included a simple project file RBScript.rbp in the Mergemill software package for you. It contains the recommended script structure in the Open event of the App class. You may develop your script there, and test it by selecting Run on the Project menu.
If you know BASIC well and your code is simple, you may develop and test it in Mergemill. Below is a simple example to calculate compound growth. It shows the four parts of a typical RBScript. Part 1
is the variable declaration:
<?[Field]?> Attributes you may include in the basic field tag are listed below. The field name and the attribute list are separated by a colon, and the attributes are separated by semicolons.
If you need to place an out-loop field inside a loop body, add the OutLoop attribute. The attributes are applied in order. So AWords,3;ZWords,1 inserts the third word of the data text into the output, and ZWords,3;AWords,1 inserts the third-last word. Also, Position;Count is effectively just Count. If the attribute parameter nn is missing or less than 1, AWords, ZWords, Left, and Right return the whole field data text unaltered, and Mid includes all characters from the starting position ss to the end of the field data text. If the attribute parameter ss in Mid is missing or less than 1, Mid starts at the first character of the field data text. If both parameters in Mid are missing or less than 1, Mid returns the whole field data text unaltered. Please note that Position and Count nullifies all other attributes EXCEPT OutLoop.
<?[Field]@NumFormat?> If the value for a field is numerical, you may wish to control its format in the output. The NumFormat specification enables you to do that. It can be made up of up to three formats separated by semicolons: positive format; negative format; zero format. Each format is a string of special characters to control how the number
will be formatted:
<?[Field]{Column}?> Mergemill makes it easy for you to generate multi-column lists on the output page. Start by adding the loop tag pair to your template to mark out the part to be repeated. Then add the column number to the in-loop field tags. When you run the job, Mergemill manages putting the series of data values from your feeds into the columns of fields. All field tags have a default column number of 1 if not specified. This is the minimum allowed value. For dynamic fields, the column number, if any, is ignored because each data value is generated on-the-fly. It is important to remember that the data value for the first-column field is always considered used for the current run of the template or loop. However, for field tags with a data value position offset (i.e. a column number greater than 1), a data value is considered used only when Mergemill acts on the request of such a field tag (or a field operand in an expression tag) to fetch the value. No such action is taken when the field tag appears within a hide section, or is skipped in a branching structure, such as <?If(...)?>...<?Else?><?[Field]{2}?><?EndIf?> where the IF condition returns true. A value is also not considered used if the actions to fetch values are only for carrying out comparisons in the branching tags <?IF(expression...)?>, <?IF(SAME[...)?>, and <?CASE(expression...)?>. For each subsequent run of the template or loop, Mergemill will continue on with the data value immediately after the one last USED. Consider the following HTML code segment as an example:
This template segment above generates a list on a web page that compares the first and second quarter sales figures across several years. The feed is from a CSV text file with contents partly shown below.
The column number 4 in the Year tag instructs Mergemill to skip three values in each iteration of the loop. To keep the columns in step, we need to add a 4th column for Sales as well. Since this value does not need to be inserted but need to be considered used, we put it in a variable assignment block. It is important to note that Mergemill treats all out-loop template segments together as one special loop body. Column numbers are handled exactly as in any other loop, and multiple occurrences of a field use the same data value.
<?[LookupValueField]([LookupKeyField]=[Field])?> The lookup extension of the field tag instructs Mergemill to take the current value for Field, locate the same value under the LookupKeyField data column, get the corresponding value for LookupValueField, and insert it into the output. All attributes are ignored in the lookup key field and the lookup value field. A good way to learn how the above tags work is to see them in action. We've included a simple case study in the Mergemill Pro software package for this purpose: Mergemill Pro > Mergemill Resources > Examples > Simple Case Study. The files are also available here: download for Windows, download for Mac OS X.
[ Top of Page ] Tag List | Content
Insertion | Expression | Looping | Branching | System
Values | Statistical Functions | Sections |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||