Xponent logo Xponent Specialists In Large XML Documents Contact

Xponent's Mostly XML Blog


Split XML And Repeat Header Element

This article shows how to use the XmlSplit program to split an XML document and include the header element from the source document in each of the split files. It also demonstrates how to skip content up to the start point for splitting as well as executing the split on elements deep in the XML hierarchy.

The header element is not a technical term used in the XML specification, but is widely used by developers to refer to the first element under the root. A header element is typically a unique element that contains identity or desciption information pertaining to the XML document. When there is a need to split the document into smaller files, it is often a requirement to insert the header element into each of the split files. The header element in the file below is TransportHeader.

ImportData.xml

importdata.xml

The objective is to split this file into five files each containing the header element and one of the "R" elements. The first split file created is shown below. The other four split files created differ only in the "R" element that appears in the file.

test1.xml

split file

It is not uncommon that the content to be split is within a descendant element and content prior to it should be skipped. In our sample file, the content to be split are the "R" elements. With the exception of the header element, the elements before the first "R" element should be skipped.

XmlSplit has command line arguments for handling all of these requirements. The XmlSplit Script Wizard was used to automatically generate the script by selecting the necessary arguments and setting their values with a simple dialog.

The script below uses the /H (Header) argument to write the header element to each split file. Each split file is to contain one row item, the R element, in addtion to the header element. The /S (Split method) argument is set to 1, meaning the first split method, splitNthElement, is to be used and the /F (Frequency) argument is set to 1 to split after each R element. Since the R elements are a depth of 5, the /D (Depth) argument is set to 5. In order to exclude all nodes up to the first R element, the /T (Threshold) argument is set to /T=R which tells XmlSplit to skip all nodes until an element named "R" is reached. The /R (Root) argument is set to the name of the root element in ImportData.xml and is used to encapsulate each split file so that each is a well-formed XML document.

script

Submitted by Bill Conniff, Founder of Xponent, on April 23, 2012



































copyright © 2008-2014. Xponent LLC. All rights reserved.