Featuresspacer::spacerDownloadsspacer::spacerBuy Nowspacer::spacerSupportspacer::spacerNewsspacer::spacerMergemill Sitespacer

spacer

spacerHomespacer>spacerMergemill Pro converts text file formats and text file character encoding

spacer spacer

spacer

Automated Text File Format and Character Encoding Converter

Easily and quickly convert text file character encoding, and text file formats between TSV, CSV, XML, and more

Share via Email Email
space
Share on Stumbleupon Stumbleupon

print friendly Print / PDF
space
Share on Reddit Reddit


space
Digg Bookmark Digg

Share on Facebook Facebook
space
Share on Delicious Delicious

Share on Twitter Twitter
space
Share on Friendfeed Friendfeed

Share on Linkedin Linkedin
space
Share on Technorati Technorati

Google Bookmark Google
space
Plurk Bookmark Plurk

Share on Tumblr Tumblr
space
Share on Slashdot Slashdot

Share on MySpace MySpace
space
Share on Posterous Posterous

space
Conversion Between Text File Formats

There are many ways to structure your data in a text file. Among them the CSV and tab-delimited formats are in widespread use and can be opened by many kinds of applications, like spreadsheet and database programs. You should avoid constructing or editing such files by hand. One problem with tab-delimited text files is that tabs are whitespace characters, and you may therefore easily break the structure by replacing a tab with a space. In the case of CSV, the comma is such a common character that the specification provides conventions for avoiding delimiter collision, so that a comma intended as part of the data is not interpreted as a delimiter instead. It is thus far better to convert file formats using software like Mergemill Pro.

XML, or extensible markup language, is the most commonly used machine-readable format. For compatibility between database applications, it is best to convert the tab-delimited and CSV formats to XML files. One important advantage, among many, of using XML is that you may specify the character encoding of the content. This makes it very easy to migrate multilingual data.


Conversion Between Text File Character Encodings

In order to represent textual characters in a file, some sort of mapping must be used to assign numeric values to the characters. The mapping varies depending on the character set, which depends on the language being used and other factors. Larger character sets, such as the Japanese Kanji set, use more bytes to represent each of their members.

Interpretive problems may occur if a computer attempts to read data encoded with a mapping different from what it expects. An example is when a Mac OS application attempts to read a text file created on a Windows computer. The Mac OS application may expect text to use the Mac OS Roman character set, while the Windows file may use the Windows Latin-1 character set. So to handle text correctly, some method of identifying the various mappings and converting between them is necessary.

Most character sets and character encoding schemes developed in the past are limited in their coverage, usually supporting just one language or a small set of languages. Multilingual software has traditionally had to implement methods for supporting and identifying multiple character encodings. A simpler solution is to combine the characters for all commonly used languages and symbols into a single universal coded character set. Unicode is such a universal coded character set, and offers the simplest solution to the problem of text representation in multilingual systems. Because Unicode includes the character repertoires of most common character encodings, it facilitates data interchange with other platforms. Using Unicode, text shared across applications and platforms can be encoded in a single coded character set.


The Mergemill Pro Advantage

Converting between common text file formats is easy with Mergemill Pro. You simply choose to export data in CSV, XML, or tab-delimited text format. You may also create a custom output format with no more than a few lines of scripts.

Converting between text encoding is even easier. Mergemill Pro lets you specify the datafeed encoding and the output encoding, and it does the character encoding conversion in generating the output. The Mergemill Pro interface elements, internal data storage, and intermediate files created in running jobs are all in UTF-8 Unicode.

The biggest benefits of using Mergemill Pro are its automation features, and its powerful processing capabilities that let you do far more than simply conversion. You may set up a drop-in folder for Mergemill Pro to automatically process the files contained in the folder at certain scheduled times.

spacer

Learn More...

spacer

Top of Page
spacer

spacer spacer

Featuresspacer::spacerDownloadsspacer::spacerBuy Nowspacer::spacerSupportspacer::spacerNewsspacer::spacerSite Mapspacer::spacerMergemill Site

Copyright © 2001-2011 Cross Culture Ltd. All Rights Reserved.