Mohsin Kalam’s Blog

Encodings

Goal:

To describe the different encodings available for both X12 and EDIFACT, the characters allowed in each encoding, and how Engine Receive and Send use these encodings during processing.

X12:

For X12 standard, R2 supports three encoding types: Basic, Extended and UTF8.

Basic allows:

  • A to Z
  • 0 to 9
  • ! ” & ‘ ( ) * + , – . / : ; ? = (space)

Extended allows:

  • All of Basic
  • a to z
  • % @ [ ] _ { } \ | < > ~ # $

UTF8 allows:

  • All of Extended
  • All of UTF8 characters

EDIFACT:

For EDIFACT standard, R2 supports the following encodings: UNOA, UNOB, UNOC, UNODUNOK, UNOX, UNOY and KECA.

UNOA allows:

  • A to Z
  • 0 to 9
  • . , – ( ) / = (space)

UNOB allows:

  • All of UNOA
  • a to z
  • ‘ + : ? ! ” % & * ; < >

UNOC allows:

  • ISO 8859 character set

UNOD-UNOK allows:

  • All of UNOA-UNOC
  • All of UTF8 characters

UNOX allows:

  • ISO 50220 character set
    • This code page allows the escape techniques in accordance with ISO 2375. The text starts in ASCII and switches to Japanese characters through an escape sequence. The bytes following the escape sequence are encoded in two bytes each

 KECA allows:

  • A to Z
  • 0 to 9
  • . , – ( ) / = ! ” % & * ; < >
  • Windows 949 code page
    • Korean Syllables (2350 characters)
    • Korean Hanja (4888 characters)
    • Korean Alphabets
    • Characters and numbers enclosed in a circle
    • The length of the strings are counted by byte instead of characters. So if you have a data element of length 3, you can have 3 latin characters, 1 Korean character or 1 Korean and 1 Latin character!

Engine Receive:

During parsing, the EDI Engine validates the incoming document against the specified encoding type selected in the settings. If a character in the instance falls outside the specified bucket for that setting, the instance is suspended and an event entry logged.

X12: 

The X12 Interchange is not self describing. Additionally it is not required to have a Byte Order Mark (BOM) so that the appropriate encoding can be determined. So reading ISA without knowing the encoding type may lead to different values for party resolution. Bottom line – This information needs to be present prior to party lookup.

For this purpose, the EDI Disassembler component supplies the pipeline property “X12ApplicableCharSet” which allows the above three encoding type and the value used from this property is applied on the message being parsed.

EDIFACT:

The EDIFACT interchange is self describing with UNB1.1 giving the appropriate encoding types. During parsing, information from this field is applied for processing the document.

Engine Send:

During serialization, the EDI Engine applies the encoding type selected in the settings to the outgoing message. If a character in the instance falls outside the specified bucket for that setting, the instance is suspended and an event entry logged.

X12:

To maintain a symmetric design with the receive side, the EDI Assembler component supplies similar pipeline property “X12ApplicableCharSet” which allows the above three encoding type and the value used from this property is applied during serialization.

In addition to this property, there is a property in PAM under Party as Interchange Receive/X12 Interchange Envelope Generation/X12 Character Set. The value from this property is used just to validate the PAM fields and is not used by the runtime engine.

EDIFACT:

The setting applied during serialization of an EDIFACT document comes from UNB1.1 under PAM/Party as Interchange Receiver/EDIFACT Interchange Envelope Generation/UNB Segment Definition.

Unlike X12 where PAM setting just validates PAM fields, this setting validates both PAM fields and instances. 

The table below summarizes the above points

X12 Validation setting taken for

EDIFACT Validation setting taken for

Instance

PAM

Instance

PAM

Receive Pipeline Property in EDI Disassembler X12 Character Set in PAM under Receiver UNB1.1 in instance UNB1.1 in PAM under Receiver
Send Pipeline Property in EDI Assembler X12 Character Set in PAM under Receiver UNB1.1 in PAM UNB1.1 in PAM under Receiver

End Note: 

I hope this post provides valuable information to you regarding encoding types and their usages in R2. Your comments/questions are always welcome.

Thanks

Mohsin Kalam

Advertisements

1 Comment »

  1. Hello Mohsin,

    Thank you for all the articles on your site. The information is very clear and to the point. I am still novice to Biztalk but your articles have been very helpful and a great resource for me. Thanks again and looking forward for more.

    – Jyothsna

    Comment by Jyothsna — February 6, 2008 @ 8:00 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: