University of South Carolina, Arnold School of Public Health, Dept. of Health Services Policy and Management, HSPM J716

XML

"XML" stands for Extensible Markup Language ("eXtensible Markup Language").

XML extends HTML

XML is HTML with different tags. XML is doing for databases what HTML does for word processing -- providing a universal format that anyone with any computer can read and write.

Here's an example that shows the basic idea:

Imagine you have a medical record, with this notation:

History

...

Patient reports having had an allergic reaction to penicillin at age 1.
The doctor would want a system to remind him or her of this piece of information on a future visit, if the patient comes in with an ear infection, for example.

One way to do this would use HTML to display the important text in a way that would catch the doctor's attention.

<h1>History</h1>

Patient reports having had an <b>allergic reaction to penicillin</b> at age 1.
The doctor could then pull this up in with a web browser:

History

Patient reports having had an allergic reaction to penicillin at age 1.
This is better than no emphasis but the computer couldn't help the doctor know why the boldface words are important. It would be better if the computer could know the nature of the information. That would help it help the doctor by emphasizing the information that doctor most needed to notice.

Tags like <h1> and <b> only tell how something should be displayed.  XML allows you to put in tags that tell what something means, like this:

<History>Patient reports having had an allergic reaction
to <allergy>penicillin</allergy> at age 1.</History>
This flags the penicillin allergy in a way that a computer program could recognize. 

With tags like this, the computer could be programmed to alert the doctor when the computer is told that the patient has a condition for which penicillin might be prescribed, or when the doctor enters an order to prescribe penicillin. At the same time, if the doctor did want to look through the entire medical record, to get a whole picture of the patient, it would all be there and all be readable. <Joke>Cluttered with tags, but readable.</Joke>

Suppose another patient has penicillin prescribed.  That could be notated as

<Prescribed>penicillin</Prescribed>
This would make it easy for the computer, which is not as smart as we are at judging meaning from context, to tell that this mention of penicillin refers to a prescription, while the first one refers to an allergy.

There are database programs that doctors can buy for their offices that can take data like this and keep it organized.  The problem with existing database programs is that each has its own proprietary file format.  This makes it hard to communicate with other offices or institutions, unless everyone standardizes on the same database.  Different file formats create major problems with offices or institutions merge. XML offers a way to standardize the data file format.

Element -- A New Jargon Term for XML

In HTML, we call <B> a "tag."  The tag turns on boldface, meaning that text after the <B> is displayed in bold.  The <B> tag has a corresponding </B> tag, which turns off boldface, so that text after the </B> is unbold.  An element is the start tag, the text between, and the end tag.  The next line shows a complete element:
<B>This text is bold.</B>
That's an element.  HTML has a number of tags that don't have a corresponding / tag, so they don't form elements.  Common HTML tags that don't form part of elements are <BR> and <IMG ...>.  XML has uses that kind of tag only at the beginning of the XML document.  The body of an XML document is made up entirely of elements.

An XML Example

Here's an example, a record of a home health visit.  I'll invent the tags as I go along. Preformatted text shows what is in the XML file. The text in your browser's regular font is my explanation. (For a plain copy of the XML file, without all the comments, see homevisit.txt. For teaching purposes, I'm starting here with what winds up being the second half of homevisit.txt. Later, we'll explain the first half.)

So, here is our XML example:

<HOMEVISIT>
This starts the document.  There will be a corresponding </HOMEVISIT> tag at the end.  These function like the <HTML> and </HTML> tags in an HTML document.  They show the general type of the document, and where it starts and ends.
<PATIENT>
This starts a section about the patient.
<NAME>
    <FIRSTNAME>Mary</FIRSTNAME><LASTNAME>Jones</LASTNAME>
</NAME>
<ADDRESS>
    <STREET>123 Fourth St.</STREET>
    <CITY>West Columbia</CITY><STATE>SC</STATE><ZIP>29001</ZIP>
</ADDRESS>
<PHONE><AREA>803</AREA><NUMBER>555-1515</NUMBER></PHONE>
<NUMBER>135792468</NUMBER>
Each piece of information about the patient is put between tags that indicate what the information means, forming elements of categorized information.  Some of the elements contain elements.  NAME, ADDRESS, and PHONE do that in this example.  Having nested elements lets me use the <NUMBER> tag for the patient number and also for the phone number.  The phone number NUMBER element is inside the PHONE element, so the computer will know it is a phone number.

The list of services rendered might be in another element that contains elements:

<SERVICES>
    <DATE><MONTH>4</MONTH><DAY>20</DAY><YEAR>1998</YEAR></DATE>
    <SERVICE>
        <NAME>Nurse Visit</NAME>
        <TIME>00:30</TIME>
        <PRICE>$60.00</PRICE>
    </SERVICE>
I decided I needed one type of element, which I call SERVICE, for services that you sell by the hour, and another type of element, which I call PRODUCT, for things you sell by the piece.
    <PRODUCT>
        <NAME>Nebulizer Cup</NAME>
        <QUANTITY>1</QUANTITY>
        <PRICE>$1</PRICE>
    </PRODUCT>
    <PRODUCT>
        <NAME>Atropine-Terbutalene Mix</NAME>
        <QUANTITY>6</QUANTITY>
        <PRICE>$12</PRICE>
    </PRODUCT>
</SERVICES>
</PATIENT>
</HOMEVISIT>
That might be a record of a home visit to an asthma patient by a nurse who leaves a couple of days worth of inhaler medication along with a device to put it in.

After the nurse makes a second visit, the file could have another big <PATIENT>...</PATIENT>, with all the stuff in between, after the </PATIENT> tag above and before the final </HOMEVISIT> tag.  Lots of visits could be added that way.

We wind up with a database.  This example is highly simplified, of course.  Even so, you can see show you could keep any kind of data you wanted in this format.  All you need is enough types of elements.

Document Type Declaration (DTD)

I just made up those tags and their corresponding elements.  How do I know that the people to whom I give the data will know what they mean?  The answer is that we have to agree in advance on what tags to use.

The XML standard suggests that XML documents start with a Document Type Declaration, or DTD for short. The purpose of this is to help assure that the tags you used in the document are the ones you had agreed to use. The DTD lists all the elements to be used in the document in a standard way.  Here's the DTD that would go at the start of our document:

<?XML version = "1.0" ?>
This announces that we have a document that will conform to the XML 1.0 standard.
<!DOCTYPE HOMEVISIT [
    <!ELEMENT HOMEVISIT (PATIENT)>
        <!ELEMENT PATIENT (NAME,ADDRESS,PHONE,NUMBER,SERVICES)>
            <!ELEMENT NAME (FIRSTNAME,LASTNAME)>
                <!ELEMENT FIRSTNAME (#PCDATA)>
                <!ELEMENT LASTNAME (#PCDATA)>
#PCDATA stands for "parsed character data," meaning that the actual data goes here, in plain text.  The indentation I'm using is something I just made up for clarity.  The computer will ignore it when it reads the data file.
            <!ELEMENT ADDRESS (STREET,CITY,STATE,ZIP)>
                <!ELEMENT STREET (#PCDATA)>
                <!ELEMENT CITY (#PCDATA)>
                <!ELEMENT STATE (#PCDATA)>
                <!ELEMENT ZIP (#PCDATA)>
            <!ELEMENT PHONE (AREA,NUMBER)>
                <!ELEMENT AREA (#PCDATA)>
                <!ELEMENT NUMBER (#PCDATA)>
            <!ELEMENT NUMBER (#PCDATA)>
            <!ELEMENT SERVICES (DATE,SERVICE*,PRODUCT*)>
The * means that there can be more than one of those items.
                <!ELEMENT DATE (MONTH,DAY,YEAR)>
                    <!ELEMENT MONTH (#PCDATA)>
                    <!ELEMENT DAY (#PCDATA)>
                    <!ELEMENT YEAR (#PCDATA)>
                <!ELEMENT SERVICE (NAME,TIME,PRICE)>
                    <!ELEMENT NAME (#PCDATA)>
                    <!ELEMENT TIME (#PCDATA)>
                    <!ELEMENT PRICE (#PCDATA)>
                <!ELEMENT PRODUCT (NAME,QUANTITY,PRICE)>
                    <!ELEMENT NAME (#PCDATA)>
                    <!ELEMENT QUANTITY (#PCDATA)>
                    <!ELEMENT PRICE (#PCDATA)>
]>
Now comes the <HOMEVISIT> tag and the data elements themselves, as before.

The DTD would be useful to the programmers of the hand-held computer that the nurse takes on her rounds.  You could give the computer the DTD at the beginning of the day.  The computer would then know what information to expect and how to store it.  Tomorrow, you could use that same computer in the emergency room.  You would start it with the ER's DTD instead of Home Health's.

If the DTD is an industry standard, and stored on the web, you can replace your DTD with something like this:

<?XML version = "1.0" ?>
<?namespace href="http://www.microsoft.com/xml/schemas/healthcare/homehealth"?>
<HOMEHEALTH>
...
I've imagined that Microsoft has made up a DTD for home health agencies, and that you have chosen to use it.  A computer with an internet connection could download the standard on the fly and set itself up to handle the data accordingly.

What's So Great About XML?

What's great about XML is that you can always read your data.  You don't need the computer or the software that originally created the data.  You don't need a data book to tell you, for example, that each record is so many characters long and that the patient's last name is in spaces 1-20 of each record, etc.  You are never stuck with two computers not being able to talk to each other because they can't understand each other's data format.

What you need, though, is for each software manufacturer to figure out how to get its particular program to read in and write out XML.  The manufacturer would not necessarily have to change the database program itself.  All you need is a translator.  This could be a shell or wrapper around existing software.

What's Not So Great About XML?

The main disadvantage of XML is that it's a wordy way to store data.  All those repeated tags take up space.  As memory, storage, and processor speed fall in price, we can afford to use some inexpensive extra space to gain ease of use and interoperability.

The views and opinions expressed in this page are strictly those of the page author. The contents of this page have not been reviewed or approved by the University of South Carolina.
http://hspm.sph.sc.edu/Courses/J716/CPM/XML.html