“XML or HTML?”
XML is one of the most ubiquitous technologies in the programmatic advertising world. This is because it’s one of the most common means of integrating with other programmatic platforms (the others being JSON and
If you’re working in the programmatic advertising world, it is essential to understand the technologies that make it all possible.
And eXtensible Markup Language, is one of those technologies. Unfortunately, this relatively simple technology is relatively misunderstood. This misunderstanding generally stems from two things: the way it looks and the name.
Before going any further, let’s take a look at some XML code:
<?xml version="1.0" encoding="UTF-8"?>
<!-- created with Free Online Sitemap Generator www.xml-sitemaps.com -->
XML is used all over the internet nowadays. One common usage is the integration of a sitemap. A sitemap is essentially a directory of the different pages on your website. This one is from mobinner.com (generated on sitemap.com).
Now, when you look at this the first time, you might be forgiven if you thought it was HTML. However, if you actually read the first bracket closely, rather than simply scanning through it as we are wont to do in modern times, you’d see that it says “XML version.”
Now, this sea of brackets, tags, and indentations does look a lot like an HTML page – and the two do have a lot in common. But, nevertheless, this is an XML document.
The other source of confusion is that it’s a “language.” When people tend to think of languages with regards to programming, they think of scripts, applications, etc. Things that are executable and perform some action or another.
XML is not like that.
XML is a markup language. Wikipedia describes a Markup language as such:
So that is to say, it’s a language used to describe the contents of a document. HTML is used to describe the contents of a web page and tell the browser what to display and how to display it. It doesn’t tell the computer anything about the content of the document itself, just how to display it.
HTML is designed to tell a computer how to present text in a way that is human-readable. Information on fonts, where the different line breaks go, colors, text size, etc. It contains all kinds of information about how the text
XML, on the other hand, is designed to tell a computer how to read the contents of the document. Rather than just being concerned with making sure that the contents are human-readable (which they are), a machine must also be able to parse and understand that data.
Do now you understand the divergent goals of HTML and XML. They both seek to do totally different things, despite looking more or less the same to the casual observer.
Now, it’s important to realize that XML wasn’t the first machine-readable markup language.
SGML – The OG Markup Language
XML itself is based on another, much older markup language called SGML (Standard Generalized Markup Language). Technically XML is a “profile” of SGML rather than a version or derivative.
SGML first started coming together as a language at IBM in the 70s. Eventually, due to its utility, it began to see relatively widespread adoption. This adoption led to it officially being adopted by the ISO (International Organisation for Standards) in 1986.
SGML was considered unnecessarily complicated. Constant annoyance with this complexity led several developers to create a stripped-down version of SGML, which was released as XML 1.0 in 1998.
Since then XML has received a series of small updates, but the version number hasn’t changed. There is an XML 1.1, but XML 1.0 is still widely used. Given its nature as a machine-readable markup language, it doesn’t need to be updated very often.
Despite the existence of XML 1.1, XML 1.0 remains the most commonly implemented by far. XML 1.1 is only used in particular circumstances, generally when non-Latin character tags are needed.
That said, developers continue to produce various extensions and integrations for XML. However, unlike HTML, which not so long ago released a paradigm-shifting 5th version, XML has changed little since its adoption.
What’s the point
As hinted on in the introduction, the primary point of XML was to create a machine-readable markup language that was simpler and easier to use than the already existing SGML.
These were the three goals of XML: that it should be easy to use, human-readable, and machine-readable. The developers were largely able to achieve these goals in the first go ground. In fact, XML has changed little since its first release. It just works so well that users are mostly content to leave the language be.
Despite its age, it has enjoyed massive adoption and continued use across a wide variety of industries. Due to its extensible nature, XML-based languages are the norm in many industries that require easy machine-machine communication – especially cross-platform.
A quick word about Unicode
As a markup language, XML is concerned with the presentation of characters. If you look at the beginning of our example XML document, one of the first things that you’ll notice is that the encoding is declared:
<?xml version=”1.0″ encoding=”UTF-8″?>
In our document, it is UTF-8. Or Unicode Transformation Format. This is the recommended format by the World Wide Web Consortium. As you can see by a quick hop over to Wikipedia, it has been steadily supplanting the older ASCII standard.
But what is an encoding? Encoding is simply the machine language framework for the transmission and display of characters.
a, B, -, and # all need to have a corresponding encoding so that the computer knows what to display and how. This standardization allows text to be exactly encoded and re-displayed anywhere else by any other computer with the required encoding built-in.
So this is what’s going on when the encoding is declared at the beginning of the XML document.
Where it’s used in programmatic
XML plays a key role in programmatic because so much information needs to be transferred from computer to computer, consistently and without error.
The programmatic ecosystem is extremely diverse. It consists of a lot of different moving parts that all need to be able to communicate with on another somehow.
DSPs need to talk to Ad Exchanges, which need to talk to SSPs, which need to talk to publisher. Sometimes DSPs need to talk to other DSPs in order to buy and resell offers. DMPs need to talk to pretty much everyone.
The list goes on.
All of these different systems are unique in their own way, built by different people, and designed to function with certain goals in mind. But despite it all they can only function if they can effectively communicate with one another.
This is where XML comes in (in many cases, anyway). As it provides a very simple, effective, and constant means of inter-platform communication.
XML has enjoyed relatively broad and rapid acceptance in a wide variety of industries. Despite tweaks and small updates, the original XML 1.0 standard hasn’t changed significantly since its launch.
As we have discussed, XML 1.1 does exist, but its adoption has been limited and it is really only used is very specific cases.
There have been rumblings over the last years about the development of an XML 2.0. But the question is, does anyone really need an XML 2.0? As a tool it fulfills its role almost perfectly.
XML was widely adopted almost as soon as it was released. It filled an essential niche regarding the production and parsing of information in a way that was both human- and machine-readable.
XML quickly became – and remains today – one of the most utilized means of storing and transferring machine-readable data in a way that is also human-readable.
This has led to its mass adoption in the tech world broadly, but in the programmatic world in particular.
If you’re working in adtech or in programmatic advertising, whether you know it or not, you’re almost certainly using XML.
XML, or eXtensible Markup Language, is a markup languages based on the older SGML. The goal of the language was to create an easy-to-use, machine-readable markup language.
When the original XML 1.0 was released back in 1998, it was quickly adopted. Since then an XML 1.1 has been released for people with certain rare needs. 1.0 remains significantly more common.
In the programmatic advertising industry, the technology is often used to integrate feeds between different platforms and more generally as a means of inter-platform communication.