Every browser sends its user-agent to each website that one connects to. This post looks at what exactly the user-agent string is. In simple words, a user-agent is actually a string which is one line of text that identifies the operating system to the web server and the browser. However, in recent times, user-agents have become a mess.
A User-Agent field in the HTTP header is included when a browser connects to a website. The contents of the user-agent field depend on the browser. Every browser has its own user-agent which is distinctive. Basically, the user-agent is a way through which the browser says "Hi, I’m Google Chrome on Windows" or it might say "Hi, I’m Opera Mini for an iPhone" to the webserver.
This information can be used by the webserver to serve different web pages to various operating systems and different web browsers. For instance, a website might send mobile pages for mobile browsers, it could be modern pages to modern browsers and a request for upgrading the browser when one uses the browser such as Internet Explorer.
For example, let’s look at Firefox’s user-agent for Windows 7. The user-agent tells the webserver a lot of things. The operating system would be Windows 7 (code name is Windows NT 6.1) which is a 64-bit version of Windows (WOW 64) and the browser is Firefox 12.
Let’s look at the user-agent of Internet Explorer 9. It would be "Mozilla/ 5.0 (compatible; Trident/ 5.0; WOW 64; Windows NT 6.1; MSIE 9.0)". The string of the user-agent identifies whether the browser is IE 9 with a Trident 5 rendering engine. But, one might spot something confusing which is that IE identifies itself as being Mozilla. This will be looked at below.
However, first let’s look at the user-agent of Chrome which is "Mozilla/5.0 (WOW 64; Windows NT 6.1) Apple Web Kit/ 536.5 (KHTML such as Gecko) Safari/526.5 and Chrome/19.0.1084.52". If one looks into this, they would notice that Chrome also pretends to be both Safari and Mozilla. In order to understand why it does this, the history of user-agents and browsers is needed to be examined.
One of the first browsers was Mosaic. It has a user-agent string that was known as "NCSA Mosaic/2.0". It was only later that Mozilla came along and the user-agent was referred to as Mozilla/1.0. Unlike Mosaic, Mozilla was a much more advanced browser and in particular, it could support frames. Pages containing frames were sent to Mozilla browsers by web servers to check and see whether the user-agent contained the word Mozilla. As for other browsers, the web servers only send old pages without the frames.
It was only eventually that Microsoft Internet Explorer came into existence and it could support frames as well. But, Internet Explorer did not receive web pages with frames as web servers only sent those for Mozilla browsers. In order to fix this issue, Microsoft had to add the word Mozilla to its user-agent and added additional information such as a reference to IE and the word compatible.
Web servers preferred the word Mozilla and they sent Internet Explorer, the modern web pages as well. Other browsers which came out later followed the same thing that Internet Explorer did and added Mozilla.
It is only after this that some servers looked for the word Gecko which is the rendering engine of Firefox and it served Gecko browsers with different pages as compared to older browsers. Originally developed for Konqueror for Linux KDE desktop, KHTML added the words to get modern pages which were designed for Gecko such as “like Gecko”. Web Kit had been based on KHTML when it had been developed and the word Web Kit had been added, while the original line “KHTML, like Gecko” was kept for compatibility purposes. Similarly, in this way, most of the browser developers had to keep adding words for the user-agents as time went by.
The truth is that web servers do not actually care about the exact user-agent string as they only check if they contain a specific word.
User-agents are even used by web-crawling bots. For instance, the web crawler of Google identifies itself as "+http://www.google.com/bot.html Google bot /2.1". Special treatment can be provided to bots by web servers. For instance, they can allow them to pass through compulsory registration screens. It does mean that it is possible for one to bypass through the registration screens at times by simply setting their user-agent as Google bot.
Orders can be given to specific bots by web servers that use robots.txt file. For instance, the web server may tell a specific bot to simply go away or it could tell another bot to only index specific areas of the web page/website. The bots would be identified by their user-agent strings in the robots.txt file.
Just about every major browser contains ways through which user-agents can be set so that one can see what the web servers actually send to various browsers. For instance, simply set the desktop browser to a user-agent string of the mobile browser and see the mobile version of the web page for desktop.