Asked  1 Year ago    Answers:  5   Viewed   13 times

According to this thread, and specially this post: https://stackoverflow.com/a/6595973/1125465, Microsoft as always shows off. The size of user agent, can be really, really huge.

I'm working on a little visitors library in php, and I want to store user agent information. I cannot decide on the data type and length.

So my question is: have you got any ideas, on how to shorten the user agent, to some "normal" size? (for example 256 chars).


Note: Developers use user agents for detecting the user browser, and operating systems. So according to the linked example, all the stupid numbers from M$ are just... Just are. As always, getting on our nerves. So the idea is to make a function that shorten the user agent string but is not losing the important information.

I think that such a function should:

  • Not depend on future updates and new browsers (no hardcoded strings)
  • Have a simple mechanism that decide what to delete (for example, if there is a number, comma, number, comma, number, comma, number, ..., it can delete it, it is not interesting).
  • And at the end if all the operations still results in too long user agent (lets say 256 chars), there is nothing more to do, so just cut off the rest. This is one per million, so the data can be lost.

Additional note: I know, that I can make a function that get the browser, and OS type from user agent, and save only these values. But as always such a functions have hardcoded names, and if browser isn't recognized, it for example return "Unrecognized browser'. So in the future everyone must remember about updating these function. And if we save shorten user agent, the information isn't lost (as only the script that is reading the database must have new recognition system). But the entries in database are reliable and consistent, as should be.


UPDATE: As there should be some code, and there is a problem with idea, and not the problem with existing code, I will write some minimum code, that I wrote so far ;) :

<?php
    function shorten($useragent, $maxsize = 256) {
        $shorten = $useragent;
        ... // ?
        $shorten = substr($shorten, 0, $maxsize); // the "last hope" cut
        return $shorten;
    }
    echo shorten($_SERVER['HTTP_USER_AGENT']);
?>

 Answers

2

There are no rules for User-Agent strings, so there is no way to create a completely correct and future-proof parser. There is a general pattern though:

User-Agent: <engine-string> <engine-string> ...

Where engine-string has form:

<agent-name> (<comment>; <comment>; ...)

Each engine string (I just called it that from my understanding, that may be not correct) may or may not have comments.

For example:

Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) ?
AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e ?
Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

(This is a single string, I just broke it into lines.) It seems, whenever someone does a fork of a browser engine, they just append their thing to the end. So we have some abstract "Mozilla" browser (a legacy of the "First Browser War") which thinks it's on iPhone. Then we see that there is a WebKit (which remembers that it was born as KHTML some long time ago). Then there is some Version/6.0 modification, which was then modified into Mobile/10A5376e, which became Safari/8536.25, which finally reveals the secret that it is actually a mobile Google bot.

Another example:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB7.4; ?
InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.4506.2152; ?
.NET CLR 3.5.30729; .NET CLR 1.1.4322)

This is a single engine, but it has much to say in parentheses.

So the general observation is:

  • last engine strings are most important,
  • last comments in parenteses are less important.

Having that in mind, my idea would be to parse the string into these engine and comment tokens, then from each engine section throw away comments starting from, say, the fifth. Then, if it is still not enough, throw away engine sections starting from the second (the first is often an abstract "Mozilla", but often has useful comments; also sometimes it is actually something concrete, especially for web crawlers).

When parsing, we need to take into account that occasionally there may be strings not following this format. They can be saved to a log file for later inspection and then simply cut to the needed length to fit to the database.

Thursday, April 1, 2021
 
SkyNet
 
3

This is what i do to check out that stuff:

if(strlen(strstr($_SERVER['HTTP_USER_AGENT'],"Firefox")) <= 0 ){ // if not firefox

  //do something

}

And added into your code:

function get_user_browser()
{
    $u_agent = $_SERVER['HTTP_USER_AGENT'];
    $ub = '';
    if(strlen(strstr($u_agent,"Firefox")) > 0 ){ 

      $ub = 'firefox';

    }
    else {
      $ub = 'other';
    }

    return $ub;
} 

if (isset($_GET['print']) && $_GET['print'] != "" 
                          && get_user_browser() == 'firefox') 
{
    $pg = $_GET['print'];
    if (!file_exists('1')) 
    {
        echo '<b>It worked!</b>';
    }
}
else 
{
    echo '';
}
Thursday, April 1, 2021
 
1

You could use a library like cURL to request the page with the iPhone user agent, and return that page to your site (be sure to expand relative URLs to absolute, with DOMDocument).

However, you may run into edge cases where CSS/JavaScript/images are served differently via the user agent. This is probably not worth requesting each of these assets on the off chance. You could limit the work by requesting once with your user agent, and then the iPhone user agent, doing md5_file() and seeing if they are different. I wouldn't bother though :P

You could also try this JavaScript...

navigator.__defineGetter__('userAgent', function(){
    return 'foo' // customized user agent
});

navigator.userAgent; // 'foo'

Source.

Also remember you may want to give a warning if your users aren't using Safari, which will be the closest to simulate Mobile Safari.

Wednesday, August 4, 2021
 
SubniC
 
3

Looks like there's a pull request for something similar (add "cordova/phonegap" to UAS)
https://github.com/apache/cordova-android/pull/10

Here is the heart of it.

So I would extend DroidGap and override public void init(CordovaWebView webView, CordovaWebViewClient webViewClient, CordovaChromeClient webChromeClient) :

...
WebSettings settings = this.appView.getSettings();
String userAgent = settings.getUserAgentString();
// can append or redefine here
userAgent += " PhoneGap/Cordova";
settings.setUserAgentString(userAgent);
...

Then you can use the extended DroidGap and have control over how you define the User Agent String.

Just confirmed this works, here is the full code using the current Cordova implementation:

package com.focusatwill.androidApp;

import org.apache.cordova.CordovaChromeClient;
import org.apache.cordova.CordovaWebView;
import org.apache.cordova.CordovaWebViewClient;
import org.apache.cordova.DroidGap;
import org.apache.cordova.api.LOG;

import android.util.Log;
import android.view.View;
import android.view.ViewGroup;
import android.webkit.WebSettings;
import android.widget.LinearLayout;


public class DroidGapCustom extends DroidGap {

    /**
     * Initialize web container with web view objects.
     *
     * @param webView
     * @param webViewClient
     * @param webChromeClient
     */
    public void init(CordovaWebView webView, CordovaWebViewClient webViewClient, CordovaChromeClient webChromeClient) {
        LOG.d("EVENT", "Custom DroidGap.init()");

        // Set up web container
        this.appView = webView;

        // Custom addition of user agent string
        WebSettings settings = this.appView.getSettings();
        String userAgent = settings.getUserAgentString();
        // can append or redefine here
        userAgent += " PhoneGap/Cordova";
        settings.setUserAgentString(userAgent);

        this.appView.setId(100);

        this.appView.setWebViewClient(webViewClient);
        this.appView.setWebChromeClient(webChromeClient);
        webViewClient.setWebView(this.appView);
        webChromeClient.setWebView(this.appView);

        this.appView.setLayoutParams(new LinearLayout.LayoutParams(
                ViewGroup.LayoutParams.MATCH_PARENT,
                ViewGroup.LayoutParams.MATCH_PARENT,
                1.0F));

        // Add web view but make it invisible while loading URL
        this.appView.setVisibility(View.INVISIBLE);
        this.root.addView(this.appView);
        setContentView(this.root);

        // Clear cancel flag
        this.cancelLoadUrl = false;
    }

}
Monday, October 11, 2021
 
akes406
 
4

RFC 2616 (HTTP 1.1) says that message header contents must be "consisting of either *TEXT or combinations of token, separators, and quoted-string". If you look at the definitions for TEXT etc you will find that legal characters are those with byte values not in the [0, 31] range and not equal to 127; therefore characters such as â are as far as I can tell legal as per the spec.

Monday, November 1, 2021
 
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :