May 03, 2017

What’s next for the Digital Analytics Association?

hipsterstepsI’ve been a member of the Digital Analytics Association for, it turns out, about twelve years – over half my professional life. In that time I’ve seen the organization grow and blossom into a vibrant community of professionals who are passionate about the work they do and about helping others to develop their own skills and career in digital analytics.

When the DAA started (as the WAA), web analytics was a decidedly niche activity, not considered as rigorous or demanding as ‘proper’ data mining or database development. Many of its early practitioners, like me, did not come from formal data backgrounds; we were to a large extent making things up as we went along, arguing with one another (often in lobby bars) about things like the proper definition of a page view, or the relative merits of JavaScript tags vs log files.

We didn’t know it at the time, but the niche activity we were helping to define would grow to dominate the entire field of data analytics. Today, transactional (i.e. log-like) and unstructured data comprise the vast majority of data being captured and analyzed worldwide and the analytical principles and techniques that the DAA championed have become the norm, not the exception.

The DAA and its members can justly derive a certain amount of satisfaction from knowing we were part of something so early on, but now that the rest of the world has shown up to the party that we started, how do we continue to differentiate the organization and add value to its members and the industry?

It’s to help answer this and other interesting and challenging questions facing the DAA that I’ve put my name forward for a position on the organization’s board. You can read my nomination (and, hopefully, vote for me) here if you’re a DAA member. After twelve years of benefiting from my DAA membership, it’s time to give something back to the organization.

If I’m elected to the board, I’ll devote my energies to helping DAA members adapt to and embrace the next set of transformations that are taking place within the industry. In my role at Microsoft I’m participating in a very rapid shift from traditional descriptive analytics, based around a recognizable cycle of do/measure/analyze/adjust, to machine learning-based optimization of business processes, particularly digital marketing. Predictive analytics and data science skills are therefore becoming more and more important in digital analytics, while the range of data and scenarios is exploding. This raises tricky questions for the DAA: Which skillsets and data scenarios should the association focus its energies on, and how to stay relevant as the industry changes so rapidly?

A big part of the answer, I believe, lies with the DAA members ourselves. At a DAA member event in Seattle last week, I met the excellent Scott Fasser of HackerAgency and had a fascinating conversation with him about a current passion of mine, multi-armed bandit experimentation for digital marketing. There are many experienced members of the DAA like Scott, who have deep expertise in different areas of digital analytics, and who are keen to share their knowledge with others. We need to find ways to connect the Scotts of this world to people who can benefit from their expertise, and more broadly connect the DAA’s more experienced members with those newer to the discipline so that they can pass on their hard-won knowledge.

Finally, given that so many new people have moved into the analytics neighborhood, the DAA needs to get out and meet some of the new neighbors rather than peering out through the curtains muttering about hipsters and gentrification. Many new groups of analytics & data science professionals have sprung up over the years, both formal and informal, and there are likely profitable connections to be made with at least some of these organizations, many of which share some of the same members as the DAA.

So if you’d like to see me put my shoulder to the wheel to address these and other challenges, please vote for me by May 12. diggDigg RedditReddit StumbleUponStumbleUpon

March 08, 2012

Returning to the fold

imageFive years ago, my worldly possessions gathered together in a knotted handkerchief on the end of a stick, I set off from the shire of Web Analytics to seek my fortune among the bright lights of online advertising. I didn’t exactly become Lord Mayor of London, but the move has been a good one for me, especially in the last three years, when I’ve been learning all sorts of interesting things about how to measure and analyze the monetization of Microsoft’s online properties like MSN and Bing through advertising.

Now, however, the great wheel of fate turns again, and I find myself returning to the web analytics fold, with a new role within Microsoft’s Online Services Division focusing on consumer behavior analytics for Bing and MSN (we tend to call this work “Business and Customer Intelligence”, or BICI for short). Coincidentally I was able to mark this move this week with my first visit to an eMetrics conference in almost three years.

I was at eMetrics to present a kind of potted summary of some of what I’ve learned in the last three years about the challenges of providing data and analysis around display ad monetization. To my regular blog readers, that should come as no surprise, because that’s also the subject of my “Building the Perfect Display Ad Performance Dashboard” series on this blog, and indeed, the presentation lifted some of the concepts and material from the posts I’ve written so far. It also forced me to continue with the material, so I shall be posting more installments on the topic in the near future (I promise). In the meantime, however, you can view the presentation here via the magic of SlideShare:

The most interesting thing I discovered at eMetrics was that the industry has changed hugely while I’ve been away (well, duh). Not so much in terms of the technology, but more in terms of the dialog and how people within the field think of themselves. This was exemplified by the Web Analytics Association’s decision to change its name to the Digital Analytics Association (we shall draw a veil over my pooh-poohing of the idea of a name change in 2010, though it turns out I was on the money with my suggestion that the association look at the word “Digital”). But it was also highlighted by  the fact that there was very little representation at the conference by the major technology vendors (with the exception of WebTrends), and that the topic of vendor selection, for so long a staple of eMetrics summits, was largely absent from the discussion. It seems the industry has moved from its technology phase to its practitioner phase – a sign of maturity.

Overall I was left with the impression that the Web Analytics industry, such as it is, increasingly sees itself as a part of a broader church of analysis and “big data” which spans the web, mobile, apps, marketing, operations, e-commerce and advertising. Which is fine by me, since that’s how I see myself. So it feels like a good time to be reacquainting myself with Jim and his merry band of data-heads. diggDigg RedditReddit StumbleUponStumbleUpon

February 07, 2012

Big (Hairy) Data


My eye was caught the other day by a question posed to the “Big Data, Low Latency” group on LinkedIn. The question was as follows:

“I've customer looking for low latency data injection to hadoop . Customer wants to inject 1million records per/sec. Can someone guide me which tools or technology can be used for this kind of data injection to hadoop.”

The question itself is interesting, given its assumption that Hadoop is part of the answer – Hadoop really is the new black in data storage & management these days – but the answers were even more interesting. Among the eleven or so people who responded to the question, there was almost no consensus. No single product (or even shortlist of products) emerged, but more importantly, the actual interpretation of the question (or what the question was getting at) differed widely, spinning off a moderately impassioned debate about the true meaning of “latency”, the merits of solid-state storage vs HD storage, and whether to clean/dedupe the data at load-time,or once the data is in Hadoop.

I wouldn’t class myself as a Hadoop expert (I’m more of a Cosmos guy), much less a data storage architect, so I may be unfairly mischaracterizing the discussion, but the message that jumped out of the thread at me was this: This Big Data stuff really is not mature yet.

I was very much put in mind of the early days of the Web Analytics industry, where so many aspects of the industry and the way customers interacted with it had yet to mature. Not only was there still a plethora of widely differing solutions available, with heated debates about tags vs logs, hosted vs on-premise, and flexible-vs-affordable, but customers themselves didn’t even know how to articulate their needs. Much of the time I spent with customers at WebAbacus in those days was taken up by translating the customer’s requirements (which often had been ghost-written by another vendor who took a radically different approach to web analytics) into terms that we could respond to.

This question thread felt a lot like that – there didn’t seem to be a very mature common language or frame of reference which united the asker of the question and the various folk that answered it. As I read the answers, I found myself feeling mightily sorry for the question-poser, because she now has a list as long as her arm of vendors and technologies to investigate, each of which approaches the problem in a different way, so it’ll be hard going to choose a winner.

If this sounds like a grumble, it’s really not – the opposite, in fact. It’s very exciting to be involved in another industry that is forming before my very eyes. Buy most seasoned Web Analytics professionals enough drinks and they’ll admit to you that the industry was actually a bit more interesting before it was carved up between Omniture and Google (yes, I know there are other players still – as Craig Ferguson would say, I look forward to your letters). So I’m going to enjoy the childhood and adolescence of Big Data while I can. diggDigg RedditReddit StumbleUponStumbleUpon

November 21, 2011

Should Wikipedia accept advertising?

imageIt’s that time of year again. The nights are drawing in, snow is starting to fall in the mountains, our minds turn to thoughts of turkey and Christmas pudding, and familiar faces appear: Santa, Len and Bruno, and of course, Jimmy Wales.

If you are a user of Wikipedia (which, if you’re a user of the Internet, you almost certainly are), you’ll likely be familiar with Jimmy Wales, the founder of Wikipedia and head of the Wikimedia Foundation, the non-profit which runs the site. Each year Jimmy personally fronts a campaign to raise funds to cover the cost of running Wikipedia, which this year will amount to around $29m.

The most visible part of this campaign is the giant banner featuring Jimmy Wales’s face which appears at the top of every Wikipedia article at this time of year. This year the banner has caused some hilarity as the position of the picture of Jimmy just above the article title has provided endless comic potential (as above), but every year it becomes increasingly wearisome to have Jimmy’s mug staring out at you for around three months. Would it not be easier for all concerned if Wikipedia just carried some advertising?

Jimmy has gone on record as saying that he doesn’t believe that Wikipedia should be funded by advertising, and I understand his position. To parse/interpret his concerns, I believe he’s worried about the following:

  • Accepting advertising would compromise Wikipedia’s editorial independence from commercial interests
  • Ads would interfere with the user experience of Wikipedia and be intrusive
  • Wikipedia contributors would not want to contribute for free to Wikipedia if they knew it was accepting advertising

I’m biased, of course, since I work for Microsoft Advertising, but I believe that each of these concerns is manageable. Let’s take them one by one:

Concern 1: Ads would compromise Wikipedia’s independence

There are plenty of historical examples where a publication has been put in a difficult position when deciding what to publish because of relationships with large advertisers. Wikipedia certainly doesn’t want, for example, Nike complaining about the content of its Wikipedia entry. And the idea of Wikipedia starting to employ sales reps to hawk its inventory is a decidedly unedifying one.

But Wikipedia does not have to engage in direct sales, or even non-blind selling, to reach its financial goals with advertising. The site could make its inventory available on a blind ad network (or ideally multiple networks) so that it would be impossible for an advertiser to specifically buy ad space on Wikipedia. If an advertiser didn’t like their ads appearing on Wikipedia, most networks offer a site-specific opt out, but the overall impact of this to Wikipedia would be minimal – Wikipedia carries such a vast range of content that it has the most highly diversified content portfolio in the world – no single advertiser could exert any real leverage over it.

Concern 2: Ads would make Wikipedia suck

As has been noted elsewhere, there are plenty of horrible ads at large in the Internet – intrusive pop-ups, or horrible creative. It would certainly be a valid concern that Wikipedia would suddenly become loaded with distracting commercial messages. But according to the back-of-an-envelope calculations I’ve done, there is no need for Wikipedia to saturate itself with ads in order to pay the bills.

According to the excellent site, Wikipedia served almost exactly 15bn page views world-wide in October 2011 (around half of which were in English). Assuming no growth in that figure over 12 months, that’s around 180bn PVs per year. So to meet its funding requirements, Wikipedia would need to generate a $0.16 eCPM on those page views (assuming just one ad unit per page). That’s a pretty modest rate, especially on a site with as much rich content as Wikipedia. It would give the site a number of options in terms of ad placement strategy, such as:

  • Place a very low-impact, small text ad on every page
  • Place a somewhat larger/more impactful ad on a percentage of pages on a rotation, and leave other pages ad free
  • Place ads on certain types of pages, leaving others always ad free (such as pages about people or companies, or pages in a particular language/geo)
  • Deploy a mix of units across different types of page, or in rotation

This also assumes that Wikimedia needs to raise all its funds every year from advertising, which it may not need to – though once the site accepted advertising, it would definitely become more difficult (though perhaps not impossible) to raise donations.

To preserve the user experience, I would definitely recommend just running text ads, which could be placed relatively unobtrusively. Sites running text-based contextual ads (such as those from Google AdSense or Microsoft adCenter) can usually expect to get at least around $0.30 eCPM, so there would be some headroom.

I would also recommend that Wikipedia not run targeted ads – or at least, only work with networks that do not sell user data to third parties. It could cause significant backlash if it became felt that Wikipedia was effectively selling data about its users’ browsing habits to advertisers for a fast buck.

Concern 3: Ads would make contributors flee

I can speak to this concern less authoritatively, since I am not that familiar with the world of Wikipedia contribution, but so long as Wikimedia made it clear that it was remaining a non-profit organization, and continued to operate in a thrifty fashion to cover its costs, the initial outrage of Wikipedia contributors could be managed. After all, plenty of other open-source projects that rely on unpaid contributors do provide the foundations for commercial activities, Linux being the best example.

In any case, in its deliberations about balancing the needs of its contributors with its need to pay the bills, Wikimedia will need to face some hard questions: Will it always be able to cover its costs through donations? Does the current level of investment in infrastructure represent an acceptable level of risk for a site that serves so many users? Is it acceptable to rely on unpaid contributors indefinitely? If Wikipedia ran out of cash or went down altogether, the righteous indignation of its contributors may not count for very much.

Apart from advertising and donations, the only other way that Wikipedia could pay the bills would be by creating paid-for services – for example, a research service. But would the unpaid Wikipedia contributors really be happier with this outcome than with advertising? It would effectively amount to selling the content that they’d authored for free. At least with advertising, it’s the user that is the product, not the content. So long as Wikipedia can maintain editorial independence and retain a good user experience, advertising feels like the better option to me. diggDigg RedditReddit StumbleUponStumbleUpon

January 12, 2010

What’s another word for “Web Analytics”?

For as long as I’ve been involved with the field, the term “Web Analytics” has never felt like the very best way to describe, well, Web Analytics – it’s somewhat limiting in many ways (the “Web” part doesn’t help there), and the “Analytics” bit does seem a bit, well, geeky.But another, better, term has never emerged – Web Measurement, Web Stats, Clickstream Analysis and a blizzard of others all have their own limitations (and don’t even get me started on the egregious “eMetrics” – sorry, Jim).

Now it seems that folks at the Web Analytics association are finding the term too limiting too, at least in terms of what they consider their remit, as they’ve launched a survey to poll people’s views about whether they should change their organization’s name to something else, and, if so, what they should change it to. You can share your own thoughts here.

I can sympathize with the Association’s motivation here, but I’m not very thrilled about the way it seems their thinking is leaning. The survey contains a set of possible alternative names which mostly includes various permutations of including the word “Marketing” in the name of the body (such as the “Digital Marketing Association” – has no one at the WAA heard of the DMA?) I think  putting “Marketing” in the title is a mistake, since measuring online marketing effectiveness is only one application of web analytics. Worse, I think the M-word risks making the WAA sound like another wishy-washy Marketing industry organization (AMA, BMA, CMA, DMA, EMA, FMA, GMA anyone?)

My inclination would probably be for the WAA not to change its name – it’s unlikely that any new name would be so significantly better that it would overcome the drop in name recognition that would come with a name change. But if it really wants to change, my guidance would be to consider names which talk more about Digital Media rather than Digital (or Online, or e-)Marketing. Sure, there are lots of Digital Media this-and-thats, but the term is a broader church IMHO, and I think that will help the WAA to continue to serve a diverse audience in the future. diggDigg RedditReddit StumbleUponStumbleUpon

December 14, 2009

I love it when a mail-merge comes together…

Would you buy BI services from a company that can’t successfully execute a mail-merge? Not to mention ones that send unsolicited e-mails to drum up business…

spamfail diggDigg RedditReddit StumbleUponStumbleUpon

September 16, 2009

Adobe + Omniture = …what?

By now, almost 12 hours after the announcement, you’ll have heard the news that Adobe is to buy Omniture for $1.8bn. If you haven’t heard, then, I mean, duh. It’s all over Twitter, dude:


(As an aside, the guys at Omniture should be proud of themselves that they managed to beat out Joe Wilson as a trending topic for a little while, even as the latter was busy facing down Congress).

I don’t think I’m putting myself in the minority when I say that I was totally blind-sided by this announcement. And while I’ve had time to think about it since my first reaction, I’m still a bit mystified by this acquisition.

The official line from the press release is that Omniture’s products will help Adobe’s customers optimize, track and monetize their websites & apps. Unofficially, the rationale for the deal seems to be that Adobe needs Omniture’s revenue to supplement its declining income from its range of software. I can see the logic of the official rationale, but I have serious reservations about Adobe’s ability to extract value from this deal, for the following reasons:

No pedigree in services: Adobe is primarily a software company; whilst it offers a full range of support services around its products, it doesn’t really have experience in providing the very deep, consultancy-like services that Omniture provides. This means that it’ll likely be challenging to attach Omniture offerings to Adobe’s customers; the opposite may be more likely to be true, but does Omniture bring enough customers to make this worthwhile?

No online scale: I’ve said before that one of Omniture’s key challenges as it strives for profitability is to scale out its infrastructure on a cost-effective basis.Adobe does offer a range of online services, but not on any kind of scale that could enable it to really drive cost out of the provision of Omniture’s services. So it’s unlikely that Omniture’s bottom line will improve in the wake of this deal.

Channel/partner conflict: The presence of the Omniture toolset in Adobe’s product lineup will complicate Adobe’s efforts to work with other agencies, EMM and web analytics tool providers, who in turn may find themselves more reluctant to encourage their clients to embrace Adobe technology for fear that it may lead to Omniture making calls on them.

Overall, I just find myself wondering whether Adobe really needed to do this deal in order to be able to leverage Omniture’s capabilities. Adobe has to be looking at some kind of synergy effect to extract value from the deal, because Omniture’s financials aren’t strong enough on their own to move the needle on Adobe’s bottom line. Would a strategic partnership not have been a simpler (and undoubtedly cheaper) option? One possible answer that presents itself is that Adobe had its hand forced by an imminent sale of Omniture to another party. What do you think?


This is one of those posts where I perhaps need to remind you that this is a personal blog which does not reflect the opinions of my employer, Microsoft. Furthermore, you shouldn’t infer that anything I’ve written above implies any foreknowledge or special knowledge of this deal, especially in the context of Microsoft. That is all. diggDigg RedditReddit StumbleUponStumbleUpon

June 30, 2009

My face, on the Internet

I have just noticed (rather belatedly, to say the least) that Laura Lee Dooley has posted a complete video of my encounter with Avinash Kaushik at the May E-metrics Summit in San Jose on Vimeo. The sound quality is a little poor, but you can more or less follow the thread of the conversation.

I come across as a cross between Prince Charles, Alastair Campbell and my Dad. Avinash does rather better, particularly around the 26 minute mark. Anyway, watch it for yourself and see who comes out on top. diggDigg RedditReddit StumbleUponStumbleUpon

January 21, 2009

Omniture stumbles

stumble Chatter is building on the interwebs about Omniture’s recent (and ongoing) latency woes. Looks like both SiteCatalyst and Discover are days behind in processing data (according to messages on Twitter, up to around 5 – 7 days in some cases). And it looks like the situation is still getting worse, rather than better.

I have no insight into the cause of Omniture’s difficulties, or how widespread they are. It may be that they’re related to the December release of SiteCatalyst 14.3, which seems to contain a number of new features which are fairly broad in scope, and which may have had an impact on the platform’s ETL stability. Behind the scenes, Omniture may have made some changes to start integrating HBX’s feature set (especially its Active Segmentation) into SiteCatalyst as a prelude to a final migration push for the remaining HBX customers. Omniture’s certainly not saying – they’ve been conspicuously silent since the start of these problems.

Whatever the cause, I can certainly empathize with this kind of situation – we had all sorts of difficulty dealing with latency issues in my WebAbacus days. And we can be confident that Omniture will (eventually) fix these problems, and will probably not lose very many customers as a result (though, in the teeth of a recession, it can’t be great for attracting new customers).

But do these problems tells us something more about Omniture’s (or any other web analytics company’s) ability to run a viable business? Infrastructure costs are a big part of a web analytics firm’s cost base (at least, those with a hosted offering, which is all of them). And unfortunately, these costs don’t really scale linearly with the charging method that most Enterprise vendors use – charging by page views captured. Factors like the amount a tool is used, and the complexity of the reports that are being called upon, have a big impact on the load placed on a web analytics system, and the resulting infrastructure cost. It’s tricky for a vendor to recoup this cost without seeming avaricious.

As Omniture’s business grows, it has a constant need to invest in its infrastructure to keep pace with the demand for its services. But as the economy has worsened, it must be terribly tempting to see if a little more juice can be squeezed out of the existing kit, especially with its 2008 earnings due later this month. This will be as true for any other vendor (such as Webtrends or Coremetrics) as it is for Omniture, and these remarks shouldn’t be seen as a pop at our friends in Orem. But the nub is, can Enterprise web analytics pay the bills for its own infrastructure cost? Or will all web analytics ultimately need to be subsidized by something else (such as, oh, I don’t know, advertising)?

Your thoughts, please. diggDigg RedditReddit StumbleUponStumbleUpon

November 21, 2008

Brandt Dainow gets over-excited again

hs_dainow_brandt After his breathless article last year, proclaiming Google Analytics to be something like a cross between the second coming and Barack Obama, Brandt Dainow seems to have soured on the big G, proclaiming this week that GA contains ‘disturbing inaccuracies’:

Google Analytics is different from other products in that it has been intentionally designed by Google to be inaccurate over and above the normal inaccuracies that are inevitable. These inaccuracies are so glaring that most people are getting a very false picture of what is happening in their sites.

Dainow’s main beef with GA is two-fold:

  • It treats single-page visit as valid visits (i.e. it doesn’t remove them from visit counts or other related measures)
  • It includes single-page visits in average visit duration calculations

He also remarks that Google did in fact change the way that GA calculated average visit duration last year, but then changed the calculation back in the face of user pressure:

Google intentionally rolled Google Analytics back so that it produced an incorrect average duration…It's been that way ever since -- Google is intentionally and knowingly providing inaccurate numbers because a few people preferred neatness to truth.

Brandt then proposes two alternative measures - ‘retained visits’ (the count of visits with more than one page impression) and ‘true average duration’ (the average duration of retained visits). These metrics are not without some merit – it’s useful to know how many visits contained more than one page view, and the average duration of these visits. But Brandt goes on to assert that these two metrics should replace the standard measurements of visits and average duration in GA and (presumably) other tools. This suggestion is ridiculous, for the following reasons:

  • Contrary to Brandt’s assertions, there are a host of scenarios where a single-page visit is a perfectly valid visit, including, for example, this blog, for crying out loud, which has a high proportion of single-page visits because readers either just read the homepage and leave, or click through to an article from their RSS reader. So chucking all these kinds of visits out is crazy.
  • Whilst the inaccuracy of including single-page visits in average visit duration calculations is known to be a problem, removing these visits from the calculation doesn’t yield a magically ‘accurate’ number, it just yields one that is inaccurate in a different way. You still have no idea how long people looked at the final page of their visit for, and with a two-page visit this can introduce a huge potential inaccuracy.
  • Such standard metrics as exist in the web analytics industry are the result of long and arduous wrangling. There are no sacred cows, but you need a really good reason to exchange a simple and easy-to-understand metric for one which is more complex and offers no discernible benefit.

Whilst I can understand Brandt’s motivations for posting these ideas (which, I imagine, lie somewhere on a spectrum between a genuine desire to spark debate and a desire to generate a lot of traffic to his blog, in which regard I am obliging him), his remarks do irk me a bit (can you tell?), principally because he commits the unpardonable sin of absolutism when talking about web analytics, bandying about words like “truth” and “wrong” when really he is just presenting his own preferences.

When, as an industry, we can’t even agree what constitutes a visit, it’s pretty rich to start decrying one tool or another as ‘inaccurate’ simply because it takes an approach to data that you don’t believe in. And besides, as Brandt surely knows, Google Analytics now has the capability (via its custom segmentation) to calculate the metrics he seeks.

Finally, as every half-experienced web practitioner (of whom Brandt seems to have a low opinion also) knows, the key to success in web analytics is to pick your metrics, stick to them, and measure them continuously as you make changes to your site and your marketing, to see what is working. If you’re looking to increase engagement, and have decided that visit duration is a good measure of this (a debatable point, as it happens), then it doesn’t matter whether you include single-page visits in your duration calculation – if your visit durations are going up, you’re happy. And if your visit durations suddenly jump because your web analytics vendor has changed the way they calculate the metric, this could in fact cause more pain than benefit, perhaps causing you to go to said vendor and say, “Oi! Change it back to how it was!”.

So feel free to read the article, but be warned: it’s not very accurate. diggDigg RedditReddit StumbleUponStumbleUpon


About me



Enter your email address:

Delivered by FeedBurner