November 21, 2017

Does your marketing need a customer graph?

social_graph (3)The relentless rise of social networks in recent years has made many marketers familiar with the concept of the social graph—data about how people are connected to one another—and its power in a marketing context.

Facebook’s social graph has propelled it to a projected annual revenue of around $40B for 2017, driven primarily by advertising sales. Advertisers are prepared to pay a premium for the advanced targeting capabilities that the graph enables, especially when combined with their own customer data; these capabilities will enable Facebook to snag over 20% of digital ad spend in the US this year.

Partly as a result of this, many marketers are thinking about how they can exploit the connectedness of their own customer base, beyond simple “refer a friend” campaigns. Additionally, it’s very common to hear marketing services outfits tack the term graph onto any discussion of user or customer data, leading one to conclude that any marketing organization worth its salt simply must have a graph database.

But what is a graph, and how is it different from a plain old customer database? And if you don’t have a customer graph in your organization, should you get one?

What is a graph database, and why should I care?

A graph database is a database that stores two things:

  • A set of entities (such as people, servers, movies, or types of cheese)
  • The relationships between the entities (such as friendships, memberships, ownership, or authorship)

In formal graph theory parlance, the entities are referred to as vertices, and the relationships edges. Whether someone uses one set of terms or the other is a good indication of whether they’re trying to impress you with their knowledge of graph theory (we’ll stick to the friendly terms above).

In a graph database, the relationships are first-class objects alongside the entities, with their own taxonomy and attributes, and it is this that makes graph databases unique. There are lots of types of database that can store information about things (the most common being our old friend, the relational database). These databases can be pressed into service to capture relationships indirectly (in RDBMS systems, through primary/foreign key pairs), but graph databases provide a much more explicit and simple way of capturing and, more importantly, retrieving relationship information.

The diagram below provides a simple example of a graph—in this case, one that captures Twitter-style ‘follow’ relationships between people. The circles in the diagram are people, and the lines are the relationships. Note that each relationship has a direction.


The graph database makes it easy to get the answers to the following kinds of question:

  • Who does Andrew follow? (Barbara and Cynthia)
  • Who are Barbara’s followers? (Andrew and Cynthia)
  • Who do the people who Andrew follows follow? (Barbara and Donald)
  • Who are Donald’s followers’ followers? (Andrew and Cynthia)

You can see how this kind of data is essential in providing a service like Twitter: When Andrew logs in, he wants to see the tweets from the people he follows, while Barbara can see how many followers she has. But the graph also makes it possible to make a suggestion to Andrew: Because Andrew follows Barbara and Cynthia, and they both follow Donald, Twitter can suggest that Andrew also follow Donald. This ability to explore indirect relationships is a key value of graph data.

Because a lot of marketing is about people and relationships, there are several ways in which graph data, particularly user graph data, can be useful for marketers, including:

  • Recommendation services
  • User identity matching
  • Community detection (tribes)
  • Commercial account management

In the rest of this post, we’ll take a look at these scenarios in a little more detail, and then look at some things to bear in mind if you’re thinking about investing in graph database technology.

Graph-based recommendation services

One of the best (and most well-known) applications for graph data is a recommendation engine. By capturing the relationships between users and the products they have purchased in a graph, companies like Amazon are able to recommend products based on the purchase behavior of others.


The graph above is more complex than the first example because it captures two kinds of entity (customer, in blue, and product, in green) and two kinds of relationships (purchase and ‘like’  - analogous to a 5-star product rating in this example).

Based on the graph, we can see that Andrew has purchased (and liked) a widget. Barbara and Cynthia have also both purchased a widget, and additionally have each purchased both a sprocket and a doodad. So a simple recommender system could use this information to suggest to Andrew that he should buy one of these products.

However, with the extra level of detail of the ‘like’ relationship, we can see that Barbara and Cynthia both like the Doodad – and that additionally Cynthia likes the widget she bought. So it is possible to make a more relevant recommendation – for the doodad. This is also known as collaborative filtering.

A simple (but significant) enhancement of such a system is to capture user behavior (especially in response to recommendations) and feed it back into the graph as a new layer of relationships. This feedback loop means that the recommendation engine can become an unsupervised machine learning system, continually optimizing recommendations based on not just similarities in purchases, but how users are responding to the recommendations themselves.

Pinterest’s Pixie is a good example of a graph-based recommendation system. This deck from Brazil-based retailer Magazine Luzia talks about how they built a recommendation system on AWS.

Identity matching

Matching sets of user data that do not contain a common, stable identity signal is a challenge for many organizations, and is only getting more challenging as devices proliferate; fortunately, graphs offer a solution here.

A graph database can model the indirect relationships between IDs through tracking the relationships between those IDs and the contexts they are seen in. In the diagram below, three distinct IDs (ID1, ID2 and ID3) are seen in combination with different devices (in green), apps (in orange) and geo locations (in grey). Using the graph, it’s possible to calculate a simple ‘strength function’ which represents the indirect relationship between the IDs, by counting the number of entities they have common links to.


Per the diagram, the relationship strength function (we’ll call it S) for each pair of IDs is as follows:

  • S{ID1, ID2} = 2
  • S{ID1, ID3} = 4
  • S{ID2, ID3} = 3

On the basis of this function, one would conclude that ID1 and ID3 are most likely to belong to the same actual user, or at least users that have some sort of commonality.

A more sophisticated version of the graph has numeric values or weights associated with the relationships – for example, the user/app relationship might capture the number of times that app has been launched by that user. This enables the strength function to achieve greater accuracy by weighting some connections more than others.

A further enhancement is to apply machine learning to the graph. The various relationships between two IDs, and their intrinsic weights, can be thought of as a set of features to build a predictive model for the likelihood of two IDs actually belonging to the same person. A set of known connected IDs can then be used as a training set, with the ML algorithm adjusting a second set of weights on the relationship edges until it achieves a high-quality prediction, and is able to provide a probability for each identity pair belonging to the same actual user. This is roughly how solutions like Drawbridge’s Connected Consumer Graph and Tapad’s Device Graph work.

This kind of probabilistic identity mapping enables marketing delivery systems to treat highly similar IDs as if they belong to the same user, even if they do not in practice. Obviously, deterministic identity mapping (where user data is joined on common keys) is better, but in cases where that is not possible, a graph-based approach can provide an efficient way of extending an audience to a new channel where good ID coverage is scarce.

Community detection

A third marketing application of graphs is community detection. This is a type of clustering that looks for communities or ‘cliques’ of highly interconnected individuals within a population:


The clusters that result from this kind of analysis represent customers who are highly connected to each other; targeting these users with social or word-of-mouth campaigns can then drive above-average response rates.

A graph-based approach to this problem is useful when the clusters or communities are not of equal size, and where there may be an uneven distribution of connectedness within the clusters. As with some of the other scenarios I’ve described, using a graph representation of the data isn’t the only way to solve this problem, but it can be the most efficient.

This article provides some interesting insights into how the diversity of Twitter users’ networks can have an impact on their ability to generate creative ideas.

Commercial account management

Commercial or Enterprise account management is all about tracking relationships – relationships with individual decision-makers, the organizations that they represent, and other information such as the decisions those people are responsible for, the role they have, and so on.

Graphs are very well suited to tracking this kind of information, making it possible for account managers to identify strategies for closing new business. Take the following example graph, of companies, individuals who work for them, and products those companies use:


This graph enables the following conclusions to be drawn:

  • GM uses Product A and Product B
  • Andrew Smith, who works for Ford, used to work for GM, which uses Product B
  • Barbara Jones, who used to work for Ford, specializes in Product B

Based on the above insights, the account manager for Ford, who is trying to sell in Product B, might decide to speak to her account manager colleague for GM to ask if she can connect Carl White with Andrew Smith, so that Carl can talk about his experiences with Product B. Given that Carl and Andrew have both worked at GM, they may even already know each other.

In selecting a partner to work with Ford on the implementation of Product B, the Ford account manager would be wise to speak to Barbara Jones at ABC, since the graph shows that ABC has worked with Ford before, and so has Barbara.

This HBR article provides some more detail on the use of social graphs for business.

Getting started with graphs

If one of the above use-cases for graph data has caught your attention, here are some ways to get started with building a customer graph dataset.

Before you start

Before you go to the trouble and expense of building your own graph database, you do need to have a fairly good-quality set of customer data with attributes that you can use to connect those customers together. A sparse graph that is a series of poorly-connected islands will not be very useful.

You should also think about whether your business and data lend themselves to the kinds of connected applications listed above. For example, if your customer base has very diverse needs, then a recommender service may not be very useful. Similarly, if your customer base is small, then an in-house graph solution may be overkill; you may benefit from connecting your customer base to one of the public graph offerings below.

Graph databases

If you’re looking to build your own graph database, there are a number of commercial solutions available:

If you’re already managing your data in the cloud, you’ll want to pick a graph data solution that works with your cloud provider – Datastax, for example, partners with Microsoft for hosting on Azure, and Google’s cloud platform. Before you are able to leverage your graph data, you’ll need to both define your graph schema and load data into the graph – look for an implementation partner that can help you with both of these things.

To work with graph data, you’ll need to learn a graph query language. The most popular graph query language is Gremlin (part of the TinkerPop stack), though other query languages exist (for example, Neo4j has its own query language, called Cypher). Again, an implementation partner who already has graph coding skills will be a valuable resource here.

Customer and ID graphs

If you’re not ready to manage your own graph dataset, there are many companies offering graph-based services around customer and identity data. In fact, most DMPs today will tout a customer graph as core to their solution. Establishing whether such offerings truly are customer graphs rather than identity graphs (i.e. whether they provide relationship information between customers, not just between IDs) should be one of your first questions if you’re being pitched on these solutions.

Some of the key offerings are:

Even if you don’t end up implementing a graph database, graph-style thinking is a valuable step in data design; one of the pleasing benefits of graph databases is that their structure closely mirrors the way that humans tend to think about data and systems. Thinking about your customer data as a graph can therefore help you unlock insights from your existing data. diggDigg RedditReddit StumbleUponStumbleUpon

May 03, 2017

What’s next for the Digital Analytics Association?

hipsterstepsI’ve been a member of the Digital Analytics Association for, it turns out, about twelve years – over half my professional life. In that time I’ve seen the organization grow and blossom into a vibrant community of professionals who are passionate about the work they do and about helping others to develop their own skills and career in digital analytics.

When the DAA started (as the WAA), web analytics was a decidedly niche activity, not considered as rigorous or demanding as ‘proper’ data mining or database development. Many of its early practitioners, like me, did not come from formal data backgrounds; we were to a large extent making things up as we went along, arguing with one another (often in lobby bars) about things like the proper definition of a page view, or the relative merits of JavaScript tags vs log files.

We didn’t know it at the time, but the niche activity we were helping to define would grow to dominate the entire field of data analytics. Today, transactional (i.e. log-like) and unstructured data comprise the vast majority of data being captured and analyzed worldwide and the analytical principles and techniques that the DAA championed have become the norm, not the exception.

The DAA and its members can justly derive a certain amount of satisfaction from knowing we were part of something so early on, but now that the rest of the world has shown up to the party that we started, how do we continue to differentiate the organization and add value to its members and the industry?

It’s to help answer this and other interesting and challenging questions facing the DAA that I’ve put my name forward for a position on the organization’s board. You can read my nomination (and, hopefully, vote for me) here if you’re a DAA member. After twelve years of benefiting from my DAA membership, it’s time to give something back to the organization.

If I’m elected to the board, I’ll devote my energies to helping DAA members adapt to and embrace the next set of transformations that are taking place within the industry. In my role at Microsoft I’m participating in a very rapid shift from traditional descriptive analytics, based around a recognizable cycle of do/measure/analyze/adjust, to machine learning-based optimization of business processes, particularly digital marketing. Predictive analytics and data science skills are therefore becoming more and more important in digital analytics, while the range of data and scenarios is exploding. This raises tricky questions for the DAA: Which skillsets and data scenarios should the association focus its energies on, and how to stay relevant as the industry changes so rapidly?

A big part of the answer, I believe, lies with the DAA members ourselves. At a DAA member event in Seattle last week, I met the excellent Scott Fasser of HackerAgency and had a fascinating conversation with him about a current passion of mine, multi-armed bandit experimentation for digital marketing. There are many experienced members of the DAA like Scott, who have deep expertise in different areas of digital analytics, and who are keen to share their knowledge with others. We need to find ways to connect the Scotts of this world to people who can benefit from their expertise, and more broadly connect the DAA’s more experienced members with those newer to the discipline so that they can pass on their hard-won knowledge.

Finally, given that so many new people have moved into the analytics neighborhood, the DAA needs to get out and meet some of the new neighbors rather than peering out through the curtains muttering about hipsters and gentrification. Many new groups of analytics & data science professionals have sprung up over the years, both formal and informal, and there are likely profitable connections to be made with at least some of these organizations, many of which share some of the same members as the DAA.

So if you’d like to see me put my shoulder to the wheel to address these and other challenges, please vote for me by May 12. diggDigg RedditReddit StumbleUponStumbleUpon

June 23, 2015

The seven people you need on your data team

Congratulations! You just got the call – you’ve been asked to start a data team to extract valuable customer insights from your product usage, improve your company’s marketing effectiveness, or make your boss look all “data-savvy” (hopefully not just the last one of these). And even better, you’ve been given carte blanche to go hire the best people! But now the panic sets in – who do you hire? Here’s a handy guide to the seven people you absolutely have to have on your data team. Once you have these seven in place, you can decide whether to style yourself more on John Sturges or Akira Kurosawa.

Before we start, what kind of data team are we talking about here? The one I have in mind is a team that takes raw data from various sources (product telemetry, website data, campaign data, external data) and turns it into valuable insights that can be shared broadly across the organization. This team needs to understand both the technologies used to manage data, and the meaning of the data – a pretty challenging remit, and one that needs a pretty well-balanced team to execute.

1. The Handyman
Weird-Al-Handy_thumb10The Handyman can take a couple of battered, three-year-old servers, a copy of MySQL, a bunch of Excel sheets and a roll of duct tape and whip up a basic BI system in a couple of weeks. His work isn’t always the prettiest, and you should expect to replace it as you build out more production-ready systems, but the Handyman is an invaluable help as you explore datasets and look to deliver value quickly (the key to successful data projects). Just make sure you don’t accidentally end up with a thousand people accessing the database he’s hosting under his desk every month for your month-end financial reporting (ahem).

Really good handymen are pretty hard to find, but you may find them lurking in the corporate IT department (look for the person everybody else mentions when you make random requests for stuff), or in unlikely-seeming places like Finance. He’ll be the person with the really messy cubicle with half a dozen servers stuffed under his desk.

The talents of the Handyman will only take you so far, however. If you want to run a quick and dirty analysis of the relationship between website usage, marketing campaign exposure, and product activations over the last couple of months, he’s your guy. But for the big stuff you’ll need the Open Source Guru.

2. The Open Source Guru
cameron-howe_thumbI was tempted to call this person “The Hadoop Guru”. Or “The Storm Guru”, or “The Cassandra Guru”, or “The Spark Guru”, or… well, you get the idea. As you build out infrastructure to manage the large-scale datasets you’re going to need to deliver your insights, you need someone to help you navigate the bewildering array of technologies that has sprung up in this space, and integrate them.

Open Source Gurus share many characteristics in common with that most beloved urban stereotype, the Hipster. They profess to be free of corrupting commercial influence and pride themselves on plowing their own furrow, but in fact they are subject to the whims of fashion just as much as anyone else. Exhibit A: The enormous fuss over the world-changing effects of Hadoop, followed by the enormous fuss over the world-changing effects of Spark. Exhibit B: Beards (on the men, anyway).

So be wary of Gurus who ascribe magical properties to a particular technology one day (“Impala’s, like, totally amazing”), only to drop it like ombre hair the next (“Impala? Don’t even talk to me about Impala. Sooooo embarrassing.”) Tell your Guru that she’ll need to live with her recommendations for at least two years. That’s the blink of an eye in traditional IT project timescales, but a lifetime in Internet/Open Source time, so it will focus her mind on whether she really thinks a technology has legs (vs. just wanting to play around with it to burnish her resumé).

3. The Data Modeler
ErnoCube_thumb9While your Open Source Guru can identify the right technologies for you to use to manage your data, and hopefully manage a group of developers to build out the systems you need, deciding what to put in those shiny distributed databases is another matter. This is where the Data Modeler comes in.

The Data Modeler can take an understanding of the dynamics of a particular business, product, or process (such as marketing execution) and turn that into a set of data structures that can be used effectively to reflect and understand those dynamics.

Data modeling is one of the core skills of a Data Architect, which is a more identifiable job description (searching for “Data Architect” on LinkedIn generates about 20,000 results; “Data Modeler” only generates around 10,000). And indeed your Data Modeler may have other Data Architecture skills, such as database design or systems development (they may even be a bit of an Open Source Guru). But if you do hire a Data Architect, make sure you don’t get one with just those more technical skills, because you need datasets which are genuinely useful and descriptive more than you need datasets which are beautifully designed and have subsecond query response times (ideally, of course, you’d have both). And in my experience, the data modeling skills are the rarer skills; so when you’re interviewing candidates, be sure to give them a couple of real-world tests to see how they would actually structure the data that you’re working with.

4. The Deep Diver
diver_thumb3Between the Handyman, the Open Source Guru, and the Data Modeler, you should have the skills on your team to build out some useful, scalable datasets and systems that you can start to interrogate for insights. But who to generate the insights? Enter the Deep Diver.

Deep Divers (often known as Data Scientists) love to spend time wallowing in data to uncover interesting patterns and relationships. A good one has the technical skills to be able to pull data from source systems, the analytical skills to use something like R to manipulate and transform the data, and the statistical skills to ensure that his conclusions are statistically valid (i.e. he doesn’t mix up correlation with causation, or make pronouncements on tiny sample sizes). As your team becomes more sophisticated, you may also look to your Deep Diver to provide Machine Learning (ML) capabilities, to help you build out predictive models and optimization algorithms.

If your Deep Diver is good at these aspects of his job, then he may not turn out to be terribly good at taking direction, or communicating his findings. For the first of these, you need to find someone that your Deep Diver respects (this could be you), and use them to nudge his work in the right direction without being overly directive (because one of the magical properties of a really good Deep Diver is that he may take his analysis in an unexpected but valuable direction that no one had thought of before).

For the second problem – getting the Deep Diver’s insights out of his head – pair him with a Storyteller (see below).

5. The Storyteller
woman_storytellerThe Storyteller’s yin is to the Deep Diver’s yang. Storytellers love explaining stuff to people. You could have built a great set of data systems, and be performing some really cutting-edge analysis, but without a Storyteller, you won’t be able to get these insights out to a broad audience.

Finding a good Storyteller is pretty challenging. You do want someone who understands data quite well, so that she can grasp the complexities and limitations of the material she’s working with; but it’s a rare person indeed who can be really deep in data skills and also have good instincts around communications.

The thing your Storyteller should prize above all else is clarity. It takes significant effort and talent to take a complex set of statistical conclusions and distil them into a simple message that people can take action on. Your Storyteller will need to balance the inherent uncertainty of the data with the ability to make concrete recommendations.

Another good skill for a Storyteller to have is data visualization. Some of the most light bulb-lighting moments I have seen with data have been where just the right visualization has been employed to bring the data to life. If your Storyteller can balance this skill (possibly even with some light visualization development capability, like using D3.js; at the very least, being a dab hand with Excel and PowerPoint or equivalent tools) with her narrative capabilities, you’ll have a really valuable player.

There’s no one place you need to go to find Storytellers – they can be lurking in all sorts of fields. You might find that one of your developers is actually really good at putting together presentations, or one of your marketing people is really into data. You may also find that there are people in places like Finance or Market Research who can spin a good yarn about a set of numbers – poach them.

6. The Snoop
Jimmy_Stewart_Rear_Window_thumb6These next two people – The Snoop and The Privacy Wonk – come as a pair. Let’s start with the Snoop. Many analysis projects are hampered by a lack of primary data – the product, or website, or marketing campaign isn’t instrumented, or you aren’t capturing certain information about your customers (such as age, or gender), or you don’t know what other products your customers are using, or what they think about them.

The Snoop hates this. He cannot understand why every last piece of data about your customers, their interests, opinions and behaviors, is not available for analysis, and he will push relentlessly to get this data. He doesn’t care about the privacy implications of all this – that’s the Privacy Wonk’s job.

If the Snoop sounds like an exhausting pain in the ass, then you’re right – this person is the one who has the team rolling their eyes as he outlines his latest plan to remotely activate people’s webcams so you can perform facial recognition and get a better Unique User metric. But he performs an invaluable service by constantly challenging the rest of the team (and other parts of the company that might supply data, such as product engineering) to be thinking about instrumentation and data collection, and getting better data to work with.

The good news is that you may not have to hire a dedicated Snoop – you may already have one hanging around. For example, your manager may be the perfect Snoop (though you should probably not tell him or her that this is how you refer to them). Or one of your major stakeholders can act in this capacity; or perhaps one of your Deep Divers. The important thing is not to shut the Snoop down out of hand, because it takes relentless determination to get better quality data, and the Snoop can quarterback that effort. And so long as you have a good Privacy Wonk for him to work with, things shouldn’t get too out of hand.

7. The Privacy Wonk
Sadness_InsideOut_2815The Privacy Wonk is unlikely to be the most popular member of your team, either. It’s her job to constantly get on everyone’s nerves by identifying privacy issues related to the work you’re doing.

You need the Privacy Wonk, of course, to keep you out of trouble – with the authorities, but also with your customers. There’s a large gap between what is technically legal (which itself varies by jurisdiction) and what users will find acceptable, so it pays to have someone whose job it is to figure out what the right balance between these two is. But while you may dread the idea of having such a buzz-killing person around, I’ve actually found that people tend to make more conservative decisions around data use when they don’t have access to high-quality advice about what they can do, because they’re afraid of accidentally breaking some law or other. So the Wonk (much like Sadness) turns out to be a pretty essential member of the team, and even regarded with some affection.

Of course, if you do as I suggest, and make sure you have a Privacy Wonk and a Snoop on your team, then you are condemning both to an eternal feud in the style of the Corleones and Tattaglias (though hopefully without the actual bloodshed). But this is, as they euphemistically say, a “healthy tension” – with these two pulling against one another you will end up with the best compromise between maximizing your data-driven capabilities and respecting your users’ privacy.

Bonus eighth member: The Cat Herder (you!)
The one person we haven’t really covered is the person who needs to keep all of the other seven working effectively together: To stop the Open Source Guru from sneering at the Handyman’s handiwork; to ensure the Data Modeler and Deep Diver work together so that the right measures and dimensionality are exposed in the datasets you publish; and to referee the debates between the Snoop and the Privacy Wonk. This is you, of course – The Cat Herder. If you can assemble a team with at least one of the above people, plus probably a few developers for the Open Source Guru to boss about, you’ll be well on the way to unlocking a ton of value from the data in your organization.

Think I’ve missed an essential member of the perfect data team? Tell me in the comments. diggDigg RedditReddit StumbleUponStumbleUpon

March 08, 2012

Returning to the fold

imageFive years ago, my worldly possessions gathered together in a knotted handkerchief on the end of a stick, I set off from the shire of Web Analytics to seek my fortune among the bright lights of online advertising. I didn’t exactly become Lord Mayor of London, but the move has been a good one for me, especially in the last three years, when I’ve been learning all sorts of interesting things about how to measure and analyze the monetization of Microsoft’s online properties like MSN and Bing through advertising.

Now, however, the great wheel of fate turns again, and I find myself returning to the web analytics fold, with a new role within Microsoft’s Online Services Division focusing on consumer behavior analytics for Bing and MSN (we tend to call this work “Business and Customer Intelligence”, or BICI for short). Coincidentally I was able to mark this move this week with my first visit to an eMetrics conference in almost three years.

I was at eMetrics to present a kind of potted summary of some of what I’ve learned in the last three years about the challenges of providing data and analysis around display ad monetization. To my regular blog readers, that should come as no surprise, because that’s also the subject of my “Building the Perfect Display Ad Performance Dashboard” series on this blog, and indeed, the presentation lifted some of the concepts and material from the posts I’ve written so far. It also forced me to continue with the material, so I shall be posting more installments on the topic in the near future (I promise). In the meantime, however, you can view the presentation here via the magic of SlideShare:

The most interesting thing I discovered at eMetrics was that the industry has changed hugely while I’ve been away (well, duh). Not so much in terms of the technology, but more in terms of the dialog and how people within the field think of themselves. This was exemplified by the Web Analytics Association’s decision to change its name to the Digital Analytics Association (we shall draw a veil over my pooh-poohing of the idea of a name change in 2010, though it turns out I was on the money with my suggestion that the association look at the word “Digital”). But it was also highlighted by  the fact that there was very little representation at the conference by the major technology vendors (with the exception of WebTrends), and that the topic of vendor selection, for so long a staple of eMetrics summits, was largely absent from the discussion. It seems the industry has moved from its technology phase to its practitioner phase – a sign of maturity.

Overall I was left with the impression that the Web Analytics industry, such as it is, increasingly sees itself as a part of a broader church of analysis and “big data” which spans the web, mobile, apps, marketing, operations, e-commerce and advertising. Which is fine by me, since that’s how I see myself. So it feels like a good time to be reacquainting myself with Jim and his merry band of data-heads. diggDigg RedditReddit StumbleUponStumbleUpon

October 24, 2011

Wading into the Google Secure Search fray

broken-chain-1024x768There’s been quite the hullabaloo since Google announced last week that it was going to send signed-in users to Google Secure Search by default. Back when Google first announced Secure Search in May, there was some commentary about how it would reduce the amount of data available to web analytics tools. This is because browsers do not make page referrer information available in the HTTP header or in the page Document Object Model (accessible via JavaScript) when a user clicks a link from an SSL-secured page through to a non-secure page. This in turn means that a web analytics tool pointed at the destination site is unable to see the referring URLs for any SSL-secured pages that visitors arrived from.

This is all desired behavior, of course, because if you’ve been doing something super-secret on a secure website, you don’t want to suddenly pass info about what you’ve been doing to any old non-secure site when you click an off-site link (though shame on the web developer who places sensitive information in the URL of a site, even if the URL is encrypted).

At the time, the web analytics industry’s concerns were mitigated by the expectation that relatively few users would proactively choose to search on Google’s secure site, and that consequently the data impact would be minimal. But the impact will jump significantly once the choice becomes a default.

One curious quirk of Google’s announcement is this sentence (my highlighting):

When you search from, websites you visit from our organic search listings will still know that you came from Google, but won't receive information about each individual query.

This sentence caused me to waste my morning running tests of exactly what referrer information is made available by different browsers in a secure-to-insecure referral situation. The answer (as I expected) is absolutely nothing – no domain data, and certainly no URL parameter (keyword) data is available. So I am left wondering whether the sentence above is just an inaccuracy on Google’s part – when you click through from Google Secure Search, sites will not know that you came from Google. Am I missing something here? [Update: Seems I am. See bottom of the post for more details]

I should say that I generally applaud Google’s commitment to protecting privacy online in this way – despite the fact that it has been demonstrated many times that an individual’s keyword history is a valuable asset for online identity thieves, most users would not bother to secure their searches when left to their own devices. On the other hand, this move does come with a fair amount of collateral damage for anyone engaged in SEO work. Google’s hope seems to be that over time more and more sites will adopt SSL as the default, which would enable sites to capture the referring information again – but that seems like a long way off.

It seems like Google Analytics is as affected by this change as any other web analytics tool. Interestingly, though, if Google chose to, it could make the click-through information available to GA, since it captures this information via the redirect it uses on the outbound links from the Search Results page. But if it were to do this, I think there would be something of an outcry, unless Google provided a way of making that same data to other tools, perhaps via an API.

So for the time being the industry is going to have to adjust to incomplete referrer information from Google, and indeed from other search engines (such as Bing) that follow suit. Always seems to be two steps forward, one step back for the web analytics industry. Ah well, plus ca change…

Update, 10/25: Thanks to commenter Anthony below for pointing me to this post on Google+ (of course). In the comments, Eric Wu nails what is actually happening that enables Google to say that it will still be passing its domain over when users click to non-secure sites. It seems that Google will be using a non-secure redirect that has the query parameter value removed from the redirect URL. Because the redirect is non-secure, its URL will appear in the referrer logs of the destination site, but without the actual keyword. As Eric points out, this has the further unfortunate side-effect of ensuring that destination sites will not receive query information, even if they themselves set SSL as their default (though it’s not clear to me how one can force Google to link to the SSL version of a site by default). The plot thickens… diggDigg RedditReddit StumbleUponStumbleUpon

September 16, 2009

Adobe + Omniture = …what?

By now, almost 12 hours after the announcement, you’ll have heard the news that Adobe is to buy Omniture for $1.8bn. If you haven’t heard, then, I mean, duh. It’s all over Twitter, dude:


(As an aside, the guys at Omniture should be proud of themselves that they managed to beat out Joe Wilson as a trending topic for a little while, even as the latter was busy facing down Congress).

I don’t think I’m putting myself in the minority when I say that I was totally blind-sided by this announcement. And while I’ve had time to think about it since my first reaction, I’m still a bit mystified by this acquisition.

The official line from the press release is that Omniture’s products will help Adobe’s customers optimize, track and monetize their websites & apps. Unofficially, the rationale for the deal seems to be that Adobe needs Omniture’s revenue to supplement its declining income from its range of software. I can see the logic of the official rationale, but I have serious reservations about Adobe’s ability to extract value from this deal, for the following reasons:

No pedigree in services: Adobe is primarily a software company; whilst it offers a full range of support services around its products, it doesn’t really have experience in providing the very deep, consultancy-like services that Omniture provides. This means that it’ll likely be challenging to attach Omniture offerings to Adobe’s customers; the opposite may be more likely to be true, but does Omniture bring enough customers to make this worthwhile?

No online scale: I’ve said before that one of Omniture’s key challenges as it strives for profitability is to scale out its infrastructure on a cost-effective basis.Adobe does offer a range of online services, but not on any kind of scale that could enable it to really drive cost out of the provision of Omniture’s services. So it’s unlikely that Omniture’s bottom line will improve in the wake of this deal.

Channel/partner conflict: The presence of the Omniture toolset in Adobe’s product lineup will complicate Adobe’s efforts to work with other agencies, EMM and web analytics tool providers, who in turn may find themselves more reluctant to encourage their clients to embrace Adobe technology for fear that it may lead to Omniture making calls on them.

Overall, I just find myself wondering whether Adobe really needed to do this deal in order to be able to leverage Omniture’s capabilities. Adobe has to be looking at some kind of synergy effect to extract value from the deal, because Omniture’s financials aren’t strong enough on their own to move the needle on Adobe’s bottom line. Would a strategic partnership not have been a simpler (and undoubtedly cheaper) option? One possible answer that presents itself is that Adobe had its hand forced by an imminent sale of Omniture to another party. What do you think?


This is one of those posts where I perhaps need to remind you that this is a personal blog which does not reflect the opinions of my employer, Microsoft. Furthermore, you shouldn’t infer that anything I’ve written above implies any foreknowledge or special knowledge of this deal, especially in the context of Microsoft. That is all. diggDigg RedditReddit StumbleUponStumbleUpon

June 30, 2009

My face, on the Internet

I have just noticed (rather belatedly, to say the least) that Laura Lee Dooley has posted a complete video of my encounter with Avinash Kaushik at the May E-metrics Summit in San Jose on Vimeo. The sound quality is a little poor, but you can more or less follow the thread of the conversation.

I come across as a cross between Prince Charles, Alastair Campbell and my Dad. Avinash does rather better, particularly around the 26 minute mark. Anyway, watch it for yourself and see who comes out on top. diggDigg RedditReddit StumbleUponStumbleUpon

May 13, 2009

Does Display help Search? Or does Search help Display?

One of the topics that we didn’t get quite enough time to cover in detail in my face-off with Avinash Kaushik at last week’s eMetrics Summit (of which more in another post) was the thorny issue of conversion attribution. When I asked Avinash about it, he made the sensible point that trying to correctly “attribute” a conversion to a mix of the interactions that preceded it ends up being a very subjective process, and that adopting a more experimental approach – tweaking aspects of a campaign and seeing which tweaks result in higher conversion rates – is more sound.

I asked the question in part because conversion attribution is conspicuously absent from Google Analytics – a fact which raises an interesting question about whether it’s in Google’s interest to include a feature like this, since it may stand to lose more than it gains by doing so (since the effective ROI of search will almost certainly go down when other channels are mixed into an attribution model).

Our own Atlas Institute is quite vocal on this topic, and has published a number of white papers such as this one [PDF] about the consideration/conversion funnel, and this one [PDF], on which channels are winners and losers in the new world of Engagement Mapping (our term for multi-channel conversion attribution).

The Atlas Institute has also opined about how adding display to a search campaign can raise the effectiveness of that campaign by 22% compared to search alone – in other words, how display helps search to be better.

However, a recent study from iProspect throws some new light on this discussion. The study – a survey of 1,575 web consumers – attempted to discover how people respond to display advertising. And one of the most interesting findings from the study is that, whilst 31% of users claim to have clicked on a display ad in the last 6 months, almost as many – 27% – claimed that they responded to the ad by searching for that product or brand:


This raises the interesting idea that search can actually help display be better, by providing a response mechanism that differs from the traditional ad click behavior that we expect. Of course, this still doesn’t mean that search should get 100% of the credit for a conversion in this kind of scenario – in fact, it makes a stronger case for “view-through” attribution of display campaigns – something that ad networks (like, er, our own Microsoft Media Network) are keen to encourage people to do, to make performance-based campaigns look better.

All this really means that, of course, it’s not a case of display vs. search, but display and search (and a whole lot of other ways of reaching consumers). Whether you take the view that it’s your display campaign that helps your search to be more effective, or your search keywords that help your display campaign to drive more response, multi-channel online marketing – and the complexity that goes with measuring it – looks set for the big time. And by “big time”, I mean the army of small advertisers currently using systems like Google’s AdWords, or our own adCenter. So maybe we’ll see multi-channel conversion attribution in Google Analytics before long. diggDigg RedditReddit StumbleUponStumbleUpon

April 30, 2009

What would you like to ask Avinash Kaushik?

boxer The gloves will be tied tight. Brightly colored silk dressing gowns will be shrugged to the floor; gum-shields inserted. In the blue corner: yours truly. In the red (and blue, yellow and green) corner, web analytics heavyweight, Avinash Kaushik. As the crowd bays for blood, battle will be joined. The Garden never saw anything like this.

Well, ok, it’ll probably be a bit more civilized (well, a lot more civilized) than that. But at next week’s E-metrics Summit in San Jose, Avinash and I will indeed be going head to head in the “Rules for Analytics Revolutionaries” session on Wednesday May 6 at 3.25. In that session, I’ll be asking Avinash some genuinely tricky questions to really get to the heart of some of the thorniest issues around web analytics today, such as campaign attribution, free versus paid tools, and what, really, the point of all this electronic navel-gazing really is.

But I could use your help. In my comments box below, or via e-mail, suggest the question(s) you’d most like me to ask Avinash next week. This is your big chance to ask Avinash the question you’re too embarrassed/polite/nervous to ask him in person. If you’re going to be at the Summit, then be sure to come to the session to see if your question gets asked; if not, I’ll post a follow-up post here after the event and shall be sure to include Avinash’s answers to any questions from the blog.

So come on – what have you got to lose? It’s not like it’s you who’s going to be picking a fight with one of the industry’s most revered and respected advocates, is it? Leave that to old numb-knuckles here. diggDigg RedditReddit StumbleUponStumbleUpon

April 21, 2009

Google adds rank information to referral URLs

The Google bus drops of another visitor in VisitorVille An interesting post on the official Google Analytics blog from Brett Crosby appeared last week, in which he announced that Google is to start introducing a new URL format in its referring click-through URLs for organic (i.e. non-paid) results. From Brett’s post:

Starting this week, you may start seeing a new referring URL format for visitors coming from Google search result pages. Up to now, the usual referrer for clicks on search results for the term "flowers", for example, would be something like this:

Now you will start seeing some referrer strings that look like this:

Brett points out that the referring URL now starts with /url? rather than /search? (which is interesting in itself in its implication for the way Google is starting to think about its search engine as a dynamic content generation engine); but the really interesting thing, which Brett doesn’t call out but which was confirmed by Jason Burby in his ClickZ column today, is the appearance of the cd parameter in the revised URL, which indicates the position of the result in the search results page (SRP). So in the example above, where cd=7, the link that was clicked was 7th in the list.

As Jason points out, this new information is highly useful for SEO companies, who can use it to analyze where in the SRPs their clients’ sites are appearing for given terms. Assuming, of course, that web analytics vendors make the necessary changes to their software to extract the new parameter and make it available for reporting (or, alternatively, you use a web analytics package that is flexible enough to enable you to make this configuration change yourself).

As you can see from the example above, there are various other new parameters that are included in the new referring URL, which may prove useful from an analytics perspective (such as the source parameter). It’s also worth noting that whereas the old referring URL is the URL of the search results page itself, the new URL is inserted by some kind of redirection (this must be the case, since it includes the URL of the click destination page).

Using a redirect in this way means that as well as providing more information to you, Google is now also capturing more information about user click behavior, since the redirect can be logged and analyzed. Crafty, huh? diggDigg RedditReddit StumbleUponStumbleUpon


About me



Enter your email address:

Delivered by FeedBurner