A Django site.
April 25, 2008

Phil Windley
pjw
Phil Windley's Technometria
» Web Authentication with Selective Delegation using SRP

Bryant Cutler and Devlin Daley developed a methodology for adding selective delegation to relationship-based identity systems. This afternoon I presented that work at WWW2008. The talk went well. There were probably about 40 people in the room. There were some good questions afterwards, so all in all, I'm pleased. Here are the slides (PDF) if you're interested.

Tags: www2008 security identity delegation

» Tyler Close: Using Promises to Orchestrate Web Interactions

Tyler Close answers
questions after his talk
Tyler Close answers questions after his talk
(click to enlarge)

Tyler Close of Waterken fame presented a way of using promises to produce succinct JavaScript (and Java) code for doing multiple asynchronous requests with a Web server. The idea of promises in asynchronous systems was developed by Barbara Liskov in the late 80's. Tyler has a tutorial online. I also found this description from Brian Lothar of Web calculus which discusses promises in that context. Very interesting stuff. I think this was my favorite presentation of WWW2008.

Tags: www2008 javascript programming+languages

April 24, 2008

Phil Windley
pjw
Phil Windley's Technometria
» WWW2008 Conference Dinner at Great Hall of the People

Great Hall of the People
Great Hall of the People
(click to enlarge)

We just got back from the WWW2008 conference dinner at the Great Hall of the People, China's parliment building and center of state ceremonial activities. How the conference got permission to have the dinner there, I don't know. I do know it wasn't cheap. Extra tickets were $150 and they said that was cost.

In any event it was quite an event. The banquet hall was huge, the food was first rate and quite varied, and the entertainment well planned. I enjoyed the whole evening. I've posted some pictures of the event. Unfortunately, they told us we couldn't take a real camera, so all I had was my iPhone. Given that constraint, they didn't turn out too badly.

Tags: www2008 china

» Computational Advertising

Andrei Broder of Yahoo! Research
Andrei Broder of Yahoo! Research
(click to enlarge)

I'm in a talk by Andrei Broder, a Yahoo! Fellow and Vice President of Computation Advertising on, what else, computational advertising. I was drawn to the talk by the title.

Find the "best match" between a given user in a given context and a suitable advertisement. Context could be click stream, page content, or something else. Key ideas:

  • The financial scale is huge. Small constants matter.
  • Advertising is a form of information
  • Finding the "best ad" is a type of information retrieval problem.

Classic advertising falls into one of two camps: brand advertising that is projecting a message and direct advertising that is attempting to elicit action. Coupons are a classic example of direct marketing.

For advertisers interested in online (keyword) ads, the key issues are

  • what words to buy
  • how much to pay
  • spamming is an economic activity

For search engine owners, the questions are

  • How to price the words (auction)
  • How to match ads to content

The problem with matching is that it's not purely syntactic. For example, an ad for Seattle hotels ought to match "Alaska cruise starting point" but not "Seatlle's Best Coffee Chicago". Finding the right ad is a query problem, but the ad database is smaller than the database of web pages. The the entries are smaller pages (less content). An ranking is not just based on matching, but also the bid.

There's been a lot of progress on this problem in recent years. Matches are not syntactic. What's not solved? Filtering for relevance. Ads on a page about Scotter Libby's testimony included entries for Libby Shoes.

We're moving from an explicit demand for information driven by a user query to active information supply driven by user activity and context. This requires the increased use of semantics and context. An information supply engine looks at user profile and context, the activity context (browsing) and the ad inventory, and provides an ad. User action then feeds back into the system.

There is a different quality (utility) factor for publishers, advertisers and users. The ad agency has it's own economic interest. Different types of ads (text, graphical, multimedia) are not easily compared.

One technique is to allow the searcher to peak at the result to determine what a query is about. For example viewing the query "TFM-PCIV92A" doesn't give you a lot of information about what this is about, but looking at the results tells you this is about 56K baud modems. Note that if you do that search in Google, you don't see any ads for modems. If you modemsearch modem, you'll see all kinds of sponsored ads. Why isn't Google figuring out the first search is about modems? (this is at least true from China...)

Finding better approaches requires interdisciplinary techniques: machine learning, optimization, information retrieval, statistical modeling, microeconomics, and so on.

Tags: www2008 advertising

» Taking Search to New Frontiers: Dr. Harry Shum (Microsoft)

Harry Shum
Harry Shum
(click to enlarge)

The Web can be divided into three components: content (pages, images, videos, blogs, feeds), people (readers, writers, creators, commenters), and actions (queries, clicks, pageviews). Current search engines have taken advantages of "keywords" to link those three components together. But the keyword model has reach it's limits.

One phenomenon that's challenging keywords is the explosive growth of content. Multimedia content is especially difficult The scale requirements are huge. Another challenge is that the Web is becoming more dynamic: people want to interact. Search engines have a long way to go to satisfy user needs. To make progress, we have to stop worrying about just the content. We need to consider the context.

Users are not anonymous, but they form a community with specific interests. Actions are not random, but are driven by intent. Semantics is important. Extracting semantics is difficult.

There is a practical approach to semantics: understand->extract->expose. It should be data-driven, incremental, and interactive. We need to derive concepts from content, people from users, and intent from actions.

Understanding content has three vectors: intra-page intelligence, inter-page intelligence, and temporal understanding.

The technologies more useful for understanding users as people have been personalization, collaborative filtering, and analyzing social graphs. Personalization has failed to live up to it's promise. Harry demos Gianxi, a Microsoft Research project that searches the social network. This isn't online yet as far as I could see. Reminds me of something Rohit Khare shoed me at the last WWW in Banff.

Harry Shum (right) and his demo partner Graham (left)
Harry Shum (right) and his demo partner Graham (left)
(click to enlarge)

Deriving intent requires contextual intelligence, mobile awareness, and intent refinement. The better we d with query classification, the better we do with user intent. Is there commercial intent? Is it location sensitive? Harry shows a demo (actually it was his trusty sidekick "Graham") where user action (dragging a particular picture to a special zone on the page) reorders the search results and filters them according to additional user action. This is a great example of how understanding intent give much better results than mere keywords. "Give me things that look like this..." This demo actually generated applause from the audience.

One of the demos was actually hobbled by the "Great Firewall of China" according to Graham. Interestingly it was searches of video from Hillary Clinton. The demo extracted the most relevant portions of long videos and showed just the relevant snippets. Seeing the relevant portion, viewers could then select the whole video.

In order to get more out of search, we have to understand semantics, extract it, and then expose it to the user for further refinement.

Tags: www2008 microsoft search

April 23, 2008

Phil Windley
pjw
Phil Windley's Technometria
» Trust-Based Recommendation Systems

Reid Andersen from Microsoft Research is talking about trust-based recommendation systems (PDF). To build a personalized recommendation, you need a trust graph among users. What system should you use to determine the recommendation? The researchers use an axiomatic approach.

The context of their axiomatic system is social choice theory (see Arrow's impossibility theorem for voting systems from 1951). More recent treatments are Webpage ranking systems (Altman, Teeneholtz, '05).

The details are fairly complex, but the basic idea is that by proposing axioms until you get an inconsistency in the axiom set and then backing off and exploring other axioms to add to the set, you can generate unique recommendation systems that have a provable set of properties.

The overall model is simple, but there are several nice result including being able to show incentive compatibility which avoids self-interested bias in the recommendations. For details, see the paper (PDF).

Tags: reputation trust identity www2008 www2007 www2006

» Cloud Computing: Dr. Kai-Fu Lee of Google

Main hall where keynotes
were held.   I love the red slip covers on the chairs.  They were
more comfortable than your standard hotel chair.
Main hall where keynotes were held. I love the red slip covers on the chairs. They were more comfortable than your standard hotel chair.
(click to enlarge)

The opening keynote at WWW2008 is Dr. Kai-Fu Lee of Google.

Before the keynote, we were treated to a presentation that featured dancers in blue Spiderman uniforms, a dancer in what I assume was traditional dress, and a guy with a "Welcome to Beijing" banner running through them all. Somehow, it seemed to fit perfectly even though it was the first of it's kind at any tech conference I've been too--especially one that's essentially academic.

We received a welcome speech from Dr. Yong Shang who is the Vice-Minister of the Ministry of Science and Technology. It basically said "thanks for coming, China's pushing forward with Internet technology." No mention of the firewall. :-) As an aside, the fact that I can find him and his ministry on Google in English speaks louder about what he was saying than his actual words. No doubt the Chinese government understands the power of the Internet. That said, in terms of eGovernment, there was mostly information there, not much in the way of services I could see.

The Internet connectivity has failed and we're not even 30 minutes in. Hopefully it will come back up. I was planning on watching Twitter for news of the Pennsylvania primary. The opening ceremony has gone on for 40 minutes now. Finally we're ready for Lee's keynote.

Cloud Computing: Dr. Kai-Fu
Lee of Google
Cloud Computing: Dr. Kai-Fu Lee of Google
(click to enlarge)

He starts out asking what people want. Many of his answers were specifically about accessibility and it's control. There are four key attributes of cloud computing

  1. Data stored in the cloud
  2. Software services are increasingly moving to the cloud and accessed through the browser
  3. Based on standards and protocols
  4. Accessible from any device

Interesting that this is more or less the Google's core set of beliefs. Companies often distinguish themselves from Google in departing from these principles. The world has moved from hardware-centric to software-centric to service-centric.

Six ideas driving cloud computing:

  1. User centric Data is stored in the could and follows you and your devices. Data accessible anywhere and easily, safely shared with others. He mentions several obvious examples of Google services that meet this definition.
  2. Task centric People don't want to make spreadsheets or write documents. Rather they want to plan a curriculum or collaborate on a business plan. Right now, of course, all Google's examples are simply documents or spreadsheet with collaboration built in.
  3. Power Lots of computers in a cloud can do things you can't do with a single PC. Google search is faster than desktop search because there's lots of computers on the task. Cloud computing isn't just about moving things off the desktop, but bring more data and compute power to bear on the problem.
  4. Intelligent Intelligence comes from data mining of massive data. "A ton of data is more valuable than an ounce of algorithm." I'm not sure that says much. Machine translation is a good example where feeding lots of good translation data into a learning algorithm leads better translation of general text. Storage + analytics = intelligence.
  5. Affordable Of course, this all uses a lot of computers and that gets expensive. Google's strategy is to use cheap machines. 1000 CPU PC-Class machines cost about the same as on 64-way high end machine and give 30x the performance (warning: data may be out of date). The actual numbers at Google are even greater since Google builds it's own hardware. Faulty hardware can be overcome with a sophisticated software layer. This is the heart of engineering.
  6. Programmable How do you program 1000's of flaky servers? Fault tolerant distributed disk storage, distributed shared memory, and a new programming paradigm. Google uses GFS for file storage: every piece of data is replicated three times. Anytime a server holding on of the three chunks dies, the others notice and make another copy. The shred memory architecture is Big Table. The programming is done using MapReduce, a way of creating parallel algorithms. Between Mar 2005 and Sept 2007 the number of processes using MapReduce went from around 72000 to over 2 million!

Cloud computing requires new skills. This is very true. We don't do enough to teach these skills to students. We ought to be introducing parallel computing in the cloud as the second programming course--ensuring that the first emphasizes the building blocks for the second. This probably means it's not in Java.

John Breslin has an excellent write-up of this speech as well.

Tags: www2008 wgoogle cloud+computing web+services

April 22, 2008

Phil Windley
pjw
Phil Windley's Technometria
» Exploring Beijing

Parking attendent
Parking attendent
(click to enlarge)

I'm in Beijing for WWW2008 which starts tomorrow. I came out early (last Saturday) because I find conferences much more enjoyable when I'm not suffering from jet lag. I'm pretty well adjusted now and I'm looking forward to the talks tomorrow.

In the meantime, I've taken some time to explore Beijing a bit. Sunday I was quite tired and other than going to church, a fun experience in Beijing, stuck close to the hotel. It was rainy both Sunday and Monday, so the weather wasn't up to outdoor activities.

Because of that, I decided that the best use of Monday was to do some shopping. I'd been told that I ought to go to the Silk Street Market, so that's where I headed. What an experience. Six floors of stalls crammed with everything from clothes to watches to electronics to luggage. Most of it is branded with famous brands. Not many of them real, of course.

More workmen
More workmen
(click to enlarge)

The stall vendors are very forward, even clinging to you to get you to come into their stall. The first price you get quoted is 4, 6 or even 10 times what they'll settle for. I'm not very comfortable negotiating and don't like it, so I probably didn't get the best possible price, but I did pay significantly less than the first price quoted. I got some fun gifts for my family. I won't name them here for obvious reasons.

The sun was finally out in the afternoon and since the hotel I'm in is close to some of the Olympic venues, I walked around a bit and took some pictures. I was fascinated to see the workers. For example, they were working on a sidewalk in front of the Bird's Nest stadium (where the opening and closing ceremonies will be held). There were at least a dozen of them all working with hand tools--picks and shovels. No power equipment of any kind being used to build a sidewalk hundreds of yards long.

Rain spout
Rain spout
(click to enlarge)

Today I took a tour of the Great Wall and the Ming Dynasty Tombs. The best part was getting out of Beijing proper for a bit and seeing some of the country side. There is beautiful country not far out of Beijing. Of course there are still people everywhere. The Ming Tombs were amazing in size.

I went to the Badaling area of the wall. This is not a wall over flat terrain, but up and down mountains. I scratched my head in wonder when I thought about people hauling all that stone up those mountains. I hiked up to the top the section where we were and it was very steep. I'm sure my knees will be reminding me tomorrow of the journey.

More pictures of how steep it is
More pictures of how steep it is
(click to enlarge)

We also spent a little time at a jade factory (refactory, I supposed since the original factory was the earth) and had lunch in a cafeteria at the back of a Friendship Store (government run store for tourists). I've had better food. The people making the jade pieces and Ming vases were working in almost unthinkable conditions from an OSHA perspective. But I'm sure they're very well paid in compensation for the danger (sarcasm).

Tonight I went to the Microsoft Research Asia reception at Microsoft's Beijing facility. The food was just so-so, but I enjoyed seeing the demos and talking to the researchers. There were some very fun projects.

The only one I went to that had a handout and a Web page, was the Excel Web Data add-in. This is essentially a very sophisticated screen scraper that puts its results in Excel and can refresh them as the Web page changes. I don't think it runs on Excel 2007 on OS X--at least the installer is a .exe. Maybe I'll fire up Fusion and give it a go later.

Another one that was pretty cool was a mobile application. Imagine two mobile phones streaming separate copies of a movie. When they get close together they both start streaming and showing just half the movie--combining their screens for more pixels. Swap their location and they swap the half of the movie they're showing. The amazing thing is that this coordination isn't done with radio signals, but with sound. The phones chirp to let the other phone know where they're at.

I'm having fun trying to decipher characters. I knew around 250 characters when I lived in Japan. Many of those are coming back and are similar enough to recognize. Of course there are thousands of characters that an educated Chinese knows, so a few hundred doesn't do much good. Even so, it's fun and helps with getting around some.

So far, China has been amazing. The amount of industry and innovation you see everywhere is beyond belief. This is a country that's movin' on up. Of course, everything is being spruced up for the Olympics and there's plenty of poverty around, but the message that comes through loud and clear is that people are working their way up.

I've taken a bunch of pictures. You'll find them all in my photo album for WWW2008.

Tags: beijing china www2008 www2007 www2006