Tuesday, September 05, 2006

More on splogs, RSS feeds and Web2.0

I learnt a new word last weekend and discovered that some people on the Net believe I am a criminal.

As you may have seen in a recent post on this Blog, I was asked by another Blogger why their material was all over a web site I run, GrantAvenue.co.za. The Blog was headlined, as I remember, “Copyright or Copycat.” I was accused of copying material into my website and passing it off as my own work. If they’d known that their material was also available on a number of other sites and that I was using a more complex duplication system than “cut and paste”, perhaps the polite email I received would have been slightly more acrid.

But it’s all better now. Mushypeasontoast.blogspot.com (an award winning Blog, I hasten to add) and I have exchanged emails and I’m pleased to say I’ve been invited to parse the peas RSS feed into the database again.

The word I learnt was splog. When I looked at the mushypeasontoast Blog, I saw that a number of anonymous posters had taken it upon themselves to find me guilty of that most heinous of Internet crimes: stealing intellectual property in order to waste people’s time and make money. I followed a link to a Blog that swore undying hatred and an unremitting search for extreme vengeance on anyone who copied their material, unless the author got paid.

SHOCK, HORROR, PROBE!! I was, apparently a “splogger.”

I knew the juxtaposition of s(pam) and (b)log couldn’t be good, but I had to look the word up to see what it was I was doing wrong. Here’s the current Wikipedia definition of a splog


“Spam Blogs, sometimes referred to by the neologism splogs or Blam!, are weblog
sites which the author uses only for promoting affiliated websites. The purpose
is to increase the PageRank of the affiliated sites, get ad impressions from
visitors, and/or use the Blog as a link outlet to get new sites indexed. Spam
Blogs are a type of scraper site, where content is often nonsense or text stolen
from other websites. These Blogs contain an unusually high number of links to
sites associated with the splog creator which are often disreputable or
otherwise useless websites.”

Now in case you’ve never seen GrantAvenue.co.za, its about thirty pages; about five relate to the actual Grant Avenue, in Norwood, Johannesburg and the rest are simply pages of recent news (and Blog) articles, directly linked to the original source.

So as far as I could see I was I was good on the major identification pointer for this new animal, a splog. Nearly all links in the grantave.co.za system are directly to the actual news sources. In fact currently there are over 250,000 links to external websites, one for every article, completely unrelated to GrantAvenue.co.za itself. I think there are two links to external websites that I have built.

However, the charge of intellectual property theft was real and I began to wonder if in fact the whole RSS aggregator model was flawed. After all, I am copying the material in the RSS feeds, and I’m not even asking permission. So at one level I absolutely was stealing very valuable material. To compound the crime, I was then repackaging the material AND selling advertising on the website.

But then I went back to basics. Why do people Blog? Because writers want to get read. And Bloggers want start conversations.

In the last ten years, it has become obvious that the most successful way to become a popular site, once you’ve developed your content theme, is to get a good listing in a search engine. And what is GrantAvenue.co.za but a very particular kind of search engine?

It’s a search engine where the search criteria are already defined. Articles that include words I think are related to the northern suburbs of Johannesburg are automatically displayed on the home page of GrantAvenue.co.za. Other pages pull articles that are related to Cape Town, Pretoria, the World Cup and so on. The results are displayed in a time based, descending sequential order.

Of course, Google and many, many other organisations offer similar, most often more powerful, Blog search facilities. However, local is very good in this business, as smaller, more focussed databases can provide a more eclectic, yet less international result.

This technology is so easy to use, and the information so extraordinary in its depth and scope, that RSS feeds are now of use in a personal, SOHO, small business and corporate environments. Soon many will have gigabytes of RSS feeds stored away on their machines.

Each database and website is going to be different. The wide variety of sources, the choice of which RSS feeds to include and the continuity and reliability of the capture system are going to be factors that impact on the use of an aggregator system and its search results.

On a slightly technical design note, some aggregators display the first few lines of the RSS feed, some display the whole text feed. I, like the founders of the RSS system, believe that the full text feed is the default. So, Bloggers, if you want to send out only a few lines of text with a link to the body of your article, that’s your prerogative. But those providing full text RSS feeds should not have their material untimely trimmed, or be penalised by those who wish to use the system incorrectly.

In the GrantAvenue.co.za case, every six hours, the server I rent at vistatpages.com activates a cron routine that updates the database. The feedonfeeds software serving GrantAvenue.co.za currently pulls in around 400 RSS feeds.

Adding and removing a feed from an aggregator is usually very easy, as the RSS structure is designed to help users massage the data in their databases so to save time, but still get to see the information they want. This is part of the problem. Subscribing to a RSS feed takes but a second. That makes the readership much easier to grow. If one had to write to the creator of an RSS feed and get permission to use their material, then the whole process of increasing a Blog’s readership would take much longer and in many cases, the integration would simply not happen.

This does not of course mean that copyright can be trampled underfoot. If you don’t want your material used in an aggregator, produce a Blog with no RSS feed, put a message in the footer of each article or use a system that puts out two versions of the RSS feed: full text for those registering their email address and a short text teaser for the rest.

What is extraordinary about this whole RSS phenomenon, at least to my mind, is that most often a complete copy of the material is stored and even more bizarrely, the creator has little idea of what will happen to the copy of the material being produced.

The Internet is changing to an “always-on” mode, aka Web 2.0 and RSS feeds (a.k.a. XML et al) take advantage of that. But getting exactly the information you want, when you want it, takes some work. There is a lot of new material out there. The PC based aggregator software need setting up, fine-tuning and periodic updating.

The Web 2.0 that is coming, based on broadband, is much more about collaboration than ownership. To use and enjoy the Internet of the near future, you’re going to have to be smarter, more flexible, have more patience and commit more time to understanding new concepts.

Today, websites are often designed to be used, rather than simply read, so they often can be difficult to understand. Web 2.0 developers don’t have much time for bells and whistles like platitudinous help systems. The programmers and designers are too busy making the bare bones of the new infrastructure to worry about users who don’t want to spend the time to get it. If it’s a good enough idea, enough people will become involved, add and contribute to make the RSS feed/web site/network successful. If some arbitrary user doesn’t get the concept, gets in the way or just doesn’t get with the program, they will simply be left behind.

It’s a very old saying: “there’s no place to hide on the net” and those who shine on the Web 2.0 will be those who give both their time and their content. Those who want to play in “walled gardens” have to include the lack of external input as a major cost to their decision. If people aren’t open to ideas, or are hypocritical or devious, their actions begin to show in the harsh spotlight of history and advertising sales.

Getting content is really easy. Dave Winer designed RSS feeds to transmit the full text of articles, but they can also include images, sound and video. Hence podcasts – which incidentally now have a bigger audience than Blogs, I believe.

Freeware software for PC’s that aggregates the RSS feeds of your choice is freely available. I guess it shouldn’t take more than a few hours for a typical, broadband-equipped, computer user to set up their own, individual RSS feed system, automatically bringing the latest news of interest to them, right to their desktop.

Online aggregators serve other functions. For those already using RSS, they offer a way of finding new content. For those still exploring the new paradigm, they offer a window into the extraordinary range of offerings. And then, of course, online aggregators provide access to the Blogosphere for those who don’t have a computer or who don’t have broadband.

So that at least gave me some comfort against the charges of copyright infringement. There were other people doing it and the online aggregator model did have a raison d’etre.

The second objection to online RSS feed aggregation, namely loss of income for the creator is farcical. The Google advertising system has resulted in a general awareness of online advertising, without knowledge of the reality. Some Bloggers are demanding that they receive a percentage of income received by online aggregators.

The trouble is that the value of any particular Blog article is pretty small. For arguments sake, lets say that I am making US1,000 a year from advertising sales on GrantAvenue.co.za. There are 250,000 articles in the database, so each article could be seen to be worth about quarter of a US cent in advertising revenue. Split that amongst the creator (1/3), infrastructure costs (1/3) and management (1/3) and, if I’ve got 5 million articles and my income is US$10,000, each article could be seen to be worth a one twentieth of one US cent to the Blogger.

Perhaps if I wasn’t so busy looking for seed capital for my security web-cam system, I should be talking to VC’s about a relationship system for Bloggers and online aggregators. Building a micro-payment system for Blog articles could be interesting.

But at the moment I don’t think a workable solution exists, the average value of a Blog article is likely to plummet even more as time passes and anyway, both parties win in the current situation.

Bloggers get more readers (and then advertising revenue from their own sites), which is what they want. Aggregators get to build interesting databases, make advertising sales and have an opportunity to build a community, each of these being valuable in their own right in today’s world.

For example, if anyone wants a copy of the GrantAvenue.co.za database of 250,000 articles from the last three months, send me an email. It’s available for a very reasonable price. However, this article is provided free of any copyright or restrictions whatsoever, but a link to GrantAvenue.co.za would be nice!

1 comment:

ATW said...

Hi Tony,

Solid post that you've put some solid researh into. I've posted a follow-up on my site.

I must apologise for the lack of faith in you: viz this comment on Peas site: "Maybe he responds positively to your nice request and all is groovy? Not likely.".

You did respond positively and all is groovy.