Matt Stuart’s pictures are really really nice. He writes that he needs optimism for his work. I like that. I also like that much of his commissioned work is almost as good as the pictures that I would be believe are real found ones.
Both links I found this morning on BlogsNow. I spend some time yesterday with it. Brought the specific views back, that I had droped a while back. I still like it.
From what I read (cursory) the tchech artist Roman Tyc replaced the usual live landscape images to been at this time of day with this recording. This seems to be a rare case that the local german paper reported such mixed / pseudo news before BlogsNow. Usually it’s the other round.
Roman Tyc did replace some traffic signs in April in Praque
BlogsNow is back. The added spam detection seems to work. Since I never trust new code, especially not when I wrote it, I pay a bit more attention to which blogs get flaged as spam. Once they are flaged they are ignored. This shows the blogs that google had seen updates for in the last ten seconds. Good luck finding a legit one. There are in there. Somewhere.
Today I thought I had found another bug. Blogs like these: example example example example example example started showing up being spam. Although they are written by people. After looking into it I realized that these people participate in a ‘pay per post’ scheme: They get paid if they blog about something. Sandwich men. I decided to ban all those blogs. No matter if it’s a spam bot or a human being getting paid to write his/her own copy and flagging it all-so-PC with ‘paid post’: The effect is the same. Links from those sources can not be trusted. I am aware that I delete lots of mid range blogs with that. But then, I don’t care: There is no short supply in blogs. BlogsNow can afford to look for the pure ones. Interesting how spam-detection can be a good training ground for other, yet related, schemes.
In four minutes this video shows how we got here
Nice. And #1 @
BlogsNow is still the best source for non maintstream items. Those other tools seemed to skewed towards
the big mainstream and established blog themes and news. BlogsNow just ‘brute forces’ it: All links count,
all blogers do. If it matters to enough real people to link to, then it will make the list. No matter what it is.
I will miss Youtube. It’s as close I want to get to TV these days. BlogsNow is crawling back into existence. Slowly. It would only find two links worth mentioning right now. Bothsurrounding the Page meme. I wonder what the Republicans will come with. Maybee bomb North Korea?
Meandering through youTube’s suggestions I watched a bit of Daily Show (there is lots) and finally ended up with a video almost dedicated to Maf54.
little people a nice project. Almost cute.
Unrelated, just that I picked up it from there: BlogsNow is seriously clogged by spam right now. Problem is that 90% spam is preventing the crawl of the real sites get done in time. Many of those spam sites don’t even react in time. Spam is usually horrible about that. They just junk as much as they can, no matter if it even makes sense or not. They are just rushing to the next thing. And it’s done by idiots in the first place. No wonder they can’t keep their stupid little scripts straight.
If Dell would read BlogsNow they could have saved millions in PR by reacting quickly to this brewing PR debacle
The machine that BlogsNow runs on still freezes with
CPU 0: Machine Check Exception: 0000000000000004
bank 4: b200000000070f0f
I have replaced Memory, CPU and motherboad. CPU got an up- and motherboard a slight downgrade. It runs for a couple of days, and then dies. After it dies it sometimes get’s back up only for a few hours. Once it has been down for more than 8 it will run for another four days or so. After every crash I have to fix the database, which takes 70 minutes. The fix of the autoincrement insert_id however takes even longer and took the machine out in the last 3 out of 2 uses. I am running an
i686 athlon i386 GNU/Linux kernel 2-6-9 with an 8 port 3ware card.
I really don’t feel like replacing this expensive card, but might need to.
BlogsNow was great, as long it was running smoothly. Now it needs allot of care, and I am considering to never turn it on again. It’s sad to leave the field to all these others attempts. None of them is convincing me. I would miss BlogsNow.
But that’s just me.
During all those crashes due to faulty hardware I must have forgotten to fix the mysql database for BlogsNow before I used it. I had to remove som 800K blogs that all had insanly high index ids. Now I am trying to be a good boy and set the auto increment correct via:
alter table blog AUTO_INCREMENT number_here
Just that it seems to take for ever. The temp file and index file grow very erratic since one hour. I will let it cook for a couple of hours, but this looks fishy. 30 million entries should not be that much for mysql …
Just crossed the thirty Million weblog mark at BlogsNow tracks. Jason installed new memory, CPU and motherboard on the machine eight hours ago. I hope that the random crashes (MCE …4 ) it had are now a thing of the past. But it still could be the raid controller that causes the troubles. We will see.
Thirty million blogs! It’s online since almost two years, ran on three different machines in three different hosting situations.
There are about a quarter million web-pages in google with the term BlogsNow. popurls has roughly the same number. I think popurls started two weeks ago.
I find it very interesting to compare BlogsNow and popurls. The later one shot to internet fame instantly while BlogsNow only caters to a small and very slowly growing audience. Popurls is better than what I wrote in many many hours. The actual implementation of popurls would have taken me a week. Of course I did BlogsNow and not popurls. And also obviously I have hoped for instant internet fame when I wrote the fastest memetracker possible.
Technically I achieved my goal. BlogsNow’ performance is unsurpassed: It will reliably track _everything_ that people talk about. I still like it better than all others.
But let’s have a look why popurls got what BlogsNow wanted so badly and could never get. Popurl’s author is the first to say that the concept of a meta meta tracker is hardly new: diggdot.us paved the way into the mainstream, but I think there have been others.
The implementation of popurls works, it’s simple and nothings gets in the way. The design is what matters. It turns out that the idea of a meta-meta-track added with decent-design adds up to go over the threshold to become a meme in itself.
In the new attention economy you have to raise the interestingness of something above an imaginary threshold. It is almost impossible to push something there. Not even the biggest media buy will get you there. If your concept is not worthy then every person in the chain works against your piece (work, meme whatever it is). The resistance will become infinite. Many companies have wasted millions of ad dollars in the last years by ignoring this simple fact. If your idea raises above this threshold then it will attract more multipliers along the way. It’s too bad that ‘viral marketing’ became such a bad rep, since all agencies attached it to their failing attempt. Real meme’s do indeed work very much like a virus. The big difference is that every ‘host’ has the ability to alter the ‘virus’. We give those links, files and words new meaning when we pass them on. We comment them and make them our own. I could not say that from that last cold I got from someone and also probably gave to somebody else. The term ‘viral’ already contains the arrogance of agencies: They think they just can ‘infect’ the audience and then save their client some money in the media buy. Of course that’s now how it works: Most of their ideas are simply not good enough to compete with what is out there. There were some single incidences where commercials got some viral traction. None of them made room for a follow up. ‘Viral Campaigns’ by their definitions are one hit wonders. With the broadening of the tools and people getting more connected every day the odds move against the traditional marketers use of viral campaigns. Does this have anything to do with a BlogsNow vs. popurls comparison? Hardly. How did we get here?
Popurls became an instant ‘meme’, BlogsNow did not. I think in very simple terms I think it comes down to the fact that design matters. Popurls is an instant hit since it instantly communicates. The natural reaction is :”this is nice. I want to use this.” Within those 2 seconds a website has with a new visitor it is able to convey what is different about it. It tells it’s story well. The ‘elevator pitch’ of the internet must be over before the user blinks another time. There are always ten real and hundreds of potential other sites comparing with the one you are looking at.
My son is seven. He wants to write a computer game. He found google, and was very frustrated, that could not find ANY instructions how to make a computer game for seven year olds after he entered his request into google. I never told him to google it, or even to research it on the internet. His experience is that the internet might as well contain all the answers to everything. It’s this unprecedented amount of instantly available content that is the evolutionary pressure on every meme out there. The pace in that online (media) experiences grow in their quantity, variety and maybe even quality is equally unprecedented. So is the spread of the audience and their level of engagement: 1 person, 1 computer with internet connection, 1 hour. To say that the range of experiences is huge would be an understatement. It is awing. And spreading. We have the user on dial with internet explorer trying to get some news about balinesian dancing to the CEO on a laptop playing WOW while being on a plane. These are not the two extremes. These are two random points in an infinitely complex cloud of usage patterns. Hundred people watch a movie in the cinema. Their experience is pretty similar, they don’t even need to have seen it the same place or at the same time. Watching an old black white movie is almost a time travel experience. Hundred people what the movie on TV and the possibilities broaden. It used to be that the whole country watched the same stuff. There was the concept of a “Straßenfeger” in Germany in the 60s. It meant that a specific radio or TV series had such a big draw that it would ‘clean the streets’ (from people). There are only few events left that can obtain this mass attraction. Hundred people watching the same movie on TV might do so from a DVD, on the seventieth rerun, because their Tivo thought they should be watching it, or just because they are waiting for the show that follow this program. These hundred people might pay attention or might not. Since TVs tend to everywhere and just run after you turned them on, it is also one of the most ignored media outlets there is. Hundred people watching a clip that they got on a computer one way or another will have the most diverse media experience.
It is in this jungle of possibilities that the ability to communicate your idea between two blinks becomes mandatory. The idea needs to be decent, and then it needs to be easy to understand. Popurls instantly tells you that it’s a decent looking meta-meta tracker. Just when you think, ah so many links it kicks some pretty pictures your way. Your peaking ‘into it’ will be rewarded. By the time you have seen the entire page (2.5 seconds into your visit) you will have glimpsed over nine headlines, at least three or four are known items. popurls is a good mix of known and the new. All known is boring and will be clicked over quicker then you can say ‘boring’. All new is confusing. People are usually not curious enough to give new things the time to understand them.
The good design of popurls makes it work. As an engineer I thought that people would understand the aspect that BlogsNow is faster and more comprehensive than anything else. Even though it is, I did not manage to communicate this. And maybe it does not even matter. The first meme trackers got the webs attention, because they were a new concept. The fastest one is not a big deal. It’s functional difference needs repeated use an comparative analysis to understand. Memetrackers are not that important that people would do this sort of analysis. BlogsNow is a classic example for the fact that people and projects overestimate their own importance. Many web 2.0 startups think that they are Moses coming down from the mount Sinai. BlogsNow first goal was that it would keep me up to date with what is going on on the internet. And BlogsNow I can trust and, if needed, even tweak to let it behave better. Enough reason to let it keep going for next thirty million blogs. Popurls is not the first too and certainly not the last tool that will surpass BlogsNow in the amount of web attention it gets. It just is such a clear case that engineering does not really matter. The upside is that it was and is fun to code the fastest Memetracker there is. That, my constant use, and the fact that I have a neat copy of what mattered on the web in the last two years is enough reasons to keep it going. Despite the fact that nobody cares
BlogsNow started crashing. I wonder if it is the CPU temperature. Since I am a 5000 miles away from the computer I needed something to measure the temperature. This was harder to google to than it should be. In the end it was as simple as:
yum install lm_sensors
[accept all defaults]
[will output what the machine feels like]
now I need to copy this to another machine and then I know what happens when the machine dies.
A super simple monitoring way is to create an executable cgi file in your webservers path like this:
echo Content-type: text/plain
And then you wget / cron on a different machine to get the status and pipe it to one file …
right now the rank number seems to indicate that right now. I hope I go back an update this once the number has changed.
I sometimes I wish I could do that too. Tricky to check if the spam filters need a tweak by looking at the results and then not to click on something that people blog about right now.
Version one hundred and eight of BlogsNow:
BlogsNow has feeds. With a twist, of course: the title contains ‘BlogsNow’ to make sure that people realise who brought them a topic first. Links click trhough BlogsNow so that I can see where which links get used and scraped. I also add the time in the link, so that I can see how long links linger out there. There are more rss bots than ever it seems. I am curious where the BlogsNow rss content will show up.
I then looked at the css file and messed around. Some people would call that a redesign. I would never attempt to consider my css dablings to be design. If I can read the text on the page then I am happy.
The logo did not feel right anymore, so I got rid of it. Which does not feel right either. Maybe I will have an idea. Yeah right.
Finally I added BlogsNow in the list of links. Click on the ‘x links’ below the subject and you can see the link history. Where did a topic show up when. Interesting to see how BlogsNow compares to other trackers.
Mr Cheney shoots somebody in a hunting accident. No big deal really. The victim is up and well.
But it’s an interesting test of of all those meme trackers that are out there. I saw it first on BlogsNow, where it got listed one hour ago and occupies the number 1 spot with 50 links. Memeorandum has it as well. Also #1 there, not sure how long, there is Michelle Malkin and 3x an AP story as well as 6 links from the selected pool of sources they track.
All others however did not show the story at all when I visited them. I did build BlogsNow for speed and coverage. Looks like it does what it issupposed to do.
Looks like most of blogger is offline from here and from where BlogsNow looks.
So I told the bot to stop trying.
I just hope that gmail is been maintained better than blogspot.
Then on the other hand: After installig a postfix server recently I might be tempted to
keep gmail only as a backup.
You an apply the Broken Windows Theory at spam blogs as well. There was always ample opportunity for spammers in blogs. Now they are in, and they make revenue. So they enhance their spam blogs to stay in the game. Here two splogs out of a current campaign:
They certainly get better.
ok, provokative title. Let’s rephrase: google tolerates spam.
Blogger is owned by google. It runs the biggest blog service on it’s blogspot domain.
It appears to be very simple to create hundrets of thousands of ‘weblogs’ like this:
Created solely for spam purposes. So called ’splogs’. You set up a robot and there is nothing in the blogger software that stops you from adding all the blogs you like.
This is not new. Google / Blogger / Blogspot knows about it. They did nothing against it in the last years.
It should be relatively easy to make sure that there is a human in front of the computer if a new weblog is created at blogspot.com. Simplecaptchas are very common today.
There are two possible explainations why this did not happen yet:
- blogspot engineering is amazing incapable
- there is no real rush to get rid of splogs on googles side.
It might make sense:
You have to forget the “don’t be evil” and “organize the worlds information and make it easily accessible” google dogma’s for a second though. Google knows one thing very very well: how to run a scalable service. They have the lowest cost per stored bit due to their own file system technology. It uses commodity hardware and adds failover management brilliantly. It does cost google not much to host millions of splogs.
But wouldn’t million of false blogs pose a danger to the result-quality of a search engine?
Google knows from which ip address a blog get’s maintained. Nobody else does. They have the actual blog data readily available for further parsing. I doubt that the googlebot comes through the front door to blogspot. The bandwidth alone that you could be saved by crawling blogsport internally should make up for the ‘exception’ that this would mean to the googlebot operations. I don’t know these things. It’s a guess.
Every search engine has to have spam combat tools these days. Google is one of the most useful search engines and in the US they have an ok handle on search engine spam. Isn’t it funny that they don’t use their insider knowledge and acess together with their anti-spam tools to simple turn off splogs on blogspot?
Last October there was somebody that scraped famous blogers sites and reposted that content splogs. That got some attention, and stopped. But splogs did not.
Blogspot hosts lots of splogs. But also lots of legit and very powerful weblogs. Nobody can really afford to ignore the biggest weblog service. Yahoo, Msn and even my little BlogsNow have to crawl blogspot in order to find out what is going on. Google can skip the skip, all others have to deal with it.
There is also a third theory that is the most plausible:
splogs don’t matter to search engines. They have to crawl billions of pages anyway. Who cares about a couple of million spam blogs here and there. That’s probably what it is: The aircraft carrier keeps on going regardless if there are 50% more roaches in the kitchen or not.
The recent update (Version 92) to BlogsNow will go mostly unnoticed for most people. If you have never clicked on a BlogsNow link then you see a very reduced menu and an intro instead of ads. If you are new to BlogsNow then this should make understanding it easier.
If you are a veteran BlogsNow user and you don’t see the full meny anymore then you probably have cookies disabled.
Update 12/19/05 : looking at my numbers I decided to skip ads entirely for now.
yikes, for the last two days I had the weird feeling as if the earth stood still. It turned out that the DNS stopped working on blogsnow. No DNS no crawl. Of course. So I started to read the same things in the paper that I see on Blogsnow.
Now it’s fixed and things should change soon.
And of course email did not work either during those two days. I could not forward to gmail.
Almost got me into trouble.
Just moved the server to it’s new location. Things should be better now. More bandwidth and better reliability.
That would be the the theory. The next days we will see how that really will pan out.
Blogdex is still down. So I thought I might run some google adwords pointing people to BlogsNow.
Turned out somebody was faster: Right now I see an add for blogturbo dot com. Interesting what google advertises for:
It costs only 149 US$ and you can generate thousands of weblogs pointing to your site. This looks like a keyword spam tool to me.
Interesting that google runs ads for it.
Then I wonderred what is going on at daypop.com
Turns out they are down as well …
update November 1st
Blogdex: “up” again, yet results are old/pointless right now.
Daypop: back up again, results make sense. the usual 24 hour delay
blogturbo: still showing ads on google adwords for blogdex.
Weblogs.com now a VeriSign service has a new look. And they only show the last 100 blogs that pinged them on their main page. Makes sense. The content however has not changed: junk, junk and junk.
BlogsNow always checks for all entries how similar the blogs are than link to a given entry. Just now I tightened this parameter radically. It catches now more splogs and also a couple of ‘real blog entries’. If your fan club is not diverse enough, then you will not make BlogsNow as easy anymore. Looking at BlogsNow how it’s filtering I must say this makes sense: Those real blog entries are mostly linked by the very same group of blogs. Mostly on both sides of the political spectrum. Plain link repeaters have simply less influence right now. I like the results better: They reflect what is emerging in many blogs, rather then what gets pushed by whatever agenda there might be.
Yahoo bought blo.gs a while ago.
Their new blogs page is one complete splog-fest.
Verizon bought Moveover.com and will switch weblogs.com servers next week. That will be interesting.
last sploglosion event also diluted peoples ego searches there is finally some discussion about Google’s ignorance towards this problem: Chris Pirillo demands that Blogger be fixed or turned off I can imagine that it’s harder for him to execute the P(i)rillo Effect.
Sidenote: when will stop the habit of leavng characters out and feel cool about it? (splogplosion -> spam world wide web log exploison)
And, of course, Icerocket suffers as well.
The problem is not new: seven months ago I had to turn of Blogger in BlogsNow. All future rewrites of BlogsNow knew not to trust blogger content. BlogsNow crawls Blogger as much as possible. But not more. That’s why BlogsNow wasn’t hit by the latest boom of junk on Google’s blogging tool. It does help to have very little resources: With BlogsNow I have simply neither room nor bandwidth for spam. Spam would have killed BlogsNow within days, if it wouldn’t be able to defend itself.
Over at Jeff Jarvis there is some discussion as well on the very same topic. Sreven Den Beste comments along the lines that splogs on blogger.com might be beneficial to google. They definitely are. Wether intended or not: Google knows which blogs on blogger get read and which ones not. That alone makes for huge head start for all search efforts.
The concept behind those splogs is threatening not only a few ego searches on PubSub: The internet didn’t work to sell dog food (web/bubble 1.0). Now the internet is trying to sell information (web/bubble 2.0). While this approach is much more promising, all that spam is diluting. And it hurts innovation: Much of BlogsNow constant rewrites go into the upkeep of the status quo against splogs and other malicious symptoms. I would rather add features. And I would rather be able to trust the information out there.
It’s the same with all spam: In order for somebody to make 1 cent somewhere there are damages of hundreds of dollars elsewhere.
BlogsNow gets seven pings a second. I just had a cursory look over those. Yes, they are all spam.
If you should still ping BlogsNow in good intention please stop doing so. If you ping BlogsNow in the future then your weblog will go on the black list. Sorry.
Sunday, 2am Pacific, 5am Eastern, the blogosphere is active.
Hyperactive. Just that it’s all spam. Most of those so called splogs are hosted at googles blogspot.com aka Blogger.
Blogger is the biggest hosting outlet for real as well as for spam blogs. I find it very hard to believe that Blogger could not do more against spam blogs. They certainly have the technology. The six billion dollar question is, why do they let all this spam happen. First: They can. Google knows how to store vast amounts of data safe and cheap. Probably cheaper than anybody else. GFS and commodity hardware create an unbeatable combination that creates amazingly low costs per byte.
The second part of my explaination why 90% of all blogspot.com subdomains are junk is slightly more evil:
Blogger knows what’s spam and what is not. They know which IPs added which content. Those other weblogs that are not spam are an invaluable resourse. Blogger can tell it’s cousin googlebot where to go, and, more importantly where not to.
Others like Yahoo or MSN can not. They have to crawl and evaluate all that junk in order to find the gems created by all those real bloggers. This creates considerable costs for Googles wannabe competition.
“Don’t be evil” they said. Looking at a random blogspot blog these days it sounds more like: “Don’t be to evil to spammers”
A couple of months ago BlogsNow stopped accepting pings.
Pings are these little messages basicallly saying “hello Service XYZ here is weblog ABC, I have new content”.
The mother of all these ping servers is weblogs.com. Just got sold to Verisign.
I started to ignore pings, since they were mostly spam.
That did not stop them: Right now I am getting half a million pings a a day at BlogsNow. I would guess that 99.9 % of them are spam. Maybe I have a look one day and then use them to build a blacklist …