spam, human one

BlogsNow is back. The added spam detection seems to work. Since I never trust new code, especially not when I wrote it, I pay a bit more attention to which blogs get flaged as spam. Once they are flaged they are ignored. This shows the blogs that google had seen updates for in the last ten seconds. Good luck finding a legit one. There are in there. Somewhere.

Today I thought I had found another bug. Blogs like these: example example example example example example started showing up being spam. Although they are written by people. After looking into it I realized that these people participate in a ‘pay per post’ scheme: They get paid if they blog about something. Sandwich men. I decided to ban all those blogs. No matter if it’s a spam bot or a human being getting paid to write his/her own copy and flagging it all-so-PC with ‘paid post’: The effect is the same. Links from those sources can not be trusted. I am aware that I delete lots of mid range blogs with that. But then, I don’t care: There is no short supply in blogs. BlogsNow can afford to look for the pure ones. Interesting how spam-detection can be a good training ground for other, yet related, schemes.

5 Responses to “spam, human one”

  1. Gunnar Helsing Says:

    Hi, thanks for an excellent service! Would you care to share how spam detection works?

  2. Andreas Wacker Says:

    How spam detection works? Hm, it’s a mix of things. Basically whenever there I see spam in BlogsNow I try to get rid of it, when I have the time. Not by selecting each item and flagging it. Since spammers spam by the millions you loose the manual battle anyway. I try to find the pattern. The thing that is inherent to that kind of spam. Spam is machine generated. Which is also it’s weakness. If I remember the folklore right it was the habit of the Wehrmacht to start and end their enigma encoded transmissions in a repetitive and predictable way that allowed the amazing people in Bletchley Park to end the war at least a year early by decoding what the Germans thought to be unbreakable. Spam detection in BlogsNow is a super simple problem compared to this, but it works on the same principles: Spam originates with an intent. It is made to camouflage it’s intentions. But there is still a pattern. And, interestingly, mostly the whole thing is rather crude and pretty easy to detect. BlogsNow will always react on the easy kind of spam: the one that goes out in brute force and poisons the Meme pool in a rather distinct manner. Enough Amplitude to make a radar blip. That does make detecting spam actually quiet easy. So far. Maybe tomorrow I loose the battle. Who knows?

  3. Jonas Says:

    Cool. Show us the source :)

  4. chaoskaizer Says:

    Hi andreas, what a wonderful webservice. Im a virgin blogsnow. I just notice blogsnow while searching on Y! sitexplorer. I really like the trends for each keywords. If you dont mind, how is the blogs list order? by date, content, pr.

    IMO google did the right thing to twart all the payperpost blogger. to much poison and fabrication in blogsphere is something not to be admire.

  5. IludiumPhosdex Says:

    How can I recommend adding my own blog to the BlogsNow tracker?

