Archive | January, 2004

Ode to Microsoft

Posted on 30 January 2004 by Demian Turner

Thanks to my old colleagues in Spain for this one, a familiar tune but with a twist.

Comments (0)

Winamp/GNUtella Creator off to Greener Pastures

Posted on 28 January 2004 by Demian Turner

Another story from the ‘programming hero’ bag, Justin Frankel, the creator of Winamp, leaves the AOL umbrella perhaps to the relief of company execs.

In case you weren’t aware, Frankel is the programmer behind Winamp, almost certainly the most popular Windows MP3 player out there. His company, Nullsoft, was bought by AOL for $86 million last year.

Since then, Frankel has been busy. First he programmed Gnutella, an open source file sharing application that can be used as a replacement for Napster, among other things. This did not make AOL happy, especially because they were starting merger talks with Time Warner at the time.

Now comes part two. Frankel has programmed a Winamp plugin that blocks out the advertisements in AOL’s AIM instant messaging program. The ad areas are currently used to advertise AOL features; install Frankel’s plugin and they’re replaced by plain white boxes. Start playing a tune in Winamp and the boxes turn into a graphical display that changes with the music.

Also check out the code section of his site, some interesting source for browsing.

Comments (0)

ZOE – Googling your Email

Posted on 24 January 2004 by Demian Turner

Bumped into the ZOE project about a month ago and have finally had time to play around with it a bit.  ZO?is a Java project that’s basically an information aggregator.  To get things up and running on Windows it’s just a matter of unzipping the archive and double-clicking the Zoe.jar file.

ZO?has it’s own webserver and runs standalone on your local computer, you interact with it through a web browser.  What was interesting for me about the software was a) its minimalist, Zen interface, and b) its unified  handling of aggregated data. 

The idea is that you use ZOE to handle your email, as an RSS aggregator, as an FTP client, etc.  Then ZOE links the data in terms of data sources, be it people, blogs, websites, or emails. All are searchable via a single interface powered by the Lucene search engine.

What is going on behind the scenes in terms of the built-in database must be quite interesting.  And what a luxury to be able to use the software with zero technical knowledge or setup work – double-click and you’re off.

Here are some screenshots to give you an idea:

1. Main
2. Search
3. Message
4. Thread
5. List
6. Name
7. Organization
8. Domain
9. Timeline
10. Preferences
11. Account
12. Stylesheet
13. Login
14. RSS
15. Entourage
16. FTP
17. Mail.app
18. Blogger
19. Aggregator

Comments (0)

Apache Monitoring

Posted on 24 January 2004 by Demian Turner

In the same vein as the previous article, here’s an interesting discovery I made regarding performance monitoring of Apache.

We’re doing fairly heavy traffic at the moment, over 200 reqs/sec to a single webserver runing PHP/MySQL.  Running Apache’s status module gives some good interactive feedback regarding process/child/slot status, busy threads, CPU usage, etc., all summarised at http://example.com/server-status.

For an idea of the output check out the status pages of the Apache or PHP sites.  Although this kind of feedback is invaluable, more useful still would be something that keeps a record of this data over time. 

Enter the Apache.mrtg project.

All that’s involved in getting the stats is running a cronjob which fires off MRTG which reads a config file that takes input from a Perl file.  The Perl file gets its input from any website’s server-status output, given you call it in the form

http://www.example.com/server-status?auto

Here are some example of the output:

# xpto.org
# apache.org
# samba.org
# www.php.net

The cronjob I used looks something like this:

0,5,10,15,20,25,30,35,40,45,50,55 * * * * env LANG=C /path/to/mrtg /path/to/apache.mrtg.cfg

Comments (0)

Getting Intimate with your Applications

Posted on 24 January 2004 by Demian Turner

Part of being able to write decent code is having the ability to know exactly what’s going on with the various services you’re interacting with, the webserver, the database and PHP itself for starters. If you’re using a database abstraction layer, sometimes you’d be surprised by the SQL that results from your API calls. This is even more the case if you’re using a data access layer like PEAR’s DB_DataObject, for example, which sits on top of PEAR::DB.

I recently had an issue where the integrity of a transaction was failing, but the point of failure was difficult to pinpoint. The first problem to tackle was the use of sequences, which is a great way to avoid getting locked in to how a particular database vendor handles incrementing record IDs, I’m thinking here of the MySQL-only concept of auto-increment.

Comments (0)

FOAF Degrees of Separation

Posted on 11 January 2004 by Demian Turner

Thanks to the FOAF standard, by creating an xml file with some of your personal details you can do lots of fun things:

To get up and running in no time you can use the FOAF-a-Matic tool.  Or you can use Davey Shafik‘s new contribution to PEAR: XML_FOAF.

Sites like http://www.ecademy.com generate your FOAF details when you register, as does Drupal I believe.  I took Sebastian Bergmann’s example and added a few fields I saw in other people’s profiles.

Comments (0)

Lamo’s Adventures in WorldCom

Posted on 09 January 2004 by Demian Turner

A real epic 😉

As he has with other networks, Lamo found the keys to WorldCom’s kingdom in open Internet proxy servers. In normal operation, a proxy server is a dedicated machine that sits between a local network and the outside world, passing internal surfers’ Web requests out to the Internet, often caching the results to speed up subsequent visits to the same URL.

But it’s easy and common for administrators to inadvertently misconfigure proxy servers, allowing anyone on the Internet to channel through them. Sometimes companies and organizations even unknowingly run proxies. Hackers and privacy-conscious netizens catalog these open proxies, using them to anonymize their surfing. Lamo has perfected a different use: jumping through them to pose as a node on a company’s internal network.

Comments (0)

Want your Secure Session Implementation Reviewed?

Posted on 08 January 2004 by Demian Turner

a very interesting offer over at php magazine:

Have you read the first digital issue of the PHP Magazine yet? If not you should. The first issue of the magazine is on us – you can download it free. We feature a brand new cover story titled “The Truth about Sessions” that takes a detailed look at implementing a secure session management mechanism with PHP.

Chris Shiflett, who wrote the cover story for us, has been kind enough to accept submissions from people who think they have a good implementation for securing sessions. In exchange, he will (hopefully) be able to reply to each person with a review of their implementation. He then plans to compile a list of his favorite techniques, and it could turn into another full-fledged article, that we promise to give away FREE.

Interested? Turn in your submissions to shiflett@php.net

Comments (0)

MSN Google Shootout

Posted on 07 January 2004 by Demian Turner

MSN To Settle Paid Inclusion vs. Algorithmic Results. Paid inclusion will become a battleground in 2004. In January 2004, Yahoo is expected to replace Google with Inktomi to power its main search results. Inktomi has a paid inclusion program, which is being combined with the paid inclusion programs of AltaVista and FAST and will be sold by Overture through resellers like Marketleap and Position Technologies. In 2004, this will fuel an ongoing debate between Google, which does not support paid inclusion philosophically, and Yahoo, which does.

Google will argue, “Our search results represent our editorial integrity, and we have no plans to alter our automated process, which works very well in gathering information and delivering highly relevant results.” Yahoo will argue, “Paid inclusion maximizes your reach by including pages that otherwise might not be crawled.”

The debate will become heated and watched closely by Microsoft, which plans to build its own crawler-based search engine. The winner will be determined when Microsoft announces which approach it believes provides the most relevant results. This won’t happen until late 2004 or early 2005.

Greg Jarboe
http://www.seo-pr.com

MSN Follows Google’s Lead. MSN officially launches MSN search based on their own technology (mid to late 2004). After some AskJeeves-like attempts to show paid results for the majority of the SERP screen real estate, they realize nobody trusts this model and decide to go to a Google-like spare screen with only Overture results on the right and two paid results clearly labled at the top of the page.

Since dropping Looksmart paid results in January, they announce they’ve been developing their own PPC engine and will spend 6 billion dollars in developing it over the next ten years, incorporating search into the Longhorn operating system – delayed again year-end to make security upgrades.

Comments (0)

Understanding Classic SoundEx Algorithms

Posted on 07 January 2004 by Demian Turner

Search Names & Phrases Based on Phonetic Similarity:

Terms that are often misspelled can be a problem for database designers. Names, for example, are variable length, can have strange spellings, and they are not unique. American names have a diversity of ethnic origins, which give us names pronounced the same way but spelled differently and vice versa.

Words can be misspelled or have multiple spellings, especially across different cultures or national sources.

To solve this problem, we need phonetic algorithms which can find similar sounding terms and names. Just such a family of algorithms exist and are called SoundExes, after the first patented version.

A Soundex search algorithm takes a word, such as a person’s name, as input and produces a character string which identifies a set of words that are (roughly) phonetically alike. It is very handy for searching large databases when the user has incomplete data.

The original Soundex algorithm was patented by Margaret O’Dell and Robert C. Russell in 1918. The method is based on the six phonetic classifications of human speech sounds (bilabial, labiodental, dental, alveolar, velar, and glottal), which in turn are based on where you put your lips and tongue to make the sounds.

The algorithm is fairly straight forward to code and requires no backtracking or multiple passes over the input word. In fact, it is so straight forward, I will start by presenting it as an outline. I will continue on to give C, JavaScript, and Perl code as well later.

Great article, learn more about soundex here.

I got a basic spider built the other day and along with Stargeek’s keyword tools mentioned earlier this week, using soundex is a good way to make your searches smarter.

I now work with a small army of search engine experts and had a shock when a colleague told me the percentage of users who use Google in the UK: only 45%!  In plain english that means the majority of web users in this country are lost in the backwash of paid results offered by the likes of MSN, Overture, etc.  The top search term on MSN is ‘www.hotmail.com’ 😉

Comments (0)

Categories

Books

Demian Turner's currently-reading book recommendations, reviews, favorite quotes, book clubs, book trivia, book lists

Facebook