MindRetrieve Blog

MindRetrieve - an open source desktop search tool for your personal web

Wednesday, May 31, 2006

From Simple to Complex - A Reflection on Framework Design

For a while I have used the Jakarta Struts framework in web development. I was quite overwhelmed by the number of configuration files and entries I have to wade through to get anything done. All these configurations are put in for a reason. The offer users a degree for flexibility. These are knobs you can tune to set the system up. With these knobs you no longer need to open the box and rewire things when you want to fine tunes things later.

Alas this makes a steep learning curve. It takes a lot of work to get just some simple thing done. Also when things go wrong, it is very hard to track it down. Unhappy programmers are the result.

It just occur to me that there is a pattern in this kind of issues in framework design. A lot of simple things adds up. When looked at a system scope, it results in something very complicated. The options of struts all serve a purpose and is not hard to understand by itself. But because you need to use all of them in your application, the overall pictures is complicated.

I will use foreign language learning as an example. You learned and memorized many grammars rules: singular v.s. plural, subject v.s. object and so on. You went through the drilling exercises quiet well. Then the challenge came when you need to read or write a full sentence. Your tongue tied, mind gone blank. Why? Because a real life sentence is free formatted. Out of the many grammar rules you've learned perhaps two or three is relevant in the sentence in front of you. The hard part is you don't know which two or three should apply. Although you understand each rule individually, together they become a challenge.

Some lessons I have learned as a user and designer of framework are:

1. Look at things at system level, not just individual feature level.

2. Use common pattern that user have already learned. Clever one-off trick may be an advantage on individual level, but it complicates the system as a whole.

3. Design a framework can do simple work without much training. User can learn gradually when they need to do something complex.

4. Spend time to learn the detail of the system. Like the foreign language analog, sometimes there is no shortcut but to learn really hard. If you are a user who stuck with a complex system you have to use this applies to you. For framework designer, better stuck with point 3 than assume your users' dedication.

5. Finally, provide a tool to manage complexity. If an action have to involve 2 beans, 3 templates, and 5 validation rules, make a tool to help user to track them down. Surprising much of my time in software development is spent simply in tracking these things down manually.

Rudy on Rail's convention over configuration philosophy is a good example. It uses a simple rule consistently (naming notation as the main logic). It can get simple thing to work easily and complex thing possible.

Monday, May 29, 2006

MindRetrieve got Ajax

Thanks for the Prototype library, MindRetrieve has started to add some Ajax UI, like an auto-completer for entering tags makes it so much easier to from pick existing tags.

I have spent much time struggling with Javascript and the poorly documented DOM before. Relying on trial and error and tribal knowledge to program DHTML is so painful. I'm glad to find these new generation Ajax libraries has brought a big improvement to this situation. They have done things the DOM standard has failed to do, namely to provide a consistent interface to programming dynamic elements across browsers.

Prototype's implementation has a lot of neat things. The $() method as a shorthand to document.getElementById() is just brilliant.

Tuesday, May 23, 2006

Took a RSS break

I have took a 2 weeks break in reading the technology blogs and industrial news. Now each of the 20 feeds I've subscribed has at least 50 unread items. Not knowing the 20 new products that has since launched, I feel that I am out of touch with the industry.

It is interesting how much a RSS feed can change a person. As long as I am keeping up the news I feel like I am an active player. But the world moves so fast that if you standstill for a little while you will have a lot of catch up to do.

Thursday, April 27, 2006

berlios outage

berlios, the hosting service of this project, is suffering from another outage. This time I'm counting 16 hours since yestarday afternoon. Over the course I've seen many small outages that last minutes or a few hours. I should get very serious about finding alternative provider, especially now that sourceforge offer subversion finally.

For now, I hope berlios will get back to life soon.

Monday, April 10, 2006

Fixing TortoiseCVS's shattering glass sound

TortoiseCVS plays an irritating sound when it encounters an error. It is downright embarrassing when you are sitting in a cafe and from your laptop comes this
shattering glass noise.

What's annoying is I couldn't find a way to change it or turn it off. I tried the usual places and searched the web. Still I couldn't get the noise to leave me alone. Finally I found the right place to do it. Go to Windows - Control Panel - Sounds and Audio Devices - Sounds Tab. Change the TortoiseCVS Error sound to something more gentle.

Tuesday, April 04, 2006

Upgrade notification with Atom

One of the biggest advantage of building a web application over a desktop one is the ease of upgrade and bug fixes. For MindRetrieve I've been looking for a mechanism to notify user for upgrade. At the same time I have to keep the mechanism non-intrusive and respect the user's privacy.

After looking at different mechanisms like CGI and Javascript on browser, it just dawn on me than upgrade notification has a lot in common with news syndication. All I have to do is to publish a news feed and then put a mini-newsreader into the application. Of course the user can always opt out or even choose to fetch it using their own newsreader.

It is a good day to find a satisfactory solution to a nuisance issue.

Monday, March 27, 2006

xfolk - Microformats

I was looking for a bookmark sharing standard. Sure enough there is a microformat xFolk proposed.

xFolk is a simple and open format for publishing collections of bookmarks. It better enables services for improving user experience and sharing data in web-based bookmarking software. xFolk may be embedded in (X)HTML, Atom, RSS, and arbitrary XML. It is one of several open microformat standards.

Microformat is simple conventions for embedding semantics in HTML to enable decentralized development. It remixing of websites possible, as shown in the wonderful Live clipboard demo.

Friday, March 24, 2006

MindRetrieve postcard

This is a postcard I've made as a promotion material. I find it a more creative compare to handing out name card. You can print it on photo paper at home. Or you can send it to a photo lab to print a big stack. Either case it probably turns out quicker and cheaper than name cards. The color is great. The 4x6 inch space afford a great deal of creativity.

The amateurish graphics design is mine. You think I got a lot of nerve to actually hand this out to people. At least I can claim some novelty :)

Wednesday, March 22, 2006

Announce: interactive javascript console

I'm rather productive in cranking out little tools lately. Here is another one.

js_console is an javascript console you can insert into you web page. It allows you to test javascript statements interactively in the context of your web page. Try a demo and download from


Wednesday, March 08, 2006

Google Desktop Search cannot find files moved

The story about Google Desktop Search cannot find files moved stir up some nosie in the blogosphere. Admittedly it is a hard problem. And it would require good platform support to do right. It is the same dilemma I have faced when implementing the tag the files feature. I choose to leave this problem alone for now (while leaving enough nexus so that it can be fixed later). This allow me to put out a simple features in days. If it is proven popular I can alway go back and spend time to resolve it.

Monday, February 13, 2006

Announce: HTMLTestRunner - generates HTML test report for unittest

I'd like to annouce the release of Python library HTMLTestRunner, a by-product of the MindRetrieve project.

HTMLTestRunner is an extension to the Python standard library's unittest
module. It generates easy to use HTML test reports. See a sample report at

Check more information and download from


Actually this was posted on comp.lang.python newsgroup a while ago. Thank you Cameron Laird for picking it up in Dr. Dobb's Python-URL!

Friday, January 20, 2006

Great gathering last night

Thanks to Scott of Ookles for organizing the great gathering in San Francisco last night. It was a great pleasure to meet so many prominent people who are pushing the Internet front.

Geeks are officially cool again!

Also I got to thank Scott for being the inspiration behind the new tag the file feature!

Thursday, January 19, 2006

Announce release of version 0.8.0

I'm gload to announce the first major release in almost a year. New features in version 0.8.0 include:

Web library - tag base bookmarking system
Tag based categorization
Tag files in local disk (Windows XP)

Download from via the MindRetrieve website.

Wednesday, January 18, 2006

Added IE support

After months of developing on Firefox and Opera only, I have finally ported the DHTML code to IE. Well almost except some small bugs. IE6 is just hideous. I should really have wait to support IE7 only.

Friday, January 13, 2006

Efficient character escapes decoding

I ran into this technical issue using Python's 'unicode_escape' codecs on unicode string. Put a question in the comp.lang.python newsgroup and thought this is probably too technical for anyone to care. But the Python community never fails me. Thank you for Steven Bethard to come up with a great suggestion that works for me. I have written up the solution in a Python cookbook recipe.

Hope it will be useful for other people too.

Thursday, January 12, 2006

Synchronization with Simple Sharing Extensions

I have come across this Simple Sharing Extensions for RSS and OPML (SSE) specification from Microsoft. It is a minimum extensions necessary to enable loosely-cooperating apps to use RSS as the basis for item sharing – that is, the bi-directional, asynchronous replication of new and changed items amongst two or more cross-subscribed feeds.

This seems to be a great fit for synchronizing the weblib among different repositories. The essence of SSE is a set of items, each has a globally unique id, a timestamp and an increasing version number. This has inspired me to make another revamp to the Weblib file specification to add all those elements. The only remaining issue for me is the globally unique id. I need to figure out a simple way to generate them across distributed systems, (I feel UUID too heavy weight). The implementation is available in SVN, although I haven't updated the specification yet.

Synchronization is a though issue I haven't have it all thought through. But it probably can't go wrong to go with Ray Ozzie who has created Lotus Notes.

Wednesday, January 11, 2006

Quick key and the Petname system

I have come across an interesting article An Introduction to Petname Systems. It echoes with a quick key feature I am building, which associates a short phrase with frequently used URL. For example phrase 'amex' can be used as the key to the 'American Express' web site. The main goal of the article is in security (against phishing). While for MindRetrieve the goal is to make a slick user interface. Nevertheless it helps me to understand what I am doing, an implicit system for user to assign petname to items!

Tuesday, January 03, 2006

Text v.s. multimedia?

The heat is on once again in the advent of internet technologies. The word is that we will move beyond text based communication to audios and videos, from text blogging to audio podcasting to video blogging, from SMS evolves MMS, from search for text to search for audios and videos, etc. The humble text is only the first step in technology development. Eventually technologies would be advanced enough to enable the full glory of multimedia.

I feel rather lukewarm for this. I often past up the news video for text article as I grow impatient with video's pace. With text I can scan back and fro much more easily. User generated content? That reminds me of those home videos, where the camera never stop panning left and right. And when the subject talks, he is actually off-screen ;)

Multimedia was the buzz world of last generation. With the enabling technology CD-ROM become popular, people envision our PC would be filled with sight and sound. While we now see lot more graphics on our PC than the early days, enough to say the focus is still on text, whether we are using email or the web. Anyway none of the multimedia companies has become Google.

I think people consider who video and audio are more advance than text are only looking at it from computer engineering's perspective. This really does not do justice to text. From a different perspective I would say audio and video more primitive because they are base on our biological sensory perception. Whether as text is truly the great innovation than revolutionized communication.

Friday, December 30, 2005

Incremental Development

I was looking at the subversion checkin log at


I notice I have made many many small checkins. Often several times a day and each checkin include a group of several files (taking into consideration I am not working on this full time). This style is quite different from my work on other projects when I do a lot less checkin but usually in a larger chunk.

Perhaps this say something about the productivity? Perhaps I was acting thoughtless because I'm the only developer right now. But just now I have come to another characterization - this is incremental development!

Each time I made small changes, add some new feature or a methods, refactor code, fix a bug, add a test. I made the code changes, test it, and then I check in. The code base is functional most of time. Seldom did I make big changes that break the code base for several days or more.

Is incremental development the best development process? I'll leave it to other discussion. But from a developer's perspective, having a functional system most of time and being able to test and verify any code change easily is wonderful. Everytime I do a checkin I have the satisfaction that something is done. Coming from an environment where changing only one line of code would lead to tedious work of building a test environment and a painful testing process, this is just pure joy.

Wednesday, December 28, 2005

Keep Your Article in One Single Web Page

There is a style widely used in web publishing. It says people don't like to read long article, one should keep modem user in consideration that a long web page would result in slow download and so. If you have a long article to publish, break it down into several short sections and let user read it page by page.

I have arrived in a contrarian view. I think breaking a long article down into several pages is a hassle to the users. It is best to publish the entire article in one single page. The issue is in order to finish the article I need to click next page several times, every time there is a delay and it interrupts the momentum. Usually the delay is short, like one or two seconds. But it is a noticeable delay. Scrolling is seamless in comparison. For slow sites the delay is much worst. Some sites routinely take 10 seconds or more. That would feel like an episode ended in a cliffhanger and we have to wait for the next episode.

Is several short page a better layout than a long one? I actually prefer to have a long one. The scroll bar give a good indication of how far in the article I have progressed. The scroll wheel is very handy in navigation. I can also use the browser to search for a word that appears anywhere in the article. All these is better than arbitrary break an article into several parts and then leave only a narrow window to the user. The speed issue is a non-issue. Any browser should be able to render incrementally so that you can start reading as soon as any text arrives. Even my cellphone do this flawlessly.

In some case the multiple pages format backfire badly. I'm glad that Yahoo has a mobile version made for cellphone at http://mobile.yahoo.com/. I thought it would be great for reading email. Turns out the issue with cellphone data network is not just low bandwidth, every time when I click a link there would be a long delay before I can get any response. As Yahoo mobile break every screen into 15 lines or so, even the simplest email cost me multiple clicks to read though. The long delay between clicks make it plain unusable. I end up went back to the regular web interface. Even I have to scroll through tons of irrelevant stuff to get to the email body I still prefer it to the mobile version.

Of course my comment should not be generalized too far. I have tried to load an entire technical manual as a single web page (in my harddrive). The browser have noticeably delay to process a web page of this size. But I still keep the manual in this format for the ease of searching.