MindRetrieve Blog

From Simple to Complex - A Reflection on Framework Design

2006-05-31T13:31:00.000-07:00

For a while I have used the Jakarta Struts framework in web development. I was quite overwhelmed by the number of configuration files and entries I have to wade through to get anything done. All these configurations are put in for a reason. The offer users a degree for flexibility. These are knobs you can tune to set the system up. With these knobs you no longer need to open the box and rewire things when you want to fine tunes things later.

Alas this makes a steep learning curve. It takes a lot of work to get just some simple thing done. Also when things go wrong, it is very hard to track it down. Unhappy programmers are the result.

It just occur to me that there is a pattern in this kind of issues in framework design. A lot of simple things adds up. When looked at a system scope, it results in something very complicated. The options of struts all serve a purpose and is not hard to understand by itself. But because you need to use all of them in your application, the overall pictures is complicated.

I will use foreign language learning as an example. You learned and memorized many grammars rules: singular v.s. plural, subject v.s. object and so on. You went through the drilling exercises quiet well. Then the challenge came when you need to read or write a full sentence. Your tongue tied, mind gone blank. Why? Because a real life sentence is free formatted. Out of the many grammar rules you've learned perhaps two or three is relevant in the sentence in front of you. The hard part is you don't know which two or three should apply. Although you understand each rule individually, together they become a challenge.

Some lessons I have learned as a user and designer of framework are:

1. Look at things at system level, not just individual feature level.

2. Use common pattern that user have already learned. Clever one-off trick may be an advantage on individual level, but it complicates the system as a whole.

3. Design a framework can do simple work without much training. User can learn gradually when they need to do something complex.

4. Spend time to learn the detail of the system. Like the foreign language analog, sometimes there is no shortcut but to learn really hard. If you are a user who stuck with a complex system you have to use this applies to you. For framework designer, better stuck with point 3 than assume your users' dedication.

5. Finally, provide a tool to manage complexity. If an action have to involve 2 beans, 3 templates, and 5 validation rules, make a tool to help user to track them down. Surprising much of my time in software development is spent simply in tracking these things down manually.

Rudy on Rail's convention over configuration philosophy is a good example. It uses a simple rule consistently (naming notation as the main logic). It can get simple thing to work easily and complex thing possible.

MindRetrieve got Ajax

2006-05-29T23:13:00.000-07:00

Thanks for the Prototype library, MindRetrieve has started to add some Ajax UI, like an auto-completer for entering tags makes it so much easier to from pick existing tags.

I have spent much time struggling with Javascript and the poorly documented DOM before. Relying on trial and error and tribal knowledge to program DHTML is so painful. I'm glad to find these new generation Ajax libraries has brought a big improvement to this situation. They have done things the DOM standard has failed to do, namely to provide a consistent interface to programming dynamic elements across browsers.

Prototype's implementation has a lot of neat things. The $() method as a shorthand to document.getElementById() is just brilliant.

Took a RSS break

2006-05-23T09:10:00.000-07:00

I have took a 2 weeks break in reading the technology blogs and industrial news. Now each of the 20 feeds I've subscribed has at least 50 unread items. Not knowing the 20 new products that has since launched, I feel that I am out of touch with the industry.

It is interesting how much a RSS feed can change a person. As long as I am keeping up the news I feel like I am an active player. But the world moves so fast that if you standstill for a little while you will have a lot of catch up to do.

berlios outage

2006-04-27T08:26:00.000-07:00

berlios, the hosting service of this project, is suffering from another outage. This time I'm counting 16 hours since yestarday afternoon. Over the course I've seen many small outages that last minutes or a few hours. I should get very serious about finding alternative provider, especially now that sourceforge offer subversion finally.

For now, I hope berlios will get back to life soon.

Fixing TortoiseCVS's shattering glass sound

2006-04-10T16:08:00.000-07:00

TortoiseCVS plays an irritating sound when it encounters an error. It is downright embarrassing when you are sitting in a cafe and from your laptop comes this
shattering glass noise.

What's annoying is I couldn't find a way to change it or turn it off. I tried the usual places and searched the web. Still I couldn't get the noise to leave me alone. Finally I found the right place to do it. Go to Windows - Control Panel - Sounds and Audio Devices - Sounds Tab. Change the TortoiseCVS Error sound to something more gentle.

Upgrade notification with Atom

2006-04-04T21:38:00.000-07:00

One of the biggest advantage of building a web application over a desktop one is the ease of upgrade and bug fixes. For MindRetrieve I've been looking for a mechanism to notify user for upgrade. At the same time I have to keep the mechanism non-intrusive and respect the user's privacy.

After looking at different mechanisms like CGI and Javascript on browser, it just dawn on me than upgrade notification has a lot in common with news syndication. All I have to do is to publish a news feed and then put a mini-newsreader into the application. Of course the user can always opt out or even choose to fetch it using their own newsreader.

It is a good day to find a satisfactory solution to a nuisance issue.

xfolk - Microformats

2006-03-27T23:41:00.000-08:00

I was looking for a bookmark sharing standard. Sure enough there is a microformat xFolk proposed.

xFolk is a simple and open format for publishing collections of bookmarks. It better enables services for improving user experience and sharing data in web-based bookmarking software. xFolk may be embedded in (X)HTML, Atom, RSS, and arbitrary XML. It is one of several open microformat standards.

Microformat is simple conventions for embedding semantics in HTML to enable decentralized development. It remixing of websites possible, as shown in the wonderful Live clipboard demo.

MindRetrieve postcard

2006-03-24T08:22:00.000-08:00

This is a postcard I've made as a promotion material. I find it a more creative compare to handing out name card. You can print it on photo paper at home. Or you can send it to a photo lab to print a big stack. Either case it probably turns out quicker and cheaper than name cards. The color is great. The 4x6 inch space afford a great deal of creativity.

The amateurish graphics design is mine. You think I got a lot of nerve to actually hand this out to people. At least I can claim some novelty :)

Announce: interactive javascript console

2006-03-22T07:57:00.000-08:00

I'm rather productive in cranking out little tools lately. Here is another one.

js_console is an javascript console you can insert into you web page. It allows you to test javascript statements interactively in the context of your web page. Try a demo and download from

http://tungwaiyip.info/software/js_console.html

Google Desktop Search cannot find files moved

2006-03-08T13:00:00.000-08:00

The story about Google Desktop Search cannot find files moved stir up some nosie in the blogosphere. Admittedly it is a hard problem. And it would require good platform support to do right. It is the same dilemma I have faced when implementing the tag the files feature. I choose to leave this problem alone for now (while leaving enough nexus so that it can be fixed later). This allow me to put out a simple features in days. If it is proven popular I can alway go back and spend time to resolve it.

Announce: HTMLTestRunner - generates HTML test report for unittest

2006-02-13T09:12:00.000-08:00

I'd like to annouce the release of Python library HTMLTestRunner, a by-product of the MindRetrieve project.

------------------------------------------------------------------------
HTMLTestRunner is an extension to the Python standard library's unittest
module. It generates easy to use HTML test reports. See a sample report at
http://tungwaiyip.info/software/sample_test_report.html.

Check more information and download from
http://tungwaiyip.info/software/#htmltestrunner

------------------------------------------------------------------------

Actually this was posted on comp.lang.python newsgroup a while ago. Thank you Cameron Laird for picking it up in Dr. Dobb's Python-URL!

Great gathering last night

2006-01-20T12:29:00.000-08:00

Thanks to Scott of Ookles for organizing the great gathering in San Francisco last night. It was a great pleasure to meet so many prominent people who are pushing the Internet front.

Geeks are officially cool again!

Also I got to thank Scott for being the inspiration behind the new tag the file feature!

Announce release of version 0.8.0

2006-01-19T12:01:00.000-08:00

I'm gload to announce the first major release in almost a year. New features in version 0.8.0 include:

Web library - tag base bookmarking system
Tag based categorization
Tag files in local disk (Windows XP)

Download from via the MindRetrieve website.

Added IE support

2006-01-18T09:22:00.000-08:00

After months of developing on Firefox and Opera only, I have finally ported the DHTML code to IE. Well almost except some small bugs. IE6 is just hideous. I should really have wait to support IE7 only.

Efficient character escapes decoding

2006-01-13T18:32:00.000-08:00

I ran into this technical issue using Python's 'unicode_escape' codecs on unicode string. Put a question in the comp.lang.python newsgroup and thought this is probably too technical for anyone to care. But the Python community never fails me. Thank you for Steven Bethard to come up with a great suggestion that works for me. I have written up the solution in a Python cookbook recipe.

Hope it will be useful for other people too.

Synchronization with Simple Sharing Extensions

2006-01-12T12:15:00.000-08:00

I have come across this Simple Sharing Extensions for RSS and OPML (SSE) specification from Microsoft. It is a minimum extensions necessary to enable loosely-cooperating apps to use RSS as the basis for item sharing – that is, the bi-directional, asynchronous replication of new and changed items amongst two or more cross-subscribed feeds.

This seems to be a great fit for synchronizing the weblib among different repositories. The essence of SSE is a set of items, each has a globally unique id, a timestamp and an increasing version number. This has inspired me to make another revamp to the Weblib file specification to add all those elements. The only remaining issue for me is the globally unique id. I need to figure out a simple way to generate them across distributed systems, (I feel UUID too heavy weight). The implementation is available in SVN, although I haven't updated the specification yet.

Synchronization is a though issue I haven't have it all thought through. But it probably can't go wrong to go with Ray Ozzie who has created Lotus Notes.

Quick key and the Petname system

2006-01-11T08:16:00.000-08:00

I have come across an interesting article An Introduction to Petname Systems. It echoes with a quick key feature I am building, which associates a short phrase with frequently used URL. For example phrase 'amex' can be used as the key to the 'American Express' web site. The main goal of the article is in security (against phishing). While for MindRetrieve the goal is to make a slick user interface. Nevertheless it helps me to understand what I am doing, an implicit system for user to assign petname to items!

Text v.s. multimedia?

2006-01-03T21:02:00.000-08:00

The heat is on once again in the advent of internet technologies. The word is that we will move beyond text based communication to audios and videos, from text blogging to audio podcasting to video blogging, from SMS evolves MMS, from search for text to search for audios and videos, etc. The humble text is only the first step in technology development. Eventually technologies would be advanced enough to enable the full glory of multimedia.

I feel rather lukewarm for this. I often past up the news video for text article as I grow impatient with video's pace. With text I can scan back and fro much more easily. User generated content? That reminds me of those home videos, where the camera never stop panning left and right. And when the subject talks, he is actually off-screen ;)

Multimedia was the buzz world of last generation. With the enabling technology CD-ROM become popular, people envision our PC would be filled with sight and sound. While we now see lot more graphics on our PC than the early days, enough to say the focus is still on text, whether we are using email or the web. Anyway none of the multimedia companies has become Google.

I think people consider who video and audio are more advance than text are only looking at it from computer engineering's perspective. This really does not do justice to text. From a different perspective I would say audio and video more primitive because they are base on our biological sensory perception. Whether as text is truly the great innovation than revolutionized communication.

Incremental Development

2005-12-30T08:04:00.000-08:00

I was looking at the subversion checkin log at

http://svn.berlios.de/wsvn/mindretrieve/trunk/?op=log&rev=0&sc=1&isdir=1

I notice I have made many many small checkins. Often several times a day and each checkin include a group of several files (taking into consideration I am not working on this full time). This style is quite different from my work on other projects when I do a lot less checkin but usually in a larger chunk.

Perhaps this say something about the productivity? Perhaps I was acting thoughtless because I'm the only developer right now. But just now I have come to another characterization - this is incremental development!

Each time I made small changes, add some new feature or a methods, refactor code, fix a bug, add a test. I made the code changes, test it, and then I check in. The code base is functional most of time. Seldom did I make big changes that break the code base for several days or more.

Is incremental development the best development process? I'll leave it to other discussion. But from a developer's perspective, having a functional system most of time and being able to test and verify any code change easily is wonderful. Everytime I do a checkin I have the satisfaction that something is done. Coming from an environment where changing only one line of code would lead to tedious work of building a test environment and a painful testing process, this is just pure joy.

Keep Your Article in One Single Web Page

2005-12-28T17:08:00.000-08:00

There is a style widely used in web publishing. It says people don't like to read long article, one should keep modem user in consideration that a long web page would result in slow download and so. If you have a long article to publish, break it down into several short sections and let user read it page by page.

I have arrived in a contrarian view. I think breaking a long article down into several pages is a hassle to the users. It is best to publish the entire article in one single page. The issue is in order to finish the article I need to click next page several times, every time there is a delay and it interrupts the momentum. Usually the delay is short, like one or two seconds. But it is a noticeable delay. Scrolling is seamless in comparison. For slow sites the delay is much worst. Some sites routinely take 10 seconds or more. That would feel like an episode ended in a cliffhanger and we have to wait for the next episode.

Is several short page a better layout than a long one? I actually prefer to have a long one. The scroll bar give a good indication of how far in the article I have progressed. The scroll wheel is very handy in navigation. I can also use the browser to search for a word that appears anywhere in the article. All these is better than arbitrary break an article into several parts and then leave only a narrow window to the user. The speed issue is a non-issue. Any browser should be able to render incrementally so that you can start reading as soon as any text arrives. Even my cellphone do this flawlessly.

In some case the multiple pages format backfire badly. I'm glad that Yahoo has a mobile version made for cellphone at http://mobile.yahoo.com/. I thought it would be great for reading email. Turns out the issue with cellphone data network is not just low bandwidth, every time when I click a link there would be a long delay before I can get any response. As Yahoo mobile break every screen into 15 lines or so, even the simplest email cost me multiple clicks to read though. The long delay between clicks make it plain unusable. I end up went back to the regular web interface. Even I have to scroll through tons of irrelevant stuff to get to the email body I still prefer it to the mobile version.

Of course my comment should not be generalized too far. I have tried to load an entire technical manual as a single web page (in my harddrive). The browser have noticeably delay to process a web page of this size. But I still keep the manual in this format for the ease of searching.

ISO 8601, the metric of date format

2005-12-16T08:53:00.000-08:00

Different countries have different convention of writing date. Some write in m/d/y order and some write in d/m/y order and so on. One thing to do for localization is to show date in the right format...

I say ____ the convention. Write it in ISO 8601 format. That is in the YYYY-MM-DD format. No more confusion of whether the day or month come first. And when you sort it, it comes out in chronological order. ISO 8601 is the metric of date format.

Your Site Has Vulnerability

2005-12-10T12:05:00.000-08:00

I was testing my web application for security problem. Failure to escape user input is a very common class of security problem. So I created an input string like this:

'"></script><h1><font size=7 color=red>GOTCHA<iframe src=http://mindretrieve.blogspot.com/2005/12/your-site-has-vulnerability.html width=500 height=300>
Cut and paste it into any input fields and then click submit. If you see something you don't expect, that site probably has a problem.

Hurry! Test out your site before hackers do!

17000 lines of code

2005-12-08T08:01:00.000-08:00

It has been a while since I did a tally on the code size. The new statistics shows there are about 17000 lines of code in about 100 modules. That is for a non-trivial application with everything deliberately designed to be as simple as possible.

A closer look inside, among the them is 5000 lines of unit test code. It on track with the the rule of thumb of 1 to 1 ratio of production code to test code.

Two of the largest module has around 700 lines of code. The median is around 100 lines. This sounds minuscule to most other software code. But with Python, 700 lines can be a really sophisticated module. In many cases I would break a module down into smaller components before they even reach 700 lines.

Google v.s. bookmark?

2005-12-07T18:02:00.000-08:00

While we are working hard on MindRetrieve to improve bookmarking, some people argue that the entire idea of bookmarking is outdated. Instead of bookmark, we just google. Indeed Google gives such excellent experience that people expect the right answer instantaneously. More easy than you would looking things up from the bookmark menu.

While this is true for some obvious sites, perhaps for 'dell' or 'walmart', it ain't necessary useful for everything you'd ever interested. How many times have you flip through pages of marginally useful search result until you finally found the one that gives what you need? (assuming, of course, you an advance searcher who actually go beyond the first page). You will not want to repeat this search. And if you do, you won't necessary found the same item. How about a page that is not a direct search result but is one click, or two clicks away? Let's say from Google you found a passionated travelogue about an Italy trip. From there you found more links about some hotel recommendation that you really want. Google is a good starting point. But you have also make additional effort to arrive in what you needed.

When you search and evaluate the result, you are actually creating something of value. If you put together the link of the travelogue, a guide on how to use the Italy train system and the two cute hotel you have seen, you will in effect created some kind of travel guide, even if you have not authored any of them. Personal web is a way for you to capture the creation.

Weblib file specification

2005-12-01T08:34:00.000-08:00

The work on MindRetrieve has reached another mile stone as I have published the weblib file specification and that it supports updating now. Previously I was using a hack to rewrite the entire file whenever I update a tiny little piece of data. That actually served me well for several months during development and personal use. Rewriting a 200k file everytime never seems to add any drag. In anycase the newer format seems well designed and is more ready for future scalability.

I've included a snapshot of the spec here or you can find the latest version from the
source code.


MindRetrieve Weblib Data File Specification Version 0.5

MindRetrieve weblib data is an UTF-8 encoded text file (no other
encoding is supported as this time). The overall format is a block of
headers followed by a blank line and then the body similar to email and
HTTP messages. Each line of the body part represents a webpage or tag
item. Update to the weblib is appended as change records to the end of
the file. The entire weblib can be represented by a single file.

file            = headers BR body
headers         = *(header BR)
header          = field-name ":" [ field-value ]
field-name      = token
field-value     = DSV encoded value
body            = column-header BR *((data-line | comment-line | *SP) BR)
column-header   = column-name *( "|" column-name)
column-name     = token
comment-line    = "#" any string
data-line       = [change-prefix] (data-record | header)
change-prefix   = '[' YYYY-MM-DD SP HH:MM:SS ']' SP ['r' | 'u' | 'h'] '!'
data-record     = ["@"] id *( "|" field-value)
BR              = CR | LF | CR LF
SP              = space characters


Note

* token is defined according to RFC 2616 Section 2.2.

* DSV encoded value is an unicode string with the characters "\", "|",
  CR and LF encoded as "\\", "\|", "\r" and "\n" respectively.

* There are two kind of data records, a webpage has a numeric id, while
  a tag has a numeric id prefixed by "@".

//* A record with the same id can appears multiple times in the data file.
//  The last record overwritten preceding records.

* A data-record preceded by a change-prefix denote update to the file.

* A record prefixed by "[ISO8601 time] r!" is a remove record. The item
  with the corresponding id is to be removed.

* A record prefixed by "[ISO8601 time] u!" is an update record. The item
  with the corresponding id is to be replaced.

* A record with by "[ISO8601 time] h!" is an header update record. The
  header value is to be updated. There is no remove header record. A
  header value can be set to empty string however.

* The last line should always ended with BR. If the last line is not
  terminated with BR it is considered a corrupted record and must be
  discarded. Moreover never append change record to a corrupted record
  because the line break would be misplaced.

* The encoding header is defined for future extension only. Only UTF-8
  encoding is supported right now.