Testing messy bash scripts

Read this book too!

I am reading the book “Refactoring” by Martin Fowler, and just reek with ideas about improving software, as well as solving problems I head-banged during my “software” development career.

On my last job, there was this huge messy heap of bash scripts that were “The Installation” of their main software product. It was a remarkable amount of bad smelling bash code, and it somehow managed to work. My work, at the time, was from-scratch-rewrites of this or that functionality and then somehow plugging it into the existing framework (damn, i call it a framework now).

It could have been just amazing if I could take the existing pile of dirt dung bash scripts – and refactor it into something that is readable, sort of.

Today, while I was riding the bus, reading “Refactoring” pg.110, it struck me. It’s actually can be extremely easy to test bash scripts! All it takes is a collection of all the familiar commands, like “cat”, “rm”, etc … and one sneaky PATH environment variable. These commands would be fakes, stubs – they all just print their name and parameters into a log file. In fact, there is just one actual script and the rest are links to that single one.

That log file can be compared before and after a refactoring. These commands writing their names to a file take less time than their actual execution. So while running the actual script might have been hours, with the stubs it should run in less time, much less. Finally by comparing your pre and post refactoring log files, you get a really nice test-suite that can help you refactor. I might even call it replacing unreadable code with readable one without breaking much of anything.

In the particular case of the scripts that I mentioned earlier, I guess that the most used refactoring to improve readability would be Inline Method. Since whoever wrote the original scripts was too fond of wrappers, when most commands like “ln” don’t really need a wrapper to do the same as the command does perfectly well. The wrapper with at least two echo commands and a very long name is quite redundant and adds unnecessary complexity.

There are several problems with this simplistic approach though. One of the problems might be that the script is using full path in names (like /bin/ln), instead of relying on the PATH environment variable. But such things can be relatively easily taken care of until the testing solution is perfect. (I guess). One of the things to try, for example, is running bash in restricted mode, if I remember what that is correctly.

I’ll try that on some new messy scripts that I got on my new job!

Gettings things done and working from home

I have been a freelance consultant for the last year or so, and people ask me where I work from. At first I was trying very hard to work from home, but I couldn’t get any work done. Then I tried jumping from one cafe to another, and a bit more was getting done but still not quite as productive as I would like it to be.

Several months ago, I rented an office at a great place right in the center of Tel Aviv. I love it, and it seems to enable work as well .

— draft
– “productive” just means finishing tasks that are on a todo list, which means you first have to have a list.
– “productive day” is finishing as many tasks from the list as possible
– “procrastination” is when you are not working on a task in the list but doing other things instead
– by doing nothing at all, for example where you don’t have other stuff to do (library, office, cafe) you are forced to work on the task at hand (on the list)
– loud offices with lots of social interaction don’t help get things done, quite the opposite.

Login for Facebook test users

While writing Facebook applications, it is useful to create test users in the developer.facebook.com “Roles” tab for your app, and login as those users.

But, Facebook test users are a little tricky to login, since they don’t really show you the e-mail or password of these users. You can get a unique “already logged in” url by asking Facebook a question, and here is a short script that does just that.


I use it in Chrome Incognito windows to test various features with these throw-away users, and it works great. Enjoy.

Broken tmux in OS X is easy to fix

I have started using tmux instead of screen for the last couple of months. Ported my screenrc file into a similar tmux one and everything works great. With one small exception, tmux on Mac OS X makes strange problems and errors appear – and I had not idea why things were so broken, so I googled for a long time and finally found the reason.

The first problem I noticed is that launchctl stopped working correctly, it would show an error saying: launchctl error: launch_msg(): Socket is not connected.

The second problem was when I tried changing my vimrc to use the OS X clipboard copy and paste inside of vim, so anything yanked would appear in the clipboard and viceversa. To change it all you have to do is add set clipboard=unnamed to your vimrc, and in the latest versions of vim it just works. Unless you do that in tmux, then it just doesn’t work.

Long story short, if you are using tmux and experience strange problems with OS X facilities — try to use https://github.com/ChrisJohnsen/tmux-MacOSX-pasteboard. There is an explanation there about why this happens and how to fix it. I used the wrapper method that required adding a line to my tmux.conf and now everything works great!

Hope this helps.

Google Adwords is where you waste your money

Overpaying for Google AdWords, you are not alone!

AdWords is a wonderful thing, give it a bucket of money and get as much traffic to your website as your money allows. Some people might have discovered this and are happily spending their hard-earned coin, oblivious that AdWords actually requires you to research how it works, or else.

Before I dive into explaining how to effectively use AdWords, there are several problems that most people find painful to deal with when using the system. Without understanding how AdWords works, most likely money just goes to waste instead of being invested into getting results.

The goal of any successful Ad campaign is getting more money in the end than you had before starting it. AdWords is not usually used to build a brand and get recognition for your product (like Coca-Cola ads in the movies), but rather to acquire paying customers to come and pay on your website – and to invest less money, getting them to come, than they will pay you back. Simple ROI (return on investment) rule that everyone probably already knows.

Here comes a surprise, some people while using AdWords are paying a hundred times more and getting a hundred times less than their competitors. Usually after a while these people realize that AdWords is a waste of money, and they stop using it. If only they would use it correctly, this would not be their conclusion. I’ll explain the things to know so you will not make the same mistake, and start earning money instead of wasting it.

If you even tried using AdWords, a most common gotcha that bites you is getting the Ads you created get denied, blocked, or even banned in some extreme cases. This is annoying and makes you think “WTF Google? I’m paying you! Don’t you want my money?”, and the answer is: no, they don’t.

Google’s motive to annoy you comes from their belief that the customers who use Google services will get a slightly better experience, and they probably have a point. History shows that search engines which showing spam results instead of relevant results got extinct, same with ad networks that show spam annoying advertisements. Google AdWords are making it hard to create bad/spam ad copy because they want the network remaining relevant to people who see the ads. They might have a point, and it seems it is working fine for them for a very long time now.

Another issue with AdWords is trying to actually use its interface to create your advertisements, they really made it extremely hard for anyone to use it. It has a million options and making sense of how it works requires years of training and study. And then they change it every month into “yet another new version” or some such. Fortunately for us, most people who use AdWords don’t care, and thus they make mistakes by not bothering to research and you – their competitor an advantage.

If you read this far, you probably want to know how you can still use AdWords and not fall into the trap of wasting your money, so let me share with you several things I learned during the years – this post is long enough, so I will write it in a sequel.

Rack Middleware

Middleware is a very powerful tool that is usually used to filter incoming requests or outgoing responses to your web application, and solves some problems in a very elegant and DRY way. It is usually a pluggable component acting as a filter in the request/response flow. Has the benefit of being easy to reuse in any application that needs them.

WSGI is an interface, that defines how web applications and web servers talk to each other. WSGI is the name most used with connection to the Python programming language, but WSGI has equivalents in most other programming languages used to create web application. For example Rack is the equivalent of WSGI for the Ruby programming language. Although it might not be apparent, WSGI in Python and Rack in Ruby are almost identical with regards to how you create middleware for them.

Popular web frameworks usually have their own way of adding middleware to WSGI/Rack that is sometimes simpler. One such example is the Django web framework that has a really excellent and simple way of adding middleware to your application, described in the Django middleware documentation.

In Rack (and in Python’s WSGI) middleware usually looks like this:

Django makes it simpler by adding methods for each step in the filter:

The best part about Rack middleware, is the excellent way to test your custom middleware using Rack::Test. It took a bit of wrapping my head around it, but when I finally had the eureka moment it all snapped into place. Let me demonstrate with a simple example that shows how to test a Rack middleware using RSpec and Rack::Test.


The secret sauce is including Rack::Test::Methods at the start of the description, and it does all the magic of using “app” and allowing to get/post/etc… as documented in the Rack::Test::Methods documentation.

Hope this helps someone, leave your comments and/or questions below.

Keywords in Search Engine Optimization (SEO)

I wrote a small web tool that can be used to measure keyword density of keywords and phrases in online articles. This blog post explains why keyword density is important and what is the role of keywords in promoting your website to the headlines of search engines.

It might be surprising, but most people are not aware of these very simple concepts. Just by reading this short article, you are already more advanced and knowledgable than most of the people who build stuff on the internet. Please use this knowledge for good and not for evil.

So, why would you even want to be in the headline (aka. first place) of the search engine result page (aka. SERP) in the first place? Well, if you want your shiny website to get traffic (aka. unique visitors), then organic search results are the best way to get it, and its cheap with no direct fees unlike Ad networks like AdWords where you pay for user clicks. Another advantage of organic search results is that they convert extremely well, in other words, users who are looking for your product and find you in the first place of the SERP are much more likely to buy your product.

Keywords are the heart of any search engine optimization. The whole purpose of SEO can be distilled into this very simple principle:

Place a specific web page to appear as high as possible in the list of search results for a certain keyword.

Note that it talks about single web pages and specific keywords, your website might have more than just one page, and it might appear at the first place for more than just one keyword. It depends on your the competition for the keywords you target, and if many people search for it – its a good bet there is a lot of competition.

Armed with this new understanding, you might say “I’ll just repeat the same word a million times on this page – and it will rank my page as the first result in the SERP.”. Well if this was 1995 that was exactly so. But modern search engines do not really like keyword stuffing and will most likely penalize a page that does it rather than reward it.

But lets step back for a moment, and look at the importance of Keywords in SEO as a general principle. Usually, the very first thing that SEO experts do is called “Keyword Research”, and that needs a little bit of explaining. Keyword research is mostly two things;

First, the hard part, is coming up with a list of keywords. The most apparent way of doing this is just asking the business owner what are the keywords he believes relevant to his business. A slightly more sophisticated way is visiting pages of competing companies for the business and looking for keywords they use in their articles, be it via keyword density, or by finding links from other websites to these competitors and looking at the link text.

Second thing in keyword research is finding among all the relevant keywords for a business – the keyword or phrase which has the most search hits in search engines, and then creating content and links targeting that keyword. Google has a tool to make this research, the Google Adwords Keyword Tool, it was built to see how many searches a certain keyword receives in Google searches. Bing has a very similar tool called the Bing Keyword Research Tool.

Modern search engines like Google and Bing prefer to rank pages with factors that are out of the direct control of web masters and content authors. These factors include things like links from other web pages to your page, and many other secret algorithms which are usually not revealed in full. One part of the algorithms is keyword density, and although it is not as important as it once was, it still should not be neglected completely.

The Keyword Density practice of SEO, described in length on wikipedia at http://en.wikipedia.org/wiki/Keyword_density explains how to calculate this important metric, and that instead of having a keyword density of 99%, many SEO experts consider the optimum keyword density to be 1% to 3% and using a keyword more than that might be considered search spam.

This article started with a promise, I promised to show you a tool that can make it easier to find keyword density of articles you write. This calculator was created to help you optimize articles and have the best word density for search engine optimization, between 1% to 3%.

Using it is very easy, just paste some text into a text area, and it will calculate, in real-time, how many times each word is repeated in the article and the density of that word as percentages. Among other features, the density calculator also removes stop-words that do not really matter when doing indexing of articles, words like “a”, “an”, “for”, “of”, “the” and the likes.

Note that it might need some more testing. For example, with a very small amount of words the calculation might not be correct and show distorted results, and other bugs might be hidden in it as well, so it is far from perfect – use at your own risk.

Lastly a link to the tool that started this article:


Broken Django behind a Load Balancer

Having problems with your seven Django servers behind a single load balancer or proxy, and you have no idea which of the servers is giving your that “one in eight requests” error?

The solution is simple, add some information about the instance to the HTTP response headers!

It doesn’t really matter which load balancing or proxy solution you use, Amazon ELB, HAProxy, Varnish, Pound, Nginx, this would just work with most of them without any modification.

Here is a simple example of just such a middleware:

Simple Amazon AWS API queries

After using the excellent cure a python interface to Amazon Web Services” target=”_blank”>python boto Amazon Web Services library, I felt a tingle of unease where boto was taking away the transparency and clarity of AWS APIs. Amazon have excellent documentation that contains detailed API References, and the pace of their new features released is staggering. It is a pity that boto is written in such a way that it needs to re-implement every new feature Amazon releases in their own wrappers and a foreign language (method names).

So as the first step in making it more transparent to use the original Amazon AWS API reference documentation in a copy-and-paste fashion, I had to write the same reinvent-the-wheel piece of boiler plate that everyone wrote a hundred times before (google for it, its there). But mine is shinier and prettier, or so I would like to think.

I present to you, in all its glory — The Python Amazon AWS API queries class!

Google AppEngine URLFetch in Unit Tests

Started using Google AppEngine for a personal project of mine some time ago, story and noticed that like everywhere else in python, viagra the state of testing (tdd) is really poor.

There are several “solutions” that provide stubs that can be used in unit testing Google AppEngine applications, patient including something called a “testbed” which is part of the API itself. But the problem with these is that they provide functional bits of API implemented on your local environment just like it would work on a deployed AppEngine application.

It sounds quite good to have a local personal instance of something similar to the datastore you get in deployed applications, but unfortunately for the urlfetch service it is not exactly what I was looking for in tests.

The thing I need is an object that will not urlfetch anything, will not access the network at all. The requirement in this case is an object that I can tinker with its state before and after my own methods have used the urlfetch facility. After a lot of digging in the current implementation of the stub, I ended up writing a very simple mock for this myself. It is far from perfect, but its a start.