Goodbye image servers

Saying goodbye tio a departed image server

A week ago tomorrow IDX decommissioned the last of its image servers. Over the last 2 and a half months I migrated a little over 20 million images, about 480 gigabytes, from our severs to Amazon’s S3 service. Most of that time was spend just occasionally checking in on the migration scripts that I had written or rewriting our image acquisition scripts to work with S3. We download images from about 190 sources every night as we gather MLS data on behalf of our clients.

The best part of the whole image migration and overhaul is that image acquisition is now tied into into our data balancer system. Each MLS in our system has a time stored in the database that is the earliest we can reliably download data from that source. Once we reach that time in the day the MLS goes through a series of steps triggered by a cronjob that runs once a minute. First the data is downloaded from what ever source makes it available. This can be ftp, http, soap, rets, or even direct sql connections. Next the data is parsed and made ready for insertion into our database. Once processing is done the data is geocoded so that we can easily map all the properties.

This was where the process stopped. When the app was first written image scripting was rushed as we were trying to meat our launch deadline. The image scripts were on different servers, so the data balancer couldn’t act on them directly. Instead each was launched as its own cronjob on one of the image servers. Every MLS is unique in the way we acquire images and is constantly changing, as such each must have its own acquisition script. Now each of those scripts is defined in our database.

Once per minute a script runs on our EC2 server that looks for image ready flags in our data balancing system. When it finds one it checks the database for the specific file that should be run to gather images. The script runs and then resets the image ready flag. As with data our image sources are varied. In some cases we generate URLs based on a know syntax, in some cases we’re given URLs by the MLS. In this cases we don’t need to store anything. Often we get images from some FTP source, via RETS, or in one case we download binary stored as BLOBs on a remote SQL server. Needless to say it’s complex to get all these images from 190 disparate sources, so anything we can do to automate things better is good.

My next project is building WSDL web services using NuSoap. This is uncharted territory for me, so I’m sure I’ll have more to say on this subject later.




You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply