Archive for the 'Code Talk' Category

A plan

It’s been just under 2 months since my last post, and three and a half months since I posted anything about programming. I haven’t had any grand code related inspirations lately and my upcoming work projects don’t promise anything.

I an effort to expand my knowedge base I’ve decided to learn a new language and undertake working with a full fledged framework for the first time. My webiste has suffered one of the worst cases or bit rot that I’ve seen. I’ve decided to redesign it and build it all using Python and Django. It will also give me a chance to work with JQuery and to experiment with some database design ideas.

I’ve also decided to kill two birds with one stone and work on being a better blogger at the same time. As such I’ll be doing my best to document everything I go through with this project. Look for that late this month or early in December.



Fun with caching

In the last couple of days I did some work to complicate the IDX application a bit. I applied the patch today that contained the changes and so far all seems well. Here’s the story.

About nine months ago I completed a reworking (aka complete rewrite from the ground up) of the application’s results class. This is the code that assembles all the properties that meet the criteria of the search that has been performed and makes them available for what every they need to do. Once all the various data tables had been queried the matching results were placed in a temporary heap table so that they could be sorted, filtered (based on client preferences and/or MLS rules), and truncated if need be. I decided to use temporary heap tables because they’re fast and since they’re session specific I knew that I wouldn’t have to worry about one user contaminating another’s results.

The system has been working beautifully for these last nine months but as our traffic has grown (now upwards of 44,000 hits a day) mySQL was having trouble keeping up. All the heap tables we using a lot of the server’s RAM and since the heap tables were being destroyed as soon as the page was delivered searches had to be rerun completely just to move from page to page.

Todays patched changed things. The heap tables are gone in favor of a searchCache table (one for each client in our system) where all search results end up. When the same search is run again (like when switching pages) the results can be pulled from the cache instead of all the data tables needing to be queried again. All results are tagged with the users PHP session ID to prevent result contamination and every 4 hours the cache is cleaned to prevent the tables from getting too large. Featured property searches are also cached in our system and, because they are the slowest queries we perform*, they are cached for 24 hours until we get new data.

I’m pleased so far. The patch was uploaded to our server 8 hours ago and thus far there are no reports of problems.

Thanks to bob the lomond for the photo.

*Featured results are the slowest because of the number of tables that have to be queried. Normal results only have to query 1 table per MLS being searched because they are property type specific. Featured properties are property type independent and thusly upwards of nine tables per MLS may need to be queried.



Wasn’t sure what I was expecting, but this isn’t it… whew

The book above arrived at my apartment via Amazon the middle of last week. I made the decision to learn stored procedures as a way to expand my skill base and hopefully reduce some of the load we place on our SQL server at IDX by writing more efficient queries. I was expecting a book that was a couple hundred pages long describing some method of chaining statements or doing caching or… I don’t really know. What I got was a 600 page book that describes a fully realized ANSI standard coding language that loosely resembles Pascal.

My surprise does not equate to disappointment. The new possibilities that stored procedures open up to me are huge and I’m eager to explore them. I started using MySQL back when it was somewhere in its version 3 life span. Back then all I was after was something easier to work with than files for storing data. I had just recently started coding in PHP and was frustrated by how much more difficult it was to work with files in PHP 3 than it was in Perl 5. It’s only been in the last 18 months that I’ve really gotten in to MySQL and all it has to offer. As such I didn’t really know much of anything about procedures despite the fact that other DB systems have had them for years.

As it turns out there are actually 3 types of programs that make up MySQL procedures. The first it the “procedure” which is a full procedural program including control statements and loops. The second type is the “function” which, once written, can be used in any SQL statement to increase mysql’s functionality. The third type is the trigger which is code that will automatically fire when every a specific event occurs.

If you have any interest in learning procedures I’d recommend it. The time spent learning will be valuable. I am coming to realize that SQL is the most interesting part of application development for me. Relational databases are more powerful than I imagined back when I first started using them. Good SQL is the key to robust applications.



Bishma FTW

Teh Winner

I beat Amazon!

Okay… not so much beat as figured out, and not so much Amazon as my own scripting. A mentioned in my last post I was having issues with corrupted images ongetting stored on S3 during my migration process. I determined that errors were being introduced during the transfers between our old image server via ftp AND during the REST upload to AWS.

I implemented the MD5 check I mentioned in my last post and added a step to the S3 upload. After transferring the file to S3 I perform a HEAD request on the object which sends my back a header containing, among other things, content-type and content length. I can then make sure that the content-length matches the size of the image I downloaded and that the content type is some type of image (useful since all errors are delivered as application/xml).

Little by little I’m developing a rock solid PHP class for S3 file handling.

Thanks to Lumaxart for the image



Amazon Fail

floppy fail

 

I’ve been having S3/EC2 problems over the last couple of days. I was migrating photos for an MLS in Washington and after 2 days of moving images in the background I found that a third of them were errored in some fashion. What’s worse is that I haven’t been able to track down the issue yet. S3 gives an AccessDenied error despite my script setting the access control at public-read… I hope.

My S3 class isn’t fully formed yet. And the holes are now staring back at me from the ether. The biggest issue is that I can’t currently read or edit the ACL for any object already on S3. The AccessDenied error make me think that my script improperly set the control level to start during upload. There also the possibility that errors were introduced into the image while it was being moved from the old image server and that S3 is erroring because the headers and content-type say it’s a jpg, but it’s actually gibberish.

In the mean time migration is halted because I can’t risk losing images. When I resume migration I’ll put a simple hash checker in place. On the old image server I’ll have a script that will accept a file name from _GET and run an MD5 on the request file file like so:
$file = $_GET['file'];
echo md5_file('my/base/path/'.$file);

Then I can read that output via curl and compare it to the MD5 of the file I just got via FTP. Images are our bread and butter. I have to be sure that I have an error free system before I continue.

Image from the Fail Blog




You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.