I've been working on some improvements to Pastoid lately. It started out as more of a response to the URL shorteners that keep popping up everywhere and getting all the press, which Pastoid languishes in obscurity.

For those that don't know, Pastoid is a site that serves two major functions. First, it functions as a URL shortener, like the ubiquitous TinyURL. Second, it functions as a pastebin, like pastebin.com. It has a few little extra features that set it apart, and I have a lot planned for it that will break it out as something really different and special from those other tools.

I recently updated the look of the site. It has been getting mostly negative reaction. I think people don't like the grungy purple. Maybe I'll revise it again, but it does affect the change I wanted where the sidebar is moved to the left to allow code sections to expand in the liquid layout on the right. In addition to that, I've been working on the thumbnail improvement project, which is the true impetus for this post.

Pastoid uses a service to produce its thumbnails. I started out using this other service, but the one that it uses currently is much better. It takes shots of the actual URL requested, no matter how deep it is in the site, instead of just handing over generic thumbnails for sites like Google Maps. It also provides the thumbnails in multiple sizes, and even custom sizes if requested.

The service offers a caching service for the thumbnails, which is what the site is using right now. Basically, you pass in some key information in a URL querystring, and it returns a thumbnail. Unfortunately, the protection that keeps people from stealing your URLs to use elsewhere causes a few little problems that lead to inconsistent functionality.

For example, the querystring is built using an MD5 hash of a secret string and the date. If the date of my server is different from the thumbnail server (it seems to be for about 4 hours out of the day, due to timezones), then the code is invalid, and no thumbnail is returned. I have tried to compensate for this, but it's just not reliable.

Also, this date-based hashing means you can't cache thumbnails from day to day. This is a problem since each request of a thumbnail costs a tenth of a credit, basically charging for something that it's already done. The price seems a little steep for something that could be cached and served once, even if the price is already pretty small.

So I've devised a plan.

The service also offers an API that lets me generate thumbnails behind the scenes and then store them on my own. This is a great idea! I want to couple this with some cheap Amazon S3 storage to make the whole operation really cheap and fast.

This is the convoluted project:

  1. Receive the URL to thumbnail from the user.
  2. Send the thumbnail request to the thumbnail service.
  3. Receive a ping from the thumbnail service when the thumbnail generation is complete.
  4. Request the thumbnail data XML from the service and extract the URL of the zip package of generated thumbnails.
  5. Fetch the zip file of thumbnails.
  6. Unzip each thumbnail from the zip file.
  7. Upload the individual thumbnails to S3.

Yes, it's terribly tricky, but amazingly I've gotten it working with minimal fuss with the exception of item #3, which I'm only able to test on the live server, since the thumbnail service can't ping the test site behind my home firewall. It has become one of those rare projects that is satisfying in its horrible complexity, and yet not so frustrating to implement that it's a bother.

It should also be possible to use this system to fetch the content of the page, and then split it for two purposes. First, I'll just throw a cache of the page to S3, where if you request it, you can get the original page contents. This will be useful if you've used Pastoid to bookmark someplace that succumbs to link rot.

Second, I'll strip the tags from the source and use the content for a full-text index. This will make not just the URL searchable, but the page content, too.

On Pastoid, you can search for any string, and if it's in the original URL, then it'll return that link in the results. But if you used Pastoid to create a short link to an obscure URL (which is the whole point, no?), then you might want to search for page content instead of URL content. I think this will be very useful.

Also, I'm excited to have some other features in the pipe. I've added some login features, but I've delayed releasing registration because I'm thinking that I will revise the whole system to use OpenID. I've complained about problems with OpenID before - like the problem where if your OpenID provider goes away, you can't recover your account - but I think that's a minimal problem in something as ephemeral in content purpose as Pastoid.

Things are shaping up pretty well, and hopefully I'll roll out these new features, and some I didn't reveal here, in the next week or two.

"Ah, Owen, you don't work hard enough," I keep hearing you all say. "You're such the slacker! Why don't you do something useful instead of writing all of that blog software?"

Well, between work and Habari, I do like to hang out with the family, and when I'm not doing that (usually after everyone had gone to bed, and long after I should have gone to bed myself) I'm frequently working on stupid little side projects that aren't of any consequence to anyone. My most recent such "mini" pet project is Pastoid.

Pastoid is a weird little thing that I started out of frustration at the lack of fusion between various pastebin services that are available and the tinyurl services that are abundant these days. I use the word "fusion" because I like it better than "mashup", which sounds more like something my kids do with potatoes, and not some whiz-bang web 2.0 technology. Also, it's one of the best lines of my favorite Invader Zim episode. Besides that, I'll tell you what it does.

Basically, it lets you paste either some code or a URL into it, and then it spits out a URL that points to it. So if you give it code, Pastoid gives you a link to that pasted code. If you give it a URL, Pastoid gives you a link that redirects to the URL you pasted. There are some hidden features that might be of interest.

First, if you give it a long URL to make short, Pastoid will give you a short URL with a plus (+) on the end. If you use this URL, you will be redirect directly to the pasted URL. If you remove the plus, then you will go to a page on Pastoid that shows the destination URL and a thumbnail. This saves you the embarrassment of opening up some link for suckers or something potentially NSFW at W.

Another cool thing that I have planned for Pastoid is the ability to paste multiple things on a single Pastoid URL. So you could paste a URL and some code, and some more code, etc. All referenced from a single Pastoid URL. That seems useful if you've got some sample code that has HTML and PHP and Javascript that all goes together. This feature is planned, but you can see the underpinnings of it if you paste both a URL and some code into the two default boxes on the Pastoid home page.

One thing I wanted to do was make Pastoid a bit less cluttered-feeling than the other paste services. Pastie is pretty good in this respect, but I think the PHP world needs to show those Ruby guys that we can still hack, you know? Plus, "pastoid" is seven characters, just like "tinyurl". I thought that was pretty neat.

Did you check out the live syntax-highlighting in the code box? Yeah, that's pretty cool. I combined a couple of very cool open source libraries to get that working. Pretty slick. Still needs refined a bit, but it's pretty slick.

Yes, I'm pronouncing this thing "paste-oid", like "factoid" but with "paste". And "oid". Whatever.

Anyway, I've got a whole mess of new features to add. Actually, I wrote 2+ pages of notes of what I want Pastoid to do, and I think I've only barely scratched the surface of those original notes. I'm going to try very hard to fit all of these features into something that stays usable, and also try to get sleep now and then. Which I'm not doing right now. Gee, how unusual.