Monday, May 28, 2007

hpricot and .NET sites

Need to scrape tour operator sites to extract itinerary information. Some of them provide a Web service, but since they're mostly SOAP, I'd rather just scrape the site to get the data. That way, we'll have a similar tool to use for all tour operators.

Using _why's hpricot tool for the scraping. Had great success with it scraping blogger sites in the past. However, I got this error message on opening the url:
Hpricot::ParseError (ran out of buffer space on element <input>...)
Tried other pages on the same site and other pages just to make sure I hadn't messed up the open method. No problems. Glanced at the page source in Firefox's source display - no evident errors.

First thought was to run the tour operator page through a validator to see if there were missing tag closures or weird tags. Found lots of errors (56) but none seemed like they would cause overrun of buffer space on initial parsing. So I pasted the page into textmate to start removing each error one at a time to identify the culprit.
Whoa! That hidden input tag used by .NET to track state - viewstate - is huge! No wonder it blows the attribute buffer. It wasn't evident in the Firefox view without word wrap.
Sure enough _why has provided a method to increase this if you run into .NET pages like this. Simply increase the buffer size before you try to open such a url:

Hpricot.buffer_size = 262144
doc = Hpricot(open("http://globusjourneys.com/product.aspx?content=itin&trip=7ZJ"))

Thanks _why!

Friday, May 11, 2007

Context Sensitive Help in Rails

I promised some friends I'd do a writeup on context sensitive help and it's led me to think that maybe a plugin will be in order. Walk with me first, through this, and we'll decide together if it's worth a plugin.

Step 1:

You're going to need a Rails application.

>rails cshelp -d sqlite3

Let's generate a few controllers so we can have some context in which to offer help:

>ruby script/generate controller welcome index
>ruby script/generate controller register index

Because I like to give the kids up in Marketing the ability to manage their page titles and SEO, let's go ahead and give them all the content management pieces too:

ruby script/generate scaffold_resource content_page controller:string action:string windowtitle:string keywords:string description:string pagetitle:string heading:string subheading:string body:text footer:text

At this point we have a pretty basic structured content form for Marketing. If we fix up our layout to have these fields, we can spend our time coding and let the interns write the pretty words.
In your application.rb, you'll need this :

# call get_content from your controller's action if you want to grab
# content from the content database and display it on an existing
# form or page
def get_content()
intended_path = request.path.split('/')
# code below works for script/server but code above works for tests...
# intended_path = @request.path_parameters['path']
intended_path.delete('')
intended_path[1] = 'index' if intended_path.nitems == 1
@content_page = ContentPage.find_by_controller_and_action(intended_path[0],intended_path[1])
end

In your application_helper.rb, put this little method:

def get_content_value(field, default)
# needs some error checking
return @content_page.send(field) if @content_page
default
end


If there's content for the calling page it will be returned. If not the default value will be returned.

Now in your layout (mine's application.rhtml) take out the fixed SEO fields, and replace with ERB-ized ones:

<meta name="keywords" content="<%= get_content_value("keywords","") -%>"/>

<meta name="description" content="<%= get_content_value("description","") -%>"/>
<title><%= get_content_value("pagetitle","My Default Page Title") -%>

And the meat of the page:

<%= get_content_value("body","") -%>
<%= yield %>

Let's check it out and see if it works!

Migrate (rake db:migrate), fire up a mongrel (ruby script/server), and keeping in mind the two controllers we created earlier, create some content. Navigate to /content_pages/ and make one record for controller "welcome", action "index" and one for controller "register", action "index". Fill in all the fields, we will use them later.

We'll assume that you're marketing department is smart enough to make content for all of your controllers and actions. Now we can make a before_filter in application.rb to get the content for each page:

before_filter :get_content

The astute among you will notice that your page title is already being set to "My Default Page Title". That's because there is no ContentPage model for the "content_page" controller and "index" method. Extra credit: enter some content for your content CRUD screens!

After you've entered content, navigate to the /welcome/ controller and see what happens. Mine says:
Wouldn't it be nice if everybody had a marketing department that created content?
Welcome#index

Find me in app/views/welcome/index.rhtml

Toes, toes and more toes in the footer.
Notice that I didn't even touch the index.rhtml file in the /app/views/welcome/ folder. The default "Welcome#index" is still there. But the framework we've just written wrapped the body, SEO fields, page title and footer around that default content.

Since this is going just the way we planned, let's put our page level help in there now.

ruby script/generate migration add_help
Fill it in:

class AddHelp <>
Add the new help field to your content_page edit view:

<p>
<b>Help</b><br />
<%= f.text_area :help %>
</p>

Repeat as needed, adding it to the show, index and new views.
Migrate up (rake db:migrate) to add the fields to the database.

Here's the fun part. Add the following to your CSS file:
#help {
display: none;
}

and in your application layout, make some room for help. Right above where you're grabbing the content for the body, put this:

<div id="help"><%= get_content_value("help","sorry charlie, no help for you!")%></div>

Now grab the Java script from this page and insert it in a script block in your application layout. Just get the hide & show functions at the top of the page.

Now add a link to hide or show the help:

<%= link_to "Help?", "#", :onclick=>"new Effect.toggle('help','blind'); return false" %>

Now you have controller and action level content and help for your Rails application. Is it worth a plugin? You tell us in the comments.

In the credit where credit is due department, we got the concept and most of the code for the get_content() method from Chad at Bear Den Designs.

To kick this up a notch, Chad also suggests a method_missing on the application controller, so that you can create content pages for controller/action routes that don't exist. We'll write that up another time.