Trying to debug some asset tag helper issues and I found this...
Did you know you can call helpers directly in the rails console? Just use the helper object...
>> helper.link_to "this", "that"
=> <a href="that">this</a>
You can also call custom helpers (from your app), but I haven't tried it:
>> helper :my_custom_helper
Saturday, March 15, 2008
Wednesday, March 12, 2008
UTF-8 and hpricot
I needed to take the text tagged in an XML document and make url strings out of it. Used hpricot to parse the XML. Like this:
doc = Hpricot.XML(itinerary_day_description)
and then used xpath to find the text within the <cite> tags that will form the basis of the url I need:
activities = doc/("cite")
activities.each do |activity|
title = activity.innerText
link = "<a href=\" '/redboxes/activity/#{title}\" etc - you get the idea...
Hit the problem when the text contained non-ASCII UTF-8 characters (ñ, é, etc).
Hpricot conveniently converted them to HTML entities. And then innerText converted them into a meaningless character.
Not only does hpricot perform the HTML entity encoding in the initial XML document, but it performs it again every time the XML document gets processed.
Here's what I had to do to make this work.
doc = Hpricot.XML(itinerary_day_description)
coder = HTMLEntities.new
activities = doc/("cite")
activities.each do |activity|
title = coder.decode(activity.innerHTML)
link = "<a href=\" '/redboxes/activity/#{CGI.escape(title)}\" etc
doc = Hpricot.XML(itinerary_day_description)
and then used xpath to find the text within the <cite> tags that will form the basis of the url I need:
activities = doc/("cite")
activities.each do |activity|
title = activity.innerText
link = "<a href=\" '/redboxes/activity/#{title}\" etc - you get the idea...
Hit the problem when the text contained non-ASCII UTF-8 characters (ñ, é, etc).
Hpricot conveniently converted them to HTML entities. And then innerText converted them into a meaningless character.
Not only does hpricot perform the HTML entity encoding in the initial XML document, but it performs it again every time the XML document gets processed.
Here's what I had to do to make this work.
- Use innerHTML instead of innerText. It preserves the HTML entity encoding that innerText didn't.
- Use the awesome HTMLEntities module from Paul Battley. I simply converted the title from an HTML entity back to native UTF-8 characters.
- Use CGI.escape for URL encoding.
doc = Hpricot.XML(itinerary_day_description)
coder = HTMLEntities.new
activities = doc/("cite")
activities.each do |activity|
title = coder.decode(activity.innerHTML)
link = "<a href=\" '/redboxes/activity/#{CGI.escape(title)}\" etc
Subscribe to:
Posts (Atom)