ruby on rails - replace relative path urls with absolute path urls -
i have bunch of html content stored in database , i'm looking convert of relative asset references use absolute paths instead. instance, of image tags looking this:
<img src=\"/system/images/146/original/03.png?1362691463\">
i'm trying prepend "http://mydomain.com" "/system/images/" bit, had following code hoping handle sadly doesn't seem result in changes:
text = "<img src=\"/system/images/146/original/03.png?1362691463\">" text.gsub(%r{<img src=\\('|")\/system\/images\/}, "<img src=\"http://virtualrobotgames.com/system/images/")
instead of manipulating url string using normal string manipulation, use tool made job. ruby includes uri class, , there's more thorough addressable gem.
here's i'd if had html links wanted rewrite:
first, parse document:
require 'nokogiri' require 'uri' source_site = "http://virtualrobotgames.com" html = ' <html> <head></head> <body> <img src="/system/images/146/original/03.png?1362691463"> <script src="/scripts/foo.js"></script> <a href="/foo/bar.html">foo</a> </body> </html> ' doc = nokogiri::html(html)
then you're in position walk through document , modify tags <a>
, <img>
, <script>
, else want:
# find things using 'src' , 'href' parameters tags = { 'img' => 'src', 'script' => 'src', 'a' => 'href' } doc.search(tags.keys.join(',')).each |node| url_param = tags[node.name] src = node[url_param] unless (src.empty?) uri = uri.parse(src) unless uri.host uri.scheme = source_site.scheme uri.host = source_site.host node[url_param] = uri.to_s end end end puts doc.to_html
which, after running, outputs:
<!doctype html public "-//w3c//dtd html 4.0 transitional//en" "http://www.w3.org/tr/rec-html40/loose.dtd"> <html> <head><meta http-equiv="content-type" content="text/html; charset=us-ascii"></head> <body> <img src="http://virtualrobotgames.com/system/images/146/original/03.png?1362691463"><script src="http://virtualrobotgames.com/scripts/foo.js"></script><a href="http://virtualrobotgames.com/foo/bar.html">foo</a> </body> </html>
this isn't meant complete, fully-working, example. working absolute links, you'll have deal relative links, links sibling/peer hostnames, missing parameters.
you'll want check errors
method "doc" after parsing make sure valid html. parser can rewrite/trim nodes in invalid html trying make sense of it.
Comments
Post a Comment