Closing tags when extracting HTML from XML -
i transforming mixed html , xml document using xslt stylesheet , extracting html elements.
source file:
<?xml version="1.0" encoding="utf-8" ?> <html > <head> <title>simplified example form</title> </head> <body> <tla:document xmlns:tla="http://www.tla.com"> <tla:contexts> <tla:context id="id_1" value=""></tla:context> </tla:contexts> <table id="table_logo" style="display:inline"> <tr> <td height="20" align="middle">big title goes here</td> </tr> <tr> <td align="center"> <img src="logo.jpg" border="0"></img> </td> </tr> </table> <tla:page> <tla:question id="q_id_1"> <table id="table_id_1"> <tr> <td>label text goes here</td> <td> <input id="input_id_1" type="text"></input> </td> </tr> </table> </tla:question> </tla:page> <!-- repeat many times --> </tla:document> </body> </html>
stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/xsl/transform" xmlns:tla="http://www.tla.com" exclude-result-prefixes="tla"> <xsl:output method="html" indent="yes" version="4.0" /> <xsl:strip-space elements="*" /> <xsl:template match="@*|node()" priority="-2"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <!-- element-only identity template prevents tla namespace declaration being copied output --> <xsl:template match="*"> <xsl:element name="{name()}"> <xsl:apply-templates select="@* | node()" /> </xsl:element> </xsl:template> <!-- pass processing on child elements of tla elements --> <xsl:template match="tla:*"> <xsl:apply-templates select="*" /> </xsl:template> </xsl:stylesheet>
output:
<html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> <title>simplified example form</title> </head> <body> <table id="table_logo" style="display:inline"> <tr> <td height="20" align="middle">big title goes here</td> </tr> <tr> <td align="center"><img src="logo.jpg" border="0"></td> </tr> </table> <table id="table_id_1"> <tr> <td>label text goes here</td> <td><input id="input_id_1" type="text"></td> </tr> </table> </body> </html>
however there's problem in meta, img, , input elements not being closed correctly. i've set xsl:output html , version 4.0 far know should output correct html.
i'm guessing there needs subtle change in first xsl:template/xsl:copy instruction xslt skills highly limited.
what change needs made tags close correctly?
p.s. i'm not sure if there's difference between different tools/parsers i'm using visual studio 2012 debug stylesheet can see immediate effect of changes.
the <meta>
, <img>
, <input>
elements don't need closed — it's still valid html.
if want have them closed, use xml
(with xslt2.0 use xhtml
, too, far know) output method , add <meta>
tag if need it. example:
stylesheet
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/xsl/transform" xmlns:tla="http://www.tla.com" exclude-result-prefixes="tla"> <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/> <xsl:strip-space elements="*" /> <xsl:template match="@*|node()" priority="-2"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <xsl:template match="head"> <xsl:copy> <meta http-equiv="content-type" content="text/html; charset=utf-8"/> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <!-- element-only identity template prevents tla namespace declaration being copied output --> <xsl:template match="*"> <xsl:element name="{name()}"> <xsl:apply-templates select="@* | node()" /> </xsl:element> </xsl:template> <!-- pass processing on child elements of tla elements --> <xsl:template match="tla:*"> <xsl:apply-templates select="*" /> </xsl:template> </xsl:stylesheet>
output
<html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"/> <title>simplified example form</title> </head> <body> <table id="table_logo" style="display:inline"> <tr> <td height="20" align="middle">big title goes here</td> </tr> <tr> <td align="center"> <img src="logo.jpg" border="0"/> </td> </tr> </table> <table id="table_id_1"> <tr> <td>label text goes here</td> <td> <input id="input_id_1" type="text"/> </td> </tr> </table> </body> </html>
Comments
Post a Comment