xml parsing - Extract attributes and values from XML file in perl -
this part of output xml file output stanford corenlp:
<collapsed-ccprocessed-dependencies> <dep type="nn"> <governor idx="25">mullen</governor> <dependent idx="24">ms.</dependent> </dep> <dep type="nsubj"> <governor idx="26">said</governor> <dependent idx="25">mullen</dependent> </dep> </collapsed-ccprocessed-dependencies> </sentence> </sentences> <coreference> <coreference> <mention representative="true"> <sentence>1</sentence> <start>1</start> <end>2</end> <head>1</head> </mention> <mention> <sentence>1</sentence> <start>33</start> <end>34</end> <head>33</head> </mention> </coreference> </coreference> <mention representative="true"> <sentence>1</sentence> <start>6</start> <end>9</end> <head>8</head> </mention> <mention> <sentence>1</sentence> <start>10</start> <end>11</end> <head>10</head> </mention> </coreference> <coreference>
how parse using perl this:
1. sentence 1, head 1 sentence 1, head 33 2. sentence 1, head 8 sentence 1, head 10
i have tried xml::simple output not understandable. here did: use xml::simple; use data::dumper;
$outfile = $filename.".xml"; $xml = new xml::simple; $data = $xml -> xmlin($outfile); print dumper($data);
regrettably, xml::simple
first stake claim simple
namespace. perhaps simple in implementation not simple in use except in trivial of cases. if want similar, xml::smart
offers nested data-structure api lot better.
thankfully there lot of choice excellent perl xml modules. xml::twig
1 of these, , allows specify callback subroutines executed when specific elements within xml data encountered during parsing.
this program uses xml::twig
, , sets callback on coreference[mention]
, i.e. coreference
elements have @ least 1 mention
child.
the code in handler subroutine makes no checks , assumes there @ least 2 mention
child elements, each sentence
, header
element. text values of these nodes output in format have described.
use strict; use warnings; use xml::twig; $twig = xml::twig->new(twig_handlers => { 'coreference[mention]' => \&handle_coreference }); $twig->parsefile('myxml.xml'); $n; sub handle_coreference { ($twig, $elt) = @_; @mentions = $elt->children('mention'); $i (0 .. $#mentions) { printf "%s sentence %d, head %d\n", $i == 0 ? sprintf '%3d.', ++$n : ' ', map $mentions[$i]->first_child_trimmed_text($_), qw/ sentence head /; } }
output
1. sentence 1, head 1 sentence 1, head 33 2. sentence 1, head 8 sentence 1, head 10
Comments
Post a Comment