xml parsing - Extract attributes and values from XML file in perl -


this part of output xml file output stanford corenlp:

<collapsed-ccprocessed-dependencies>         <dep type="nn">         <governor idx="25">mullen</governor>         <dependent idx="24">ms.</dependent>       </dep>       <dep type="nsubj">         <governor idx="26">said</governor>         <dependent idx="25">mullen</dependent>       </dep>     </collapsed-ccprocessed-dependencies>   </sentence> </sentences> <coreference>   <coreference>     <mention representative="true">       <sentence>1</sentence>       <start>1</start>       <end>2</end>       <head>1</head>     </mention>     <mention>       <sentence>1</sentence>       <start>33</start>       <end>34</end>       <head>33</head>     </mention>   </coreference>  </coreference> <mention representative="true">       <sentence>1</sentence>       <start>6</start>       <end>9</end>       <head>8</head>     </mention>     <mention>       <sentence>1</sentence>       <start>10</start>       <end>11</end>       <head>10</head>     </mention>   </coreference>   <coreference>    

how parse using perl this:

1. sentence 1, head 1    sentence 1, head 33 2. sentence 1, head 8    sentence 1, head 10 

i have tried xml::simple output not understandable. here did: use xml::simple; use data::dumper;

$outfile = $filename.".xml"; $xml = new xml::simple;  $data = $xml -> xmlin($outfile); print dumper($data); 

regrettably, xml::simple first stake claim simple namespace. perhaps simple in implementation not simple in use except in trivial of cases. if want similar, xml::smart offers nested data-structure api lot better.

thankfully there lot of choice excellent perl xml modules. xml::twig 1 of these, , allows specify callback subroutines executed when specific elements within xml data encountered during parsing.

this program uses xml::twig, , sets callback on coreference[mention], i.e. coreference elements have @ least 1 mention child.

the code in handler subroutine makes no checks , assumes there @ least 2 mention child elements, each sentence , header element. text values of these nodes output in format have described.

use strict; use warnings;  use xml::twig;  $twig = xml::twig->new(twig_handlers => {   'coreference[mention]' => \&handle_coreference }); $twig->parsefile('myxml.xml');  $n; sub handle_coreference {    ($twig, $elt) = @_;    @mentions = $elt->children('mention');    $i (0 .. $#mentions) {     printf "%s sentence %d, head %d\n",       $i == 0 ? sprintf '%3d.', ++$n : '    ',       map $mentions[$i]->first_child_trimmed_text($_), qw/ sentence head /;   } } 

output

  1. sentence 1, head 1      sentence 1, head 33   2. sentence 1, head 8      sentence 1, head 10 

Comments

Popular posts from this blog

android - getbluetoothservice() called with no bluetoothmanagercallback -

sql - ASP.NET SqlDataSource, like on SelectCommand -

ios - Undefined symbols for architecture armv7: "_OBJC_CLASS_$_SSZipArchive" -