¿Cómo usar XMLReader en PHP?

Question 1

Tengo el siguiente archivo XML, el archivo es bastante grande y no he podido obtener simplexml para abrir y leer el archivo, así que estoy probando XMLReader sin éxito en php

<?xml version="1.0" encoding="ISO-8859-1"?>
<products>
    <last_updated>2009-11-30 13:52:40</last_updated>
    <product>
        <element_1>foo</element_1>
        <element_2>foo</element_2>
        <element_3>foo</element_3>
        <element_4>foo</element_4>
    </product>
    <product>
        <element_1>bar</element_1>
        <element_2>bar</element_2>
        <element_3>bar</element_3>
        <element_4>bar</element_4>
    </product>
</products>

Desafortunadamente, no encontré un buen tutorial sobre esto para PHP y me encantaría ver cómo puedo almacenar el contenido de cada elemento en una base de datos.

Question 2

Todo depende de qué tan grande sea la unidad de trabajo, pero supongo que está tratando de tratar cada <product/>nodo en sucesión.

Para eso, la forma más sencilla sería usar XMLReader para llegar a cada nodo, luego usar SimpleXML para acceder a ellos. De esta manera, mantiene bajo el uso de memoria porque está tratando un nodo a la vez y aún aprovecha la facilidad de uso de SimpleXML. Por ejemplo:

$z = new XMLReader;
$z->open('data.xml');

$doc = new DOMDocument;

// move to the first <product /> node
while ($z->read() && $z->name !== 'product');

// now that we're at the right depth, hop to the next <product/> until the end of the tree
while ($z->name === 'product')
{
    // either one should work
    //$node = new SimpleXMLElement($z->readOuterXML());
    $node = simplexml_import_dom($doc->importNode($z->expand(), true));

    // now you can use $node without going insane about parsing
    var_dump($node->element_1);

    // go to next <product />
    $z->next('product');
}

Descripción general rápida de los pros y los contras de diferentes enfoques:

XMLReader solamente

Ventajas: rápido, usa poca memoria
Contras: excesivamente difícil de escribir y depurar, requiere mucho código de usuario para hacer algo útil. El código del área de usuario es lento y propenso a errores. Además, te deja con más líneas de código para mantener

XMLReader + SimpleXML

Ventajas: no usa mucha memoria (solo la memoria necesaria para procesar un nodo) y SimpleXML es, como su nombre lo indica, realmente fácil de usar.
Contras: crear un objeto SimpleXMLElement para cada nodo no es muy rápido. Realmente tiene que compararlo para comprender si es un problema para usted. Sin embargo, incluso una máquina modesta podría procesar mil nodos por segundo.

XMLReader + DOM

Ventajas: utiliza tanta memoria como SimpleXML, y XMLReader :: expand () es más rápido que crear un nuevo SimpleXMLElement. Desearía que fuera posible usarlo, simplexml_import_dom()pero no parece funcionar en ese caso
Contras: DOM es molesto para trabajar. Está a medio camino entre XMLReader y SimpleXML. No es tan complicado e incómodo como XMLReader, pero está a años luz de trabajar con SimpleXML.

Mi consejo: escriba un prototipo con SimpleXML, vea si funciona para usted. Si el rendimiento es primordial, pruebe DOM. Manténgase lo más lejos posible de XMLReader. Recuerde que cuanto más código escriba, mayor será la posibilidad de que introduzca errores o regresiones de rendimiento.

Question 3

Para xml formateado con atributos ...

data.xml:

<building_data>
<building address="some address" lat="28.902914" lng="-71.007235" />
<building address="some address" lat="48.892342" lng="-75.0423423" />
<building address="some address" lat="58.929753" lng="-79.1236987" />
</building_data>

código php:

$reader = new XMLReader();

if (!$reader->open("data.xml")) {
    die("Failed to open 'data.xml'");
}

while($reader->read()) {
  if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'building') {
    $address = $reader->getAttribute('address');
    $latitude = $reader->getAttribute('lat');
    $longitude = $reader->getAttribute('lng');
}

$reader->close();

Question 4

La respuesta aceptada me dio un buen comienzo, pero trajo más clases y más procesamiento de lo que me hubiera gustado; entonces esta es mi interpretación:

$xml_reader = new XMLReader;
$xml_reader->open($feed_url);

// move the pointer to the first product
while ($xml_reader->read() && $xml_reader->name != 'product');

// loop through the products
while ($xml_reader->name == 'product')
{
    // load the current xml element into simplexml and we’re off and running!
    $xml = simplexml_load_string($xml_reader->readOuterXML());

    // now you can use your simpleXML object ($xml).
    echo $xml->element_1;

    // move the pointer to the next product
    $xml_reader->next('product');
}

// don’t forget to close the file
$xml_reader->close();

Question 5

La mayor parte de mi vida de análisis de XML la dedico a extraer pepitas de información útil de cargas de XML (Amazon MWS). Como tal, mi respuesta asume que solo desea información específica y sabe dónde se encuentra.

Encuentro que la forma más fácil de usar XMLReader es saber de qué etiquetas quiero que salga la información y usarlas. Si conoce la estructura del XML y tiene muchas etiquetas únicas, creo que usar el primer caso es fácil. Los casos 2 y 3 son solo para mostrarle cómo se puede hacer para etiquetas más complejas. Esto es extremadamente rápido; Tengo una discusión sobre la velocidad sobre ¿Cuál es el analizador XML más rápido en PHP?

Lo más importante que hay que recordar al realizar un análisis basado en etiquetas como este es el uso if ($myXML->nodeType == XMLReader::ELEMENT) {..., que verifica para asegurarse de que solo estamos tratando con nodos de apertura y no con espacios en blanco o nodos de cierre o lo que sea.

function parseMyXML ($xml) { //pass in an XML string
    $myXML = new XMLReader();
    $myXML->xml($xml);

    while ($myXML->read()) { //start reading.
        if ($myXML->nodeType == XMLReader::ELEMENT) { //only opening tags.
            $tag = $myXML->name; //make $tag contain the name of the tag
            switch ($tag) {
                case 'Tag1': //this tag contains no child elements, only the content we need. And it's unique.
                    $variable = $myXML->readInnerXML(); //now variable contains the contents of tag1
                    break;

                case 'Tag2': //this tag contains child elements, of which we only want one.
                    while($myXML->read()) { //so we tell it to keep reading
                        if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Amount') { // and when it finds the amount tag...
                            $variable2 = $myXML->readInnerXML(); //...put it in $variable2. 
                            break;
                        }
                    }
                    break;

                case 'Tag3': //tag3 also has children, which are not unique, but we need two of the children this time.
                    while($myXML->read()) {
                        if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Amount') {
                            $variable3 = $myXML->readInnerXML();
                            break;
                        } else if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Currency') {
                            $variable4 = $myXML->readInnerXML();
                            break;
                        }
                    }
                    break;

            }
        }
    }
$myXML->close();
}

Question 6

Simple example:

public function productsAction()
{
    $saveFileName = 'ceneo.xml';
    $filename = $this->path . $saveFileName;
    if(file_exists($filename)) {

    $reader = new XMLReader();
    $reader->open($filename);

    $countElements = 0;

    while($reader->read()) {
        if($reader->nodeType == XMLReader::ELEMENT) {
            $nodeName = $reader->name;
        }

        if($reader->nodeType == XMLReader::TEXT && !empty($nodeName)) {
            switch ($nodeName) {
                case 'id':
                    var_dump($reader->value);
                    break;
            }
        }

        if($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == 'offer') {
            $countElements++;
        }
    }
    $reader->close();
    exit(print('<pre>') . var_dump($countElements));
    }
}

Question 7

~~XMLReader está bien documentado en el~~ sitio PHP . Este es un analizador de extracción XML, lo que significa que se utiliza para iterar a través de los nodos (o nodos DOM) de un documento XML dado. Por ejemplo, podría revisar todo el documento que proporcionó de esta manera:

<?php
$reader = new XMLReader();
if (!$reader->open("data.xml"))
{
    die("Failed to open 'data.xml'");
}
while($reader->read())
{
    $node = $reader->expand();
    // process $node...
}
$reader->close();
?>

Depende de usted decidir cómo tratar el nodo devuelto por XMLReader :: expand () .

Question 8

Esto funciona mejor y más rápido para mí


<html>
<head>
<script>
function showRSS(str) {
  if (str.length==0) {
    document.getElementById("rssOutput").innerHTML="";
    return;
  }
  if (window.XMLHttpRequest) {
    // code for IE7+, Firefox, Chrome, Opera, Safari
    xmlhttp=new XMLHttpRequest();
  } else {  // code for IE6, IE5
    xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
  }
  xmlhttp.onreadystatechange=function() {
    if (this.readyState==4 && this.status==200) {
      document.getElementById("rssOutput").innerHTML=this.responseText;
    }
  }
  xmlhttp.open("GET","getrss.php?q="+str,true);
  xmlhttp.send();
}
</script>
</head>
<body>

<form>
<select onchange="showRSS(this.value)">
<option value="">Select an RSS-feed:</option>
<option value="Google">Google News</option>
<option value="ZDN">ZDNet News</option>
<option value="job">Job</option>
</select>
</form>
<br>
<div id="rssOutput">RSS-feed will be listed here...</div>
</body>
</html>

** El archivo de backend **


<?php
//get the q parameter from URL
$q=$_GET["q"];

//find out which feed was selected
if($q=="Google") {
  $xml=("http://news.google.com/news?ned=us&topic=h&output=rss");
} elseif($q=="ZDN") {
  $xml=("https://www.zdnet.com/news/rss.xml");
}elseif($q == "job"){
  $xml=("https://ngcareers.com/feed");
}

$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);

//get elements from "<channel>"
$channel=$xmlDoc->getElementsByTagName('channel')->item(0);
$channel_title = $channel->getElementsByTagName('title')
->item(0)->childNodes->item(0)->nodeValue;
$channel_link = $channel->getElementsByTagName('link')
->item(0)->childNodes->item(0)->nodeValue;
$channel_desc = $channel->getElementsByTagName('description')
->item(0)->childNodes->item(0)->nodeValue;

//output elements from "<channel>"
echo("<p><a href='" . $channel_link
  . "'>" . $channel_title . "</a>");
echo("<br>");
echo($channel_desc . "</p>");

//get and output "<item>" elements
$x=$xmlDoc->getElementsByTagName('item');

$count = $x->length;

// print_r( $x->item(0)->getElementsByTagName('title')->item(0)->nodeValue);
// print_r( $x->item(0)->getElementsByTagName('link')->item(0)->nodeValue);
// print_r( $x->item(0)->getElementsByTagName('description')->item(0)->nodeValue);
// return;

for ($i=0; $i <= $count; $i++) {
  //Title
  $item_title = $x->item(0)->getElementsByTagName('title')->item(0)->nodeValue;
  //Link
  $item_link = $x->item(0)->getElementsByTagName('link')->item(0)->nodeValue;
  //Description
  $item_desc = $x->item(0)->getElementsByTagName('description')->item(0)->nodeValue;
  //Category
  $item_cat = $x->item(0)->getElementsByTagName('category')->item(0)->nodeValue;


  echo ("<p>Title: <a href='" . $item_link
  . "'>" . $item_title . "</a>");
  echo ("<br>");
  echo ("Desc: ".$item_desc);
   echo ("<br>");
  echo ("Category: ".$item_cat . "</p>");
}
?>