Asked  1 Year ago    Answers:  5   Viewed   12 times

How to use simple html dom parse img html5 attribute: data-original

$htmls = '<img class="lazy" alt="Nubifragio a Verbania , ferite 2 turiste  Gravi danni, chiesto stato di calamità    foto" title="Nubifragio a Verbania , ferite 2 turiste  Gravi danni, chiesto stato di calamità    foto" data-original="http://www.repubblica.it/images/2012/08/26/130634575-506cc9ae-11b8-4a53-920c-539a3811e46b.jpg" src="http://www.repubblica.it/static/images/homepage/2012/lazy.png" width="130" height="98" style="display: inline; ">';
$html = str_get_html($htmls);
$fata = $html->find('img'); 
foreach($fata as $newimage){
    echo $newimage->data-original; //0
    echo $newimage->src; //http://www.repubblica.it/static/images/homepage/2012/lazy.png
}

I could get the attribute src, but data-original return 0

 Answers

2
$newimage->data-original;

means

$newimage->data - original;

A way round this is to try:

$property = 'data-original';
$newimage->$property;

or, to use the alternative syntax:

$newimage['data-original'];
Thursday, April 1, 2021
 
3

You're not creating the DOM correctly, you must do it like this:

// Create a DOM object
$dom = new simple_html_dom();
// Load HTML from a string
$dom->load(curl_exec($ch))

print_r( $dom );

Check the Manual for more details...

Edit

It seems that is a cURL settings problem, please refer to the documentation to configure it correctly...

This is a function I usualy use to download some pages, feel free to adjust it to your needs:

function dlPage($href) {

    $curl = curl_init();
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
    curl_setopt($curl, CURLOPT_HEADER, false);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($curl, CURLOPT_URL, $href);
    curl_setopt($curl, CURLOPT_REFERER, $href);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4");
    $str = curl_exec($curl);
    curl_close($curl);

    // Create a DOM object
    $dom = new simple_html_dom();
    // Load HTML from a string
    $dom->load($str);

    return $dom;
    }

$url = 'http://www.example.com/';
$data = dlPage($url);
print_r($data);
Thursday, April 1, 2021
 
Xavio
 
5

First, what I would do is also iterate each td's thru foreach. So that you'll be able to get which index number key it falls into. (Note that of course its indexing is zero based so it actually starts at 0).

Then inside the inner loop, just check if the class is null, then map it in the corresponding word 1 = one, 2 = two, etc...

Rough example:

$map = array(1 => 'one', 2 => 'two', 3 => 'three');
foreach ($demo->find('tr') as $tr) { // loop each table row
    // then loop each td
    foreach($tr->find('td') as $i => $td) { // indexing starts at zero
        if($td->class == 'null') { // if its class is null
            echo $map[$i+1]; // map it to its corresponding word equivalent
        }
    }
}

So in this case, this would output three and then two. Inside the second table row, the null lands on the third, inside the third row it lands into the second.

Thursday, April 1, 2021
 
muffe
 
5

$el->attr is an associated array of tag=>value s

Saturday, November 20, 2021
 
2

You can get the HTML code using the Snoopy Class (https://sourceforge.net/projects/snoopy). Next code displays the HTML code inside of a <textarea> tag, then it displays the page itself, copy-paste next code in a PHP file and open it in your browser:

<!DOCTYPE html>
<html>
  <head>
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=euc-kr">
    <META HTTP-EQUIV="Content-language" CONTENT="ko">
  </head>
  <body>
<?php
require("Snoopy.class.php"); // ◄■■ GET SNOOPY FROM https://sourceforge.net/projects/snoopy
$snoopy = new Snoopy;
$snoopy->fetch("http://eecs.kookmin.ac.kr/site/computer/notice.htm");
$html = mb_convert_encoding( $snoopy->results, "UTF-8", "EUC-KR" ); // ◄■■ GET HTML CODE.
echo "<textarea rows='25' cols='80'>$html</textarea>"; // ◄■■ DISPLAY THE HTML.
echo $html; // ◄■■ DISPLAY THE WEBPAGE.
?>
  </body>
</html>

The Snoopy Class is only one file, make sure the file is in the same directory your PHP file is.

Thursday, December 9, 2021
 
juherr
 
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :