Asked  1 Year ago    Answers:  5   Viewed   20 times

i am just starting with the mentioned parser and somehow running on problems directly with the beginning.

referring to this tutorial:

http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/

i want now simply find in a sourcecode tne content of a div with a class clearboth box

i retrieve the code with curl and create a simple html dom object:

$cl = curl_exec($curl);  
$html = new simple_html_dom();
$html->load($cl);

then i wanted to add the content of the div into an array called divs:

$divs = $html->find('div[.clearboth box]');

but now, when i print_r the $divs, it gives much more, despite the fact that the sourcecode has not more inside the div.

like this:

array
(
    [0] => simple_html_dom_node object
        (
            [nodetype] => 1
            [tag] => br
            [attr] => array
                (
                    [class] => clearboth
                )

            [children] => array
                (
                )

            [nodes] => array
                (
                )

            [parent] => simple_html_dom_node object
                (
                    [nodetype] => 1
                    [tag] => div
                    [attr] => array
                        (
                            [class] => socialmedia
                        )

                    [children] => array
                        (
                            [0] => simple_html_dom_node object
                                (
                                    [nodetype] => 1
                                    [tag] => iframe
                                    [attr] => array
                                        (
                                            [id] => showfacebookbuttons
                                            [class] => socialweb floatleft
                                            [src] => http://www.facebook.com/plugins/xxx
                                            [style] => border:none; overflow:hidden; width: 250px; height: 70px;
                                        )

                                    [children] => array
                                        (
                                        )

                                    [nodes] => array
                                        (
                                        )

i do not understand why the $divs has not simply the code from the div?

here is an example of the source code at the site:

<div class="clearboth box">
          <div>
<i class="icon smallicon productratingenablediconsmall" title="gute peppige qualität: sehr empfehlenswert"></i>
<i class="icon smallicon productratingenablediconsmall" title="gute peppige qualität: sehr empfehlenswert"></i>
<i class="icon smallicon productratingenablediconsmall" title="gute peppige qualität: sehr empfehlenswert"></i>
<i class="icon smallicon productratingenablediconsmall" title="gute peppige qualität: sehr empfehlenswert"></i>
<i class="icon smallicon productratingenablediconsmall" title="gute peppige qualität: sehr empfehlenswert"></i>

              <strong class="alignmiddle leftsmallpadding">gute peppige qualität</strong> <span class="alignmiddle">(17.03.2013)</span>
          </div>
          <div class="bottommargin">
            gute verarbeitung, schönes design,
          </div>
        </div>

what am i doing wrong?

 Answers

4

the right code to get a div with class is:

$ret = $html->find('div.foo');
//or
$ret = $html->find('div[class=foo]');

basically you can get elements as you were using a css selector.

source: http://simplehtmldom.sourceforge.net/manual.htm
how to find html elements? section, tab advanced

Thursday, April 1, 2021
 
Zeth
 
2

you're trying to replace on a simplehtml object which is impossible (it's an object, not a string). what you should do is first get the html, then replace, and then turn it into simplehtml using the str_get_html function.

<?php
    include("simple_html_php_dom.php");

    //start with getting the pure html and replacing in that (don't use simplehtmlphp for this)
    $html = file_get_contents("http://freebacklinks.prijm.com"); //example.com
    $html= preg_replace("/([1-9]|[0-2][0-9]|3[0-1])s+(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)s+[0-9]{4}/", " ", $html);
    $html = preg_replace("/(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)s+([1-9]|[0-2][0-9]|3[0-1])s+[0-9]{4}/", " ", $html);

    //now create the $result variable:
    $result = str_get_html($html);
    echo $result;
?>
Thursday, April 1, 2021
 
1

turns out i was a little too early asking this question. i found this page (api reference) and it tells us we can use the following w3c standard too:

$e->setattribute ( $name, $value )

so instead of

$elem->attr['class'] = "classname";

you can do

$elem->setattribute("class","classname");

i'll keep the question and answer up in case other people come across this and miss the api reference page.

Saturday, May 29, 2021
 
Eugenie
 
5

you can do it with regular expression:

preg_match ($pattern, $java_script, $matches);

pattern depends, if the variable 'wmsauthsign' is unique. for example:

$pattern = '/wmsauthsign=(.*?)==/';

preg_match ($pattern, $java_script, $matches);

echo $matches[1];

but you can always start your pattern from 'streamer' for example if 'wmsauthsign' is not unique.

Friday, August 6, 2021
 
Johnson
 
3

edit2: as this is a bug in the dom parser (tested on version 1.5), there is no simple way of doing this. solution i could think of:

$find = $html->find(".class1");
$ret = array();
foreach ($find as $element) {
    if (strpos($element->class, 'class3') !== false) {
        $ret[] = $element;
    }
}
$find = $ret;

basically you find all the elements with class one than iterate through those elements to find the ones that have class two (in this case three).


previous answer:

simple answer (should work according to html spec):

find(".class1.class2")

this will look for any type of element (div,img,a etc..) that has both class1 and class2. if you want to specify the type of element to match add it to the beginning without a . like:

find("div.class1.class2")

if you have a space between the two specified classes it will match elements with both the classes or elements nested in the element with the first class:

find(".class1 .class2")

will match

<div class="class1">
  <div class="class2">this will be returned</div>
</div>

or

<div class="class1 class2">this will be returned</div>

edit: i tried your code and found that the solutions above do not work. the solution that does work however is as follows:

$html->find("div[class=class1 class2]")
Wednesday, August 11, 2021
 
Null
 
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :