Asked  1 Year ago    Answers:  5   Viewed   11 times

I just started to use PHP Simple HTML DOM Parser.

Now I'm trying to extract all elements surrounded with a <b>-tag inclduing </b> from an exsiting HTML document. This works fine with

foreach($html->find('b') as $q)
    echo $q;

How can I achieve to show up only elements surrounded with the <b>,</b>-tags followed by a <span class="marked">?

Update: I've used firebug to get the css path for the elements. Now it looks like this:

foreach ($html->find('html body div#wrapper table.desc tbody tr td div span.marked') as $x)
    foreach ($x->find('html body div#wrapper table.desc tbody tr td table.split tbody tr td b') as $d)
        echo $d;

But it won't work... Any Ideas?

Update:

To clarify my question here a sample tr of the document with starting table and ending table tags.

<table width="100%" border="0" cellspacing="0" cellpadding="0" class="desc">
    <tr>
        <th width="25%" scope="col"><div align="center">1</div></th>
        <th width="50" scope="col"><div align="center">2</div></th>
        <th width="10%" scope="col"><div align="center">3</div></th>
        <th width="15%" scope="col"><div align="center">4</div></th>
    </tr>
    <tr>
        <td valign="top" bgcolor="#E9E9E9"><div style="text-align: center; font-weight: bold; margin-top: 2px"> 1 </div></td>
        <td>
            <table width="100%" border="0" cellspacing="0" cellpadding="0" class="split">  <tr>
                    <td>
                        <b> element to extract</b></td>
                </tr>
                <tr>
                    <td>
                        <table width="100%" border="0" cellspacing="0" cellpadding="0" class="split">  <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        1
                                    </div>
                                </td>
                                <td>
                                    abed
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        2
                                    </div>
                                </td>
                                <td>
                                    ddee
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        3
                                    </div>
                                </td>
                                <td>
                                    xdef
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        4
                                    </div>
                                </td>
                                <td>
                                    abbcc
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        5
                                    </div>
                                </td>
                                <td>
                                    ab
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        6
                                    </div>
                                </td>
                                <td>
                                    e1
                                </td>
                            </tr>
                        </table>
                    </td>
                </tr>
            </table>
        </td>
        <td valign="top"><div style="text-align: center"> <span class="marked">marked</span> </div></td>
        <td valign="top"><div style="text-align: center">  </div></td>
    </tr>
</table>

 Answers

1

Try the following CSS selector

b > span.marked

That would return the span though, so you probably have to do $e->parent() to get to the b element.

Also see Best Methods to parse HTML for alternatives to SimpleHtmlDom


Edit after update:

Your browser will modify the DOM. If you look at your markup, you will see that there is no tbody elements. Yet Firebug gives you

html body div#wrapper table.desc tbody tr td div span.marked'
html body div#wrapper table.desc tbody tr td table.split tbody tr td b'

Also, your question does not match the queries. You asked how to find

elements surrounded with the <b>,</b>-tags followed by a <span class="marked">

That can be read to either mean

<b><span class="marked">foo</span></b>

or

<b><element>foo</element></b><span class="marked">foo</span>

For that first use the child combinator I have shown earlier. For the second, use the adjacent sibling combinator

b + span.marked

to get the span and then use $e->prev_sibling() to return the previous sibling of element (or null if not found).

However, in your shown markup, there is neither nor. There is only a DIV with a SPAN child having the marked class

<div style="text-align: center"> <span class="marked">marked</span>

If that is what you want to match, it's the child combinator again. Of course, you have to change the b then to a div.

Thursday, April 1, 2021
 
Rocket
 
3

You're not creating the DOM correctly, you must do it like this:

// Create a DOM object
$dom = new simple_html_dom();
// Load HTML from a string
$dom->load(curl_exec($ch))

print_r( $dom );

Check the Manual for more details...

Edit

It seems that is a cURL settings problem, please refer to the documentation to configure it correctly...

This is a function I usualy use to download some pages, feel free to adjust it to your needs:

function dlPage($href) {

    $curl = curl_init();
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
    curl_setopt($curl, CURLOPT_HEADER, false);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($curl, CURLOPT_URL, $href);
    curl_setopt($curl, CURLOPT_REFERER, $href);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4");
    $str = curl_exec($curl);
    curl_close($curl);

    // Create a DOM object
    $dom = new simple_html_dom();
    // Load HTML from a string
    $dom->load($str);

    return $dom;
    }

$url = 'http://www.example.com/';
$data = dlPage($url);
print_r($data);
Thursday, April 1, 2021
 
Xavio
 
4
$d = new DOMDocument();
$d->loadXML($xml);
$x = new DOMXPath($d);
$result = $x->evaluate("//text()[contains(.,'617.99')]/ancestor::*/@id");
$unique = null;
for($i = $result->length -1;$i >= 0 && $item = $result->item($i);$i--){
    if($x->query("//*[@id='".addslashes($item->value)."']")->length == 1){
        echo 'Unique ID is '.$item->value."n";
            $unique = $item->value;
        break;
    }
}
if(is_null($unique)) echo 'no unique ID found';
Thursday, April 1, 2021
 
2

try: innertext() innertext used for Read or write the inner HTML text of element.

    foreach($html->find('.name a') as $element) 
    {
        echo "<br>a tag text value=" . $element->innertext;
    }

API Ref

Thursday, April 1, 2021
 
Asher
 
4

Isn't it easy. Try things first then ask. (:

<?php
include 'simple_html_dom.php';
$html = file_get_html('http://www.weather.gov.sg/lws/zoneInfo.do');

$n = 0;
$table = $html->find('table',3)->find('table',0)->find('table',0)->find('table',0)->find('table',3)->find('table',0);

$i = -3;
$rows = $table->find('tr');
$holder = array();

foreach($rows as $element){
    $i++;
    if($i < 0) continue;

    $holder[$i]['name'] = $element->find('td',0)->plaintext;
    $holder[$i]['zone_or_school'] = $element->find('td',1)->plaintext;
    $holder[$i]['risk'] = $element->find('td',2)->plaintext;
    $holder[$i]['from'] = $element->find('td',3)->plaintext;
    $holder[$i]['till'] = $element->find('td',4)->plaintext;
}

var_dump($holder);
?>

if you want to get a particular data then you can filter it out:

foreach($holder as $key => $val)
{
if($holder[$key]['name']=='Bedoc')
$my_data = $holder[$key];
}

this code isn't debuged cause i am on mobile now. But maybe you have get the idea if not works. Thanks

Saturday, May 29, 2021
 
Claudio
 
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :