Asked  1 Year ago    Answers:  5   Viewed   9 times
$html = file_get_html('page.php');

foreach($html->find('p') as $tag_name) 
    {
        $attr = substr($tag_name->outertext,2,strpos($tag_name->outertext, ">")-2);
        $tag_name->outertext = str_replace($attr, "", $tag_name->outertext);        
    }
echo $html->innertext;

Above is the code I wrote to take what's inside all <p> tags in my html page and remove them.


My html code is similar to this :
<p class="..." id = "..." style = "...">some text...</p>
<p class="..." id = "..." style = "...">some text...</p>
<p class="..." id = "..." style = "...">some text...</p>
  <font>
    <p class="..." id = "..." style = "...">some text ...</p>
    <p class="..." id = "..." style = "...">some text ...</p>
  </font>
<p class="..." id = "..." style = "...">some text...</p>


If I run the php code , result would be this :
<p>some text...</p>
<p>some text...</p>
<p>some text...</p>
  <font>
    <p class="..." id = "..." style = "...">some text ...</p>
    <p class="..." id = "..." style = "...">some text ...</p>
  </font>
<p>some text...</p>

It doesn't remove <p> tags attributes that are inside <font>.
If anyone can help me with this I'll be appreciate.

 Answers

5

When I use your code and example HTML, it does remove all the attributes from all the <p> tags, even the ones inside <font>, so I'm not sure why yours isn't working.

But it looks like simplehtmldom has methods that specifically deal with attributes so you don't have to use string functions:

$html = file_get_html('page.php');


foreach($html->find('p') as $p) {
    foreach ($p->getAllAttributes() as $attr => $val) {
        $p->removeAttribute($attr);
    }    
}
echo $html->innertext;

Hopefully that will be more effective.

Thursday, April 1, 2021
 
Karsten
 
2

You can remove the elements you don't want by setting their outertext = '':

$src =<<<src
<div id="product_description">
    <p> Some text</p>
    <ul>
        <li>value 1</li>
        <li>value 2</li>
        <li>value 3</li>
    </ul>

    <!-- the div I don't want -->                                                                                                                                        
    <div id="comments">
        <h1> Some Text </h1>
    </div>

</div>
src;

$html = str_get_html($src);

foreach($html->find('#product_description') as $description)
{
    $comments = $description->find('#comments', 0); 
    $comments->outertext = ''; 
    print $description->outertext ;
}
Thursday, April 1, 2021
 
Besnik
 
4

SimpleHTMLDom doesn't use quoted string literals in the selector. It's just elem[attr=value]. And the comparison of value seems to be case-sensitive (there may be a way to make it case-insensitive, but that I don't know)*

E.g.

require 'simple_html_dom.php';
$html = file_get_html('http://www.google.com/');
// most likely one one element but foreach doesn't hurt
foreach( $html->find('meta[http-equiv=content-type]') as $ct ) { 
  echo $ct->content, "n";
}

prints text/html; charset=ISO-8859-1.

*edit: yes, there is a way to perform a case-insensitive match, use *= instead of =

find('meta[http-equiv*=content-type]')

edit2: btw that http-equiv*=content-type thingy would also match <meta http-equiv="haha-no-content-types"... (it only tests if the string is somewhere in the attribute's value). But it's the only case-insensitive function/operator I could find. I guess you can live with it in this case ;-)
edit 3: It uses preg_match('.../i') and the pattern/selector is directly passed to that function. Therefore you could do something like http-equiv*=^content-type$ to match http-equiv="Content-type" but not http-equiv="xyzContent-typeabc". But I don't know if this is a warranted feature.

Thursday, April 1, 2021
 
2

To grab all those attributes, you should before investigate the parsed element, like this:

foreach($html->find('div[class=bar] a') as $a){
  var_dump($a->attr);
}

...and see if those attributes exist. They don't seem to be valid HTML, so maybe the parser discards them.

If they exist, you can read them like this:

foreach($html->find('div[class=bar] a') as $a){
  $article = array($a->href, $a->innertext);
  if (isset($a->attr['data1'])) {
    $article['data1'] = $a->attr['data1'];
  }
  if (isset($a->attr['data2'])) {
    $article['data2'] = $a->attr['data2'];
  }
  //...
  $articles[] = $article;
}

To get both classes you can use a multiple selector, separated by a comma:

foreach($html->find('div[class=bar] a, div[class=bar2] a') as $a){
...
Thursday, July 29, 2021
 
Easen
 
5

$el->attr is an associated array of tag=>value s

Saturday, November 20, 2021
 
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :