Asked  1 Year ago    Answers:  5   Viewed   4 times

Im working on small website scrapper with cURL.

I decided to use preg_match to find header and article content.

This is my code:

preg_match('@<h2 class="title">(.*?)</h2>@s', $this->website, $this->title);

if(sizeof($this->title) > 1)
    $this->title = trim($this->title[1]); // rewrite first element of array to regular variable

I was experimenting with it and I found, that if there is one match - it returns it in array at index 1, not 0.

Edited question: Why is this 1, not 0? Im doing something wrong?

My server: Apache/2.4.3 (Win32) PHP/5.4.7

 Answers

3

The default behaviour of preg_match is to return the entire string which was matched in the result array at index 0, then each matched sub-pattern in subsequent result array indexes. If nothing was matched, the result array is empty. If something is matched, you get the full string that was matched and then any sub-patterns.

Saturday, May 29, 2021
 
mozlima
 
4

You are missing the /ims flag at the end of your regex. Otherwise . will not match line breaks (as in your first paragraph). Actually /s would suffice, but I'm always using all three for simplicity.

Also, preg_match works for many simple cases. But if you are attempting any more complex extractions, then consider alternating to phpQuery or QueryPath which allow for:

foreach (qp($html)->find("p") as $p)  { print $p->text(); }
Friday, May 28, 2021
 
1

Use preg_quote to quote regular expression characters.

Like this:

preg_quote($theKeyword, '/');

Where '/' is the delimiter in your regular expression.

Friday, May 28, 2021
 
3
preg_match('~"http://(.*)"~iU', $code, $matches);

Your issue was you need delimiters (I chose ~) to use with the pattern. See the preg_match() man page for more information.

Friday, June 11, 2021
 
1

It may occur because:

  • DbContext configured with an incorrect connection string
  • The entity specified is actually not mapped in configuration
Thursday, July 29, 2021
1

Add the virtual keyword before your related entities:

public class Order
{
    public int Id { get; set; }

    public virtual Patient Patient { get; set; }

    public virtual CertificationPeriod CertificationPeriod { get; set; }

    public virtual Agency Agency { get; set; }

    public virtual Diagnosis PrimaryDiagnosis { get; set; }

    public virtual OrderApprovalStatus ApprovalStatus { get; set; }

    public virtual User Approver { get; set; }

    public virtual User Submitter { get; set; }

    public DateTime ApprovalDate { get; set; }

    public DateTime SubmittedDate { get; set; }
    public Boolean IsDeprecated { get; set; }
}

You might end up with a A circular reference was detected while serializing an object... error if your objects have references of each other. In that case, you will need to create a ViewModel or something similar to overcome this problem. Or use LINQ to project an anonymous object.

Sunday, December 26, 2021
 
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share