Hi!
I developped a web crawler with CI. It does the job, but my host keep blocking my IP over ‘Large connection amount’ (max allowed, 25). I guess this is a direct effect of the crawler, but I would like your input on this…
Is it not best to use PHP to code a crawler?
What causes concurrent connections?
Has anyone of you ever code a crawler (or anything similar)?
The main goal is to parse overs 60 000+ html pages, to retreive specific product information. Has anyone of you ever had the same ‘problem’??
thanks a lot!