Website crawler for HTML content

종료 등록 시간: Nov 23, 2009 착불
종료 착불

I need a crawler to identify phrases in the html of websites, for example "google analytics".

There will be about 5 phrases in total, i want this to be an input that i can control. I want to be able to control the depth of the crawl in terms of how many levels "deep" the crawler goes into the website (e.g., home page --> about us --> management would be 3 layers deep).

Also, i want to be able to control the total number of pages crawled per site, e.g., cut-off search after 100 pages crawled.

Finally, the crawler needs to be able to crawl 20,000 sites in about a week. Therefore, the winner bidder needs to be able to build a "fast" crawler--e.g., utilizing multi-threading etc. Also, i will need to be able to upload the urls of the websites I want to crawl.

Finally, this crawler needs to be completed in a couple days.

This is something that was allready asked a couple of months ago by somebody else. But I need it as well now.

PHP

프로젝트 ID: #556542

프로젝트 소개

5 건(제안서) 재택 근무형 프로젝트 서비스 이용 중: Dec 28, 2009

이 일자리에 대한 프리랜서 5 명의 평균 입찰가: $176

wildlily980

I'm interesting in it. check pmb for detaisl.

$150 USD (7일 이내)
(59 리뷰)
6.7
numatido

Hi, Please check your PM. Thanks.

$150 USD (2일 이내)
(2 리뷰)
2.8
svetlinb

Contact me to clarify details on the project

$150 USD (2일 이내)
(0 리뷰)
0.0