Web-scraping Desktop Application

진행중 등록 시간: Apr 21, 2011 착불
진행중 착불

.

We are looking for a programmer who has specialized on programming Windows Vista desktop scripts for web-scraping.

The script's basics are:

1) Must work with proxies with and without password ( free or paid) from a list and selected randomly every xxx time spans

2) The processing must simulate a manual website visitor - all 4 websites have scraping protection mechanism installed

The job is to scrape data from 4 different yellow pages websites in Germany run by different companies with different HTML layout and HTML data tables.

Based on a list of xxxxx company names in a txt file.

The input on the specific yellow pages website data field is a company name at a time and search for it. From a list of results identify the respective company name and select and scrape full address data plus other data shown.

The successfully scraped data needs to be stored in a MS Access 2007 compatible file format.

The company names that do not show any search results need to be saved in another txt file

The yellow pages URLs are:

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

The UI of the script should ask the user for selection of

a) proxy list txt file

b) company name txt file

c) which website to scrape (select one of the four)

The result files should be named as following:

For success:

"Name of company name txt file" &"-"& "success" &"-"& "name of website (without http://www and the ending .de or .com) &"-"& "yyyymmdd"

For no success:

"Name of company name txt file" &"-"& "nosuccess" &"-"& "name of website (without http://www and the ending .de or .com) &"-"& "yyyymmdd"

Please only bid if you know how to write this kind of code and if have a proven track record for this kind of programming work.

Thank you!

.NET C 프로그래밍 C++ 프로그래밍 비쥬얼 베이직

프로젝트 ID: #1032998

프로젝트 소개

13 건(제안서) 재택 근무형 프로젝트 서비스 이용 중: Apr 22, 2011