1. Write a python script (preferably BeautifulSoup, or if other framework please provide reasons) that will scrape 2 websites daily.
2. The scraped data is to be stored onto a mysql database daily (prepared 8 datasources), each day appending on the next row
3. Deploy all scripts onto cloud (preferably AWS or GCP, and on serverless such as Lambda -also open to other suggestion to minimize cloud server cost)
4. have a cron schedule to scrape these information daily into a mysql database on cloud (preferably AWS or GCP)
Extra notes and preparation I have prepared:
A. I have 8 prepared datasources' structure/headers (approx 4 datasources for each website) that want to have the scraped information to be stored daily, each day appending on the next row. (will give more details upon hiring)
B. All data to be scraped on the two websites are in a table, which is easier to scrape than many complex data sources.
C. These websites are updated more than once on a daily basis but I only want to scrape once each day only.
D. I have some knowledge in tech, coding, and cloud infrastructure which will make our collaboration and communication relatively easy.
E. I have a clear set of datapoints and each associated URL for each datasource that i want to scrape which will be listed in a document
F. I have other multiple scraping projects in the future as well if this job goes well)
- Have strong knowledge in web-scraping, python, and preferably Beautifulsoup, and know how to scrape some anti-bot sites
- Patient, responsive, and have good communication
*Good to have: Cloud computing, cloud deployment, database, SQL*
Extra note about the budget:
The budget can be less or more depending on the offer, service, knowledgeability of the freelancer. The budget also relates to the scope of work as well, so please don't stick too much to the budget and would rather talk more and I am looking for a good relationship for future projects as well.
이 프로젝트의 입찰 현황은 다음과 같습니다. 입찰자: 36명, 평균 입찰가: $176
I can provide you to web scrapers but I recommend to use Scrapy framework instead of Beautifulsoup. Each spider will visit a target website and output into MySQL database.