Hi,
We were recently working on a Scraping and Extraction project from websites to extract and retrieve specific attributes using both API as well as scraping methods and would be interested in your project. We have worked on scraping, crawling, extraction, aggregation and synchronization for data consistency from various unstructured data, websites and have assembled it and semantically mapped the extracted attributes in useful way in Excel, CSV formats storing them into databases schema and synchronizing any updates to the website with the schema. We have worked with various sites including NCBI and have used primarily PHP and Perl and a bit of Scrapy framework.
Please find below our short experience summary.
* Have several years experience developing Text Mining and Information Extraction and Analytics for web crawling, scraping, extraction and aggregation from unstructured big data such as web-pages and text corpus, assembling and populating them into databases, datastores and search-indexes(Lucene, Solr) for analysis, search, reporting and dashboard.
* Extensive experience using Perl, PHP, Python, C, Java, .NET with MySql, Oracle, MS-SQL Server
* Information Extraction Tools : Scrapy, Weka, R, Excel, Perl-CPAN Packages for Extraction.
Estimated Budget : ~ 730 USD ( Timeline : 15-20 days )
Price,milestones and timelines flexible and negotiable based on exact project specifications and details or for any additional project work.