Data mining script from website into database or excel
£20-250 GBP
종료됨
게시됨 8년 이상 전
£20-250 GBP
제출할때 지불됩니다
I need some data mining from the public data from the site "[login to view URL]" - I can see the data I need from a single "code", but I need a script that goes through their entire database and makes an output - I need a copy of the script for when new models are released to update the database.
What I'm trying to achieve is to get a full list of all Kingston products, and their application - if you look at an example product at the following url
[login to view URL]
..then hit the "search" button on the bottom, it shows a list of all the compatible models - if you look at the page source for this, it shows the data I need. There are a number of compatible models - all looking like this "15929-HP/Compaq-Business Desktop d200/d210" and "11869-HP/Compaq-Evo Business Desktop D310 / D31M" - the list is one-to-many -- and contains SysID (i.e 15929, 11869 in this case), the manufacturer (HP/Compaq, and HP/Compaq in this case) and lastly the model ("Business Desktop d200/d210", and "Evo Business Desktop D310 / D31M") - so I'd need the following table to be created
KingstonID, SysId, mfr, model
KTC-PR266/1G, 15929, HP/Compaq, Business Desktop d200/d210
KTC-PR266/1G,11869, HP/Compaq, Evo Business Desktop D310 / D31M
..etc (one for each entry in the list, in this model, there should be 336 different entries!)
I can cut/paste search/replace this list in excel myself, but what I need is someone who can write a script that does the same job for their entire catalog, I have no idea how to achieve this, but there are literally thousands of products like KTC-PR266/1G, so the script needs to be clever enough to identify all their products so I can get a list as above for every memory and what systems they work on. I can do the rest of the data, looking at the actual product specification (capacity, number chips, voltage etc) . Please only respond once you've actually looked at the Kingston site and seen how feasible this is. I will ignore any quotes that come in in the first few minutes telling me how "detailed" the response is! Preference given to someone who's actually done this kind of thing in the past. There is no urgency for this project, as it's something I'm looking to launch in 2016.
Hi
I am an expert scrapper through python,PHP,Java.
I have a team of experienced software developer who has written code in several languages but we presently specialize in Python and Javascript. Web Scraping/Python Specialist.
I definitely have some questions when we discuss project with you. I am very much interested and have experience to do manual/ automated image to excel and pdf to excel work, bulk data scraping work. I have experience of many extensive and difficult email searching and lead generation, image conversion job. Apart from this type of job I have experience in Internet research, Virtual Assistant, Business analysis, Other Excel, Data Entry, excel formula generation etc.
Hello,
I can do this project from Excel or on the cloud using a Google spreadsheet with Google apps script.
I already made many web scraping project and this is an easy one.
Are you looking for discontinued memory parts?
I read your description, and i feel that i can start this project right now. i have comfortable working with these technologies, you can check my profile and see my reputation and different people who work with me in past. My work speaks for me.
Thank you.
Hello
I have vast experience of scraping data from websites, both plain html and data generated by and obscured by javascript. The Kingston website looks like one of the easier types to work with.
This is the kind of work I do on a daily basis.
My preferred language to work with is Python with the required modules based on how the website behaves. All of them are available for both Linux and Windows.
Best regards,
Jonas
All I could say is it could be done, based on the first looks, they are not using angular or any kind of bot or scraper defense.
Its just the matter of navigating the pages and extracting the required content and storing them in desired format. I'm skilled in Python and learning Scrapy.
I can store it on either MySQL or Spreadsheet. I recommend Google Docs over MS Office if it is spreadsheet as MS may bring myriad of dependency issues.
Hey there ! We're 2 developers with vast and wide knowledge in Python and scripting. We'll gladly do your project as it seems like something we can pull-off quite easily, in fact we just recently made this kind of project for someone else here. Contact us if you'd like to see it.
Greetings for the Day!!!
I have 6 years of experience in .NET, VBA Macros, VB script,PS Script and VB creation with application like SAP, Internet explorer, Microsoft Outlook,PDF& Text files, MS Access and SQL Server databases.
And also I have worked on extraction with websites like Amazon, Cellpex,Costco,etc.,
hope if awarded with this project I can make it best and better with maximum 100% accuracy and satisfaction.
please award me this project and contact me for further details
This is an extremely simple job I can tell you. I have hands on experience in writing such type of code. I also have experience working in IT field. I will complete the work you need in time and deliver you THE BEST output, if I am awarded.
I have a great deal of experience in web scraping using python. We can even discuss potentially adding these entries into a database or keep with your request for Excel/CSV files. I am also a full time Linux admin and can instruct you in ways to automate this script.
I know exactly what you need as i have done this before. I worked for a client who wanted to an online retail business but the wholesalers didn't provide a data feed for him and each wholesaler could have up to 20k distinct items. The main issue was that in order to see the item price he needed to login. so I had to bypass this working with the cookies.
I downloaded all the pages and reconstruct the database scrapping the htmls. then i built an app to compare items and prices. All above, Only to assure you that i really can do this job.
I think the best way to proceed is to download all the pages following the links from an initial page with a high depth (12 levels ). Once we have all the pages, we parse the html and extract the data according with the type of page.
finally we clean up the data and run some tests on it.