Peyton Data Mining

완료 등록 시간: 4년 전 착불

$30-250 CAD

착불

완료 착불

you are going to read some text files and classify them according to their labels. The Reuters corpus is one of the most famous datasets for text categorization tasks. We provide a subset of this dataset on Brightspace. You apply these files to make your classifier. There is more information about this dataset available on [login to view URL]

1- Download zip file and extract it. Consider this data is a subset of full Reuters corpus to make it possible for you to process without the need of a powerful server.

2- Each file contains some XML files. Explore XML files and find a list of all fields available there.

3- Write a function extract a Pandas's Dataframe containing: (1) headline, (2) text, (3) bip:topics,(4)

[login to view URL], (5) itemid, (6) XMLfilename

4- Write a python function to find all the possible values for bip:topics. Consider that each news can

belong to more than one topic.

5- Write a function to prepare your text data by methods such as removing stop words. You are allowed

to use the NLTK library.

6- Extract features from the text using any approach you like. Write a function that input the Dataframe

in step 3 and generates a new Dataframe of your features and labels.

7- Divide your data into a training and test set. You can use any method such as cross-validation. You

need to provide a reason why you decide so here.

8- Write a function to get the Dataframe of step 6 and a set of parameters to return a trained classifier

to classify all labels that you get in step 4.

9- Write a function to evaluate the quality of your classifier (like accuracy, F-score, AUC, ...). Explain why

you think this function is the best choice

9- Generate five different classifiers (Random Forest, Decision Tree, Linear Regression, Neural Network, and SVM) using step 8. Tune them up for the best parameters. Find the best classifier. Explain why.

Python 데이터 마이닝 소프트웨어 아키텍처 데이터 처리 XML

프로젝트 ID: #21831994

프로젝트 소개

9 건(제안서) 재택 근무형 프로젝트 서비스 이용 중: 4년 전

돈을 벌기 원하시나요?

Freelancer 서비스를 이용한 입찰 시의 장점

예산 한도 및 기간의 설정

작업 결과에 대한 급여 수급

제안서의 개요 작성

회원 가입과 입찰 참여는 무료

수상자:

Zohaib748

Hello Dear...! Alert: I will give you 20% discount on my bid rate also give on my All Services. So grabs this special offer is limited. Let’s get to the point. I came to know that your Looking a developer which 기타

$131 CAD (3일 이내)

(19건의 리뷰)

4.9

이 일자리에 대한 프리랜서 9 명의 평균 입찰가: $186

DevStar925

Hi, I read your project description and I am interested in your job. As you can see my profile, I am a full-time developer and have just completed many projects. Specially, I have top skills for C/C++, C#, Java, Py 기타

$200 CAD (2일 이내)

(70 리뷰)

7.4

smsaurabhv

Hi, I have gone through your requirement to scrape lots of websites. I am EXPERT in building scraping tools /scripts. Hence, I can SURELY work on your project. I am having 4 YEARS of EXPERIENCE in developing PHP-PYTHON 기타

$108 CAD (3일 이내)

(91 리뷰)

5.7

Arahan00

Hi, I have worked with NLP for sentiment analysis. I used Pythonfor the development. I would like to work on your project. Let me know if you want to discuss further. Regards, Monir

$250 CAD (14일 이내)

(21 리뷰)

5.2

razajen

I am a professional data scientist from Scotland I have a vast amount of experience in data mining I am more than happy to go ahead and discuss your project with you please drop me a text here.

$277 CAD (1일 이내)

(7 리뷰)

4.1

agrepatil12345

Hi Sir, Having Expertise in nature language processing, using python. also worked on different classification algorithm from machine learning and Deep learning. let's connect for further discussion. Thanks

$200 CAD (2일 이내)

(3 리뷰)

0.0

soooky92

i can do it in a couple of days, i would use cross-validation because it is the one that i normally use.

$100 CAD (10일 이내)

(0 리뷰)

0.0

luisnarvaez19

Certified in Java 1.2. I have been working with Java and JEE for 15 years. I have worked with several programming languages as: C, Python, Javascript, Visual Basic among others. I have experience doing compilers and in 기타

$250 CAD (7일 이내)