Find Jobs
Hire Freelancers

Looking for a python developer to help me finish a search engine with tf-idf and cosine similarity + query WITHOUT libraries such as sklearn

€8-30 EUR

취소됨
게시됨 3년 이상 전

€8-30 EUR

제출할때 지불됩니다
I am looking for a python developer, preferably an expert in NLP, to help me finish a search engine for one of my college courses. The first part of the code, which is an inverted index, is already done. Please DO NOT change any parts of the pre-existing code, except for the parts instructed. It is important to keep the posting lists as they are - DO NOT shorten them. As I only have a limited number of characters, i have added a file that contains a more detailed job description, which examples, as well as a screenshot of what the result should look like. Please read the instructions carefully first and have a look at the screenshot before bidding. It is of great importance to follow the instructions (e.g. NOT using libraries for certain parts) This task should not be too much trouble for a skilled developer. Here is the rough outline of what needs to be done: - the tokens need to be stemmed, using snowballstemmer for German. It MUST be done using a separate function, do not stem in the same function as tokens are counted. I have noted in the code where to add this part. Stemming has also to be done in the queries. So, for example, if you type in "eating" in the queries (both inverted index AND cosine similarty), anything starting with "eat" should be printed out. - tf-idf needs to be calculated. MOST IMPORTANTLY: you CANNOT use any libraries for this. So DO NOT use sklearn, tfidfvectorizer or anything like that. Each part (tf, idf, tfidf) needs to be calculated in a separate function. I have noted where to add these in the code as well. If you use a library like tfidfvectorizer, or anything else that does the same, I cannot accept the code. - cosine similary has to be calculated; also MUST be done using a function, NO libraries (No sklearn, etc.) it has to be calculated based on whatever is typed into a query, comparing to the texts in the corpus. This query has to be accessed using the main function by typing in "2" in the menu. (menu already implemented; please find the corresponding part in the main function to add the query) The user should be able to search for words and then see the cosine similarity, tf, idf, and the final tf-idf for the Top N (e.g. Top 10) ranked document names AND document IDs for each result (please view the screen shot for this) after choosing the option for tf-idf in the menu (menu already implemented, tf-idf is chosen by entering "2"), first, the overall top 10 results (or any other number) for tf-idf should be printed out; without a query (no cosine similarty in this, as it is used for queries only). it should look something like this: Documents: [id: name (|d|)] 0: text1, 1: text2, 2: text3,.... dictionary: [term: idf | (doc: tf), (doc: tf), (doc: tf),...] and then it should ask the user to type something into a query. the result should look something like this (using cosine similarity): Query: food Top 3 containing the queried word(s): filename1 (file ID, tf | idf) filename2 (file ID, tf | idf) filename3 (file ID, tf | idf) (please view the screenshot for details, you will understand what I mean) The user should be able to type in more than just one word, but it the texts don't have to contain every single one of the words typed in in order to appear in the results. the added screenshot, a commented screenshot, and the more detailed project description will give you more details. Please advice these if you need more information. I have also provided some of the texts I am working with. Please note that the code has to be as simple as possible, nothing too hard/fancy. And it should be quite fast as I have to go through almost 4000 texts. To test the query with the texts I provided, I recommend searching for "vater sohn" and see if cosine similarity works.
프로젝트 ID: 26972532

프로젝트 정보

제안서 1개
원격근무 프로젝트
활동 중 4년 전

돈을 좀 벌 생각이십니까?

프리랜서 입찰의 이점

예산 및 기간 설정
작업 결과에 대한 급여 수급
제안의 개요를 자세히 쓰세요
무료로 프로젝트에 신청하고 입찰할 수 있습니다
1 이 프로젝트에 프리랜서들의 평균 입찰은 €490 EUR입니다.
사용자 아바타
I have 3+ years of experience as a Python programmer and have worked on several Machine Learning projects mainly targeting the domain of Computer Vision and Digital Image Processing. Get effective Python programming / Machine Learning / Computer Vision / Deep Learning / Digital Image Processing / Algorithms & Design solutions
€490 EUR 7일에
0.0 (1 건의 리뷰)
0.0
0.0

고객에 대한 정보

국기 (GERMANY)
Birkenfeld, Germany
5.0
2
5월 30, 2020부터 회원입니다

고객 확인

감사합니다! 무료 크레딧을 신청할 수 있는 링크를 이메일로 보내드렸습니다.
이메일을 보내는 동안 문제가 발생했습니다. 다시 시도해 주세요.
등록 사용자 전체 등록 건수(일자리)
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
미리 보기 화면을 준비 중...
위치 정보 관련 접근권이 허용되었습니다.
고객님의 로그인 세션이 만료되어, 자동으로 로그아웃 처리가 되었습니다. 다시 로그인하여 주십시오.