Find Jobs
Hire Freelancers

Build Database to Store & Extract Data from Text Files (Easy $$$)

$100-320 USD

진행 중
게시됨 5년 이상 전

$100-320 USD

제출할때 지불됩니다
NOTE: I originally hired someone for this project and he got called away. Below is the basic scope of work needed. I also included some of the comments from the original developer. You may have a better idea. Speed is important… very important. My goal… load email list into a database and extract all of the email addresses that have the same email domain as a URL in my domain list. Example domain [login to view URL] [login to view URL] [login to view URL] Example text string chris,jackson,chris@[login to view URL],1234567897 billy,bob,bbob@[login to view URL],84881451 john,doe@[login to view URL],8814 Example saved results chris,jackson,chris@[login to view URL],1234567897 john,doe@[login to view URL],8814 I am looking for any line that has an email that matches the domain. So if I have [login to view URL] it will extract every line where there is a @[login to view URL] email address. Domains are one per line. In my list I will have [login to view URL] but when it is being searched, you will code it to search for @[login to view URL] to ensure it pulls a valid email format and not ‘email@[login to view URL]’. IMPORTANT Once the Source files are imported into the database there will be around 2 billion records. Many duplicates. I have a list of 400,000 domains that I want to scan against it. I do not expect this to be completed in a few minutes but I do not expect it to take days or weeks. One of the ways we can speed up the Search process is by allowing me to load a suppression list of domains that I do NOT want to import into the database (Gmail, Hotmail, Yahoo). It may be best to have this as a list that I paste into directly into the database so that it doesn’t bring this data in. This will hinder the Import speed but it will speed up the Matching phase since most will be free email accounts. NOTES -- The Source files are .csv files that I renamed to .txt for this purpose. -- Some files have 1 column, and some have 3, 4, etc. Instead of trying to build a table to match the columns, we will treat each row as a single text string (as a single column) and import the entire row of data. REQUIREMENTS -- The results need to save frequently instead of waiting until the task is complete. -- It must have a basic UI – no commands. I want to be able to click a button to run Import and click a button to Match. -- When I click Match, it asks me to select the [login to view URL] file. This is the file that has the URL’s I want each email address to include. -- Speed is very important. Speed matters in two parts – the initial import of data and the matching of Domains against the email list. DEVELOPER NOTES Here are some key points from the previous developer who started it. -- Use C++ -- with some precomputation and indexing we can save a lot of time -- your issue looks mainly like an indexing issue, if you'll index emails with domain name you -- won't have to go through all data every time you search -- I talked with the DB Admin, and we are using nested Queries I cannot waste any additional time on this so I must ensure that you have read the details or I’ll flag your bid. In your bid, include the following the quoted text in the first line “I reviewed the notes and understand that speed is important to you.” To help me understand that you are the right person for the job, let me know how soon you can start, when you can finish, and how you plan to develop. The more detail you provide, the more confident that I am that you are the best choice. Make your best bid first as I will not be overpaying for this task. Do not bid the maximum budget amount as my max is lower than that. Thanks!
프로젝트 ID: 17493376

프로젝트 정보

14 제안서
원격근무 프로젝트
활동 중 6년 전

돈을 좀 벌 생각이십니까?

프리랜서 입찰의 이점

예산 및 기간 설정
작업 결과에 대한 급여 수급
제안의 개요를 자세히 쓰세요
무료로 프로젝트에 신청하고 입찰할 수 있습니다
14 이 프로젝트에 프리랜서들의 평균 입찰은 $220 USD입니다.
사용자 아바타
“I reviewed the notes and understand that speed is important to you.” Hello Sir, I am a professional software developer, my goal is efficiency of the code. I checked your requirements, and I have a great idea to accomplish this problem. let me share this with you. so instead of checking every single domain throughout the available 2 billion text strings. we have to first work on these text strings. and need to grouping it. so we have to apply a algo and for every row we will first check if any domain exist or not in text string, if it is duplicate and multiple checks, once we have clear text strings list. we will also maintain a global table which will have info about these text strings, and we will match our domain against this newly created table, and things will be sorted out smoothly. also I can take help of parallel computing if needed. Thanks, Rajdeep
$300 USD 7일에
5.0 (47 건의 리뷰)
7.0
7.0
사용자 아바타
I reviewed the notes and understand that speed is important to you. Hello, As an experienced software developer and having sound knowledge in database, i am very much interested to do this work. I read your description carefully, definitely the speed should be main factor in processing so large data. I have some plans for it. Just a note, i would like to use C# not C++, it will not make any difference, the main factor would be how the ideas are implemented. We can discuss more. Thanks
$260 USD 3일에
4.9 (172 건의 리뷰)
7.0
7.0
사용자 아바타
“I reviewed the notes and understand that speed is important to you.” Hello. I have read the specifications and here is my proposal of how this could work. I am a vb.net programmer, so I will write a simple ui in vb.net to handle things. You will definitely need mysql for this as it has many tricks to avoid duplicates on import and scan the database. Regarding the scan part which is the most crucial. It can be done with select statements for mysql for each of the domains so it won't take much time. However the table with the emails to check must have a small size so it will do the procedure fast. There is no way you can load 2 billion emails at once and have them checked. You must load at most 100000 emails per time so this would work in a reasonable time. However I can make the program get the csv files and load them to the db one at a time, so you won't have to do this manually. Also you can have a table for domains to exclude, or add them directly to the program when you load the domains. Hope this doesn't sound too technical for you. I have created many similar programs so I know what needs to be done. I will need 2-3 days to make the program and do some tests for thresholds and faster results. Then you can start running it. Please feel free to ask any questions. Regards, Marina Louki
$200 USD 3일에
5.0 (158 건의 리뷰)
6.8
6.8
사용자 아바타
i reviewed the notes and found speed is important to you. we can start immidiatly. i am planning to first put all data to database using some regex filter. then from database we can find it easily. also in database we can define unique so that duplicate will be ignored. if you have any questions please do ask me. thanks.
$166 USD 2일에
4.8 (39 건의 리뷰)
6.3
6.3
사용자 아바타
Hi, I read your document and noticed this is easy to do it. I am an experienced programmer with over 13 years of experience. Thanks.
$166 USD 10일에
5.0 (14 건의 리뷰)
4.6
4.6
사용자 아바타
Greetings sir, This is a very easy task for me, I am very experienced with processing CSV data to extract data. I can deliver top quality work in a few hours. Please send me a message whenever you want to start, I can start right now and will deliver within the day.
$110 USD 2일에
4.2 (15 건의 리뷰)
5.1
5.1
사용자 아바타
Hello I like the amount of details you put on the job , so many people don't care to do that . But I still have to say I'm little confused , are you having the source files loaded in a database or you want to perform the extract of the domains and then build a database from that ? Can you please with no commitment Send me a sample file , so I can get the whole picture ? I don't plan to use C++ in that project , Python is in my head for that purpose . but again .. there is no commitment .. we can see after I see a source file .. If C++ is the only suitable solution regards
$222 USD 7일에
4.9 (5 건의 리뷰)
4.1
4.1
사용자 아바타
if I understand correctly you will use the software on your local PC what operating system do you use? is it Windows or MacOS, etc. ?
$166 USD 3일에
5.0 (2 건의 리뷰)
2.4
2.4
사용자 아바타
I reviewed the notes and understand that speed is important to you. I have several years experience with c/c++ and also with MySQL databases. I can help you with this project. I believe 5 days is a reasonable timeline for this project, assuming we have regular contact for resolving questions during the development. It could potentially be a bit shorter than 5 days depending. One question I have--I did not see mention of a particular database vendor--is MySQL okay? I'm interested in the project and I'd be happy to help with it.
$250 USD 5일에
5.0 (1 건의 리뷰)
2.0
2.0
사용자 아바타
Hi I have the required skill set (C#, Application Development, Console Applications, MSSQL, MS Access, Scripting) and experience for the job, would be happy to assist you on this. Understood your requirement, we can start immediately, please share sample data. Thanks, Amit K
$277 USD 10일에
0.0 (0 건의 리뷰)
0.0
0.0

고객에 대한 정보

국기 (UNITED STATES)
Lexington, United States
4.8
60
결제 수단 확인
4월 6, 2011부터 회원입니다

고객 확인

감사합니다! 무료 크레딧을 신청할 수 있는 링크를 이메일로 보내드렸습니다.
이메일을 보내는 동안 문제가 발생했습니다. 다시 시도해 주세요.
등록 사용자 전체 등록 건수(일자리)
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
미리 보기 화면을 준비 중...
위치 정보 관련 접근권이 허용되었습니다.
고객님의 로그인 세션이 만료되어, 자동으로 로그아웃 처리가 되었습니다. 다시 로그인하여 주십시오.