Find Jobs
Hire Freelancers

Extract and concatenate to single string 2 or 3 or 4.. text elements in a pdf, by regex and proximity

£20-250 GBP

완료함
게시됨 약 4년 전

£20-250 GBP

제출할때 지불됩니다
I need .NET code to highlight (and extract and concatenate) 2 or 3 or 4 separate text elements in a pdf, based on regular expressions and their proximity to each other (see attached image). 1.) I am using the DevExpress PdfDocumentProcessor to obtain the document text and coordinates using the [login to view URL] property 2.) Then, use the standard Regex class to get all substrings in the given string (the text returned with the [login to view URL] property) that matches your regular expression. Example text in Pdf: 240 TT 12345 Example Regex (should find the elements above individually): 1st Line: 3 Numeric Chars: ^\d{3}$ 2nd Line: 2 Alpha Chars: ^[A-Z]{2}$ 3rd Line: 5 Numeric Chars: ^\d{5}$ Criteria: All 3 text elements share same text height All 3 text elements have same (or close +- 10%) X coordinate Value All 3 text elements are within Y coordinates value of Char height * 4 +- 10% of each other Example text in Pdf: 240 TT 12345 Required concatenated string: 240TT12345 I'm guessing the workflow would be something along the lines of: Open pdf Extract all text elements Find text matching first line of regex Is there a text element the same character height with the same X coordinate value below this element (within the height of the text element above +-10%)? Is there another text element the same character height with the same X coordinate value below this element (within the height of the text element above +-10%)? If there is, extract all text elements, concatenated to string, e.g. 240TT12345 Highlight the elements in the pdf. I would class myself as an intermediate coder, but I'm really struggling here because the number of lines to search using regex can be 2, sometimes 3, maybe 4. Perhaps a LINQ query to find all by Regex and proximity however happy to see all suggestions.
프로젝트 ID: 24739628

프로젝트 정보

9 제안서
원격근무 프로젝트
활동 중 4년 전

돈을 좀 벌 생각이십니까?

프리랜서 입찰의 이점

예산 및 기간 설정
작업 결과에 대한 급여 수급
제안의 개요를 자세히 쓰세요
무료로 프로젝트에 신청하고 입찰할 수 있습니다
프로젝트를 수여된 사용자:
사용자 아바타
I have FULL CONFIDENCE of lending you a hand in sorting out your Regular Expressions problem and I am ready to start IMMEDIATELY. QUESTIONS/COMMENTS 1) It will be much beneficial if you can upload a small sample of [login to view URL] property that you are going to parse. I am asking this because I think it will contain contents of multiple text elements and I can clearly see what you mean by proximity. 2) Exactly what do you mean by text element? As I see it in attached image, there are 3 "figures" joined by dashed arrows. 1st figure contains 240-TE-24381, 2nd figure contains 240-TT-24381, and 3rd contains 240-TI-24381. Does the "figure" (e.g. 240-TE-24381) corresponds to a text element or individual parts within the figure, viz. 240, TE, 24381, constitute a text element? 3) I have not followed how X or Y offsets are related to RegExes. Please explain. EXPERIENCE Although new to Freelancer.com, I have EXTENSIVE experience in Regular Expressions and I am pretty much familiar with the RegEx “flavour” as implemented in .NET. Thus, I know that named capturing groups in .NET use (?<id>\w+) or (?'id'\w+) format while the syntax for named capturing groups is (?P<id>\w+). In addition to “regular” concepts such as Character classes, Anchors, Word boundaries, etc. I am also very much at home with concepts such as Atomic Grouping, Lookahead and Lookbehind. Thanks, Tushar
£69 GBP 4일에
5.0 (9 건의 리뷰)
4.5
4.5
9 이 프로젝트에 프리랜서들의 평균 입찰은 £141 GBP입니다.
사용자 아바타
Hello, I can help you with your project - Extract and concatenate to single string 2 or 3 or 4.. text elements in a pdf, by regex and proximity I have gone through your job posting and become very much interested to work with you. I am an expert in this field. I have already completed several projects like this. For evidence you can see my profile. Please visit : https://www.freelancer.com/u/schoudhary1553 I have excellent command over English. I am a hard worker, productive and worthy of your attention I hope, I would be the right candidate for this post. Awaiting an affirmative response from you. Kinds Regards, Sandeep
£220 GBP 4일에
5.0 (35 건의 리뷰)
5.9
5.9
사용자 아바타
I am PDF expert, I can write code to extract from raw pdf without libraries, it work for simple pdfs only. I hope your pdf like it, please send it to check
£200 GBP 3일에
5.0 (3 건의 리뷰)
3.8
3.8
사용자 아바타
Hi Claire J.! I'm a Graphic Designer, with over 6 years experience based in Vancouver, Canada. I've previously worked on pdf, vb.net for another employers. Please see my portfolio @ www.visak2691.com. I look forward to working on this project with you. Thank you, Vishakh
£142 GBP 7일에
4.5 (6 건의 리뷰)
3.7
3.7
사용자 아바타
-- VB.NET expert with PDF processing experience .......... Interested to do your project for regex matching ...........
£145 GBP 7일에
4.9 (7 건의 리뷰)
3.1
3.1
사용자 아바타
hello,dear. I have read all your requirements for 'Extract and concatenate to single string 2 or 3 or 4.. text elements in a pdf, by regex and proximity' and I fully understood it. I've already done this kind of project before. I am confident and I am sure that I am able to finish this project. Please come in contact with me, so that we can discuss any details via chat:) Skills: PDF, VB.NET
£150 GBP 1일에
5.0 (2 건의 리뷰)
2.5
2.5
사용자 아바타
10+ years experience in C# Have experience in processing inconsistent Excel & PDF files. Can complete in a day.
£111 GBP 1일에
5.0 (1 건의 리뷰)
1.9
1.9
사용자 아바타
i need this project i do best work for you any employer contact me. i am professional data entry work,
£135 GBP 7일에
0.0 (0 건의 리뷰)
0.0
0.0

고객에 대한 정보

국기 (UNITED KINGDOM)
Bagshot, United Kingdom
5.0
4
결제 수단 확인
9월 8, 2017부터 회원입니다

고객 확인

감사합니다! 무료 크레딧을 신청할 수 있는 링크를 이메일로 보내드렸습니다.
이메일을 보내는 동안 문제가 발생했습니다. 다시 시도해 주세요.
등록 사용자 전체 등록 건수(일자리)
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
미리 보기 화면을 준비 중...
위치 정보 관련 접근권이 허용되었습니다.
고객님의 로그인 세션이 만료되어, 자동으로 로그아웃 처리가 되었습니다. 다시 로그인하여 주십시오.