Find Jobs
Hire Freelancers

Network data stream simulation with time range LDA pattern mining

$50-100 USD

종료됨
게시됨 거의 3년 전

$50-100 USD

제출할때 지불됩니다
This project involves the simulation of a SIEM system using Latent Dirichlet Allocation for IoT device streams. It can be implemented in R, Python, C++ or any relevant language that achieves the outcome. Workflow Input config > random & pattern generated content streams > stream chunks > LDA parser > output pattern frequency & topics per stream Data Generation Input config > random & pattern generated content streams The generator should be configurable and able to create network simulation data streams. Each stream generates random content and includes generated content as provided by the config file: 1. stream information 2. string and regex patterns to include in the stream (generator fills the regex with matching values) 3. occurrence frequency (range 0 to 10) which represents the number of the generated string and regex patterns to include per minute. Does not have to be very sophisticated, just relatively different. The generator can be started and stopped. Example inputs configuration for 2 streams in JSON format. / input/[login to view URL] { { “name”: “endpoint1”, “ip”: [login to view URL], “port”: 345, { “pattern”: “IP_EXT: '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}' MSG: ^#[^ !@#$%^&*(),.?":{}|<>]*$ USER: ^[a-z0-9_-]{3,15}$” “frequency”: 2 }, { “pattern”: “PAYLOAD: ^ABC_[^ !@#$%^&*(),.?":{}|<>]*$ ID: ^[a-z0-9_-]{30,150}$” “frequency”: 5 }, }, { “name”: “syslog1”, “ip”: [login to view URL], “port”: 534, { “pattern”: “IP_EXT: '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}' MSG: ^#[^ !@#$%^&*(),.?":{}|<>]*$ USER: ^[a-z0-9_-]{3,15}$” “frequency”: 2 }, { “pattern”: “PAYLOAD: ^ABC_[^ !@#$%^&*(),.?":{}|<>]*$ ID: ^[a-z0-9_-]{30,150}$” “frequency”: 5 }, }, } Sample stream chunk. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed euismod eros a lectus porttitor, vitae aliquet magna ullamcorper. Praesent in enim non magna vehicula faucibus. Vestibulum lacinia velit ut dolor aliquet tincidunt. IP_EXT: [login to view URL] MSG: #abyx USER: das-dkjh Ut consectetur hendrerit massa vel tempus. Nulla sit amet libero id felis lacinia accumsan. PAYLOAD: ABC_aS57dasd USR: 42d8ffe6-8a65-416c-ac92-d5826315faa6 In dictum porta magna sed lectus venenatis. Aliquam accumsan molestie augue, sit lectus amet vulputate metus tristique et. Ut a lectus erat elit…. Regex specifications from [login to view URL] [login to view URL] [login to view URL] Stream Parser stream chunks > LDA parser > output pattern frequency & topics per stream The streams are red by a parser application which reads each input stream for a configurable span of time (e.g. 30 seconds) as input chunks. You must use the Latent Dirichlet Allocation package or method to analyze the data and create/append to 3 log files per stream. Each run is in a new output folder with a timestamp from when the run began. 1. the found matching patterns log (use the input file to identify patterns), 2. the count of the patterns in that timespan log, and 3. up to 10 highest frequency single string terms (LDA topics, occurrence > 1 & not in regex patterns?) Attached is a research paper related to the filed of study. My aim is to replicate the basic stream generation and pattern matching using LDA. It is just a proof of concept and not for production code. Good use of comments is always welcome!
프로젝트 ID: 30617901

프로젝트 정보

3 제안서
원격근무 프로젝트
활동 중 3년 전

돈을 좀 벌 생각이십니까?

프리랜서 입찰의 이점

예산 및 기간 설정
작업 결과에 대한 급여 수급
제안의 개요를 자세히 쓰세요
무료로 프로젝트에 신청하고 입찰할 수 있습니다
3 이 프로젝트에 프리랜서들의 평균 입찰은 $92 USD입니다.
사용자 아바타
Hi, I graduated Bachelor of Statistics. I have experience using R, IBM SPSS, IBM Amos, IBM Modeler, and Tableau because that application have been learned when i was college. I am also a specialist in Basic Statistical Analysis (descriptive analysis, graph, chart), Correlation Method (chi-square, gamma, mean test, ANOVA), Regression Method (linear, nonlinear, logistic, spatial), Forecast Method (ARIMA, Fuzzy, Wavelet, Basic Forecast), Data Mining, and Factor Analysis. During my lecture, I had part-time work experience as a data entry at one of the survey institutes where I studied namely Haluoleo Institute. I worked part time as an online Data Analytics one of the start-ups in Singapore, namely Grid Synergy. The work that I do is visualizing data using Excel and Tableau. I will work carefully and responsibly if i am recruited. Best regards, Aulia Atikah Putri
$100 USD 5일에
5.0 (10 건의 리뷰)
3.1
3.1
사용자 아바타
Hello! This is Artem from Russia who has been working as an Desktop App developer for the last 6 years. I have checked the project description and I think that I can help you to do this project. I am fully feeling comfortable working with C++, Python, R programming languages. I have some questions about your project description. Hope to discuss further details via chat. Thanks. Artem.
$100 USD 7일에
5.0 (1 건의 리뷰)
0.4
0.4
사용자 아바타
the reason why something is done or used : the aim or intention of something. : the feeling of being determined to do or achieve something. : the aim or goal of a person : what a person is trying to do, become, etc.
$75 USD 7일에
0.0 (0 건의 리뷰)
0.0
0.0

고객에 대한 정보

국기 (UNITED STATES)
New York, United States
5.0
1
6월 21, 2021부터 회원입니다

고객 확인

감사합니다! 무료 크레딧을 신청할 수 있는 링크를 이메일로 보내드렸습니다.
이메일을 보내는 동안 문제가 발생했습니다. 다시 시도해 주세요.
등록 사용자 전체 등록 건수(일자리)
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
미리 보기 화면을 준비 중...
위치 정보 관련 접근권이 허용되었습니다.
고객님의 로그인 세션이 만료되어, 자동으로 로그아웃 처리가 되었습니다. 다시 로그인하여 주십시오.