FinalMethodologyPlan.docx

Assignment Questions

METHODOLOGY PLAN 6

Methodology Plan

Student’s Name

Institutional Affiliation

Professor’s Name

Course Name and Code

Date
RUNNING HEAD: METHODOLOGY PLAN 6

Methodology Plan
This paper present the methodology plan in light of the problem statement, the research approach and strategy, the sources and data collection, data collection and analysis methods, and ethical considerations, and anticipated limitations of the study. The study topic is “Perceived efficacy of the various anomaly detection techniques employed in data science during the enforcement of data security”.
Problem Statement
There are various data science techniques that are geared towards the enhancement of data security. This brings to mind the question as to which data science approach would be the most beneficial to implement in an algorithm that is dedicated to anomaly detection. This study will consider ten anomaly detection techniques, which are as follows: k-Means, k-Medoids, EM clustering, outlier detection algorithms, classification tree, fuzzy logic, naïve Bayes networks, genetic algorithm, neural networks, and support vector machine. These anomaly detection methods are largely derived from a survey on anomaly detection using data mining techniques previously conducted by Agrawal and Agrawal (2015). The study seeks to answer the research question, “Which anomaly detection technique is perceived by most data scientists as the most efficient to use?”
Just to briefly review these anomaly detection techniques, k-Means refer to a clustering technique in which k disjoint clusters are defined based on the grouped objects’ feature value – k is the parameter defined by the user. Time intervals with abnormal and normal traffic are separated in the dataset. The centroid clusters that are obtained are then used for the detection of anomalies in novel data. On the other hand, k-Medoids refers to an algorithm that is basically similar to k-Means apart from that it represents clusters using the most central object in the cluster instead of the mean. The k-Medoids technique detects anomalies containing unknown intrusions. EM clustering is functionally an extension of k-Means though it assigns objects to clusters based on similarity in means. Here, there are no strict inter-cluster boundaries. EM clustering is reputed to be more effective than k-Means and k-Medoids. Outlier detection algorithms are engineered to find patterns in datasets that are not conforming to the expected predefined behavior and then single them out. There are several outlier detection protocols since the definition of outlier also depends on reference to parameters like mean, median, etc. Approaches within outlier detection techniques include Nearest Neighbour algorithms, density-based local outlier approach, and distance-based outlier detection. The classification tree technique is basically a decision tree with nodes that are representative of test results. Fuzzy logic, which is derived from fuzzy set theory, takes an approach that classifies data on the basis of various statistical metrics that subsequently identify them as malicious or normal. The naïve Bayes network technique makes use of probabilistic graph models. The genetic algorithm technique is engineered in a fashion that is reminiscent of the evolutionary phenomena of selection, mutation, inheritance, and crossover. Genetic algorithm is by far deemed efficient in detecting anomalies with significantly low false-positive rates. Neural networks (NN) are designed to work like a typical brain with characteristic sets of interconnected nodes. Support vector machine (SVM) techniques are used for pattern recognition. SVM technique outperforms NN when it comes to accuracy and false alarm.
Research Approach and Strategy
This study employs a descriptive cross-sectional research design. This is because, given the nature of the study, the aforementioned study designs are deemed the most suitable. The study is considered a descriptive study because it seeks to unravel the in situ perception of various data scientists regarding the efficacy of the aforementioned anomaly detection techniques. The emphasis is on the “what” rather than the “why” or the “how” (Nassaji, 2015). It is also a cross-sectional study since it will be only done at one point in time (Setia, 2016; Wang & Cheng, 2020), unlike longitudinal studies that are done at more than one point in time (Caruana et al., 2015).
Sources and Data Collection
The sources of data in this survey are the data scientists who are in various fields related to data science. However, those who are working in cybersecurity in various institutions are of more interest when it comes to this study. This is because those who are in the domain of cybersecurity are most likely familiar with the anomaly detection techniques which are employed in various algorithms that are dedicated to enhancing data security. This study targets 100 participants who will be randomly selected. There will be a structured self-administered questionnaire that will capture details such as sociodemographic characteristics and the actual study questions. The sociodemographic characteristics that will be included in this study are factors like gender, age, geographical address, occupation, rank or employment position, and level of experience or number of years of experience. The question of occupation is intentionally included as a measure of validity and reliability of the questionnaire since the study is strictly meant for those who are working as data scientists particularly in the domain of cybersecurity. It is this particular group whose opinions will contribute to unraveling the perception of the efficacy of the various anomaly detection techniques.
The study questions section will open with a question on if the participant is aware of the ten anomaly detection techniques included in this study, to which their task will be to say either “yes” or “no”. They will then rank their level of knowledge of the ten anomaly detection methods on a visual analog scale of 0 to 10, whereby 0 implies “least knowledgeable” while 10 implies “completely knowledgeable”. This will be followed by a list of the anomaly detection techniques to which the participant will be required to rank their level of efficacy on a visual analog scale (VAS) running from 0 to 10, whereby zero implies least effective while 10 implies most effective. A visual analog scale has been chosen for this particular study as opposed to a Likert scale because it is deemed by researchers to have better internal consistency, and therefore, superior metrical characteristics. Visual analog scales have better inter-rater reliability and test-retest reliability (Brazier & Ratcliffe, 2017; Reips & Funke, 2008).
To ensure that the questionnaire, which is the data collection tool, is comprehensive enough for the study, a pilot study will be done consisting of ten participants. This pilot study will enable the researcher to pretest the clarity of the questionnaire as well. If the questionnaire is proven to be comprehensive and free of errors, copies will be made, both in print and online. The online versions of the questionnaire will be prepared using online survey form platforms such as SurveyPlanet, SurveyMonkey, or Google Forms depending on what the researcher finds convenient. These online forms will prove most useful where the would-be participant is not physically accessible or is not available for a one-on-one administration of the questionnaire. The questionnaires will be made concise and straight to the point so that the participants do not get repulsed by them. It is approximated that the time taken to complete any given questionnaire will be anywhere from ten to fifteen minutes.
Data Collection & Analysis Methods
Once all the required data have been satisfactorily collected, they will input into SPSS (Statistical Package for the Social Sciences) version 25 for analysis. The data analysis organized into three phases as follows:
Phase one: This phase will involve the cleaning of the data that have been collected to remove those who do not properly qualify as participants in the study yet proceeded to complete the questionnaires. This phase also involves the transcription of all the collected data into SPSS software. During this, the data will be categorized according to their appropriate data types in to enhance the validity and reliability of the analysis output. As the responses trickle in, the researcher will be doing the data cleaning process so that where a questionnaire has been rejected, more participants can be recruited to cover for the deficit. This process will be diligently done until the desired sample size has been achieved.
Phase two: This phase involves the actual analysis of the data that have been collected. At this point, there will be two types of data analysis that will be performed: one, descriptive statistical analysis; and two, inferential statistical analysis. Descriptive statistics that will be generated include frequencies, mean, mode, median, etc. On the other hand, inferential statistics refer to the statistical analysis methods that are used to test hypotheses in research settings. For this study, one inferential statistical analysis method that will be applied is MANOVA (multivariate analysis of variance). Inferential statistics will the performed at a confidence level of 95%, alpha = 0.05.
Phase three: This phase involves the presentation of the data in the form of tables and charts as deemed appropriate by the researcher. Tabulation will efficiently summarize the data as opposed to prose narrations. However, brief narrations will accompany the tabulated analysis results. Charts such as bar charts and pie charts will be used to bring a visual effect to the analysis results where appropriate. Nonetheless, caution will be taken not to senselessly duplicate the results, i.e., repetition will be avoided by using either a chart or tabulation, but not both, as much as possible. Performing inferential statistical tests allow for generalizability and universalizability of the obtained results since they corroborate the validity and reliability of the results.
Ethical Considerations and Limitations
Ethical concerns in this study include the adherence to the participants’ rights to anonymity, confidentiality, and privacy during the study and afterward. To address this, the researcher shall make use of code numbers for the participants instead of their actual personal identifiers such as official names, identity card numbers, etc. To ensure confidentiality and privacy, the researcher shall protect all data from unauthorized access as follows: hard copy data will be kept in a safety cabinet under lock and key, while softcopy data will be encrypted with passwords. This study does not involve any interventions that are invasive in nature to the participants, so no potential for physical harm is anticipated. The researcher shall ensure to make use of easy-to-understand language in the questionnaire so that the data collection process remains as smooth as possible. This study is a voluntary study, meaning that no compensation in any form shall be awarded to any participant. This fact shall be made plain to the participants in the consent explanation form so that unforeseen adverse incidences are avoided.
No participant shall be coerced into participating in the study. The choice to study is completely voluntary with the understanding that no monetary or any other form of compensation shall be given. Besides, the participants are free to withdraw from participating in the study at any point without fear of intimidation or prejudice. The data shall be provided by the participants shall not be shared or published in any means whatsoever without their consent, which why there will be a consent form to sign once the would-be participants have agreed of their own volition to take part in the study. The participants are allowed to seek clarifications from the researcher regarding anything at any point during the study. The contacts for the researcher shall be provided in the consent form explanation and the consent form as well, which shall be countersigned by the researcher as well. These elements of ethical considerations have been largely borrowed from Creswell and Creswell (2018).
Some of the anticipated limitations in the study are issues like having participants who fail to provide honest information regarding their familiarity and level of knowledge regarding anomaly detection techniques. This might reduce the validity and reliability of the results. Another issue is that this study will have a geographical bias in that it will be not be done globally. This usually an inherent bias in most research studies anyway (Ross & Bibler Zaidi, 2019).

References

Agrawal, S., & Agrawal, J. (2015). Survey on Anomaly Detection using Data Mining Techniques. Procedia Computer Science, 60, 708-713. https://doi.org/10.1016/j.procs.2015.08.220
Brazier, J., & Ratcliffe, J. (2017). Measurement and Valuation of Health for Economic Evaluation. International Encyclopedia Of Public Health, 586-593. https://doi.org/10.1016/b978-0-12-803678-5.00457-4
Caruana, E., Roman, M., Hernández-Sánchez, J., & Solli, P. (2015). Longitudinal studies. PubMed Central (PMC). Retrieved 30 May 2021, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4669300/.
Creswell, J., & Creswell, J. (2018). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches (5th ed.). SAGE Publications, Inc.
Nassaji, H. (2015). Qualitative and descriptive research: Data type versus data analysis. Language Teaching Research, 19(2), 129-132. https://doi.org/10.1177/1362168815572747
Reips, U., & Funke, F. (2008). Interval-level measurement with visual analog scales in Internet-based research: VAS Generator. Behavior Research Methods, 40(3), 699-704. https://doi.org/10.3758/brm.40.3.699
Ross, P., & Bibler Zaidi, N. (2019). Limited by our limitations. Perspectives On Medical Education, 8(4), 261-264. https://doi.org/10.1007/s40037-019-00530-x
Setia, M. (2016). Methodology series module 3: Cross-sectional studies. Indian Journal Of Dermatology, 61(3), 261. https://doi.org/10.4103/0019-5154.182410
Wang, X., & Cheng, Z. (2020). Cross-Sectional Studies. Chest, 158(1), S65-S71. https://doi.org/10.1016/j.chest.2020.03.012

Continue to order Get a quote

Calculate the price of your order

Type of paper needed:

Pages:

550 words

Academic level:

We'll send you the first draft for approval by September 11, 2018 at 10:52 AM

Total price:

$26

The price is based on these factors:

Academic level

Number of pages

Urgency

Basic features

Free title page and bibliography
Unlimited revisions
Plagiarism-free guarantee
Money-back guarantee
24/7 support

On-demand options

Writer’s samples
Part-by-part delivery
Overnight delivery
Copies of used sources
Expert Proofreading

Paper format

275 words per page
12 pt Arial/Times New Roman
Double line spacing
Any citation style (APA, MLA, Chicago/Turabian, Harvard)

FinalMethodologyPlan.docx

Calculate the price of your order

Our guarantees

Money-back guarantee

Zero-plagiarism guarantee

Free-revision policy

Privacy policy

Fair-cooperation guarantee