nach oben

Erschienen in:

Open Access 2024 | OriginalPaper | Buchkapitel

Classification of Static Poses Based on Key Point Detection for Application of Incriminated Image Files

verfasst von : Schönbrodt Antonia

Erschienen in: First Working Conference on Artificial Intelligence Development for a Resilient and Sustainable Tomorrow

Verlag: Springer Fachmedien Wiesbaden

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Zusammenfassung

The role of artificial intelligence, particularly in enhancing decision-making processes and facilitating automation, has become indispensable in today’s society. Law enforcement agencies have used the findings of this research for several years to analyze the ever-growing volumes of data. This approach owns an important role in the detection of child and adolescent pornography. Research conducted in this field has traditionally been dependent on the findings from skin filtering studies. However, some scientific publications and empirical studies reveal insufficient classification when using nudity levels. This work takes a more promising approach by leveraging the knowledge of an existing system and combining it with a homegrown model to improve the detection of incriminated image files. For this implementation, techniques from motion analysis were used to open up a new field of pose recognition.

Zusammenfassung. Die Rolle der künstlichen Intelligenz in Bezug auf ihre Individualität und Automatisierung ist in der heutigen Gesellschaft unverzichtbar geworden. Die Ergebnisse dieser Forschung werden seit einigen Jahren ebenfalls genutzt, um Strafverfolgungsbehörden bei der Analyse der ständig wachsenden Datenmengen zu unterstützen. Dieser Ansatz besitzt eine wichtige Rolle bei der Aufdeckung von Kinder- und Jugendpornografie. Bisherige Studien in diesem Bereich stützen sich größtenteils auf die Ergebnisse von Studien zur Hautfilterung. Einige wissenschaftliche Veröffentlichungen und empirische Studien zeigen jedoch, dass die Verwendung des Nacktheitsgrades keine ausreichende Klassifizierung ermög-licht. Diese Arbeit verfolgt einen vielversprechenderen Ansatz, indem sie das Wissen eines bestehenden Systems nutzt und es mit einem selbst entwickelten Modell kombiniert, um die Erkennung inkriminierter Bilddateien zu verbessern. Für diese Implementierung wurden Techniken aus der Bewegungsanalyse verwendet, um ein neues Feld der Posenerkennung zu erschließen.

Supported by DigiFors GmbH.

1 Introduction

The idea of using artificial intelligence to assist humans and introduce new innovations has been around since its inception in 1950 [1]. Its use has already consolidated and is constantly expanding in many fields such as business, the automotive industry, and many others [2]. Nevertheless, the field of artificial intelligence is not sufficiently considered in the context of law enforcement. As digitization continues, ever-increasing amounts of data are being generated that cannot be handled by law enforcement’s own resources alone [3]. In order to facilitate the analysis process for investigators, a field opened up in academia a few years ago in which neural decision patterns are used to examine and evaluate the volumes of data. The surge in data volume is especially noteworthy in less-visible crime areas. This is also true for the analysis and detection of child and adolescent pornography [4]. Here, not only can the analytical aspects be made more efficient through the use of neural networks, but artificial intelligence is also used to minimize the psychological burden on investigators. However, most of the models to be tested are currently still in the testing phase. For this reason, a bachelor’s thesis was written to provide a foundation for the research listed here. It is intended to change the perspective regarding the use of artificial intelligence for the detection of child and adolescent pornography. To this end, existing methods have been partially redesigned. This approach makes them more resilient to large amounts of data, as this adapts the methods to other subjects. In this way, the constructed systems can be adapted to the different needs and interests of the users in order to revolutionize the analysis of the addressed offense domain. Thus, the goal of this thesis is to address the question of whether criminal image recognition can be improved by allowing knowledge of pose recognition to have a decisive influence on the determination of incriminated images. The basis for this theory is the aforementioned bachelor thesis, which was further adapted and improved after its completion.

2 Scientific Framework

2.1 State of Research – Artificial Neural Networks for Child Pornography Detection

The emerging field of artificial intelligence in science now makes it possible to analyze large amounts of data faster and more efficiently and in a more targeted manner[5]. The most promising scientific contributions are currently still in the research stage [6]. In preparation for the aforementioned bachelor’s thesis, systems dealing with this area of crime were examined in a practical manner in police agencies. None of the models tested were targeted, as most research in this area relied on the use of the skin filter alone. A suitable scientific publication in this regard was written by Nicole Garbers and Michael Brodthage[6]. This paper delves into the challenges posed by employing artificial intelligence in the detection of child pornography content. In their 2021 execution, they describe the limitations of the skin filter often used for classification purposes and their improvements. According to them, the skin filter by itself does not provide suitable preselection [6]. One problem that arises from the nudity level classification is that the neural networks are not able to recognize different ethnicities [7]. On the other hand, misinterpretations of the analyzed data also occur in everyday situations of the data under analysis. This phenomenon of misclassification also occurs when selfies are taken. Since these image recordings represent a high percentage of skin of the entire image [8]. In addition, borderline cases often occur on the evidence being analyzed, where underage individuals are required to pose sexually clothed. These considerations show that the skin filter determinations for pornography detection [9] cannot be used alone in such cases. Moreover, the described views of Garbers & Brodthage could be confirmed by the results of the apparent observations of the practical models. The multilayeredness and complexity of the phenomenon area to be addressed and the listed limitations of Gabers and Brothage have the consequence that the analysis of the nudity level has to be improved by considering further characteristics. Since final thesis only provide a limited framework for research, the considerations in this area must be narrowed down. The scientifically most important area is provided by the motion analysis with the analysis of the joint points [10‐12]. Despite the scientific dissemination of this topic, the determination of poses and their classification was not recognized in court for a long time. This changed with the 49th Criminal Law Amendment Act [13]. Through this amendment, recordings without a direct sexual act are now considered criminal. It is now sufficient only to detect the pose itself [13]. Based on the arguments described above, the question of whether the detection of criminally relevant pornographic material can be improved by determining and classifying static poses will be explored.

2.2 Pose Detection

According to the Criminal Code [§184 para. 1 no. 1 lit. B], posing of a partially or fully clothed child is punishable. In this context, the legislator defines posing as an “unnatural and sexually accentuated posture”. In this category, children are induced to assume sex-related and stimulating postures in order to imitate adult models [13]. For the detection of posing, a uniform determination scheme must be established first. In the context of this work, unnatural poses are static postures that do not correspond to the typical behavioral patterns of children. This category includes positioning the buttocks in front of the camera, rolling the body on the floor and spreading the legs. In addition, there are targeted postures in which certain body regions are deliberately accentuated by certain inclinations of the arms or legs. Furthermore, images in which touching the chest or intimate area is in the foreground are counted under a gender-emphasized posture. According to [14], tracking and estimating the human pose is a computer vision task. The basis of this is the recognition, assignment and tracking of key semantic points. There are already exist various software applications to support the fixing of joint points and thus the determination of poses. During the research period, different applications were considered and evaluated. Various pose detection systems were tested for the planned procedure and were adapted to the conditions. The major known pose detection systems AlphaPose [15] and Mask R-CNN [16] are not suitable for the intended classification. Mask R-CNN is based on semantic segmentation, which can lead to shallow human bounding boxes and this leads to a incorrect classification of joints. Furthermore the system was fundamentally not designed for pose estimation [17]. The AlphaPose pose estimation system is rather designed for the determination of multiple individuals. The scientific review by Siddharth Sharma also mentions that this model provides more failure cases than other models in pose estimation [17]. In order to actively compare the favored pose estimation system OpenPose with another suitable system, local tests between OpenPose and another system called MediaPipe were performed as part of the bachelor thesis developed. It has been shown that OpenPose can detect the skeletons more reliably. In particular, hidden body parts could be detected better as with MediaPipe due to the additional function of occlusion.

2.3 OpenPose

OpenPose is a pose recognition system [18] which was developed based on the COCO dataset. OpenPose is the first “open source real-time system for 2D pose estimation” [18], which is able to determine key points of the human body. The keypoints describe various points that are placed at significant locations on the body, such as the shoulders, knees, and elbows. [19] The basic principle of the convolutional neuron network based approach is the so-called bottom-up approach. Here, in a given image, the different keypoints are first determined and then the differently recognized skeletal parts are combined to form a whole object [19]. Such approaches exhibit a certain robustness and have the potential to decouple the influence of runtime complexity from the number of people [20]. OpenPose provides the first bottom-up approach in which association values are used to encode an image in terms of limb position and orientation via so-called Part Affinity Fields (Pafs) [18]. Here, the Pafs are defined as a set of 2D vector fields consisting of a series of flow fields. [18]. After the initial estimation of these vectors, the information is improved based on the multilevel classifier at each refinement level. Subsequently, a grouping procedure is performed in which the vectors corresponding to the same joint region are connected and ranked according to affinity. This principle is based on the idea that the joints to be detected are connected by limbs. Finally, the detected points are connected to represent the pose key points. First, the connections of one group of joints are performed and then those of the other limbs are determined [18] Thus, the connection of the body-and-foot keypoints detector finally provides a complete human skeleton with at least 25 different keypoints, which can be used for further considerations.

3 Methods – General Procedure

3.1 Assembling the Data Material

Different data sets were compiled for pose recognition and neural network training purposes. Given legal and ethical constraints, we cannot use incriminated image files for creating suitable training material, as the possession of child and youth pornographic content remains prohibited, even for scientific endeavors [12]. For the work with artificial neural networks, various data sets are needed, which due to their diversity ensure a robustness of the network against large amounts of data. Initially, we require a dataset tasked with training and updating the neural network. In order to achieve diversification of the dataset, care must be taken to use shots of people of different ethnicities, group shots, and backgrounds in the classes under consideration [6]. In addition, diverse shooting angles and situations have to be considered in the image scenes to under consideration.The freely available collection of Figshare [21] was used as the pornographic data set. However, due to this, the number of usable image files reduced rappide. Since the goal of this research is to distinguish between posing and no posing images, everyday images were added to the dataset as an opposing class. Thus, both domains form the training dataset used in this work. Considering the limited time for the underlying bachelor thesis and the limited datasets, only the small amount of data was relied on. In order to check how well the learned model generalizes the data or if it represents only the training data, a validation dataset is necessary. For this purpose, the corresponding data from Figshare [21] was also taken into consideration. Then, another independent dataset is needed to test the neural network. The files from the test dataset of Figshare [21] are not only pornographic images, but also everyday clippings. The use of both classes is necessary to test the functionality of the artificial intelligence related to the objective. Only in the comparison between non-pornographic and pornographic poses it can be seen if the system performs a good classification.

3.2 Created Workflow

The scientific thesis of the underlying bachelor thesis is to be able to make a classification of sexual poses on the basis of defined cornerstones. For this purpose, a workflow adapted to this topic was developed, which already uses an existing AI-generated innovation and combines the basic knowledge with a self-created instance.

This compiled construct provides information about whether a human pose is understood as posing or not posing. The listed parts were combined in a Python code to form a packaged analysis system. The architecture of the workflow is shown in Fig. 1. At the beginning, the pipeline is briefly summarized for a better understanding. First, the image files to be analyzed are passed to the program. In this process, the files are first examined using the embedded framework OpenPose, where the person-based rigs are created. The results are optimized with respect to the objective. Subsequently, they are passed to a neural network, which was constructed self-referentially for the problem to be analyzed. The structure of this system is described in more detail below. This network divides the image data into the classes posing and no posing and then assigns them to equal folders, which can be viewed at the end of the pipeline with the respective results. The entire process was visually structured so that at the end it is comprehensible which image files were classified as posing and which need to be examined more closely by the officials. The output of the workflow is therefore not the human skeletons created in the intermediate steps, but the data originally entered. This form of representation ensures that each image can be uniquely assigned. Thus, the decision of the neural network can be better observed and evaluated. Possible misinterpretations in the representation of the original files can be better detected. Thereupon, a faster identification and adjustment of the inaccuracies can take place. These deviating results are then used to be integrated into the new learning process of the neural network and to improve it.

Prediction of person-specific body skeletons by OpenPose. The OpenPose motion analysis system was used for pose determination. In this study, only the demo version of this implementation was used, as it does not rely on an NVIDA graphics card, unlike the more comprehensive version. The initial testing did not yet yield the possibility to test the encryption results with the full version. Whether changing the way the program is run improves the results of this work will be tested in further future research. Through the demo version used, human skeletons, also called rigs, are generated from the incriminated data. To better narrow the predefined skeletons for the range of phenomena to be considered, slight changes were made to the implementation code line of OpenPose.

The parameters were changed by the attribute disable_blending to remove the background from the images to be analyzed. This change allows the neural network to analyze only the keypoints and avoids distortions caused by accentuated backgrounds. In addition, the number of keypoints to be classified has been increased by also inducing the detection of the feet and hands. The foot detection is already integrated by default in the demo version. With the attribute –hand the hand detection was added to the determination.These body regions play an important role in the detection of posing images. Since especially hands can provide clues for a conscious accentuated movement. However, not all poses are uniquely recognized when OpenPose detects the key points. For example, there are often deviations when the spine is bent, the legs are stretched in the air, or the buttocks are held up to the camera. Currently, attempts are being made to counteract this problem by specifically training the neural network with such borderline cases in order to promote the robustness of the network. On the other hand, efforts are also underway to develop the Linux-based variant of the workflow to incorporate the deeper innovation of OpenPose.

Labeling of the inserted data sets – preparation of the training phase. To initiate the determination of the poses, they must first be precisely evaluated and categorized for training purposes. These differences are then learned by the neural network so that it can determine a suitable classification for unknown image objects. Reliable preparations are pivotal for the success of training. After the initial billfiles were pipelined, OpenPose added keypoints to them, and due to the change in code implementation, the background was subsequently removed. All that is left is the pure skeleton. Subsequently, in preparation for the training, the values of individual skeletons must be established. For the labeling of the training data exclusively results are used, with which OpenPose could detect nearly all key points. Such well-detected skeletons include outputs in which there are hardly any deviations from the original keyponits detection. The absence of single small but still visibly complementary body parts, such as the absence of a lower leg, is within the tolerance range. A visualization of such good results is shown in Fig. 2. Based on this selection a robustness of the later trained network is generated and misinterpretations are prevented.

For the assignment of the two classes posing and no posing the method of labeling is used. In this method, characteristic sexual poses are selected and assigned to the class posing. For the classification of data, different applications exist to determine the respective images or to assign them to different classes. For this work the tool LabelIMG was used. With this tool the image files with the most recognized keypoints are read in and assigned to the desired class by hand. The file is provided with a bounding box, similar to a bounding box. This frame is then assigned the class posing or no posing. This assignment works similar to the assignment of object views, where objects such as ships are assigned to the class ship due to their nature. The class information and coordinates of the boxes are saved and stored in an xml file. These for the further procedure used and afterwards the own developed net passed.

Construction and training of the neural network As a proof of concept for classification, a simple neural network with a convolutional layer based on the existing OpenPose system was attached to process the results of the herding system. The results of this network, however, did not provide satisfactory results, because due to the architecture and the resulting output only one class was detected. As a result, the proof of concept was adapted, revised and transformed into a convolutional neural network with pooling layers during the course of the research. The individual layers of the self-developed transformed network are shown in Fig. 3. To increase the processing power of the system, the output image files of the previous mesh were initially scaled to an input value for the new mesh of 64 x 64 pixels using the Resize (64,64) function. This resolution was chosen to increase the processing power of the mesh.

Previous studies frequently utilized higher resolutions as they facilitate the extraction of more information from an image. Nonetheless, as this study is primarily focused on pose detection rather than detailed image analysis, a lower resolution was selected. In addition, the input layer is represented by 3 layers, for the image colors red, green and blue. These pixels are responsible for making colors visible on the input files. After the input layer, two Convolutional Layers are added. Their output is determined by the so-called ReLu function. They follow the principle of feature learning. Subsequently, the data is reduced by the adjacent pooling layer. This is done by filtering out the strong features and discarding the weak features. At the end of the network there are two linear layers, which are transformed into a one-dimensional layer with the help of the view() function. These layers are finally responsible for the classification. After defining the neural network structure, the forward method is applied in the program code to connect the network architecture. It also enables the execution of the forward pass. Consequently, the layers process together until the output is available, which provides the image. In the output layer, the softmax function is used as the activation function instead of the previously used ReLU function. In this research, the softmax function was deliberately chosen because it is specifically used for classification problems. Many multilayer neural networks end up with real-valued results output at the last hidden layer. These often pose scaling challenges for the neural network, creating obstacles for further processing. Here the softmax function is helpful, since it converts the results into a probability distribution, which can be better used by the network. For the analysis process, the zero points of the gradient and the activation of the weights are determined. Using Pytorch’s Autograd module, differentiations are performed and weights of the model are automatically updated. In order for Pytorch to not only determine whether the network’s predictions are true or false, but also to determine their valence, a loss function is needed. For this study, we utilized the CrossEntropy function. This is also used in other work for classification tasks. The loss function is used during the training loop to match the predictions with the actual labels. This information is backwarded by the backward() method to allow the initial layers to learn from the differences and update the network. Through the optimizer() function, the differences are passed to the gradient, resulting in an adjustment of the weights at each layer. For the training loop, the parameters just described are passed to the network, which is shown in the underlying representation. In the code sequences, Cnnet was used to describe the self-created neural network.

Due to the given resources and the limited time for this research, an epoch of 50 was decided upon. This implies that the training process iterates 50 times. With this number, representative results are already produced, which, however, do not yet drastically extend the analysis process. After all, the accuracy of the training as well as a loss function were determined in order to identify and subsequently evaluate the functionality. In summary, the training can be described as the following lines of code.

However, if only training data is used for programming, the neural network used may overfit. This means that the system is good at recognizing and evaluating representations that it has been trained with. However, generalized representations that are not included in the training dataset cannot be analyzed. For this reason, other datasets such as the validation dataset and the test dataset are mandatory. To fulfill this scenario, a validation loop is created after the training loop. This can ensure that the model is not over-fitted to the training data and recognizes not only them, but also independent data. The training procedure just described is repeated with the validation data. Once again, the dataset undergoes batch-by-batch processing within the system, with the loss being calculated. However, no backward() method and optimization approaches are needed here, since the parameters of the model do not change in this step. A similar procedure is also used to pass the data to the test loop. Evaluation is performed independently of training or validation. In order to determine the learning and recognition success of the respective phases, the accuracy of the respective predictions were measured for the training and validation data set and were output after the respective phases. Thus, after each training phase, a percentage is shown which indicates how well the system classified the data. Afterwards, the trained model can be passed to the pipeline.

Composition of the individual components After training the self-generated neural network, the two systems must be connected to form a single unit in order to pass the results of the motion analysis system to the classifier. For this purpose, the two sequences must be merged in the code pipeline. In order to be able to guarantee further modifications of the created network, the architecture and the training of the network were deliberately combined into a separate program code, which was subsequently saved in a file named new_model.pt. This project file is then passed to the pipeline in only one program line. This ensures that other possibly better meshes can be trained separately and can later be integrated into the pipeline without much effort. Thus, the pipeline can be quickly and efficiently replaced by even better neural networks, increasing its individuality. Based on the training, the neural network can now evaluate the captured image files and assign them to the two classes Posing and no Posing to be determined. This assignment is done by sorting the images of the respective classes into two folders of the same name. Since the output of the neural network is the individual skeletons without background, these files should actually be in the folders of the same name. According to the workflow, these folders now contain the images in the form of skeletons. However, if you are not intensively involved with the subject, you will not be able to do anything with this kind of representation. In the meantime, a program has been created that ensures that the converted files are back in the folders as original files at the end of viewing. However, OpenPose automatically changes the file extension of the images as soon as the skeleton is determined. Thus, standard extensions like .jpg, .png or .bmp are replaced by _rendered.png. This procedure makes it difficult to reconstruct the original files. In the underlying bachelor thesis, an attempt was made to solve the problem by removing the file extension _rendered.png and replacing it with some kind of tariff system with .png and .jpg. This led to the fact that 34% of the image files to be analyzed could not be determined. This is because these were not taken into account when restoring the image extensions. To counteract this problem, in later studies a working directory was created containing a folder in which all image files were stored again before the corner points were determined. In this process, all image files were given a duplicate file extension. This means that the file neuronalesNetz.webp became neuronalesNetz.webp.webp after this step. This newly named file is now packed into an intermediate folder “Zwischenstand”. Afterwards, the data is passed to the OpenPose system for item recognition. The file extension is now replaced by the predefined phrase again. Thus, the file name is now neuralNet.webp _rendered.png. With this in mind, a Python program was written with the Pose Estimation System extension removed. What remains is the extension of the original file. So the ending is known and at the end all files can be sorted into the respective label folders again.

4 Results

The results of the neural network training provided satisfactory results. At the beginning of the training, the accuracy of the predictions was only 50.75% for the training data set and 52.64% for the validation data set. These values could be improved strongly after the 50 training runs. It was found that the model predicted the values of the training data set with an accuracy of 97.59%. The accuracy for the fit to the validation data set was 98.51%. The loss functions also obtained improved from 0.687834 to 0.085758 for the training data set and from 0.702857 to 0.024533 for the validation data set. The total training time was 3425.92 seconds. By restructuring and adjusting the pipeline, the developed workflow could be improved. For testing purposes, the corresponding reduced dataset from Figshare [21] with 532 data was introduced into the created pipeline.

Due to the change to determine the file extensions, it was now possible to recognize and subsequently determine all image files from the constructed model, unlike in the underlying bachelor thesis. In total, 150 images were placed in the no posing folder and 382 images were placed in the posing section. Fig. 4 illustrates the exact results. It was found that the basic idea of this research, to detect posing on the basis of pose estimation systems, could be applied to many images. In addition, the consideration of the revised output representation could be fully implemented. It was shown that the developed pipeline could reliably classify posing images as such even if the depicted persons showed postures where, for example, the legs are spread or the buttocks are held into the camera. However, the developed system faced challenges in recognizing postures involving bent legs and a curved spine This finding is reflected in the 16 misclassified image files in the no posing category. The image files there predominantly illustrate the problem just described. In order to minimize this already known problem, the neural network is currently being trained with such problematic image files in order to improve the recognition rate of these poses. On the other hand, it has been shown that the trained neural network exhibits a slight over-fitting. This means that many apparently everyday poses, 153 in total, were sorted into the category posing. In most of the images, the people are working with their hands to express certain situations. Due to the targeted training on the hand region to detect a playful accentuation of the hands, certain hand positions were misinterpreted by the system. In addition, the studies conducted showed that the OpenPose system, which was integrated first, had problems identifying the corresponding skeletons for persons without a recognizable eye region or with the half of the body covered. For example, no skeletons existed for individuals with no eyes visible. As a consequence, the subsequently attached neural network fails to detect potentially sexually suggestive poses as posing, if Open Pose doesn’t provide a substantial skeleton beforehand After the results were obtained for applied incriminated dataset with adult subjects, a dataset with everyday image files of children was injected into the pipeline. This step was performed in order to verify whether the transfer, of the already confirmed functioning of the pipeline, is also transferable to the proportions of children. Particular attention was paid to the fact that the interpretation of skeletal parts was transferred. The tests showed that both systems implement the learned factors and adapt them to the respective image files.

5 Discussion and Limitations

The results of the research have shown that motion analysis methods can also be used for the detection of sexually suggestive poses. Furthermore, the generated pipeline proved that the estimation system just described can be combined with another neural network. Since the present work deals exclusively with the analysis of movements with a pornographic context, a fundamental improvement in the recognition of child and adolescent pornographic files cannot be confirmed by this one method alone. However, to further fill the gap in scholarship in this area, the proposed methodology must be combined with the other preliminary considerations from the research. This work will pave the way to bring more attention to the issue of child and adolescent pornography and provide further suggestions for the outdated rating systems.

5.1 Classification Problem

An existing problem, which has been shown in the course of this research and on the basis of the results, is that the self-created model can only act in combination with previously switched network OpenPose. This means that if OpenPose provides weak or no keypoints, the subsequent model cannot perform reliable classification either. Although the supposed best evaluation system was chosen for this research, it also has its limitations. The system was developed to recognize human poses and make predictions. However, there are no scientific publications yet on whether this system is also used to detect sexual posing. Thus, the present results are the first attempt to implement the prediction of skeletons by OpenPose in a different domain. It could be confirmed that OpenPose shows problems when the spine of the respective person is curved. Thus, the results were only inconsistent for certain lying positions. In addition, images where no eyes are visible cannot be skeletonized. This means that close-ups of buttocks or genitals are not analyzed when posing. These inconsistencies might have arisen because the actual functionality of the rating system is to recognize people in everyday situations. Should OpenPose become further established for the analysis of pornographic files, it would be possible to revise individual program lines to adapt the recognition to the problem being analyzed. In this context, it would be worthwhile to investigate in future research how these discrepancies can be reduced, or even whether there are other systems that better perform the task of evaluating sexually suggestive poses. This issue could also be mitigated by enhancing the analytical capabilities of OpenPose. On the one hand, this can be done by incorporating the deeper version of this initialization into the pipeline rather than the demo version. The demo version seems to have problems recognizing certain postures. The full version method has not yet been implemented because the computer used to obtain the data does not have an NVIDA graphics card. For this reason, the demo version was first integrated into the pipeline, as no extensive technical requirements had to be met for this. On the other hand, the network must be specifically trained with such images, which were incorrectly recognized by the results or where the skeletons were only partially displayed. In this way, an attempt is made to improve the results in further research. The mentioned over-fitting of the neural network was characterized by an overtraining of the hand position. The results and the output image files have testified that many files were sorted into the posing category on basis of accentuated hand positions. The over-fitting should be reduced by using a much larger amount of data for further training and by setting the poses to be labeled more expressively. This adaptation should prevent the artificial intelligence from deciding whether a pose is sexually stimulating or not based on arm positions alone, but always acting in combination with the lower body region. The neural network was almost one hundred percent accurate in categorizing poses with spread legs into the posing category.

5.2 Work Restriction

The aforementioned classification problem can also be mentioned as a limitation of this work.Another limitation of this work is that a lack of available data sets hampered the analysis of the research objective. There was a lack of reliable and problem-specific material. Although the datasets used, each containing approximately 3,000 files, already contained less data than comparable datasets used to train other neural networks, the datasets used had to be further reduced. Deepening the dataset was not possible due to limited time, as the image files to be classified had to be labeled individually by hand. With the reduction of the data set, it had to be accepted that the neural network would not achieve the desired results. If the network would be trained with more data, a higher variability would arise and thus a higher probability would be given to recognize other poses. Furthermore, due to the legal situation, the main task, an improvement for the detection of child and youth pornography, could not be fully met. The developed pipeline has shown that sexually suggestive poses are classified into two categories posing and no posing with regard to the created workflow. However, since a one hundred percent transfer of the workflow to child pornographic files could not take place, only the recognition and marking of child skeletons was tested. This ran without problems. Based on this assumption, it can be assumed that the assumptions made here can also be transferred to the topic of posing in relation to children and adolescents, since everyday child skeletons were already recognized by the system.

6 Summary

The present study aimed to design awn worklow to improve the detection of child and adolescent pornographic files. To this end, insights from the field of motion analysis were to be used to distinguish posing images from everyday images based on classification. To answer the guiding question, a motion analysis system was connected to a neural network to draw conclusions about the valence of the data. It was found that OpenPose was the best motion analysis system of the systems tested to detect sexually suggestive poses. The meaningful results confirmed that the combination of the two models into a pipeline is suitable for detecting sexually suggestive poses of adults. In addition, the self-created neural network was integrated into the pipeline in such a way that it could be easily replaced based on the modification of a single program line. Thus, better and deeper neural networks could be inserted into the already working pipeline without a big effort. Thus, better and more accurate results could be achieved. Tests have also shown that the system can also be applied to the physique of children. For this reason, it can be assumed that the findings for sexually suggestive poses can also be transferred to the physique of children and adolescents. Thus, the basic goal of this thesis could be achieved and the research question could be confirmed. Furthermore, the recognition of posing can be used in other areas as well. For example, the learned knowledge from this work could be used to recognize intimacies between people, since elements of posing appear in many actions between individuals. For example, spreading the legs and positioning the buttocks in front of the camera occur more frequently in such scenarios. To subsequently implement this idea with the present pipeline, a more elaborate network would need to be created and trained with appropriate data. Afterwards, the model is saved and passed to the created pipeline in the form of a program line. In addition, it should be noted that the findings of this research alone cannot improve the analysis of child pornography files. A combination of several preliminary considerations must always be ensured to pursue this goal. For example, a network could be created that both detects posing and predicts certain sexually suggestive objects that describe a pornographic context. In this way, the data to be analyzed can be further minimized and the detection of incriminated files can be further improved.

Acknowledgements

At this point I would like to thank all those who have supported and motivated me during the preparation of this scientific work. First of all, I would like to thank my superior Nico Müller, who supervised and reviewed my bachelor thesis and this paper. I would like to thank him for giving me the opportunity to work on this tortious topic and to work on my own programming solutions and research. Thanks to him and my colleagues, my interest in digital forensics and AI-based solutions was awakened. Based on this new knowledge, I hope to gain further ground in this field. Finally, I would like to thank my family and friends who have always supported me in my scientific endeavors and were willing to face the theories that were created.

Open Access Dieses Kapitel wird unter der Creative Commons Namensnennung 4.0 International Lizenz (http://creativecommons.org/licenses/by/4.0/deed.de) veröffentlicht, welche die Nutzung, Vervielfältigung, Bearbeitung, Verbreitung und Wiedergabe in jeglichem Medium und Format erlaubt, sofern Sie den/die ursprünglichen Autor(en) und die Quelle ordnungsgemäß nennen, einen Link zur Creative Commons Lizenz beifügen und angeben, ob Änderungen vorgenommen wurden.

Die in diesem Kapitel enthaltenen Bilder und sonstiges Drittmaterial unterliegen ebenfalls der genannten Creative Commons Lizenz, sofern sich aus der Abbildungslegende nichts anderes ergibt. Sofern das betreffende Material nicht unter der genannten Creative Commons Lizenz steht und die betreffende Handlung nicht nach gesetzlichen Vorschriften erlaubt ist, ist für die oben aufgeführten Weiterverwendungen des Materials die Einwilligung des jeweiligen Rechteinhabers einzuholen.

Vorheriges Kapitel Iterative Development of a Process-Oriented Approach for the Selection of Platform-Based Digital Services

Nächstes Kapitel Human Centered Implementation Process of AI in SMEs – Conditions for Success

Mainzer, K. (2016). Künstliche Intelligenz – Wann übernehmen die Maschinen? (1. Aufl.). Springer. https://doi.org/10.1007/978-3-662-48453-1.

Cremers, A., Englader, et. al. (2019). Vertrauenswürdiger Einsatz von Künstlicher Intelligenz. Fraunhofer-Institut für Intelligente Analyseund Informationssysteme IAIS.

Schwander, M. Kultur neu Entdecken – SWR 2. Kinderpronographie im Netz- Neue Dimension des sexuellen Missbrauchs. (S. Striegl, & L. Meyer-Blankenburg, Redakteure) SWR (04. 08 2022). http://www.swr.de/swr2/wissen/kinderpornografie-im-netz-neuedimensionen-des-sexuellen-missbrauchs-swr2-wissen-2022-08-04-100.pdf 2 Okt. 2022.

Kuhnen, K. (2007). Was ist Kinderpronographie? 9, Hogrefe Verlag GmbH & Co. KG.Göttingen, https://doi.org/978-3-8017-2085-8.

Microsoft. Künstliche Intelligenz bewährt sich im Einsatz gegen Kinderpornografie., Microsoft (25. 05 2021). http://www.news.microsoft.com/de/kuenstliche-intelligenz-im-einsatz-gegen-kinderpornografie 27 Sept. 2022.

Garbers, N., & Brodthage, M. (2021). Herausforderungen beim Einsatz Künstlicher Intelligenz. In: Lecture Notes in Informatics (LNI), Informatik 2021, S. 879–889. G. f. e.V.

Kaplan, S., Handelmann, D., Handelmann, A. AI and Ethics. Sensitivity of neural networks to corruption of image classifcation, Springer Nature Switzerland AG., Schweiz S. 425–434. 6 Nov. 2020.

Banaeeyan, R., Karim, H. A., Lye, H., Fauzi, M. F. A., Mansor, S., & See, J. (2019). Automated nudity recognition using very deep residual learning network. International Journal of Recent Technology and Engineering, 8(3S), 136–141.

Moustafa, M. (2015). Applying deep learning to classify pornographic images and videos, arXiv:1511.08899.

10.

Zecha, D., & Lienhart, R. Bestimmung intrazyklischer Phasengeschwindigkeiten, Universität Augsburg, Germany: Institut für Informatik, Augsburg, Juli 2022.

11.

Xu, C., Xiaoye, L., Liya, M., & Xuan, F. AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation, Cornell University. 11 Mai 2022.

12.

Brandl, C., Bonin, D., Mertens, A., Wischiniewski, S., & Schlick, C. M. Digitalisierungsansätze ergonomischer Analysen und Interventionen am Beispiel der markerlosen Erfassung von Körperhaltungen bei Arbeitstätigkeiten., Springer Verlag, Berlin Heidelberg., 27 Juli 2016, DOI: https://doi.org/10.1007/s41449-016-0016-9.

13.

Eisele, J., & Franosch, R. (2016). Posing und der Begriff der Kinderpornografie in §184b StGB. Zeitschrift für Internationale Strafrechtsdogmatik, 8, 519–525.

14.

Odemakinde, E. Estimation with Deep Learning – Ultimate Overview in 2022 http://www.viso.ai/deep-learning/pose-estimation-ultimate-overview/ 29 Sept. 2022.

15.

Hao-Shu, F., Shuqin, X., et. all. RMPE: Regional Multi-Person Pose Estimation, arXiv:1612.00137v5, 04 Juli 2018.

16.

He, K., Gkioyari, G., Dollar, P., Girshick, R. Mask R-CNN, arXiV:1703.06870v3, 24 Jan. 2018.

17.

Sharma, S. A Short Guide to Pose Estimation in Computer Vision https://www.medium.com/@siddrrsh/a-short-guide-to-pose-estimation-in-computer-vision-3ea708dd9155, 02 Apr. 2020.

18.

Cao, Z., Hidlago, G., Simon, T., Wei, S.-E., & Sheikh, Y. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. Cornell University. 10.48550. 30 Mai 2019.

19.

Uhrmacher, L. Bachelorthesis. Gestenerkennung für den RoboCup mit OpenPose und neuonalen Netzen. Universität Hamburg. 29 Mai 2020.

20.

Bösch, G. A Guide to OpenPose in (2022). viso.ai.http://www.viso.ai/deep-learning/openpose/. 29 Sept. 2022.

21.

Figshare. http://www.figshare.com/articles/dataset/Adultcontentdataset/13456484/1. 20 Dez. 2020.

Titel: Classification of Static Poses Based on Key Point Detection for Application of Incriminated Image Files
verfasst von: Schönbrodt Antonia
Verlag: Springer Fachmedien Wiesbaden
Buch: First Working Conference on Artificial Intelligence Development for a Resilient and Sustainable Tomorrow
Print ISBN: 978-3-658-43704-6

Electronic ISBN: 978-3-658-43705-3

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-3-658-43705-3_6

Springer Professional